How we compute creator sentiment, and why we trust it.
We're upfront about what we measure, how we measure it, and what we don't (yet) measure. This page is the canonical answer for procurement, due-diligence, and DDQ questions.
By the numbers
The pipeline
We discover cruise YouTube creators, extract their full transcripts, run an LLM pipeline that produces 50+ deterministic structured fields per video (creator profile, sailing context, sponsorship flags, verified quotes, operational issues, price mentions, recommendations), plus 30+ correlated cruise topic sentiment scores per entity. The extraction is fact-checked against the transcript, hallucination-scored, and schema-validated. Aggregations roll up to channel-level expertise (Layer 2) and entity-level consensus (Layer 3) with the controls below.
The controls
1. Expertise-weighted sentiment
Every channel is scored on an expertise axis derived from four input signals: cruise-content concentration, entity specialization, longevity in the cruise space, and audience validation. Each video's contribution to aggregate scores is weighted by its channel's expertise - so a dedicated cruise-specialist channel with years of brand-specific coverage carries more signal than a one-off general travel vlogger reviewing the same ship. Generic social-listening platforms provide reach metrics as a separate dimension that users filter manually; we build authority weighting into the aggregate by default.
2. Credibility discounting
We programmatically detect sponsorships, comped sailings, press trips, and affiliate relationships, and downweight sentiment accordingly. Sponsored content receives the steepest discount; comped sailings and press trips are downweighted to a lesser degree. This is something the industry does not do, and it's the difference between consensus scores you can trust and ones inflated by paid promotion.
3. Per-channel contribution cap
No single creator's videos can dominate any aggregate score, no matter how prolific they are. A hard cap on per-channel contribution prevents a single dominant voice from masquerading as consensus.
4. Confidence scoring
Each rollup carries a confidence score combining three input axes: video volume (saturating - additional volume beyond a threshold adds diminishing return), channel diversity, and expert-tier coverage. Reports below the confidence threshold for an entity are suppressed rather than shown with weak data.
5. Coverage-depth tagging
For every entity mention in every video we tag depth of coverage: whether the entity is a primary focus of the video, a meaningful portion of it, or a passing mention. Aggregate scores draw from deeper coverage only - passing mentions don't dilute signal.
6. Audit traceability
Every output value resolves to its source. Each quote is anchored to a transcript offset; each aggregate cites its contributing videos and channels. Sample any aggregate, see the underlying creator videos, and verify the extracted claims against the source content. A human-evaluation accuracy benchmark is on our roadmap; until it publishes, audit traceability is the strongest assurance we offer and we're upfront about that in every DDQ.
Full consumer methodology and visual examples: tripbacon.com/methodology.
DDQ pack & sample data
For institutional buyers we provide a written DDQ response, a data dictionary, sample-size and confidence-distribution disclosures, and our roadmap for human-evaluation accuracy benchmarking. Request below.