Methodology

How we compute creator sentiment, and why we trust it.

We're upfront about what we measure, how we measure it, and what we don't (yet) measure. This page is the canonical answer for procurement, due-diligence, and DDQ questions.

By the numbers

158K+

Cruise videos indexed

20-year archive (2006-present)

2K+

Scored creator channels

Expertise-tier weighted

238+

Cruise ships covered

Across 19 cruise lines

110K+

Videos sentiment-analyzed

Full transcripts, structured fields

484K+

Entity-level sentiment records

Per-video × per-entity

~8K

Days of watch time analyzed

8-hour workdays equivalent

The pipeline

We discover cruise YouTube creators, extract their full transcripts, run an LLM pipeline that produces 50+ deterministic structured fields per video (creator profile, sailing context, sponsorship flags, verified quotes, operational issues, price mentions, recommendations), plus 30+ correlated cruise topic sentiment scores per entity. The extraction is fact-checked against the transcript, hallucination-scored, and schema-validated. Aggregations roll up to channel-level expertise (Layer 2) and entity-level consensus (Layer 3) with the controls below.

The controls

1. Expertise-weighted sentiment

Every channel is scored on an expertise axis derived from four input signals: cruise-content concentration, entity specialization, longevity in the cruise space, and audience validation. Each video's contribution to aggregate scores is weighted by its channel's expertise - so a dedicated cruise-specialist channel with years of brand-specific coverage carries more signal than a one-off general travel vlogger reviewing the same ship. Generic social-listening platforms provide reach metrics as a separate dimension that users filter manually; we build authority weighting into the aggregate by default.

2. Credibility discounting

We programmatically detect sponsorships, comped sailings, press trips, and affiliate relationships, and downweight sentiment accordingly. Sponsored content receives the steepest discount; comped sailings and press trips are downweighted to a lesser degree. This is something the industry does not do, and it's the difference between consensus scores you can trust and ones inflated by paid promotion.

3. Per-channel contribution cap

No single creator's videos can dominate any aggregate score, no matter how prolific they are. A hard cap on per-channel contribution prevents a single dominant voice from masquerading as consensus.

4. Confidence scoring

Each rollup carries a confidence score combining three input axes: video volume (saturating - additional volume beyond a threshold adds diminishing return), channel diversity, and expert-tier coverage. Reports below the confidence threshold for an entity are suppressed rather than shown with weak data.

5. Coverage-depth tagging

For every entity mention in every video we tag depth of coverage: whether the entity is a primary focus of the video, a meaningful portion of it, or a passing mention. Aggregate scores draw from deeper coverage only - passing mentions don't dilute signal.

6. Audit traceability

Every output value resolves to its source. Each quote is anchored to a transcript offset; each aggregate cites its contributing videos and channels. Sample any aggregate, see the underlying creator videos, and verify the extracted claims against the source content. A human-evaluation accuracy benchmark is on our roadmap; until it publishes, audit traceability is the strongest assurance we offer and we're upfront about that in every DDQ.

Full consumer methodology and visual examples: tripbacon.com/methodology.

DDQ pack & sample data

For institutional buyers we provide a written DDQ response, a data dictionary, sample-size and confidence-distribution disclosures, and our roadmap for human-evaluation accuracy benchmarking. Request below.