Predicting Movie Success with 86% Accuracy from Trailers Alone
A North AI neuroscience study. 24 respondents watched 20 movie trailers while we captured EEG and eye-tracking simultaneously. Models predicted a trailer's Metacritic class with 86% accuracy — and gaze synchronisation turned out to inversely predict critical acclaim. Here is what the data showed, and what we are not yet claiming.

Can a machine watching your audience's brain predict whether a film lands — before a single ticket is sold?
We ran the experiment. 24 respondents watched 20 movie trailers while we recorded their eye movements and brain activity at the same time. From that neuro-behavioural signal alone, our models sorted trailers into Metacritic classes with 86% accuracy.
This is a small study with real limitations, and we will be specific about them. But the direction is clear, and one finding genuinely surprised us: the trailers that pulled every viewer's eyes to the same spot tended to score worse with critics, not better.
WHY PRE-RELEASE PREDICTION IS THE HARD PROBLEM
Every studio, brand, and agency faces the same gap. The decision to greenlight, recut, or pour media spend behind a piece of content happens before the market reacts. By the time box office, view counts, or Metacritic scores exist, the money is already committed.
North AI measures how audiences cognitively respond to video content, second by second, so that response can be read before launch rather than after. This study was designed to test a narrow, falsifiable version of that promise: can synchronised EEG and eye-tracking data, collected from a modest panel, predict a trailer's eventual critical reception?
Movie trailers are a useful proving ground. They are short, emotionally engineered, and they carry an external ground-truth label — the Metacritic score of the film they advertise — that no one in the test room can influence.
METHODOLOGY
We collected data from 24 respondents across 54 session records. Each session presented 5 of the 20 unique trailers, producing 270 content-session records in total.
Three data streams were captured for every session:
- Gazepoint eye-tracking — high-fidelity, hardware-based gaze tracking.
- Gazefilter webcam eye-tracking — webcam-based gaze estimation, the same scalable capture method used in North AI's platform.
- EEG — direct brain-activity measurement.
Running a research-grade tracker and a webcam tracker side by side lets us check how much of the lab-grade signal survives in the scalable, webcam-only setup that real deployments depend on.
A note on data hygiene: across the panel, a mean of 7.9% of gaze fell outside the main content area, with an overall mean of 14.7% once off-screen gaze is included. Attention wanders. Any honest model has to be robust to that.
WHAT WE MEASURED
Eye-tracking and behavioural metrics. Blink count, blink rate (per minute) and blink duration were estimated from the Gazefilter tracker. For each trailer we also computed blink synchronisation — how tightly respondents blinked together, expressed as a mean timestamp difference and as the proportion of blinks synchronised within one second. Blinks matter because a blink is a micro-gap in attention; when a room blinks together is a proxy for a shared cognitive or emotional beat.
We measured gaze synchronisation (the distance between respondents' gaze points at each timestamp), the proportion of time gaze fell outside the content, and head movement (tracked via three face models, normalised to trailer duration).
EEG metrics. Six band-ratio measures, each a standard index in the neuro-marketing literature:
| EEG metric | What it indexes | Band ratio |
|---|---|---|
| Level of brain activation | How active the brain is | Beta ÷ alpha |
| Attention concentration | Degree of focus | (Theta + alpha) ÷ (beta + gamma) |
| Brain load | How easily information is processed | Theta ÷ alpha |
| Engagement | How captivating the content is | Beta ÷ (alpha + theta) |
| Emotional background | Positive vs negative affect | Difference in alpha power |
| Cognitive processes | Cognitive overload | (Theta + alpha) ÷ (beta + gamma) |
FINDING 1 — EEG AND EYE-TRACKING MOVE TOGETHER
The first question is whether the cheap, scalable signal (webcam eye-tracking) carries any of the same information as the expensive one (EEG). If the two are statistically independent, webcam-only deployments lose the neural story.
They are not independent. We found eight statistically significant correlations between EEG metrics and Gazefilter eye-tracking metrics [1]:
| EEG metric | Eye-tracking metric | r | p | Sig. | n |
|---|---|---|---|---|---|
| Brain activation | Head movement distance | 0.353 | 0.000002 | ** | 170 |
| Attention concentration | Blink rate | 0.217 | 0.004 | ** | 170 |
| Attention concentration | No-content gaze time | 0.186 | 0.015 | * | 170 |
| Attention concentration | Blink-synchronised proportion | −0.202 | 0.008 | ** | 170 |
| Brain load | Blink-synchronised proportion | −0.224 | 0.003 | ** | 170 |
| Brain load | No-content gaze time | −0.181 | 0.018 | * | 170 |
| Engagement | Head movement distance | 0.346 | 0.000003 | ** | 170 |
| Cognitive processes | Blink rate | 0.209 | 0.006 | ** | 170 |
* p < .05 · ** p < .01 · n = 170 paired observations
The effects are small — the strongest correlation explains about 12% of the variance — but they are consistent and they point the right way. Higher engagement and brain activation track with more head movement. Higher cognitive load tracks with lower blink synchronisation, which fits intuition: when a trailer is hard to process, viewers fall out of lockstep.
Assumption vs fact: it is a fact that these correlations are significant at this sample size. It is an assumption — a reasonable one — that the webcam channel is therefore a usable proxy for neural state in deployment. Correlations this size confirm shared signal, not interchangeability.
FINDING 2 — SYNCHRONISED GAZE PREDICTS LOWER CRITICAL SCORES
This is the counterintuitive result. Across the 20 trailers, gaze synchronisation correlated with Metacritic rating at r = −0.685 (p = 0.001, n = 20) [1]. The more tightly viewers' eyes locked onto the same point, the lower the film's critical score tended to be.
That inverts the naive reading of "synchronisation = good." A plausible interpretation: trailers that funnel all attention to a single obvious focal point are often the formulaic, single-hook promos attached to weaker films. Trailers for better-reviewed films may distribute attention across a richer frame — more to look at, more for the eye to explore.
We flag this as the single finding most in need of replication. With n = 20 titles, one correlation coefficient is suggestive, not settled. A −0.685 on 20 points is the kind of result that survives or dies on the next 80 titles.
FINDING 3 — THE MODELS
We built classifiers under stratified cross-validation with 5 permutations, each using 16 training and 4 test samples. Two targets: a binary success-rate label, and Metacritic class.
Success-rate prediction [1]:
| Model | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|
| Random Forest | 0.72 ± 0.17 | 0.63 ± 0.11 | 0.87 ± 0.27 | 0.72 ± 0.17 |
| Support Vector Classifier | 0.70 ± 0.10 | 0.73 ± 0.16 | 0.93 ± 0.13 | 0.80 ± 0.07 |
| Isolation Forest | 0.75 ± 0.16 | 0.77 ± 0.20 | 0.88 ± 0.15 | 0.80 ± 0.16 |
Metacritic-class prediction [1]:
| Model | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|
| Random Forest | 0.86 ± 0.00 | 0.75 ± 0.00 | 1.00 ± 0.00 | 0.86 ± 0.00 |
| Support Vector Classifier | 0.80 ± 0.10 | 0.80 ± 0.10 | 1.00 ± 0.00 | 0.89 ± 0.06 |
| Isolation Forest | 0.75 ± 0.16 | 0.83 ± 0.21 | 0.80 ± 0.19 | 0.80 ± 0.17 |
Note: Metacritic classes were collapsed to 2 classes — only one trailer fell into 'Generally Unfavorable', too few to learn from.
The headline number is the 86% Metacritic-class accuracy from the Random Forest. Taken together, the neuro-behavioural feature set sorts trailers into the right critical tier far better than chance.
THE PART WE WON'T OVERSELL
A confident headline deserves an honest footnote, so here is the skeptical read of our own results.
- The Random Forest's 86% has a recall of 1.00 and a standard deviation of 0.00. With a 2-class target and only 4 test samples per fold, that pattern is consistent with a model leaning hard on the majority class. The Support Vector Classifier (more variance, recall also pinned at 1.00) tells a similar story. We read 86% as encouraging, not as a validated production accuracy.
- n is small at every level. 24 respondents, 20 titles, 4-sample test folds. The cross-validation guards against some overfitting, but it cannot manufacture statistical power that 20 titles do not contain.
- Collapsing to 2 classes inflates the ceiling. A binary problem with an uneven split is easier than the 3- or 4-tier problem a studio actually cares about.
- Correlations are real but small. "Statistically significant" at n = 170 is a low bar for effect size. These are signals to build on, not laws.
None of this negates the study. It scopes it. The honest claim is: a modest neuro panel carries enough signal to beat chance on pre-release critical reception, and the method is worth scaling. That is a different — and more defensible — sentence than "we predict hits."
WHAT THE DATA SAYS ABOUT FOCUS GROUPS
One result cuts across the whole study. Our internal-dependency analysis — comparing what respondents did (gaze, blink, EEG) against what they reported — trended negative, with non-linear segments. Behaviour and self-report did not reliably agree.
That is the quiet case against the focus group. When a viewer's stated reaction diverges from their measured one, the questionnaire is recording a story the brain has already edited. For pre-launch decisions, the measured channel is the one with predictive value.
KEY FINDINGS FROM THIS STUDY
→ 24 respondents · 54 sessions · 20 trailers · 270 content-session records
→ EEG and webcam eye-tracking are significantly correlated (8 effects, all p < .05)
→ Gaze synchronisation inversely predicts Metacritic score (r = −0.685)
→ Metacritic class predicted at 86% accuracy (Random Forest)
→ Success-rate label predicted at up to 75% accuracy (Isolation Forest)
→ Behaviour and self-report diverge — qualitative methods miss signal
→ Effects are small and n is modest — replication at scale is the next step
Frequently Asked Questions
Can you really predict a movie's success from its trailer?
In this study, models sorted 20 trailers into the correct Metacritic class 86% of the time using only EEG and eye-tracking data captured while respondents watched. That demonstrates predictive signal in the neuro-behavioural response to a trailer. It is not yet a validated box-office forecaster — the sample is small, the target was simplified to two classes, and the headline accuracy needs replication on a larger, balanced set of titles before it should drive a greenlight decision.
What does gaze synchronisation measure, and why did more of it predict worse scores?
Gaze synchronisation is how closely different viewers look at the same point on screen at the same moment. We found it correlated negatively with Metacritic ratings (r = −0.685). The working interpretation is that trailers which force every eye to one obvious focal point are often the formulaic promos attached to weaker films, while better-reviewed films offer richer frames that distribute attention. This is our most striking result and also the one most in need of confirmation at larger n.
Why use both EEG and webcam eye-tracking?
EEG is the high-fidelity measure of brain state but is impractical at scale. Webcam eye-tracking is scalable but indirect. Running them together let us show they carry overlapping signal (eight significant correlations), which supports using the scalable webcam channel as a proxy for neural response in real-world deployments.
Is 86% accuracy reliable?
It is promising but should be read with caution. The Random Forest achieved 86% on a 2-class Metacritic target with a recall of 1.00 and zero variance across folds — a pattern that can indicate reliance on the majority class when test folds are as small as four samples. We treat it as evidence that the method works in principle and as motivation to scale the dataset, not as a production-grade accuracy figure.
What would make these models more trustworthy?
A larger, more balanced sample: more respondents per trailer, more titles, and enough examples in each Metacritic tier to predict 3 or 4 classes rather than 2. That would raise statistical power, reduce reliance on majority-class behaviour, and let the gaze-synchronisation finding be confirmed or rejected. The current results solve a deliberately simple version of the problem; the next study scales it.
About the author
Rishi Kapoor is CEO of North AI, a neuroscience and AI R&D company building pre-launch attention measurement for video. He works with brands, studios, and agencies on translating neuro-behavioural signal into decisions made before media spend is committed.
Tags: #Neuroscience #EyeTracking #EEG #PredictiveModelling #CreativeTesting #MovieMarketing #AttentionAnalytics #NorthAI
Reference: [1] North AI Ltd Research.
Run this in the simulator
Drop in a video, image, or copy. Get attention, engagement, and emotional response across 7 platform models in minutes.