
You ran Lighthouse, hit a green 98, and shipped. Three weeks later Search Console flags the page as failing Core Web Vitals. If that has ever happened to you, you have run into the central problem with Lighthouse: the number it gives you and the number Google actually ranks on are measured in completely different worlds. Lighthouse is a useful diagnostic tool. It is a terrible scoreboard. Here is exactly where it misleads, and what to watch instead.
A Lighthouse run is a single synthetic page load on an emulated mid-range Android phone with throttled CPU and a simulated slow 4G connection. It happens once, in a clean environment, with no real user attached. That is lab data.
What Google uses for ranking is field data: the Chrome User Experience Report (CrUX), assembled from real Chrome users who actually visited your page. It is a 28-day rolling window, reported at the 75th percentile — meaning three out of four real visits must hit the target. The thresholds you are graded against are:
A page can score 100 in Lighthouse and still fail CrUX, and a page can score in the 60s and pass comfortably. The synthetic run does not know that 40% of your real traffic is on aging phones over patchy mobile networks, or that your visitors mostly land on a heavy archive page rather than the homepage you tested. The lab is a controlled experiment; the field is your actual audience. Only one of them affects rankings.
This is the single most important thing to understand. In March 2024, Google replaced First Input Delay with INP as a Core Web Vital. INP measures responsiveness across the whole visit — it watches how long the page takes to visually respond after a user taps, clicks, or types, and reports near the worst interaction.
Lighthouse cannot measure INP. There are no real interactions during a synthetic load — nobody clicks anything — so there is nothing to time. Instead, Lighthouse reports Total Blocking Time (TBT) as a lab proxy for interactivity. TBT and INP are correlated but they are not the same, and the gap between them is exactly where sites get burned. You can drive TBT to near zero in the lab and still ship terrible INP in the field.
Here is the most common way this happens on WordPress. Optimization plugins like WP Rocket, Perfmatters, and FlyingPress offer a feature usually called "Delay JavaScript Execution" — it holds back nearly all scripts (analytics, ad tags, chat widgets, sliders) until the user's first interaction.
In a Lighthouse run, no interaction ever happens, so none of that JavaScript executes. TBT plummets and your Performance score jumps, often by 20 or 30 points. It looks like a miracle fix.
Then a real visitor's first tap fires every deferred script at once. The main thread chokes processing all of it, and the response to that very first interaction — the one INP is most likely to record — is slow. Your lab score went up; your field INP got worse. The tool rewarded the exact behavior that hurts real users. This is not a bug in the plugins; used carefully (delaying only non-critical scripts) they help. It is a demonstration that the lab score and the ranked metric can move in opposite directions.
The big number at the top is a weighted blend of several lab metrics — most of the weight sits on interactivity (TBT) and LCP, with smaller contributions from CLS, First Contentful Paint, and Speed Index. Because it is a blend, two pages with the same score can have very different real problems: one might have great paint times but janky interactivity, the other the reverse. Worse, you can "fix the score" by improving whichever metric is cheapest to game rather than the one your users actually feel. Always scroll past the number and read the individual metrics. The score is a summary; the metrics are the truth.
Run Lighthouse three times on the same page and you will often see the score swing by 10 points or more. The simulated throttling, your machine's CPU contention, background tabs, and especially third-party scripts (ad networks and A/B tools serve different payloads each load) all introduce variance. A single run feels authoritative because it produces one tidy number, but it is one sample from a noisy distribution. If you must use the lab score to compare before/after, run it several times and look at the median — never trust a one-shot result.
There are several ways to run Lighthouse, and they do not agree:
When a developer's DevTools shows 95 and the client's PageSpeed Insights shows 68, nobody is lying — they ran two different tests. Pick one environment, ideally PSI for its field data, and stop comparing apples to oranges.
A few more places where a clean lab score hides real-world trouble on WordPress:
None of this means ignore Lighthouse. It means use it for what it is good at and stop treating the score as the goal.
Chase the green number and you will optimize for a robot that never clicks anything. Chase your field Core Web Vitals and you will optimize for the people Google is actually measuring — which is the only audience that moves your rankings.
Site
Tools
We do not sell your email. We do not spam.
© 2026 RevealTheme. All rights reserved.