jsonld@web
jsonld@web:/compare-json-ld-50-news-sites$  cat ./compare-json-ld-50-news-sites.jsonld
rich results · passed post · blog

> How the Top 50 US News Sites Handle Their Own JSON-LD

I scraped the JSON-LD off one article from each of the 50 biggest non-paywalled US news sites. 50/50 valid, zero syntax errors -- but real gaps: Mother Jones ships none, Fox has no author, Vox has no image. Full data and raw blocks for every site.

published
modified
size
22.8K
path
/compare-json-ld-50-news-sites/

Every page on this site shows you JSON-LD that someone wrote on purpose. This post is the opposite: a look at what the big newsrooms actually ship, scraped straight off one recent article from each of 50 popular, non-paywalled US news sites. No cherry-picking, no cleanup. Just whatever was in the <script type="application/ld+json"> tags the day I looked.

The short version: structured data on news sites is in good shape. 50 of 50 sites resolved, and there were zero JSON syntax errors across the entire set. But "valid" and "complete" are not the same thing, and the gaps are where it gets interesting.

how the data was gathered

For each site I grabbed the homepage, picked one recent article, pulled the raw HTML, and extracted every application/ld+json block. Most sites went through Firecrawl. A handful that Firecrawl will not serve (The Atlantic, ZDNet) or that block bots at the homepage (SFGate, Houston Chronicle) were fetched directly with a browser user-agent. ABC News renders its homepage entirely in JavaScript, so that one was driven through a real headless browser (Playwright) and the JSON-LD was read from the rendered DOM.

One gotcha worth calling out: four sites first served me a live blog instead of a normal article (USA Today, Al Jazeera, HuffPost, and a Google News read-page). Live blogs use LiveBlogPosting and stuff every rolling update into a liveBlogUpdate array -- HuffPost's was 124 KB on its own. Those were swapped for standard articles so the comparison is apples-to-apples.

One caveat before the findings: this is a single article per site, captured on 2026-06-30. It is a snapshot, not a full-site audit. A site that nailed it on the article I grabbed could be inconsistent elsewhere, and vice versa. Treat the per-site calls below as "here is what this one page shipped," not a grade for the whole publication. Every raw block is linked in the table at the bottom so you can check my work.

what they (almost) all have in common

Most common types across all 50:

TypeSites
ImageObject48
Person (author)48
NewsArticle46
WebPage37
Organization31
BreadcrumbList22
NewsMediaOrganization20
SpeakableSpecification11

SpeakableSpecification (voice-assistant markup) on 11 sites is the surprise -- CNBC, TechCrunch, The Verge, Forbes and others still ship it even though Google deprecated the feature. It is harmless, but it is dead weight.

the leanest one that still does it right: BBC

BBC News was the standout for minimalism: 541 bytes of NewsArticle + Person + Organization and nothing else. No breadcrumbs, no speakable, no images-as-objects. Every field Google needs for an article rich result, and not one byte more.

{
  "@context": "https://schema.org",
  "@type": "NewsArticle",
  "headline": "How to make Google put preferred sources up top when you search",
  "description": "A new feature lets you choose which publishers – including the BBC – appear at the top of your search results. Here's how to use it.",
  "image": [
    "https://ychef.files.bbci.co.uk/1280x720/p0mxnd5b.jpg"
  ],
  "datePublished": "2026-01-28T17:30:00.000Z",
  "dateModified": "2026-01-28T20:27:00.263Z",
  "author": {
    "@type": "Person",
    "name": "BBC Staff",
    "url": ""
  },
  "publisher": {
    "@type": "Organization",
    "name": "BBC"
  }
}

If you ever want a template for "the minimum that is actually correct," this is close to it. Compare that to the biggest payloads below.

who's biggest, who's leanest

Biggest article payloads (single normal article, not live blogs):

SiteBytesNotes
Google News57,490aggregator read-page; mirrors source + extras
Wired26,352rich author/CreativeWork graph
NBC News24,582huge custom @context (see below)
Fox News24,362big site/org graph, thin article
Business Insider16,613
CNN16,390one dense NewsArticle block

Leanest that still do it right:

SiteBytes
BBC News541
Slate1,249
RealClearPolitics1,353
NPR1,653
New York Post1,842

gaps and quirks (the "who has errors" part)

Nobody shipped invalid JSON. But several articles were missing required fields or doing something unusual:

Two things that look like errors but are not:

Here is NBC's Dataset block. None of this is for search engines -- it is CMS plumbing wrapped in schema.org clothes:

{
  "@context": {
    "@vocab": "http://schema.org",
    "pageType": {
      "@id": "Text",
      "@type": "@id"
    },
    "vertical": {
      "@id": "Text",
      "@type": "@id"
    },
    "subVertical": {
      "@id": "Text",
      "@type": "@id"
    },
    "section": {
      "@id": "Text",
      "@type": "@id"
    },
    "subSection": {
      "@id": "Text",
      "@type": "@id"
    },
    "label": {
      "@id": "Text",
      "@type": "@id"
    },
    "packageId": {
      "@id": "Text",
      "@type": "@id"
    },
    "sponsor": {
      "@id": "Text",
      "@type": "@id"
    },
    "ecommerceEnabled": {
      "@id": "Text",
      "@type": "@id"
    },
    "videoPlayerCount": {
      "@id": "Text",
      "@type": "@id"
    },
    "appVersion": {
      "@id": "Text",
      "@type": "@id"
    },
    "tags": {
      "@id": "Text",
      "@type": "@id"
    },
    "gatedContentEnabled": {
      "@id": "Text",
      "@type": "@id"
    },
    "contentClassifications": {
      "@id": "Text",
      "@type": "@id"
    }
  },
  "@type": "Dataset",
  "name": "additionalTaxonomy",
  "description": "This is additional taxonomy that helps us with analytics",
  "url": "https://www.nbcnews.com/news/us-news/medicaid-disabilities-cost-trump-funding-cuts-big-beautiful-bill-rcna351975",
  "pageType": "Article",
  "vertical": "news",
  "subVertical": "",
  "section": "news",
  "subSection": "us-news",
  "label": "",
  "packageId": "",
  "sponsor": "",
  "ecommerceEnabled": false,
  "videoPlayerCount": 0,
  "appVersion": "5.819.0",
  "tags": "",
  "gatedContentEnabled": false,
  "contentClassifications": "UNRESTRICTED"
}

It validates. It is also a reminder that "has lots of JSON-LD" is not the same as "has good JSON-LD." Half of NBC's 24 KB is internal taxonomy no search engine asked for.

patterns by org type

the full set: all 50, with the raw JSON-LD

Every site below links to the exact JSON-LD blocks I pulled, pretty-printed as a file you can open or download. Byte counts are the raw extracted size; "blocks" is how many separate ld+json script tags were on the page.

#SiteBlocksBytesTop typesRaw JSON-LD
1Associated Press
apnews.com
38,947NewsArticle, ImageObject, WebPage, Person, Organizationview
2NPR
npr.org
11,653NewsArticle, Organization, ImageObject, WebPage, Personview
3CBS News
cbsnews.com
49,488NewsMediaOrganization, ImageObject, ContactPoint, PostalAddress, NewsArticleview
4ABC News
abcnews.go.com
31,822WebSite, WebPage, ImageObject, NewsArticle, Personview
5NBC News
nbcnews.com
224,582NewsArticle, @id, PropertyValue, ImageObject, Personview
6USA Today
usatoday.com
13,493NewsArticle, WebPage, ImageObject, Organization, Personview
7CNBC
cnbc.com
12,238NewsArticle, SpeakableSpecification, Person, NewsMediaOrganization, ImageObjectview
8Fox News
foxnews.com
124,362WebSite, SearchAction, NewsMediaOrganization, ImageObject, ContactPointview
9CNN
cnn.com
116,390NewsArticle, Person, ImageObject, Organization, WebPageview
10The Hill
thehill.com
12,669NewsArticle, WebPage, Organization, ImageObject, Personview
11Axios
axios.com
13,220NewsMediaOrganization, ImageObject, NewsArticle, WebPage, BreadcrumbListview
12PBS NewsHour
pbs.org/newshour
114,368VideoObject, NewsMediaOrganization, ImageObject, NewsArticle, Personview
13U.S. News & World Report
usnews.com
26,349NewsArticle, ImageObject, Person, WebPage, Organizationview
14Newsweek
newsweek.com
15,553NewsArticle, WebPage, ImageObject, Person, NewsMediaOrganizationview
15Time
time.com
12,297NewsArticle, WebPage, Person, ImageObject, Organizationview
16Politico
politico.com
24,723NewsArticle, WebPage, Person, NewsMediaOrganization, ImageObjectview
17RealClearPolitics
realclearpolitics.com
11,353OpinionNewsArticle, ImageObject, Organization, Personview
18Vox
vox.com
29,097NewsArticle, Organization, ImageObject, Person, SpeakableSpecificationview
19Salon
salon.com
13,076Organization, ImageObject, BreadcrumbList, ListItem, Thingview
20The Daily Beast
thedailybeast.com
12,097NewsArticle, Organization, ImageObject, Person, WebPageElementview
21Mother Jones
motherjones.com
00nonen/a
22Reason
reason.com
215,605Article, CommentAction, Thing, Person, Organizationview
23The Atlantic
theatlantic.com
33,944WebSite, SearchAction, Organization, ImageObject, QuantitativeValueview
24Slate
slate.com
11,249NewsArticle, ImageObject, Person, Organization, WebPageElementview
25Reuters
reuters.com
25,261NewsArticle, WebPage, Person, WebPageElement, NewsMediaOrganizationview
26BBC News
bbc.com/news
1541NewsArticle, Person, Organizationview
27Al Jazeera
aljazeera.com
43,171NewsArticle, Person, NewsMediaOrganization, ImageObject, WebPageview
28The Guardian US
theguardian.com/us
12,391NewsArticle, Organization, ImageObject, CreativeWork, Productview
29MarketWatch
marketwatch.com
13,690WebPage, ImageObject, NewsArticle, Person, WebPageElementview
30Fortune
fortune.com
11,888NewsArticle, @json, NewsMediaOrganization, ImageObject, Personview
31Business Insider
businessinsider.com
216,613NewsArticle, WebPage, Person, NewsMediaOrganization, ContactPointview
32Forbes
forbes.com
13,072WebPage, SpeakableSpecification, WebPageElement, BreadcrumbList, ListItemview
33TechCrunch
techcrunch.com
15,466NewsArticle, SpeakableSpecification, WebPage, ReadAction, ImageObjectview
34The Verge
theverge.com
24,017NewsArticle, Organization, ImageObject, Person, SpeakableSpecificationview
35Ars Technica
arstechnica.com
22,978WebSite, SearchAction, EntryPoint, Organization, ImageObjectview
36Engadget
engadget.com
26,134Article, WebPage, BreadcrumbList, ListItem, ImageObjectview
37Wired
wired.com
226,352NewsArticle, Person, CreativeWork, WebPage, Organizationview
38ZDNet
zdnet.com
17,038BreadcrumbList, ListItem, NewsArticle, Person, ImageObjectview
39Yahoo News
yahoo.com/news
22,801NewsArticle, Person, ImageObject, Organization, VideoObjectview
40Google News
news.google.com
557,490NewsArticle, ImageObject, Person, CreativeWork, Productview
41HuffPost
huffpost.com
15,058NewsMediaOrganization, ImageObject, ContactPoint, WebSite, WebPageview
42BuzzFeed News
buzzfeednews.com
35,620NewsArticle, Organization, ImageObject, Person, Commentview
43Daily Kos
dailykos.com
23,985WebSite, SearchAction, EntryPoint, Organization, WebPageview
44Mediaite
mediaite.com
16,911SiteNavigationElement, NewsArticle, WebPage, Person, ImageObjectview
45Raw Story
rawstory.com
23,519NewsArticle, Person, ImageObject, WebPage, Organizationview
46SFGate
sfgate.com
27,757NewsArticle, NewsMediaOrganization, ImageObject, Place, imageObjectview
47Houston Chronicle
chron.com
25,701NewsArticle, NewsMediaOrganization, ImageObject, Place, PostalAddressview
48NJ.com
nj.com
25,810NewsArticle, WebPage, WebPageElement, CreativeWork, Articleview
49New York Post
nypost.com
11,842NewsArticle, NewsMediaOrganization, ImageObject, PostalAddress, ContactPointview
50ProPublica
propublica.org
14,931Article, WebPage, ReadAction, ImageObject, BreadcrumbListview

how this was built

The scrape, extraction, and per-page classification all ran through the same JSON-LD audit tooling I wrote about in the JSON-LD audit skill. If you want to run this on your own pages -- or on a competitor's -- that skill reads what is already on a URL, parses it, and tells you what is complete, what is broken, and what is missing. Same engine, pointed at one site instead of 50.

Want to write any of these from scratch instead? The NewsArticle example and the generator are the place to start.