> How the Top 50 US News Sites Handle Their Own JSON-LD
I scraped the JSON-LD off one article from each of the 50 biggest non-paywalled US news sites. 50/50 valid, zero syntax errors -- but real gaps: Mother Jones ships none, Fox has no author, Vox has no image. Full data and raw blocks for every site.
- published
- modified
- size
- 22.8K
- path
- /compare-json-ld-50-news-sites/
Every page on this site shows you JSON-LD that someone wrote on purpose. This post is the opposite: a look at what the big newsrooms actually ship, scraped straight off one recent article from each of 50 popular, non-paywalled US news sites. No cherry-picking, no cleanup. Just whatever was in the <script type="application/ld+json"> tags the day I looked.
The short version: structured data on news sites is in good shape. 50 of 50 sites resolved, and there were zero JSON syntax errors across the entire set. But "valid" and "complete" are not the same thing, and the gaps are where it gets interesting.
how the data was gathered
For each site I grabbed the homepage, picked one recent article, pulled the raw HTML, and extracted every application/ld+json block. Most sites went through Firecrawl. A handful that Firecrawl will not serve (The Atlantic, ZDNet) or that block bots at the homepage (SFGate, Houston Chronicle) were fetched directly with a browser user-agent. ABC News renders its homepage entirely in JavaScript, so that one was driven through a real headless browser (Playwright) and the JSON-LD was read from the rendered DOM.
One gotcha worth calling out: four sites first served me a live blog instead of a normal article (USA Today, Al Jazeera, HuffPost, and a Google News read-page). Live blogs use LiveBlogPosting and stuff every rolling update into a liveBlogUpdate array -- HuffPost's was 124 KB on its own. Those were swapped for standard articles so the comparison is apples-to-apples.
One caveat before the findings: this is a single article per site, captured on 2026-06-30. It is a snapshot, not a full-site audit. A site that nailed it on the article I grabbed could be inconsistent elsewhere, and vice versa. Treat the per-site calls below as "here is what this one page shipped," not a grade for the whole publication. Every raw block is linked in the table at the bottom so you can check my work.
what they (almost) all have in common
- 46 of 50 use
NewsArticleas the main type. The rest: ProPublica, Reason, and Engadget use plainArticle; RealClearPolitics usesOpinionNewsArticle; Mother Jones uses nothing. - Author (
Person) andImageObjectappear on 48 of 50. Author and image markup are effectively universal. - A publisher/organization block on 49 of 50 -- either
NewsMediaOrganization(the news-specific type, 20 sites) or genericOrganization. - The required Google "Article" fields are nearly always present. Headline, image, datePublished, author, and publisher are complete on virtually every site.
Most common types across all 50:
| Type | Sites |
|---|---|
| ImageObject | 48 |
| Person (author) | 48 |
| NewsArticle | 46 |
| WebPage | 37 |
| Organization | 31 |
| BreadcrumbList | 22 |
| NewsMediaOrganization | 20 |
| SpeakableSpecification | 11 |
SpeakableSpecification (voice-assistant markup) on 11 sites is the surprise -- CNBC, TechCrunch, The Verge, Forbes and others still ship it even though Google deprecated the feature. It is harmless, but it is dead weight.
the leanest one that still does it right: BBC
BBC News was the standout for minimalism: 541 bytes of NewsArticle + Person + Organization and nothing else. No breadcrumbs, no speakable, no images-as-objects. Every field Google needs for an article rich result, and not one byte more.
{
"@context": "https://schema.org",
"@type": "NewsArticle",
"headline": "How to make Google put preferred sources up top when you search",
"description": "A new feature lets you choose which publishers – including the BBC – appear at the top of your search results. Here's how to use it.",
"image": [
"https://ychef.files.bbci.co.uk/1280x720/p0mxnd5b.jpg"
],
"datePublished": "2026-01-28T17:30:00.000Z",
"dateModified": "2026-01-28T20:27:00.263Z",
"author": {
"@type": "Person",
"name": "BBC Staff",
"url": ""
},
"publisher": {
"@type": "Organization",
"name": "BBC"
}
}
If you ever want a template for "the minimum that is actually correct," this is close to it. Compare that to the biggest payloads below.
who's biggest, who's leanest
Biggest article payloads (single normal article, not live blogs):
| Site | Bytes | Notes |
|---|---|---|
| Google News | 57,490 | aggregator read-page; mirrors source + extras |
| Wired | 26,352 | rich author/CreativeWork graph |
| NBC News | 24,582 | huge custom @context (see below) |
| Fox News | 24,362 | big site/org graph, thin article |
| Business Insider | 16,613 | |
| CNN | 16,390 | one dense NewsArticle block |
Leanest that still do it right:
| Site | Bytes |
|---|---|
| BBC News | 541 |
| Slate | 1,249 |
| RealClearPolitics | 1,353 |
| NPR | 1,653 |
| New York Post | 1,842 |
gaps and quirks (the "who has errors" part)
Nobody shipped invalid JSON. But several articles were missing required fields or doing something unusual:
- Mother Jones -- no JSON-LD at all. The only site of the 50 with zero structured data on the article I pulled. For a WordPress site that is a one-plugin fix; surprising for a national outlet.
- Fox News -- no author in the article schema. Headline, date, and publisher are there; the byline is not.
- Vox -- no
imageon theNewsArticle. That is a required field for Google's article rich results, so it is a real miss. - Daily Kos -- no
publisher. Another required field absent. - PBS NewsHour is video-first -- the page leads with a
VideoObjectand theNewsArticleis a thin wrapper, so most of the article metadata lives on the video entity.
Two things that look like errors but are not:
- NBC News defines a large custom
@contextthat types internal CMS fields as@id, then ships a secondDatasetblock literally labeled "additionalTaxonomy." It is valid JSON-LD 1.1. It is also their analytics taxonomy leaking into public structured data (see below). - Fortune declares a
stnMetaterm as@jsonin its context:["https://schema.org", {"stnMeta": {"@id": "https://stnvideo.com", "@type": "@json"}}]. Also valid 1.1, just unusual.
Here is NBC's Dataset block. None of this is for search engines -- it is CMS plumbing wrapped in schema.org clothes:
{
"@context": {
"@vocab": "http://schema.org",
"pageType": {
"@id": "Text",
"@type": "@id"
},
"vertical": {
"@id": "Text",
"@type": "@id"
},
"subVertical": {
"@id": "Text",
"@type": "@id"
},
"section": {
"@id": "Text",
"@type": "@id"
},
"subSection": {
"@id": "Text",
"@type": "@id"
},
"label": {
"@id": "Text",
"@type": "@id"
},
"packageId": {
"@id": "Text",
"@type": "@id"
},
"sponsor": {
"@id": "Text",
"@type": "@id"
},
"ecommerceEnabled": {
"@id": "Text",
"@type": "@id"
},
"videoPlayerCount": {
"@id": "Text",
"@type": "@id"
},
"appVersion": {
"@id": "Text",
"@type": "@id"
},
"tags": {
"@id": "Text",
"@type": "@id"
},
"gatedContentEnabled": {
"@id": "Text",
"@type": "@id"
},
"contentClassifications": {
"@id": "Text",
"@type": "@id"
}
},
"@type": "Dataset",
"name": "additionalTaxonomy",
"description": "This is additional taxonomy that helps us with analytics",
"url": "https://www.nbcnews.com/news/us-news/medicaid-disabilities-cost-trump-funding-cuts-big-beautiful-bill-rcna351975",
"pageType": "Article",
"vertical": "news",
"subVertical": "",
"section": "news",
"subSection": "us-news",
"label": "",
"packageId": "",
"sponsor": "",
"ecommerceEnabled": false,
"videoPlayerCount": 0,
"appVersion": "5.819.0",
"tags": "",
"gatedContentEnabled": false,
"contentClassifications": "UNRESTRICTED"
}
It validates. It is also a reminder that "has lots of JSON-LD" is not the same as "has good JSON-LD." Half of NBC's 24 KB is internal taxonomy no search engine asked for.
patterns by org type
- Wire services / TV networks (AP, Reuters, NBC, CBS, ABC, CNN) -- consistently complete, often with the news-specific
NewsMediaOrganizationpublisher type and breadcrumbs. - Tech press (Wired, Verge, TechCrunch, Ars, Engadget, ZDNet) -- heavy on
Person,CreativeWork, breadcrumbs, and speakable; clearly SEO-tuned. - Hearst regionals (SFGate, Houston Chronicle) -- share a template:
NewsArticle+NewsMediaOrganization+Place/PostalAddress(local geo signals). - Digital-native opinion (Mother Jones, Daily Kos, Salon) -- the weakest markup of the set; this is where the gaps cluster.
the full set: all 50, with the raw JSON-LD
Every site below links to the exact JSON-LD blocks I pulled, pretty-printed as a file you can open or download. Byte counts are the raw extracted size; "blocks" is how many separate ld+json script tags were on the page.
| # | Site | Blocks | Bytes | Top types | Raw JSON-LD |
|---|---|---|---|---|---|
| 1 | Associated Press apnews.com | 3 | 8,947 | NewsArticle, ImageObject, WebPage, Person, Organization | view |
| 2 | NPR npr.org | 1 | 1,653 | NewsArticle, Organization, ImageObject, WebPage, Person | view |
| 3 | CBS News cbsnews.com | 4 | 9,488 | NewsMediaOrganization, ImageObject, ContactPoint, PostalAddress, NewsArticle | view |
| 4 | ABC News abcnews.go.com | 3 | 1,822 | WebSite, WebPage, ImageObject, NewsArticle, Person | view |
| 5 | NBC News nbcnews.com | 2 | 24,582 | NewsArticle, @id, PropertyValue, ImageObject, Person | view |
| 6 | USA Today usatoday.com | 1 | 3,493 | NewsArticle, WebPage, ImageObject, Organization, Person | view |
| 7 | CNBC cnbc.com | 1 | 2,238 | NewsArticle, SpeakableSpecification, Person, NewsMediaOrganization, ImageObject | view |
| 8 | Fox News foxnews.com | 1 | 24,362 | WebSite, SearchAction, NewsMediaOrganization, ImageObject, ContactPoint | view |
| 9 | CNN cnn.com | 1 | 16,390 | NewsArticle, Person, ImageObject, Organization, WebPage | view |
| 10 | The Hill thehill.com | 1 | 2,669 | NewsArticle, WebPage, Organization, ImageObject, Person | view |
| 11 | Axios axios.com | 1 | 3,220 | NewsMediaOrganization, ImageObject, NewsArticle, WebPage, BreadcrumbList | view |
| 12 | PBS NewsHour pbs.org/newshour | 1 | 14,368 | VideoObject, NewsMediaOrganization, ImageObject, NewsArticle, Person | view |
| 13 | U.S. News & World Report usnews.com | 2 | 6,349 | NewsArticle, ImageObject, Person, WebPage, Organization | view |
| 14 | Newsweek newsweek.com | 1 | 5,553 | NewsArticle, WebPage, ImageObject, Person, NewsMediaOrganization | view |
| 15 | Time time.com | 1 | 2,297 | NewsArticle, WebPage, Person, ImageObject, Organization | view |
| 16 | Politico politico.com | 2 | 4,723 | NewsArticle, WebPage, Person, NewsMediaOrganization, ImageObject | view |
| 17 | RealClearPolitics realclearpolitics.com | 1 | 1,353 | OpinionNewsArticle, ImageObject, Organization, Person | view |
| 18 | Vox vox.com | 2 | 9,097 | NewsArticle, Organization, ImageObject, Person, SpeakableSpecification | view |
| 19 | Salon salon.com | 1 | 3,076 | Organization, ImageObject, BreadcrumbList, ListItem, Thing | view |
| 20 | The Daily Beast thedailybeast.com | 1 | 2,097 | NewsArticle, Organization, ImageObject, Person, WebPageElement | view |
| 21 | Mother Jones motherjones.com | 0 | 0 | none | n/a |
| 22 | Reason reason.com | 2 | 15,605 | Article, CommentAction, Thing, Person, Organization | view |
| 23 | The Atlantic theatlantic.com | 3 | 3,944 | WebSite, SearchAction, Organization, ImageObject, QuantitativeValue | view |
| 24 | Slate slate.com | 1 | 1,249 | NewsArticle, ImageObject, Person, Organization, WebPageElement | view |
| 25 | Reuters reuters.com | 2 | 5,261 | NewsArticle, WebPage, Person, WebPageElement, NewsMediaOrganization | view |
| 26 | BBC News bbc.com/news | 1 | 541 | NewsArticle, Person, Organization | view |
| 27 | Al Jazeera aljazeera.com | 4 | 3,171 | NewsArticle, Person, NewsMediaOrganization, ImageObject, WebPage | view |
| 28 | The Guardian US theguardian.com/us | 1 | 2,391 | NewsArticle, Organization, ImageObject, CreativeWork, Product | view |
| 29 | MarketWatch marketwatch.com | 1 | 3,690 | WebPage, ImageObject, NewsArticle, Person, WebPageElement | view |
| 30 | Fortune fortune.com | 1 | 1,888 | NewsArticle, @json, NewsMediaOrganization, ImageObject, Person | view |
| 31 | Business Insider businessinsider.com | 2 | 16,613 | NewsArticle, WebPage, Person, NewsMediaOrganization, ContactPoint | view |
| 32 | Forbes forbes.com | 1 | 3,072 | WebPage, SpeakableSpecification, WebPageElement, BreadcrumbList, ListItem | view |
| 33 | TechCrunch techcrunch.com | 1 | 5,466 | NewsArticle, SpeakableSpecification, WebPage, ReadAction, ImageObject | view |
| 34 | The Verge theverge.com | 2 | 4,017 | NewsArticle, Organization, ImageObject, Person, SpeakableSpecification | view |
| 35 | Ars Technica arstechnica.com | 2 | 2,978 | WebSite, SearchAction, EntryPoint, Organization, ImageObject | view |
| 36 | Engadget engadget.com | 2 | 6,134 | Article, WebPage, BreadcrumbList, ListItem, ImageObject | view |
| 37 | Wired wired.com | 2 | 26,352 | NewsArticle, Person, CreativeWork, WebPage, Organization | view |
| 38 | ZDNet zdnet.com | 1 | 7,038 | BreadcrumbList, ListItem, NewsArticle, Person, ImageObject | view |
| 39 | Yahoo News yahoo.com/news | 2 | 2,801 | NewsArticle, Person, ImageObject, Organization, VideoObject | view |
| 40 | Google News news.google.com | 5 | 57,490 | NewsArticle, ImageObject, Person, CreativeWork, Product | view |
| 41 | HuffPost huffpost.com | 1 | 5,058 | NewsMediaOrganization, ImageObject, ContactPoint, WebSite, WebPage | view |
| 42 | BuzzFeed News buzzfeednews.com | 3 | 5,620 | NewsArticle, Organization, ImageObject, Person, Comment | view |
| 43 | Daily Kos dailykos.com | 2 | 3,985 | WebSite, SearchAction, EntryPoint, Organization, WebPage | view |
| 44 | Mediaite mediaite.com | 1 | 6,911 | SiteNavigationElement, NewsArticle, WebPage, Person, ImageObject | view |
| 45 | Raw Story rawstory.com | 2 | 3,519 | NewsArticle, Person, ImageObject, WebPage, Organization | view |
| 46 | SFGate sfgate.com | 2 | 7,757 | NewsArticle, NewsMediaOrganization, ImageObject, Place, imageObject | view |
| 47 | Houston Chronicle chron.com | 2 | 5,701 | NewsArticle, NewsMediaOrganization, ImageObject, Place, PostalAddress | view |
| 48 | NJ.com nj.com | 2 | 5,810 | NewsArticle, WebPage, WebPageElement, CreativeWork, Article | view |
| 49 | New York Post nypost.com | 1 | 1,842 | NewsArticle, NewsMediaOrganization, ImageObject, PostalAddress, ContactPoint | view |
| 50 | ProPublica propublica.org | 1 | 4,931 | Article, WebPage, ReadAction, ImageObject, BreadcrumbList | view |
how this was built
The scrape, extraction, and per-page classification all ran through the same JSON-LD audit tooling I wrote about in the JSON-LD audit skill. If you want to run this on your own pages -- or on a competitor's -- that skill reads what is already on a URL, parses it, and tells you what is complete, what is broken, and what is missing. Same engine, pointed at one site instead of 50.
Want to write any of these from scratch instead? The NewsArticle example and the generator are the place to start.