jsonld@web:/compare-json-ld-50-news-sites$ cat ./compare-json-ld-50-news-sites.jsonld

rich results · passed post · blog

> How the Top 50 US News Sites Handle Their Own JSON-LD

I scraped the JSON-LD off one article from each of the 50 biggest non-paywalled US news sites. 50/50 valid, zero syntax errors -- but real gaps: Mother Jones ships none, Fox has no author, Vox has no image. Full data and raw blocks for every site.

published: Jun 30, 2026
modified: Jun 30, 2026
size: 22.8K
path: /compare-json-ld-50-news-sites/

Every page on this site shows you JSON-LD that someone wrote on purpose. This post is the opposite: a look at what the big newsrooms actually ship, scraped straight off one recent article from each of 50 popular, non-paywalled US news sites. No cherry-picking, no cleanup. Just whatever was in the <script type="application/ld+json"> tags the day I looked.

The short version: structured data on news sites is in good shape. 50 of 50 sites resolved, and there were zero JSON syntax errors across the entire set. But "valid" and "complete" are not the same thing, and the gaps are where it gets interesting.

how the data was gathered

For each site I grabbed the homepage, picked one recent article, pulled the raw HTML, and extracted every application/ld+json block. Most sites went through Firecrawl. A handful that Firecrawl will not serve (The Atlantic, ZDNet) or that block bots at the homepage (SFGate, Houston Chronicle) were fetched directly with a browser user-agent. ABC News renders its homepage entirely in JavaScript, so that one was driven through a real headless browser (Playwright) and the JSON-LD was read from the rendered DOM.

One gotcha worth calling out: four sites first served me a live blog instead of a normal article (USA Today, Al Jazeera, HuffPost, and a Google News read-page). Live blogs use LiveBlogPosting and stuff every rolling update into a liveBlogUpdate array -- HuffPost's was 124 KB on its own. Those were swapped for standard articles so the comparison is apples-to-apples.

One caveat before the findings: this is a single article per site, captured on 2026-06-30. It is a snapshot, not a full-site audit. A site that nailed it on the article I grabbed could be inconsistent elsewhere, and vice versa. Treat the per-site calls below as "here is what this one page shipped," not a grade for the whole publication. Every raw block is linked in the table at the bottom so you can check my work.

what they (almost) all have in common

46 of 50 use NewsArticle as the main type. The rest: ProPublica, Reason, and Engadget use plain Article; RealClearPolitics uses OpinionNewsArticle; Mother Jones uses nothing.
Author (Person) and ImageObject appear on 48 of 50. Author and image markup are effectively universal.
A publisher/organization block on 49 of 50 -- either NewsMediaOrganization (the news-specific type, 20 sites) or generic Organization.
The required Google "Article" fields are nearly always present. Headline, image, datePublished, author, and publisher are complete on virtually every site.

Most common types across all 50:

Type	Sites
ImageObject	48
Person (author)	48
NewsArticle	46
WebPage	37
Organization	31
BreadcrumbList	22
NewsMediaOrganization	20
SpeakableSpecification	11

SpeakableSpecification (voice-assistant markup) on 11 sites is the surprise -- CNBC, TechCrunch, The Verge, Forbes and others still ship it even though Google deprecated the feature. It is harmless, but it is dead weight.

the leanest one that still does it right: BBC

BBC News was the standout for minimalism: 541 bytes of NewsArticle + Person + Organization and nothing else. No breadcrumbs, no speakable, no images-as-objects. Every field Google needs for an article rich result, and not one byte more.

{
  "@context": "https://schema.org",
  "@type": "NewsArticle",
  "headline": "How to make Google put preferred sources up top when you search",
  "description": "A new feature lets you choose which publishers – including the BBC – appear at the top of your search results. Here's how to use it.",
  "image": [
    "https://ychef.files.bbci.co.uk/1280x720/p0mxnd5b.jpg"
  ],
  "datePublished": "2026-01-28T17:30:00.000Z",
  "dateModified": "2026-01-28T20:27:00.263Z",
  "author": {
    "@type": "Person",
    "name": "BBC Staff",
    "url": ""
  },
  "publisher": {
    "@type": "Organization",
    "name": "BBC"
  }
}

If you ever want a template for "the minimum that is actually correct," this is close to it. Compare that to the biggest payloads below.

who's biggest, who's leanest

Biggest article payloads (single normal article, not live blogs):

Site	Bytes	Notes
Google News	57,490	aggregator read-page; mirrors source + extras
Wired	26,352	rich author/CreativeWork graph
NBC News	24,582	huge custom `@context` (see below)
Fox News	24,362	big site/org graph, thin article
Business Insider	16,613
CNN	16,390	one dense `NewsArticle` block

Leanest that still do it right:

Site	Bytes
BBC News	541
Slate	1,249
RealClearPolitics	1,353
NPR	1,653
New York Post	1,842

gaps and quirks (the "who has errors" part)

Nobody shipped invalid JSON. But several articles were missing required fields or doing something unusual:

Mother Jones -- no JSON-LD at all. The only site of the 50 with zero structured data on the article I pulled. For a WordPress site that is a one-plugin fix; surprising for a national outlet.
Fox News -- no author in the article schema. Headline, date, and publisher are there; the byline is not.
Vox -- no image on the NewsArticle. That is a required field for Google's article rich results, so it is a real miss.
Daily Kos -- no publisher. Another required field absent.
PBS NewsHour is video-first -- the page leads with a VideoObject and the NewsArticle is a thin wrapper, so most of the article metadata lives on the video entity.

Two things that look like errors but are not:

NBC News defines a large custom @context that types internal CMS fields as @id, then ships a second Dataset block literally labeled "additionalTaxonomy." It is valid JSON-LD 1.1. It is also their analytics taxonomy leaking into public structured data (see below).
Fortune declares a stnMeta term as @json in its context: ["https://schema.org", {"stnMeta": {"@id": "https://stnvideo.com", "@type": "@json"}}]. Also valid 1.1, just unusual.

Here is NBC's Dataset block. None of this is for search engines -- it is CMS plumbing wrapped in schema.org clothes:

{
  "@context": {
    "@vocab": "http://schema.org",
    "pageType": {
      "@id": "Text",
      "@type": "@id"
    },
    "vertical": {
      "@id": "Text",
      "@type": "@id"
    },
    "subVertical": {
      "@id": "Text",
      "@type": "@id"
    },
    "section": {
      "@id": "Text",
      "@type": "@id"
    },
    "subSection": {
      "@id": "Text",
      "@type": "@id"
    },
    "label": {
      "@id": "Text",
      "@type": "@id"
    },
    "packageId": {
      "@id": "Text",
      "@type": "@id"
    },
    "sponsor": {
      "@id": "Text",
      "@type": "@id"
    },
    "ecommerceEnabled": {
      "@id": "Text",
      "@type": "@id"
    },
    "videoPlayerCount": {
      "@id": "Text",
      "@type": "@id"
    },
    "appVersion": {
      "@id": "Text",
      "@type": "@id"
    },
    "tags": {
      "@id": "Text",
      "@type": "@id"
    },
    "gatedContentEnabled": {
      "@id": "Text",
      "@type": "@id"
    },
    "contentClassifications": {
      "@id": "Text",
      "@type": "@id"
    }
  },
  "@type": "Dataset",
  "name": "additionalTaxonomy",
  "description": "This is additional taxonomy that helps us with analytics",
  "url": "https://www.nbcnews.com/news/us-news/medicaid-disabilities-cost-trump-funding-cuts-big-beautiful-bill-rcna351975",
  "pageType": "Article",
  "vertical": "news",
  "subVertical": "",
  "section": "news",
  "subSection": "us-news",
  "label": "",
  "packageId": "",
  "sponsor": "",
  "ecommerceEnabled": false,
  "videoPlayerCount": 0,
  "appVersion": "5.819.0",
  "tags": "",
  "gatedContentEnabled": false,
  "contentClassifications": "UNRESTRICTED"
}

It validates. It is also a reminder that "has lots of JSON-LD" is not the same as "has good JSON-LD." Half of NBC's 24 KB is internal taxonomy no search engine asked for.

patterns by org type

Wire services / TV networks (AP, Reuters, NBC, CBS, ABC, CNN) -- consistently complete, often with the news-specific NewsMediaOrganization publisher type and breadcrumbs.
Tech press (Wired, Verge, TechCrunch, Ars, Engadget, ZDNet) -- heavy on Person, CreativeWork, breadcrumbs, and speakable; clearly SEO-tuned.
Hearst regionals (SFGate, Houston Chronicle) -- share a template: NewsArticle + NewsMediaOrganization + Place/PostalAddress (local geo signals).
Digital-native opinion (Mother Jones, Daily Kos, Salon) -- the weakest markup of the set; this is where the gaps cluster.

the full set: all 50, with the raw JSON-LD

Every site below links to the exact JSON-LD blocks I pulled, pretty-printed as a file you can open or download. Byte counts are the raw extracted size; "blocks" is how many separate ld+json script tags were on the page.

#	Site	Blocks	Bytes	Top types	Raw JSON-LD
1	Associated Press apnews.com	3	8,947	NewsArticle, ImageObject, WebPage, Person, Organization	view
2	NPR npr.org	1	1,653	NewsArticle, Organization, ImageObject, WebPage, Person	view
3	CBS News cbsnews.com	4	9,488	NewsMediaOrganization, ImageObject, ContactPoint, PostalAddress, NewsArticle	view
4	ABC News abcnews.go.com	3	1,822	WebSite, WebPage, ImageObject, NewsArticle, Person	view
5	NBC News nbcnews.com	2	24,582	NewsArticle, @id, PropertyValue, ImageObject, Person	view
6	USA Today usatoday.com	1	3,493	NewsArticle, WebPage, ImageObject, Organization, Person	view
7	CNBC cnbc.com	1	2,238	NewsArticle, SpeakableSpecification, Person, NewsMediaOrganization, ImageObject	view
8	Fox News foxnews.com	1	24,362	WebSite, SearchAction, NewsMediaOrganization, ImageObject, ContactPoint	view
9	CNN cnn.com	1	16,390	NewsArticle, Person, ImageObject, Organization, WebPage	view
10	The Hill thehill.com	1	2,669	NewsArticle, WebPage, Organization, ImageObject, Person	view
11	Axios axios.com	1	3,220	NewsMediaOrganization, ImageObject, NewsArticle, WebPage, BreadcrumbList	view
12	PBS NewsHour pbs.org/newshour	1	14,368	VideoObject, NewsMediaOrganization, ImageObject, NewsArticle, Person	view
13	U.S. News & World Report usnews.com	2	6,349	NewsArticle, ImageObject, Person, WebPage, Organization	view
14	Newsweek newsweek.com	1	5,553	NewsArticle, WebPage, ImageObject, Person, NewsMediaOrganization	view
15	Time time.com	1	2,297	NewsArticle, WebPage, Person, ImageObject, Organization	view
16	Politico politico.com	2	4,723	NewsArticle, WebPage, Person, NewsMediaOrganization, ImageObject	view
17	RealClearPolitics realclearpolitics.com	1	1,353	OpinionNewsArticle, ImageObject, Organization, Person	view
18	Vox vox.com	2	9,097	NewsArticle, Organization, ImageObject, Person, SpeakableSpecification	view
19	Salon salon.com	1	3,076	Organization, ImageObject, BreadcrumbList, ListItem, Thing	view
20	The Daily Beast thedailybeast.com	1	2,097	NewsArticle, Organization, ImageObject, Person, WebPageElement	view
21	Mother Jones motherjones.com	0	0	none	n/a
22	Reason reason.com	2	15,605	Article, CommentAction, Thing, Person, Organization	view
23	The Atlantic theatlantic.com	3	3,944	WebSite, SearchAction, Organization, ImageObject, QuantitativeValue	view
24	Slate slate.com	1	1,249	NewsArticle, ImageObject, Person, Organization, WebPageElement	view
25	Reuters reuters.com	2	5,261	NewsArticle, WebPage, Person, WebPageElement, NewsMediaOrganization	view
26	BBC News bbc.com/news	1	541	NewsArticle, Person, Organization	view
27	Al Jazeera aljazeera.com	4	3,171	NewsArticle, Person, NewsMediaOrganization, ImageObject, WebPage	view
28	The Guardian US theguardian.com/us	1	2,391	NewsArticle, Organization, ImageObject, CreativeWork, Product	view
29	MarketWatch marketwatch.com	1	3,690	WebPage, ImageObject, NewsArticle, Person, WebPageElement	view
30	Fortune fortune.com	1	1,888	NewsArticle, @json, NewsMediaOrganization, ImageObject, Person	view
31	Business Insider businessinsider.com	2	16,613	NewsArticle, WebPage, Person, NewsMediaOrganization, ContactPoint	view
32	Forbes forbes.com	1	3,072	WebPage, SpeakableSpecification, WebPageElement, BreadcrumbList, ListItem	view
33	TechCrunch techcrunch.com	1	5,466	NewsArticle, SpeakableSpecification, WebPage, ReadAction, ImageObject	view
34	The Verge theverge.com	2	4,017	NewsArticle, Organization, ImageObject, Person, SpeakableSpecification	view
35	Ars Technica arstechnica.com	2	2,978	WebSite, SearchAction, EntryPoint, Organization, ImageObject	view
36	Engadget engadget.com	2	6,134	Article, WebPage, BreadcrumbList, ListItem, ImageObject	view
37	Wired wired.com	2	26,352	NewsArticle, Person, CreativeWork, WebPage, Organization	view
38	ZDNet zdnet.com	1	7,038	BreadcrumbList, ListItem, NewsArticle, Person, ImageObject	view
39	Yahoo News yahoo.com/news	2	2,801	NewsArticle, Person, ImageObject, Organization, VideoObject	view
40	Google News news.google.com	5	57,490	NewsArticle, ImageObject, Person, CreativeWork, Product	view
41	HuffPost huffpost.com	1	5,058	NewsMediaOrganization, ImageObject, ContactPoint, WebSite, WebPage	view
42	BuzzFeed News buzzfeednews.com	3	5,620	NewsArticle, Organization, ImageObject, Person, Comment	view
43	Daily Kos dailykos.com	2	3,985	WebSite, SearchAction, EntryPoint, Organization, WebPage	view
44	Mediaite mediaite.com	1	6,911	SiteNavigationElement, NewsArticle, WebPage, Person, ImageObject	view
45	Raw Story rawstory.com	2	3,519	NewsArticle, Person, ImageObject, WebPage, Organization	view
46	SFGate sfgate.com	2	7,757	NewsArticle, NewsMediaOrganization, ImageObject, Place, imageObject	view
47	Houston Chronicle chron.com	2	5,701	NewsArticle, NewsMediaOrganization, ImageObject, Place, PostalAddress	view
48	NJ.com nj.com	2	5,810	NewsArticle, WebPage, WebPageElement, CreativeWork, Article	view
49	New York Post nypost.com	1	1,842	NewsArticle, NewsMediaOrganization, ImageObject, PostalAddress, ContactPoint	view
50	ProPublica propublica.org	1	4,931	Article, WebPage, ReadAction, ImageObject, BreadcrumbList	view

how this was built

The scrape, extraction, and per-page classification all ran through the same JSON-LD audit tooling I wrote about in the JSON-LD audit skill. If you want to run this on your own pages -- or on a competitor's -- that skill reads what is already on a URL, parses it, and tells you what is complete, what is broken, and what is missing. Same engine, pointed at one site instead of 50.

Want to write any of these from scratch instead? The NewsArticle example and the generator are the place to start.