SUN vs NotebookLM and Spotify+Anthropic

AI can generate a podcast in seconds. But generating one a person actually wants to listen to is a much harder problem, and it's the one that matters for commute, errand, and everyday listening.
To find out where SUN really stands, we ran a large fair head-to-head of AI-generated podcasts: 600 blind listener-preference comparisons across 100 topics, pitting SUN's two pipelines against the strongest publicly available competitors: Google's NotebookLM and the Save-to-Spotify pipeline running on Anthropic's flagship Claude Opus 4.7 model.
Every script was judged blind, head-to-head, by a state-of-the-art language model acting as a listener. Names of the products were never shown. The only question: which of these two anonymized podcasts would you keep listening to?
The headline result
Sun Course finished first by every metric we measured.
Pairwise listener-preference comparisons across variants, judged by OpenAI GPT-5.5 (reasoning effort: high).

Bradley-Terry scores across the four pipelines, with 95% confidence intervals. Higher is better. Sun Course leads decisively; Save-to-Spotify is anchored at the bottom.
Sun Course beat NotebookLM 68 to 32, and Save-to-Spotify 79 to 21. Both results are far beyond what could happen by chance, even after the strictest statistical correction for testing multiple comparisons.
Sun Podcast, SUN's podcast-style pipeline, finished second in the overall ranking and stayed directionally ahead of both external competitors, though its margins were smaller and would need a larger sample to confirm statistically.
Two out of three matchups go to SUN
Looking across every matchup between a SUN pipeline and an external competitor, that's 400 blind listener-preference judgments in total, SUN won 255 of them. A 63.7% aggregate win rate against the best AI audio products publicly available today.

Across all SUN-vs-external matchups, SUN pipelines won 255 of 400.
Why does this happen?
The win isn't from sounding fancier or packing in more facts. When we look at what separates the scripts listeners stay with from the ones they drift away from, it tracks how listenable a script is, how clearly its ideas are sequenced and oriented for the ear, far more than how much it manages to say.
That reflects how we think about the problem. A course isn't meant to be read; it's meant to be heard. We studied what makes people finish a podcast, and what makes them quietly drift away somewhere in the middle, and we built that understanding into how every Sun Course is made, from the foundation up, not as a finishing pass.
The result is audio that's genuinely easier to follow, from the moment you press play to the moment it ends. You stop working to keep up, and start to actually listen.
A great course doesn't ask more of you. It asks less.
How the test was run
- 100 topics, balanced across nine content categories and four target durations (5, 15, 30, 45 minutes).
- Four pipelines: Sun Course and Sun Podcast (SUN's two production pipelines), NotebookLM (Google), and Save-to-Spotify on Claude Opus 4.7 (Anthropic).
- 600 blind pairwise comparisons: every pipeline against every other on every topic. No cached judgments were reused.
- The judge was a state-of-the-art language model acting as a blinded listener. It saw only "Podcast A" and "Podcast B," never the product names.
- Statistical inference used exact two-sided sign tests on the matched-topic outcomes, with Bonferroni correction for multiple comparisons.
What's next
We will follow up with a human listening panel to confirm the AI-judge result. We will publish those results when they are ready.
© SUN · 2026-05-20 · sun.ai