How does ChatGPT decide what to cite?
ChatGPT blends three sources: its training data (weighted by how often and how prominently a brand appears), a licensed Reddit feed that biases it toward high-karma threads, and a live browsing tool that pulls top-ranking Google pages for fresh queries. Volume and authority across those three decides who gets named.
There are three layers. The training corpus is the biggest — anything published on the indexed web before the model's cutoff date is fair game, weighted heavily toward frequency (brands mentioned in thousands of contexts get cited more) and authority (news and editorial beats random blogs).
The second layer is the Reddit data licensing deal, in force since 2024. ChatGPT disproportionately pulls from threads with high scores and long comment chains — which is why Reddit SEO has become the single most lopsided lever in AI visibility right now.
The third layer is the browsing tool for real-time queries. For anything that needs fresh data, ChatGPT runs a live search (typically Bing) and cites the top-ranked results. So traditional SEO still matters — just for a smaller subset of queries than before.