How Do LLM Citations Work? (And Why Care for GEO/AEO)
Current GEO strategies tend to focus on optimizing for citations. But do we actually understand how those work?
Generative AI platforms like ChatGPT, Claude, and Gemini often provide references for users to click and find more information.
These links are called citations, and they may appear in-line within the answer and in a separate panel, often to the right of the answer itself.
Generative Search Optimization (GEO) services often prioritize “citation optimization,” but not many businesses understand how those AI citations work.
Understanding the algorithm behind being cited is key to creating an effective GEO strategy. But first a couple of free and unique SEO+AI education opportunities to join this week:
Please JOIN US to discuss this topic LIVE on Linkedin (and bring your questions!)
Please ask your questions in this week’s AMA on Reddit hosted by SE Ranking rep Bogdan Krupin!
Here’s what we know about AI citations:
Where do LLMs find citations?
There is no official data answering this question. From exploring overlaps between citation visibility and search rankings, as well as by using independent tests, so far, we know that:
ChatGPT*, Gemini (and AI Mode), and Grok use Google search
Claude and Perplexity use Brave search
ChatGPT* also has long-term agreements with top publications which it can cite without searching any external engines.
Maintaining high rankings in both of those search engines increases your site to be cited.
What types of LLM citations are there?
Again, we don’t have any official statements as to how AI citations work but from patents and independent studies, we know about these three types of AI citations:
1. Grounded citations
These are AI citations that influence the answer itself, i.e., LLMs run searches, “read” the content of pages they found, and sync the answer from those pages.
In this case, if found and cited, your content becomes part of an AI answer.
2. Ungrounded (reverse) citations
I refer to these types of citations as “reverse citations” because the process is reversed here: an LLM pulls an answer from its training data (it already “knows” the answer) and then finds URLs that support this answer.
This process likely exists to keep the AI answers more accurate and unbiased.
In this case, citations do not influence the answer content. To be surfaced in an answer, your business needs to be known as a solution to an underlying problem or an answer to a prompt.
We don’t know how many citations are reverse, or how likely each specific LLM is to give answers before searching. The recent NY Times article indicates that half of Gemini citations are ungrounded, but it doesn’t cite the source of that data.
3. Invisible citations
In many cases, URLs are found, retrieved, “read” by an AI agent, and used to create an answer, but they never get cited.
The recent study from Ahrefs indicates that about half of URLs retrieved by ChatGPT remain uncited, but invisible citations can differ based on the source. For example, Reddit threads almost always impact an answer, but very few of them are cited.
4. “Ghost” citations
This is when getting cited didn’t help your brand appear in the answer. This can be due to the “reverse citation” process. This can also happen because your own content that got mentioned didn’t do a good job explaining how your business or your product helps solve the underlying problem, i.e., your brand wasn’t part of your own content context.
According to Kevin Indig, about 60% of citations are “ghost citations”, failing to get the brand included in the answer:
What are the actionable applications?
Knowing how LLM citations work can help you prioritize your GEO strategy better, as well as create content that helps your business’s AI visibility:
Being cited is not a priority
Many GEO strategies focus on the number of “owned” citations for important prompts, i.e., they measure the strategy’s success by how many times your own site is cited. Knowing that citations may not influence an answer or, vice versa, that a URL can be used to create an answer but never cited, helps shift the priority from getting cited to being visible in the answer.
Optimizing for AI citations is still a good idea, but your content strategy should focus on putting your products in context. This way, even if your URL isn’t cited, the answer will still include your product.
Being present in the training data is fundamental
LLMs may or may not search to give an answer. They may search after they have already created an answer, or they may or may not use a page’s content for an answer. Being associated with specific questions in the prompts is the only way to ensure consistent visibility in AI answers.
The training data often defines how LLMs search too. For example, if you prompt ChatGPT, “Best shoes for long-distance traveling”, it will search for specific brands and product names it already knows:
Your GEO strategy should be all about getting cited. It should prioritize building a brand that is known for solving specific problems.






Thanks for the plug :)
People need to learn how to talk to machines to get cited!