SEO isn’t dead—it’s evolving. And understanding how AI models think about your brand is quickly becoming just as important as what you rank for in traditional search.
My guest today, Dan Petrovic, is the managing director of DEJAN, a machine learning specialist, and one of the most forward-thinking voices in modern SEO. Dan has been pushing the boundaries of the field for decades—and famously predicted back in 2013 that we’d be chatting with Google by 2023. He wasn’t wrong.
In this conversation, Dan breaks down the mechanics of AI visibility, including what he calls primary bias—the internal leanings models hold about brands before a single search result appears. Dan demystifies the difference between model memory and grounded search results, and why that distinction matters more than most marketers realize. He explains how geographic signals can undermine your global AI visibility, and how one client fixed the issue with a simple address change. We also dig into practical tools—including TreeWalker and AIRank—that expose how models actually perceive your brand.
If you’ve been optimizing for search engines, this episode shows you how to optimize for the AI systems now standing between your content and your audience.
In This Episode
- [07:05] – Dan Petrovic mentions his agency’s development of new tools and methods, and the positive reception from the SEO community.
- [13:20] – Dan describes the process of tracking model visibility and the importance of understanding the associations made by AI models.
- [29:17] – Dan emphasizes the importance of understanding the primary biases and decision-making processes of AI models.
- [31:42] – Dan shares a case study of a client whose brand was not recommended by AI models due to its European origins and the need to Americanize the brand.
- [34:18] – Dan emphasizes the importance of understanding the model’s internal perception and making necessary adjustments to improve visibility.
- [41:03] – Dan advises on optimizing content for both human readers and AI models, ensuring that the information is clear and self-contained.
- [56:53] – Dan discusses the potential impact of AI on traditional websites and the emergence of new interfaces for user interactions.
Dan, it’s so great to have you on the show.
Thank you very much for inviting me. I’m excited to share some thoughts with you.
Awesome. Well, I’m excited to hear them and to discuss the rapidly changing world of SEO and AI and how this is causing you to pivot your agency and all your clients.
So let’s first start with your origin story. You’ve been in this industry for a long time. I think decades for sure. I don’t know exactly how many decades, but it’s been a while. And I actually think I knew you when I was still based in New Zealand, which was like two thousand, and now I’m in Israel. So yeah, let’s go through your origin story.
It’s hard to say, yeah, like I remember your book. I think there was a mention. That’s how we cross paths. I think you mentioned something that I did a long time ago in your book. And I still have it somewhere on my shelf, with your signature. How nice is that?
So yeah, we go way back. We’ve seen everything that’s happened in SEO since basically the start of Google, even before Google, before all the search engines’ transformation. So we’ve seen evolution. We’ve had that benefit as sort of senior members of the industry, and we’ve seen things come and go and new things arrive. I think my agency has all evolved from I guess, reputation and authority building, we kind of found our moment in link building, primarily because I had a very large link network early days, like maybe 2000, 2001, 2002, 1000 domain private link network, properly private that nobody knew, didn’t know about it and made a ton of money that way, but I see sustainability in such practices.

So I decided to start an agency, but I started losing money. And this was a bit of a learning curve, now, going too big, hiring a lot of people, having lots of offices, having business lessons. So we’ve settled on a configuration that puts us almost not quite an agency, but a collective of senior SEO consultants working together on very large, complex projects. So that’s the current configuration. It’s really a long story short because this episode, suppose, is not about our history, but it’s about where we’re at now.
Yeah, and where we’re going.
Yeah, exactly. And look, the way things were configured internally, we closed down all the offices, we discouraged flying.
We encouraged meetings to be via video. We were like early adopters of all that. We adopted Skype day one, you know, saying, “Let’s not fly.” And then carbon emissions became a thing. And then COVID came and just sealed the deal. Basically, it was okay if your kid came into the meeting and was just in the background. You know, before that was a scandal.
So we basically discourage people from commuting to the offices, you know, so we can work from home if we want to work from home. And my guys can work from a service office if they want that experience, if they can’t focus on it. So it’s complete freedom.
And so we’ve set it up so people have the time to think about things rather than being stuck in traffic or, you know, this and that. So I think serendipity, that actually, when COVID happened, it sort of normalized all these practices we were already having in place. And everything we were doing became normal, and we’re like, “yep, this is great. The world’s finally catching up with all these ideas.”
AI Models are savant geniuses, but like a child, super intelligent, but very gullible and naive. Share on XBut so in 2013, going a little bit back in history, 2013, I think it’s by Bryan Fishkin. I always like to predict things. And I was like, “Let’s make a bold prediction. And I said, “Well, we’ll chat to Google in 2023. We’ll send it on a task, it’ll come back, pick up some information, do research, and summarize things for us.” And one reader of the blog post I published said, “What do you mean? Google will be like a chatbot?” And I’m like, “Yeah, that’s exactly what it’ll be like.” And then 2023 is like, “Yeah, man.”
Because of the exact prediction, I like to brag about that wherever I get a chance. A lot of the predictions I made over the years were pretty accurate, but not the exact timing. So we’ve been preparing for what’s coming for quite a while. And obviously, the bottlenecks were actually, believe it or not, natural language processing platforms like Quill and all the others; they wouldn’t talk to us. We were too small. They were dealing with large corporations, and they had to spend hundreds of thousands of dollars and millions of dollars to become a client. And they’re all like snobs, you know? So, entering machine learning and AI was really, really hard because you needed data scientists and machine learning experts. And it was all very costly.
So ChatGPT came along, and then we, Gemini, and everything else, Claude, and for types like us. That was just wings so we could develop, we could iterate. So that’s what’s been happening for the last three years. We’ve been shipping your features, new tools, new methods, new ways of doing things, and people have been really enjoying the stuff that we pump out.
We don’t think too much, we discover something, and share it immediately with the SEO community. And the SEO community has been contributing back, and it’s been a really fruitful cooperation between the community and us. Currently, there’s a massive tech stack and lots of tools. We’re rolling up our sleeves and saying, “All right, let’s put all those great ideas into practice, measure the results, and see what happens.” So that’s the seed of a new SEO.
But why am I saying all this? We’ve been through this before. When mobile came along and like basically every new thing, when Google started introducing deep learning into their search, know, passage indexing, and you know, so machine learning is not new. Google’s been in the machine learning industry since almost day one. They’ve had all these things in place. Mean, BERT, my God. Vector embeddings, all these things have been there the whole time; we’ve been aware. It’s just that the barrage entry was a little bit too high, so we were looking at it.
Same as with algorithm updates, know, mysterious gods of Google, how do you do those things? I spent like two, three years studying all this very hard. I pretended I was a junior and just went studying hard, machine learning, all the basic concepts, and skill up. And I feel really good right now about the level of knowledge I have, the depth of understanding, just enough to get me through.
If you're not in organic search through traditional SEO, you're not going to be one of the grounding results that support the AI model. Share on XBut when I talk to people, I still say, “I don’t know how to do any of this. I’m figuring things out.” I like to be honest about the whole thing. Yeah. So long story short as I can.
You’re several steps ahead of many folks in the SEO industry. So you’re definitely a leader in that field. And some of these tools have become go-to sources for information and insight. Maybe you should run through the tools you have. I know, like you, have the semantic compression tool. You’ve got a bunch of different resources for the SEO industry. So can you walk us through some of those and their use cases.
Yeah. I have literally a page for everyone keen to see all this in dejan.ai/tool. There are about 20 growing lists, 20 tools at the moment. I probably missed a few in there that I forgot to make. And so the idea is to have a collection of tools to supply to the community.
Yes, some of those tools are a flex. You’re like, look what we can do. But they’re all, so they do demo the internal capability in a small way. Like, obviously, we’ve got a pipeline, a processing pipeline, everything is batched, and we can do volumes of things, and tools do one thing, manual entry, you know, but they do showcase what we can do. But that’s only a part of it. The other part is that people use the tools, and they’re like, “This doesn’t work.” How does this work? They ask questions.
And it’s like you’re saying I’m several steps ahead. I’m not quite sure, because as soon as SEOs jump on the tools I have, they contribute equally intelligent ideas we already put into the tools, and they help us build them further. It’s kind of like an open-source mentality. So, before I get into the actual tools and name many of them, I think I’d like to present the basic frameworks for thinking about AI customization.
AI is the presentation layer for search, retrieval, and information retrieval. And you should know because you wrote about that a long time ago. We’ve got information retrieval, ranking relevance and all that, same. All that is the same. And then you’ve got the interpretive layer that mainly presents the information to the user in a slightly different way, rather than, like, lots of links and open tabs. It summarizes, organizes, and provides a report-like answer, whether it’s Gemini, Claude, ChatGPT, or Google AI Mode.
So we’ve got SEO, or organic search, as up-to-date, fresh knowledge feeding into AI, which interprets it and shows it to the user. So basically, the backbone, the backend, and the presentation layer. So really, what’s happened just now is we went from a 10-link interface to a very different one, so it’s evolved a little bit faster. It was a quantum leap for us.
Understanding and control are the two pillars of SEO, just like mechanistic interpretability and model steering are in AI.
Because we were used to little incremental changes. A bit of this, a bit of that. There’s Barry Schwartz. There’s a new button on those SERPs. But this time it was just like, whoa, complete change. So it’s a paradigm shift. So that took people by surprise. But I still want to underscore that this search has a presentation layer. So we established that fact.
Now, the mentality of thinking about the whole thing. In SEO, we’ve had two major pillars of work. The first one was understanding, and the second one was control. Why is this happening? What happened to the site? Why did traffic drop? Why did traffic go up? What fell in rankings? Which keyword am I missing? Understanding what Google did, why did Google do what it did?
Once we have the understanding established, we have the second element: control over page optimization, authority building, content, and so on. I found beautiful symmetry, beautiful parallels between SEO, what SEO used to be and what SEO is now, because guess what? In machine learning, control is model steering, and understanding is mechanistic interpretability. So those two things are the new jargon for us SEOs, but they are like well-established terminology in the machine learning industry. So basically, understanding control, mechanistic interpretability and model steering. What is mechanistic interpretability? It’s getting into the model’s head.
AI psychology, why did it do that? Why did it say what it said when I put that prompt, and why did it say something different when I put a different prompt? This is interesting. Why did it say 10 different things when I put the same prompt 10 different times? Because it’s a stochastic parrot. It has temperature, has top-case sampling, and allows us to say different things when we ask the same thing 10 different times.
So this puts all these AI visibility rank trackers in hot water on that point. Why are we tracking a hundred different prompts, arbitrary prompts, when we know that when we refresh the screen, they’ll give us different answers every time? What’s the point of that? There are many things I’ve been critical of in the industry. So yeah, there’s the theory part. Now, the practice, what do you actually do when you ask about the tools?
So we have a free tool called AIRank. You just log in with Google and measure your visibility, but we don’t measure it through a bunch of random prompts. We have a bi-directional probing methodology that works like this. We ask the model, What does this brand do? And the model gives us 10 things and sorts them in order from 1 to 10. Familiarity with the SEO mentality, yeah.
AI is a stochastic parrot. It says different things each time due to temperature, top-case sampling, and probabilistic decision-making.
And then we let the user enter their own list of things they consider important. So we call those entities. We allow the user to enter their own entities that they consider important to their business, what they do, and what they want to be visible for. And we say to the AI, name the top brands for this entity. And we repeat the same question for each of the listed entities. So number one, the first question is, what does this brand do? And at least 10 things.
Second, the opposite direction. We’re going from brands to entities, then from entities back to brands. So, for each entity that the user specified, we go through the list of brands. So what do we end up with? We end up with structured data. Like the model literally can’t give us anything else because we are forcing structured output using JSON. So the model gives us a list of 1-10 things your brand does. And for each thing that you said your brand does, what are the top 10 competitors? And we do that on an ongoing basis.
We track the model’s visibility, and what we do with that, so that’s the primary tool that gives you the first part, which is about understanding, yeah, mechanistic interpretability, but like a little glimpse of it.
So using that bi-directionality, we’re able to understand the associations the models make with your brand and your entities, as well as with other brands associated with those entities.
And associations.
This might sound like a bit of a word salad, but we might accompany this with some helpful imagery, because it’s actually really quite simple. And I have some handy slides from SEO Week in New York.
What we do next, then, is track that visibility weekly, because daily is pointless. Daily changes don’t happen with models. Models don’t even find you weekly. They find you maybe once a month, every six months, or something like that. Model releases are annual, I would say. Like GPT-4 to 5 takes a year for that sort of leap.
And that’s the whole reason the models, on their own, can’t be generative engines. People think the models replace the search; they don’t. They cannot possibly do that because models cannot hold up-to-date world knowledge in their weights. Training a model is too expensive, and you can’t train it every day to keep up with the world’s knowledge. Models will always depend on search.

There’s just no way around it. Search is quick to update. Google’s got enormous systems built around that, and nothing can beat it. So models are lean, mean, and intelligent, and they tap into search to stay up to date on the world. And their mental knowledge, what they feel and perceive about the world view,
That’s just enough knowledge to get by with the tools and to generate new knowledge. Obviously, if you don’t know what CMS is, you can’t look up WordPress. So they know enough, but they all rely on search grounding. All right, so we’ve got that tool out of the way. We have our visibility scoring and all that.So what do we do with the two pieces of information? So over time, we get the average ranking position. And over time, we get the number of times your brands appear on that list.
Because those two values are on different scales, one can go into the hundreds, and the other is always between 1 and 10. We normalize both. So we normalize them to the range [0, 1] and multiply them. That’s it. The product of those two normalized values of the brand mentioned frequency and their brand position. So the questions the tool answers are: how frequently is my brand mentioned when somebody’s prompting this concept?
And when it is mentioned what position it’s in. So we multiply those two and get a brand visibility score. Nice and simple. And we have two modes. We use one with model memory only and one with grounding. Obviously, as soon as you plug in grounding from search results, the answers are completely different because it’s getting fresh world knowledge from search engine indexes. Okay, so that’s visibility.
Google wins the AI race not because of early adoption, but because it controls the ecosystem: search index, data centers, TPUs, and talent.
Yeah, so for those who are not familiar with terms like grounding and retrieval, augmented generation (or reg), let’s set that up so they understand what that means.
All right, so you’ve got a model that has an idea about things and some answers it can give you, just ask it, what’s the formula for water? H2O just gives you the answer. But there’s a little decision-making classifier in these AI systems that sits outside the model and determines when something the user has typed deserves, kind of like the query-deserves-freshness concept. It was my king who gave me the idea for that. I think he said the query deserves grounding. And I really liked that. So I used that. So there’s a little classifier in these AI systems that determines when a user’s prompt equals a query.
Queries were short, now they’re long. Still a query. So the query deserves grounding. So, what did the user just type in? Can I answer that safely? Or do I need to call for help from a search index? And this is an interesting thing for people to understand.
Google, the way it’s designed at the moment, uses the Gemini model; it grounds everything 100% of the time. Is water wet? Search lookup. Is the sky blue? Search lookup. Just ground everything; they want to be safe. They own the search index that retrieves information cheaply, so they do it for everything. No matter what you do, they always ground everything and for the full.
Puts them at a huge advantage, right? Because that gives them a huge advantage over OpenAI, Anthropic, and so forth, since they have the search index. They don’t have to pay somebody else to get access to all that real-time search information.
Yeah, I’ve been saying, like, for three years, ‘Google wins the AI race.’ As soon as they released Gemini, I was like, “yep, that’s it.” I saw it because you could be OpenAI and have the advantage of being an early adopter, with market share and all that. But ultimately, Google’s got the talent, like tens of thousands of engineers. They’ve still got an energy problem solved. They’ve got the data centers. They’ve got, and these are very large logistical challenges. They’re independent of NVIDIA GPUs. They’ve got their own TPUs. They’re buying stuff from ASML in the Netherlands.
They’re manufacturing their own stuff. They’re a self-sustaining ecosystem. So Google will win this. There will be other players and this and that, but Google’s got this. Well, let’s not give financial advice, but I’ve done some things because as soon as I saw what was happening. And another big player has just done the same thing last week, I think, dumped a whole lot of stock A to buy a whole lot of stock B. Just leave it at that.
Grounding is essential. Providing real-world data to AI helps prevent hallucinations and ensure accuracy.
Yeah, grounding, just to establish the terminology here. Grounding is necessary. Grounding is providing real, fresh-world data to the model to prevent hallucination. The model hallucinates by design because it’s supposed to say the most probable thing it’s supposed to say, but when that fails, it’ll just say something, anything. It has to say something. So Google’s done a good thing.
Yeah, and something that says is never, shoot, I just don’t know the answer to this one. That was a great question.
I mean, I prefer it when my AI says, “In fact, I have custom instructions.” They say, if you don’t know, the answer is, “I don’t know,” that’s fine. So I give it that option. So it’s similar when you’re doing surveys online or polls, you should always give people an option to say, “just show me the results or I don’t know,” or, know, I give them an option to exit rather than being forced into making a selection because that’s how you skew your bias, your data collection, pollute your data.
But so yeah, grounding supports models’ lack of up-to-date knowledge. So Google grounds its results quite aggressively because it’s possible, and OpenAI does it selectively. So Google will ground a single sentence or a single chunk of the generative answer, with one, two, three, or sometimes five citations from other websites. And they can get that cheaply from their cache and index. While OpenAI is a bit more selective, the OpenAI model will be presented with grounding context and will be sparing with what it selects.
So OpenAI will pick and choose which ones to embed, and it’ll only ground one chunk with one citation. So that’s a little bit of a behavioral change. So I think it’s important to understand that people are, “Oh, how does AI crawl websites?” That’s a nonsensical question because there’s no such thing as an AI. Google does it its own way. OpenAI does it its own way. Anthropic does it their own way. Grok does it their own way. So basically, it’s platform-dependent, and these are all software choices. It’s like, “Oh, does AI use schema?”
Some do, some do not. It’s how their engineers choose to embed things. But when the information in a grounding situation gets to the model, it doesn’t have the schema. So models are perfectly capable of understanding the schema, but the way things are supplied to the model is, at best, plain text or Markdown. So the preprocessor that renders your page and may use schema and structured data extracts the star ratings, prices, and all those product specs, and supplies that to the model, but when it gets to the model, it’s plain text. Those are the slight differences.
So are you saying that folks who are reporting that schema is helpful for AI, SEO, or Geo, that that’s not a thing?
Well, I used to say that it’s not a thing. And then I got corrected by Andrea Volpini, who made the point that, well, I said, it’s”It’s a design choice.” Some do, some do not. Said that that doesn’t mean exactly what I just said. He said he actually corrected me. I said it doesn’t. It’s not there because I did testing, and he didn’t get through. But he actually made the point that we have no evidence of them, and he didn’t include it.
Models will always depend on search. There's just no way around it. Share on XSo it could be a relatively simple test. You could generate a page with schema and everything. And if the information that’s only in the schema but not on the visible page gets through to the model as plain text, it’ll be plain text. It does not reach the model as a schema in this exact format.
But if it reaches it even as text, and if that information wasn’t visible on a page, it was only in JSON, LB, or whatever, it’ll be proof enough. So yeah, if somebody’s willing to do that test, I didn’t want to do it, didn’t see it. If someone wants to put me wrong, I would love that, because I do love structured data, and I would love for models to use it. But there’s another layer growing underneath right now.
Agent-to-agent communications, agent-to-agent protocols, payment systems, and agent payment protocols. The future of commerce will be abandoning the traditional surfaces, such as tablets, mobiles, and desktops, and websites will be optional. So we’ll start seeing things happen in the background.
I’ll need something; I’ll just tell my AI what I need, and things will show up at my front door. It’ll know my preferences, my price ranges, and things will happen in the background. The question is, what does an SEO do? When there’s agent-to-agent communication, like my personal shopper now liaising with eBay’s backend to find the best deals.
Looking up reviews, check in with me, “Hey, I found one, you want this, here’s the three options”, and I go, “yeah, I’ll go, yeah, this one, or you just proceed. Or don’t even ask me anymore, just deliver that pet food to my door once per week, and it’ll be sold.”

So yeah, we’ll start to see some major, major shifts. We’re not there yet, but I believe that in maybe two years, it’ll become a bit more serious.
When you think websites will be optional or no longer a thing, and that it’ll just be primary-agent communication.
I don’t think I mean that will be optional, but I think a lot of people will still gravitate towards them. I am an old school guy, I suppose. I am progressive in that I like to see what’s new, but I prefer to have control when I’m shopping. I know one thing, and I’ve studied human behavior for a very long time to understand how Google does things. I’ve been preaching that Chrome is the biggest ranking factor for decades.
But so, people are lazy. And if we can reduce our cognitive load and reduce friction, if we can reduce the effort involved in making a purchase, even if we know it might not be the best thing, the best deal, you get a slightly better, slightly worse deal than if you did it manually. Still, if it’s gonna save you like six hours of research, you’re just gonna go ahead and do it. Just human nature.
Obviously, if you’re buying a vehicle or a house or, you know, picking where to live next, these types of decisions will take a bit more effort. But I’m talking like, what should we eat for the weekend? I have a party, you know, small decisions. I have six guests coming over to our place looking for catering. My AI just surprised me. Get some stuff. You know, I’ve got one guy with lactose intolerance, one gluten-free.
And one has a how I’ll preferences, so just do it, and it’ll be so it’ll be done, that’s that’s the type of future we’re looking forward to, but yeah, obviously that’s forward-looking stuff, not quite there yet, we still have a lot of stuff to do, which I guess brings us back to so I think I didn’t quite complete it. I think people might be wondering, like, so we had visibility, but what do you do then? Like the tools, yeah. It’s like, that’s where you’re at. But like, what do you do with that information? So there’s another element I help people with: understanding more granular levels of confidence. So, in a similar way to how we do that, there’s another tab in that tool that I enable for people who ask for it.
Primary bias is the new click-through rate. AI models are selected based on perception, not clicks.
Right. Yeah.
And it’s, we use quite heavily internally, which is called AI relevance. Basically, we run a binary analysis. Would you recommend this website if you saw this in the grounding results? No. And then we build a probability matrix, a set of recommendations and so on. But then what do you do? You know your competitors’ strengths and weaknesses, you know your own. What do you do after that?
We have a tool called TreeWalker.ai that helps you really understand what happens. Now, we’re getting to the model’s head, the mechanistic interpretability, not quite, but almost, because we’re analyzing the output, not the model’s weights themselves.
So we’ve got things like models. When they do the token completion, they are autocomplete, fancy autocomplete, yeah, let’s call them. They predict the next token and are allowed to sample from a pool of options for every next token they’re gonna say. And we tap into that probability space, and allow the model to walk many paths for as long as it meets a certain threshold. So we end up with one sentence, say golden path.
You know, the main sentence that the model chose to say about your brand. But then we say, “Okay, at this point, diverge.” Take a little side quest and say this other thing that you couldn’t say in this universe, but you would have said if there was a roll of the dice type thing.
So we actually call it a parallel universe. So we can see all the tokens that the models could have sampled from, and we allow it to say all these things. And we look for high and low confidence at every junction point. So we can say, for example, Dejan AI is an Australian AI SEO company. What happens next?
So we get a sample of what the model wants to say. If it says, for example, can say helps, analyzes, or builds, but like, these are meaningful things it could say. But if it says something weird, like, you know, it says, pasta and so well that come from so that doesn’t fit it’s incongruent with my what I’m trying what I’m about or if it’s not sure about certain thing and the models confidence levels are very low, so we call that high entropy spot and on the graph you can see. That is like a bottom of the graph and red so for every token the sentence completion.
So, effectively, what that does is tell you how the model stops being confident about your brand and says things that are. Maybe you know you can flip-flop between different things about you, and that gives you a chance to get really close to the model’s opinion about you. None of this has anything to do with traditional SEO, because SEO isn’t plugged in yet. We are probing models’ raw thoughts, right?
And this is what I refer to as primary bias. So primary bias is a new term I use internally, and I guess I’m broadcasting it to the universe, hoping it gets picked up.Primary bias is what skews the model selection rate, which replaces the click-through rate. Because we are now in the agentic era, and we have an agent, an AI system picks between 10 different results to present to the user. How do they make that choice? Since they don’t click, they select, right? They select one, two, three, or six out of 10 results. So they select three out of the ten results, but why did they pick those? And that’s exactly why we’re going into all that fancy business with high-entropy points, low-entropy points, the past, token predictions, high confidence, and low confidence.

We are trying to understand whether models feel confident about you and whether they are aligned with certain concepts but not others. So primary bias can be positively geared towards your brand and a certain entity. The association is very strong. It could be negatively geared towards the thing that you want to be visible for.
And that primary bias, according to my tests and observations, is the primary factor influencing models’ choices when grounded. Now, grounding is traditional SEO. We’re talking about 10 search results or 100 search results supplied to the model as a budget text. And the model is now reviewing all of them and picking and choosing. And it’s like, okay, I like this one. I like that one. That one’s relevant. This one, not so much and leaves you out.
And that part, the reason for it leaving you out, will be in that probing part that you did before. So basically, asking the model is like a model survey, giving you an illustration that brings it closer to practical reality. We have a client in Europe, specifically Germany, who manufactures custom sports jerseys. And for some reason, OpenAI just would not recommend them to global audiences.
So Canada, the United States, Australia, even the UK, and so it’s like, “what’s going on? Why the hesitation?” They’re ranking well in Google. They’re being supplied to the AI as the grounding context, but just not being selected. The selection rate is low; therefore, the model’s primary bias must be against the brand. So we did probing and analysis, and then, finally, I used all these tools to try to figure out what to do for this client.
When I figured it out, I shared it with the community, right? So, guess what it was? European origins. They are a German brand. So they are recommended when someone searches for Germ products within Germany and within the European Union. But as soon as you’re outside, it’s no longer relevant.
What?
Brands aren’t just entities. They are locations, perceptions, and probabilities in the model’s mind.
So two things were problematic. One, the model’s internal perception. The second one reinforced that, as at the bottom of the website, they listed the GmbH and German numbers, addresses, and other details. Basically, it was never going to happen. So, first of all, the model already had an opinion of them as German-centric.
And the other one was when they were supplied as one of the grounding results to the model, which reinforces that, since the HTML and text supplied contain more German content. And so what we did was two things: a campaign of model steering. Basically, think of it as lean building or digital PR seeding content. That’s what we’re trying to do with Americanizing the brand, or, like, the European eyes on the brand. And the second thing we did was Simple on-site optimization. Put the American address on the .com website because they have a few TLDs. American address, American phone number, put LLC instead of GmbH. So, a week later, when the results were in, there were none. It was as simple as that.
I wish I could have solved that problem so much earlier if I had just known what to do. So that’s the benefit of knowing all this, understanding the models. But as soon as the grounding is switched off, say OpenAI has a prompt that does not require that grounding, that is still not being recommended because the model is still running out of its memory. So we have to wait six months for Google to update from Gemini 2.0. To Gemini 2.5 to get into the model’s head. Meaning when the results are not, when there’s no grounding, when there’s no search supply to the model, natively the model just mentions them. Say, name the top 10 brands associated with such-and-such concept entities.
And they’re not number one, but they’re like the fifth result out of 10 now, something like that. You know, it’s obviously stochastic; it shifts around in positioning. But they’re there natively. So primary bias is there. It’s well aligned. They’re a known entity, and they’re the brand and market. So it’s not just about brand and entity association. It’s the brand location. It’s quite complex. But so we’ve got both things happening. The grounding situation is fixed, very simple. And the model’s internal perception is simple.
And that’s fixed. So those two concepts, model understanding and steering, more mechanistic interpretability and model steering are the two most important concepts and understanding that models rely on search to function properly. Like, if you understand those two concepts, you’re already way ahead of the crowd, and you can focus your efforts on what matters. But, like, if you want to optimize, say you want to design a PR campaign to get your brand better visibility in an AI search. Where do you begin? There are so many things you could do.
My advice is to begin with the things where you find to be wobbly when you’re probing the models. The model says one thing or the other, but the confidence is low. Those are the key items. Don’t try to fix something that’s already going well, like the model has a very strong association with your brand and that entity in that geo region. Don’t work on that. Work on the weak areas that are of commercial interest.
Even small, incognito brands oscillate wildly in AI visibility. Authority stabilizes your timeline.
So I know that’s a lot to absorb, but I’m just pushing this into the ether, hoping that as many people adopt the reasoning and get a bit more comfortable thinking about this. I don’t want people to think about AI as this mystical thing, you know, like the mentality of praying to Google Lords and algorithm updates, fingers crossed, let’s see what happens. We can now influence things much more than we did before.
It’s funny that you mentioned the mystical, because when you’re talking about things like parallel universes and entropy and nodes and things, I was thinking of timeline theory. Do you know about timeline theory and have multiple timelines?
So for our listener who’s not familiar with this concept, when you are in your life review, trying to figure out what you did right and wrong in your life after you pass back into non-physical. If you are reviewing not just what you did and what you didn’t do, but you’re also reviewing what would have happened if you’d done the thing that you were meant to do, but you didn’t, you restricted and didn’t do the bad thing that you ended up doing because you were reactive. You can access this kind of infinite choose-your-own-adventure book. Right? Have you ever read those choose-your-own-adventure books when you were a kid?
Yeah, you’ve got a very salient image right behind you, showing different parts of your background. So that’s exactly how Treewalker works.
Right. Yeah, it’s a virtual image, too. So if you want to review my mistakes and missed opportunities and who I could have been if I made all the bold right choices and so forth, then you get to examine this infinite choose-your-own-adventure book. So that’s the general explanation of timeline theory. Still, if you start talking about specifics around things like nodes and having these high chaos time periods where a lot of the timelines intersect, something that happens in one timeline also happens in 500 million other ones, because it’s a high chaos point in time or node. I was just thinking that you’re kind of talking in a similar language to what you were describing with parallel universes, entropy, nodes, and so forth. Just curious. Are you familiar with this whole concept?
Yeah, absolutely. So this is, this is not just the nature of deep learning models. This is the nature of our reality. We’re not going to get too far into science and quantum mechanics. First of all, I’m not a scientist to speak properly about it. If you think you understand quantum mechanics, you do not understand quantum mechanics.
Seeding content with proper brand associations can influence AI even more than links.
But we do understand that there’s the uncertainty principle and the collapsing states, and when you observe it, it collapses to a certain thing. The analogy is there; it’s not quite exact, but the models find a path. It’s probabilistic. And that’s why the discipline of trying to understand the models, what they do and how they do is called mechanistic interpretability because models are probabilistic by their nature. Still, we’re trying to get as close as we can to understanding the things we can trace and calculate. And that’s what we’ve been busy with. The Tree Walker tool gives you a glimpse of what happens and all the probable paths the model could have taken.
And so the high contention points that you mentioned, the notes that are like it’s quite chaotic, are precisely what the high entropy points are in our tool. So when you see a high-entropy point, it will be red and at the bottom, meaning the model’s confidence is very low, and you can say anything. For example, a lot of different tokens could be used at that point to describe your brand in that part of the sentence. But I should probably mention that a concept can be quite strong in one part of a sentence but really low in the other. And that depends on the preceding tokens. So you can’t take the output of that tool as a sort of like, that token we’re doing poorly for, and we’re doing well for that.
You have to contextualize that in the context of preceding tokens. So, for example, I analyzed one client website that said, “We do Tapware.” This is the brand that does Tapware. So that was like Tapware was really strong. But if we proceed with the word Tapware, the probability of Tapware goes really, really low.
So we learned something from that. We learned that the model does not associate the brand with luxurious products, but just general products. And that gives a direction for the optimization path. That says what type of content you should be publishing on the website, which tells you what type of content you should be seeding outside, putting it into the training data of the models, putting it in the grounding situations, so that it gets into citations to compensate for the model’s lack of understanding. Hence, it ends up in the grounding. So even if it’s skimming against you, being a luxury brand in its head, when it sees it in a grounding situation, it says, ‘luxury,’ it’s like, ‘okay, luxury.’
And the good news for now is that the models are still naive. Think, savant, genius, but child. And so it’s super smart, super intelligent, but very gullible, very naive. So you can just say anything on the grounding source. If it’s your page, we are the best. Pick this one, and they will believe it. So for now, of course, you don’t want to do too many embarrassing things like that. I’m not saying there will be penalties later. Penalty comes from a human reviewing your content. You don’t want to burn your brand by forcing AI visibility. That’s probably one thing we should not wrap up without. I think brand recognition is huge. During my model probing, I find two patterns.
Tokens are words. Chunks are groups of tokens, the building blocks AI uses to understand your content.
One is low oscillation, and one is high oscillation, meaning your rankings are up and down every week. And then, for Adidas, Puma, and Reebok, your oscillation is non-existent. You’re glued to the top, and you’re on the graph of rankings. You are just a peaceful timeline, just flowing. That might be a little meandering here and there, but not much.
If you’re a small, incognito brand and we’re probing you for a certain entity, you’re all over the place. So I found this interesting pattern. The number of oscillations that your brand has for a certain entity is in direct inverse correlation with the brand authority. So if your brand is highly authoritative, it won’t meander. It’s not going to wobble. It will be fixed in the timeline. It’s always going to be there. But if your brand is not as strongly associated with an entity, then you’re going to be wildly jumping up and down.
So that’s something to keep in mind. There you go: the full-circle importance of branding comes right back into it as one of the key things. In obvious terms, the model’s primary biases are many. Can’t detail all that in a quick, short call. But branding, what’s in the URL, is a factor. What’s written on the page is a factor, and many other things that associate you with the brand.
So people who say link building is dead are dead wrong, because digital PR is still a thing and will continue to be. Brands are important, and if you are not getting your brand mentioned on other websites, you’re becoming obsolete.
Yeah, link building became a dirty thing because there’s always money exchanged. People know if you’re a blogger, you know what somebody wants. You don’t want to be linked to that band. We have silly rules: you can have one link for your client, but two links to Wikipedia and a gov website to make it look natural. All these ridiculous rules are in place, so I’m glad it’s kind of on the exit, because link building isn’t the main thing anymore.
Seeding content with your brand associations in a proper way is kind of replacing it. So whether you get a link or not matters only towards traditional organic search visibility, which is important for AI because, like I mentioned, it is 10 times now. When something is important, I repeat it often.
If you’re not an organic search result through traditional SEO, you’re not gonna be one of the top results that support the models saying whatever they want. So you gotta be there first. Yeah, link building will help with that, among other things. Relevance, authority, clicks, traffic, Chrome. Android, whatever signals Google uses. So that’s one thing. But content seeding and brand association are quite valuable, even if you don’t get a link, for training models, grounding, citations, and brand mentions. So yeah, there’s a lot of work for us SEOs to do.
Unlinked mentions on the right sites can be more valuable to AI than do-follow links on weak sites.
Yeah, so unlinked mentions on the right websites are more valuable than do-follow links on sites that are not as impressive to an AI model.
I would say so, yeah.
One thing you mentioned multiple times, and for many folks who are listening, they’re familiar with this term if they’re following the AI world, but tokens. You also mentioned chunks. And I’d love for you, for those who are not familiar with those two terms, to provide the actual definitions and to differentiate between a token and a chunk for our listeners.
Yeah, token is a word, sometimes a subword. And basically, if you have a complex word, it might be broken up into two tokens, sometimes three, not very frequently. So if you have a sentence of 10 words, that might be like 15 tokens. OpenAI has a very handy tool. If you Google OpenAI Tokenizer, you can paste your content in there, and you can see visually, very nice, colorful, how it splits it up into tokens. You can even switch to Geeky Part Token IDs because models don’t see text; they see numbers. They convert those numbers to vectors, and the vectors go through the neural network, and the neural network speeds up tokens back out, and then those tokens get converted to text, and that’s what you see. Tokens are just words or word parts.
Chunks are groups of tokens or parts of larger texts that models use for a variety of purposes, such as retrieval and the precise moment of the most salient point in search results. Sometimes they chunk content. Let’s say you have a very large document, like 100,000 words. They might do chunks of 120 (8,000 tokens), 10(124 tokens), or 512 tokens.
So basically, chunks are used for a variety of information retrieval purposes, including passage ranking, snippet generation, and so on. Basically, just arbitrary content slicing into meaningful semantic units.
Some of them are quite dumb. They just cut off a word count. Some of them are semantic. The semantic chunk is a bit different. It can include a bit of extra content if it fits within a sentence, or a little less if it fits the semantic unit better, and that can be very useful. Google’s got chunking pretty much sorted. You don’t have to think too much about it for as long as you have a good page of content with clear semantic units.
You’re not waffling on too much. Everything is nice and clean. I think you’ll be all right. You don’t have to engineer your page chunks and things like that, because Google will undo all that work and do its own thing. They’ve got their mastery of chunking techniques, and they’ve got many, many different methods. So that’s not something you can just roll up your sleeves for. Oh, I’m going to go chunk optimize my landing page. Google’s got that. That part’s done. There are other things you could do to address the page’s semantic structure, improve its quality, and ensure you’ve included all the important entities and concepts that should be on that page.
So yeah, tokens, chunks, chunks are bunches of tokens, and you have a bunch of chunks within a larger document. So it’s Matryoshka-type situations.
Yeah, so if we’re talking about an AI model that’s not as sophisticated as Google at analyzing chunks, you might have something where, if you mention the brand name in a previous paragraph. Still, now that you’re not mentioning it, you’re just referring to the brand as ‘it’. That could be a problem out of context; that paragraph doesn’t really make a lot of sense, or it doesn’t really reinforce the attributes that you want to associate with your brand.
Yeah, I mean, not even like a less sophisticated AI model, as, you know, state-of-the-art models like GPT 5.1 just got released last week, still rely on, let’s call them chunks or segments, tech segments, because I discovered just last week that OpenAI uses their internal tooling. It’s not what the model does. A model is just a file that gets served context, and it does things with it.
So basically, their tooling, when they scan a web page, they give the model a window, a context window of a page. They don’t show it on the full page. They show it as a chunk they decide on. So it’s kind of like, let’s say, you have a thousand-word page, and they kind of choose which one to show. So the models got this narrow view of what it can peek at. And then, arbitrarily, the model can move that window to, like, a line count. It’s like, take me to line one hundred. And then it gets that little chunk, and it has the option to expand that, but never the full page. They are worried about copyright, and they are really freaking out, even about the system instructions. They are freaking out about the model and repeating the page’s content verbatim. There are guardrails in place to prevent that. And I believe this chunking of the window, the context window that the AI models at OpenAI are primarily around. There’s, of course, the matter of efficiency and all that.
They can mouth words and follow links. So there’s an agentic behavior there. There’s an element of decision-making. But for what it’s worth, the fact that you mentioned that the practical takeaway is like, make sure that a lot of your content is like, rather than saying it, maybe mention that it is, an entity, whatever it might be, make sure that as many of your independent content segments are self-contained. They stand on their own legs if isolated from the rest of the page.
Yeah, so you don’t want to be taken out of context. And that’s just good, common sense. If you think about how to be quotable in sound bites, you don’t want to be taken out of context by the media. You don’t want to be taken out of context by the AI as well.
That’s a comment I had from the audience in one of the conferences I did recently. Somebody raised their hand and said, “How can I make sure of the model, because there’s a mediator between your website and the user.” Users aren’t going to your page to read content. The content’s taken from your page and provided to the user. And is the model interpreting your intention as it was intended? Like, is it saying, or was there some sort of lost-in-translation type situation where the model is speaking for you, but through the wrong message or voice? That is a whole new topic but certainly doing proper content structuring can help making sure that if your content is even windowed like we spoke about before It still says what it needs to say in isolation You of course need to make sure that you don’t write like each one of your passages is like this weird self-contained unit that doesn’t sit well with the rest because that can get really weird if somebody actually reads your page.
It’s like you don’t need to have an intro and a conclusion in each paragraph. It needs to flow for a normal reader. We’re not optimizing only for machines. Yes, we are increasingly supplying information to the models. But if somebody lands on your page, it’ll still make a lot of sense. So don’t just optimize for machines. Optimize for users as well.
This is something that we’ve been talking about for decades in a different way in the SEO world, and that is shingles, like a much smaller window, maybe it’s five or six words or whatever, instead of dozens or hundreds. But this idea of comparing shingles across documents for things like duplicate content, filtering, and so on. This isn’t a totally foreign concept for us SEO folks.
Technically, they are just chunks. When you translate to machine learning speak, you know that to them, that’s just a chunk, a small little semantic unit, so yeah.
So I know we’re about out of time, but I’m really curious about one last thing before we wrap up. And that is, Ahrefs, a very popular, well-respected tool set in the SEO world, is now tracking AI citations with a tool they call BrandRadar. And I know we talked about how AI and rankings aren’t things you can’t rank in an AI engine. So what’s your take on BrandRadar? Do you think that’s kind of a ridiculous tool and concept, or are there some things worth examining and tracking with BrandRadar?
So I’ve got friends in a lot of these visibility-tracking tools and colleagues I respect. So I’m not going to go and talk garbage about what they do.
You’re not throwing anyone under the bus?
No, look, but I’ve been vocally critical about the methodology that they use, and I presented mine as the correct way of doing things, each to their own. I think arbitrary prompt tracking is nonsensical busy work, akin to an AI work slot, to use the latest proper terminology.
You get the model to output garbage, then create yourself busy work to process that garbage and get some sort of distillation out of it, when you can distill the model output straight away, straight from the model, which is what I do. So why are you going in circles, creating all this busywork, and paying thousands of dollars for these AI visibility-tracking tools? So I just screw it, prompt, arbitrary prompt tracking.
So when I say ‘friends,’ I’m talking about Marcus Tolbert. He leads an amazing division of more like enterprise SEO tooling and all that. He’s got some great ideas. We must all understand that I disagree with the methodology of simply tracking arbitrary prompts, as it’s nonsensical.
But I do see utility in some of these tools, including Ahrefs and Profound; they’re the sleek new players in the industry. While they all do silly things that I disagree with, they provide utility in a sense; they scrape the actual results. Those who actually scrape, let’s say you put some prompt in AI mode, Google AI mode, and you scrape the HTML, and you show what was actually showing for that prompt.
Like a bit of map results and images, you scrape open AI, I see value in that. Understanding the true layout and the true makeup of the answers provides the next level of insight on top of what I do. I can also generate arbitrary URLs and prompts, and perform citation mining. I’ll allow using APIs, I’ll allow for grounding of search results, and I understand the model’s choices.
And I can get an amazing amount of prospecting lists and opportunities from the choices I know that Gemini and OpenAI’s GPT have already made. But this is simply a synthetic model probing and synthetic citation generation. These are not real user interactions. These are not real user chat sessions. Normal user chat sessions are not single-prompt situations. People talk back and forth, back and forth. They have a whole exchange before the results are brought up. How can you possibly predict that chain of conversations to get to the real situation that they’ve had?
So they can do all that. The layout that they scrape is valuable, but it’s still built on synthetic data. And when they say, “We have some real queries, real stuff,” I mean, what kind of user would allow their real prompts to be shared? And what type of conversations are there? Certainly not very sensitive. So, even like with the clickstream data, I wouldn’t trust any of that. So yeah, highly skeptical of that, but I don’t say there’s no utility in what they do. They offer a sleek interface that produces very good reports, giving you some time to figure out what to do. If you want to do proper visibility tracking for yourself, it’s not that hard. API calls. If you want to just outsource the whole thing and you’re willing to pay the bill, go ahead and do it. Your life is simpler. Just don’t present it as the ground truth. It’s not.
Yeah, I mean, it can also help you make a relative comparison with competitors in terms of visibility. So, you know, it does have some utility, but it’s based on a flawed premise.
Yeah, call it model surveying.
Awesome. Well, this is a great way to wrap things up. If our listener or viewer wants to potentially work with you or learn from you, where should we send them?
Dejan.ai is very easy to get in touch with me there. If you want to reach out, book a time to chat. There’s a scheduler there. Explain your needs, and we can chat about what you have on your plate.
Awesome. All the public tools are available at dejan.ai/tool.
Exactly right, very simple.
Yeah, and you’re very active on LinkedIn. You have great posts there. So that’s another great spot for folks to follow your thought leadership.
Yep, shoot through a message. If you have a question about a tool or something like that, how does this work? I’m happy to help.
Dan, thank you so much. Thanks for sharing your wisdom. Thanks for being such a leading light in the SEO industry for so long. And thanks for being on the show.
Thank you so much. Thank you for having me.
All right, and thank you, listener. We’ll catch you in the next episode. I’m your host, Stephan Spencer, signing off.
Important Links
Connect with Dan Petrovic
Book
Apps/Tools
Businesses/Organizations
People
YouTube Videos
Your Checklist of Actions to Take
- Implement bi-directional brand visibility tracking. I use the AI Rank tool to probe models in two directions: first, I ask “what does this brand do?” to get 10 ranked associations, then I input my own important entities and ask “name top brands for this entity” for each one.
- Probe model confidence with Tree Walker analysis. I tap into the probability space by allowing the model to walk multiple paths at decision points, predicting its next token. This reveals “high entropy points” where the model has low confidence and says different things about my brand, versus “low entropy points” where associations are strong and stable.
- Optimize high-entropy weak points first. I don’t waste time fixing what’s already working well; instead, I focus on areas of commercial interest where the model’s associations are weak or conflicting.
- Launch model steering campaigns through content seeding. I treat content seeding and brand association building like the new link building—creating and distributing content that influences how models perceive my brand, even without traditional backlinks.
- Track both grounded and memory-only model outputs. I measure visibility in two distinct modes: with model memory only, and with search grounding enabled. Understanding both tells me where I need traditional SEO versus where I need to influence the model’s internal knowledge.
- Focus on “primary bias”—the model’s internal perception that skews its selection rate when choosing which grounded results to present to users. Even when my content appears among the 10 search results provided to the model, primary bias determines whether the model actually selects and cites it.
- Structure content as self-contained semantic units. I write content segments that stand on their own when isolated, since models often view pages through narrow context windows or extract specific chunks.
- Prioritize traditional SEO for grounding supply. I ensure my clients maintain strong traditional organic search visibility because if I’m not ranking well in search, I won’t appear in the grounding results that models use to answer queries.
- Monitor brand oscillation as an authority signal. Brand ranking volatility inversely correlates with brand authority. If my brand visibility metrics are jumping up and down significantly, it signals I need to focus on building stronger brand recognition and authority to stabilize my position in model outputs.
- Connect with Dan Petrovic for AI SEO guidance. Access all of Dan’s public tools available at Dejan.ai/tool, where I can experiment with semantic compression, AI rank tracking, and tree walker analysis.
About the Host
STEPHAN SPENCER
Since coming into his own power and having a life-changing spiritual awakening, Stephan is on a mission. He is devoted to curiosity, reason, wonder, and most importantly, a connection with God and the unseen world. He has one agenda: revealing light in everything he does. A self-proclaimed geek who went on to pioneer the world of SEO and make a name for himself in the top echelons of marketing circles, Stephan’s journey has taken him from one of career ambition to soul searching and spiritual awakening.
Stephan has created and sold businesses, gone on spiritual quests, and explored the world with Tony Robbins as a part of Tony’s “Platinum Partnership.” He went through a radical personal transformation – from an introverted outlier to a leader in business and personal development.
About the Guest
DAN PETROVIC
Dan Petrovic is the managing director of DEJAN and a well-known name in the field of search engine optimisation. Dan is a machine learning specialist, innovator and a highly regarded search industry event speaker.






Leave a Reply