Barry Adams, it’s great to have you on the show.
Thanks for having me, Stephan.
I’ve had the pleasure of co-presenting with you before and being on panels with you before. I’ve always been impressed with what you share with the audience. You get technical and you share stuff that is not just general knowledge, but it’s done in an accessible way. I’m excited for you to share in that same way in this interview. No pressure. One of the things that I thought would be a great starting point is if you could describe how Google works from the crawler, indexer and all that perspective. Walk through what that looks like and why it’s important.
It’s something that I stumbled upon as I was doing SEO. It became a bit of a mental model of how I approached SEO. It’s supported by the way Google writes about things, although they never make it explicit because they don’t want to live too much from under the hood and then tell people exactly how the system works. The more you learn SEO, the more you learn how Google interacts with websites. You started to get these three main processes of any web search engine, be it Google, Bing or Yahoo or any other search engine, which is the crawler, the indexer and the ranking system or the query engine. Google is quite good at conflating the crawling and indexing. They tend to call it Googlebots in most contexts, but they’re two different processes. The crawler which is Googlebot is the discovery system. That’s the system involved with trying to find new stuff on the web, be it new pages that it doesn’t know existed before or new content on existing pages.
When I talk about technical SEO, I’m primarily focused on how websites interact with that aspect of Google, with the crawling system specifically. To make sure that especially for large-scale websites, Google can very efficiently crawl the right pages and doesn’t waste time crawling the wrong UL. ULs that add no value to a website’s visibility in search. When we talk about indexing, for me that’s very much related to making sure that Google can understand what you’re presenting and that it ranks your pages for the right queries and that you send the right relevancy signals. This is a lot of classic on-page SEO with title tags, descriptions, good body content. Things like structured data and a lot of people see it as a purely technical exercise. For me, this is more of a relevancy exercise. You help make pages more relevant that the indexer and Google’s ecosystem can make a good sense of it. It can understand when it shows your pages and which contexts it can show your pages. For me, that is on page SEO as well and that goes beyond writing good content, but also looking at other technical aspects that help pages become easier to understand for search engines.
One thing that Dawn Anderson and I talked about in our episode when she was on the show was crawl budget. I’m curious if you see that there’s possibly another thing in Google index budget, not just crawl budget, but you could be using up your index budget separately from your crawl budget.
I don’t know if I have a clear idea on that yet. I don’t think a website can necessarily be limited in the amount of pages it can rank. What I am seeing happening is that Google is getting very good at deciding the topics our website could rank for. I’ll give you an example. I worked with a lot of news publishers. Historically, news publishers especially the larger ones, have gotten a bit of a free-for-all in Google search results. That’s because they are such authoritative websites and they are absolutely link magnets. They get a lot of external link value. It makes it very easy for a news publisher to write about topics that generally won’t write about it and aren’t necessarily newsworthy but still ranked very highly. It’s just purely because they have so much link power. An example is The Daily Telegraph here in the UK, which for a very long time had the number one ranked result for top 100 songs of all time. They had a pretty good article on that and it wasn’t newsworthy, but it was the number one result in both Google.com and Google.co.uk because The Telegraph is an authoritative news publisher. What we’re seeing is that Google is trying to limit news publishers from exploiting their own authority to that degree and getting very good at putting websites in specific topical buckets.
If you do a search now for the top 100 songs of all time, you will get primarily music-focused websites as the top rank result. The top rank result is The Rolling Stone website. The Telegraph is at the bottom of page one, if it is even still on the first page of search results. We see similar things happening with other news publishers that have certain editorial specialties, be it celebrity news or political news or sports news that is still very good at claiming rankings outside of the news carousel. Just the normal searches of claiming rankings that fits with the editorial expertise. The moment they step outside of those little bubbles, they find it very hard to achieve good ranking even though they have all the link signals there. I’m thinking if we’re talking about index budget stuff, we’re talking more about Google trending websites to stay in their lane and adhere to the topical specialties. Not necessarily be able to claim search results outside of the usual areas that they tend to cover.
What would be the implications here for our audience? Maybe we also bring into account here that concept of EAT, which is talked about or written about in the quality rater guidelines; expertise, authoritativeness and trustworthiness. It sounds like there’s maybe some relevancy of EAT to this conversation as well.
Those Google quality rate guidelines are very useful because they hint at what Google expects from websites. We still don’t have to try and translate that to how Google would see that in the algorithmic sense because Google doesn’t want to manually curate the search results. Google uses the quality rate as feedback to find ways to algorithmically classify websites or on certain attributes like quality and trustworthiness. Links still matter a great deal to that and relevancy of links as well. A lot of the EAT signals will be translated into algorithmic length signals in Google’s ranking algorithms. It’s very hard for us to second guess exactly what Google uses in those algorithms. Also it tends to be niche specific as we see that in recent algorithm updates, especially in the past year. Specific niches, specific industries have been hit really hard and others have been entirely unaffected. It seems to me that Google is trying to narrow down specific quality signals that it sees are not being evaluated properly. Thanks to the Google quality raters in trying to find ways to algorithmically implement that and apply algorithmic quality evaluations to specific websites in specific niches without revamping the entire way to ranking algorithm works.Some things can be very trustworthy without having a great deal of authority. Click To Tweet
It’s a very iterative process with a lot of trial and error. For example, in early March 2018, we had a massive update in the UK at least that affected a lot of what we call tabloid news publishers very strongly who lost a lot of visibility and a lot of traffic. Then in April, we saw a refinement of that where some of those tabloid news publishers managed to reclaim a little bit visibility, whereas others stayed more or less where they were after the first update. That’s what we see happening now with the August update and the recent update. They seem to be refinements of what Google has been doing to try and find the golden ratio of the right signals. It makes it harder for SEO to gain the system and harder for specific types of news websites to very easily claim a lot of traffic from Google.
When do you think we’re going to be to the point where it’s going to be so opaque because of the machine learning algorithms that are going to dream up their own signals? Their own algorithms will be determined on their own, that it will be impossible for the programmers who wrote the code to see what the attributes are, and for us as SEOs to figure that out through trial and error.
We’re not far off that already. The thing about those machine learning system is that it’s very outcome-driven. How to get to that outcome is entirely irrelevant for both Google’s engineers and us as users of the search engine, as long as the outcomes are correct and high-quality search results that answer people’s questions. RankBrain was Google’s first stab at that and a pretty successful one where it became an outcome-based system. They just looked at the search results. If those results fulfilled the specific intent that the user had when they search for that and creating that implicit feedback loop so that the most useful results tended to resurface more and more. Because it was very outcome-based, it’s very hard to second guess exactly how the machine system got there.
That also makes our lives a little bit easier to a certain extent because if we managed to create content and create websites that fulfill a users’ demands, it fulfills the query, it does what people expect it to do. We have to worry less and less about the technical aspects of getting to that stage. The downside of that for me is it can narrow the opportunities that you get from service. There will be a very real risk that in every specific niche, there will be a few websites who tend to dominate them. They are seeing as the most useful or maybe because they have the most brand power and people implicitly trust them the most. Therefore, when they see that website pops up in search results, they click on that and never looked at one of the other sites. That again becomes a bit challenging then if you’re a small business trying to compete with that brand power.
It’s a brand bias.
That’s something we’ve been seeing developing Google’s ranking for quite a long time, where the specific attribute being put into the ranking algorithms tend to favor big brand websites.
Could you differentiate for our audience, authority versus trust? Those are two very important concepts to differentiate.
They tend to equate those to link value and link power in historic SEO context. Something can be very trustworthy without having a great deal of authority. If you have a very specific niche website that writes a lot of good content, produces some good research and insights on a very niche topic, it’s highly trustworthy but may not have any of the classic authority signaled like massive link profiles. Historically, those websites will have lost to other publishers who do have very strong link profiles but not necessarily have the same quality of content production. Google is trying to find ways to differentiate those two signals so that it can float the most trusted content that is also reasonably authoritative.
Sometimes they get it wrong. An example of that would be when Google had this whole kerfuffle about ranking Holocaust denial websites and search results. That’s one of those cases where the algorithm went wrong where it tended to reward websites more on its topical expertise. Holocaust denial websites had to write a lot about the topic and not look enough at the authority signals associated with that. Those sites shouldn’t be ranking for that and shouldn’t be seen as a trusted signal. That was an unpleasant side effect of Google trying to find the right balance there between authority and trust to rank content on those niche searches.
There is the old-world order of authority versus trust based on links where trust is the distance from a trusted site. Like is it one click away or is it ten clicks away, that thing. Then there’s maybe the new world order where Google figures out through perhaps machine learning, AI, other signals, other indicators that it’s a trusted site. Maybe it has a rigorous research methodology. These are things I don’t normally hear talked about in the SEO community. There could be new ways to identify trust beyond the current link-worthy one click away or distance away from a trusted seed site.
That’s the historic way of any page ranking and looking at each side of trusted websites and then seeing how many clicks it takes to get to your insight on average. An interesting development in the news space was when a company called NewsGuard was launched on NewsGuardTech.com. They also have a browser plug-in and this is an initiative of journalists from different backgrounds. They are evaluating news websites to see to what level they adhere to certain journalistic standards with regards to quality of reporting, fact-checking, accuracy and printing corrections. The browser plug-in shows a green, orange or red mark whenever that news website is listed on search results on social media websites or just popping up in any different contexts. It’s fascinating because it shows that it’s very hard to find algorithms to make those evaluations. You have to rely on humans to make semi subjective evaluation of quality and trust. Whether or not a publisher, a website or a specific piece of content should be given the prominence that other signals might say it deserves.
It wouldn’t surprise me if Google would start relying more and more on those human signals. That’s the whole point of that quality raters program as well to want to incorporate those human evaluations and trying to find ways to make them into the algorithm. They attempted with the machine learning systems with the RankBrain algorithms to circumvent the process to a certain extent. I don’t think they’ve successfully managed that because there are too many flaws and there are too many exceptions to the rule. It might be for 90% of queries but the other 10% skews the results then we get these untrustworthy search engines because they happen to align with the signals that the machine learning systems look for. It’s fascinating to see how Google is trying to develop this because in the end, Google wants to say it’s all algorithm. It gives them plausible deniability and says, “We’re not an editorial publisher, we’re just a search engine. Whatever we do, you can’t sue us for the results.” On the other hand, they want to mimic human evaluations as much as possible to try and serve the best possible results that people would appreciate and would also agree with them are the best quality search results.A lot of people see structured data as purely technical. For me, this is more about relevancy. You make pages more relevant so that Google can make a good sense of it. Click To Tweet
It’s only a matter of time now, at least according to Ray Kurzweil who is one of the top futurists in the world and a great predictor of the advances in technology. When we’ll hit these different milestones, he says 2029 is about the date that we will hit computers, technology being at human level intelligence, which is pretty scary but also pretty exciting at the same time. In the next few years, we’ll have expert systems that will be realistically as good or better than humans at these humanlike thought processes around, “Is this trustworthy? Is this authoritative? Is this valuable? Is this well-written? Is this credible?”
I’m not entirely confident about those predictions. We tend to underestimate the complexity of the human intellect and tend to overestimate the complexity of machine intellect. We will find artificial intelligence systems becoming very capable in narrow spaces like for example self-driving cars. In most contexts, they will be working fine but that’s a lot of contexts. Even for fairly narrow applications of AI like self-driving cars where humans can make decisions much faster and much better than machine systems ever can. We have the right context to make those decisions and we are biologically equipped to make those decisions in a split second. Our brain is built up of layers of layers of different types of biological computing, all the way down from the basic reptilian brain to the prefrontal cortex, which makes us self-aware.
We barely began to build machine systems that can mimic the reptilian simplistic fight or flight brain models, let alone make machines that can evaluate any aspect of daily life as a human can. The more we build those machines, the more advanced they get, the more we realize how far away we still are from modeling the human brain into machine system and trying to make it even remotely human-like in its capabilities. I’m a bit more of a pacifist than Ray Kurzweil on that particular topic. I don’t think we’re going to get human level intellect anytime soon. I do think we’re going to get machine systems that are vastly superior than humans in very narrow areas. A general AI that can think on a par with a human in any given context, that for me is not something we have the capability to achieve, let alone going to be achieving it in the next ten to twenty years.
I listened to an interview with Ray Kurzweil by Peter Diamandis, who’s the creator of The X Prize. That was mind-blowing. I’ve been a follower both of those guys for a while now and other futurists like John Smart and so forth. I don’t know if optimistic is the right word, but I’m a realist. The Law of Accelerating Returns, how the information technologies in particular, the speed of evolution of these technologies has been increasing because of the price performance doubling every year or so. That’s been holding up for the last 50 years and there’s no end in sight to that continuing. Do you follow any of those futurists and what they’re prognosticating?
Absolutely. I’m a huge fan of that blue-sky thinking and we should always try to aim for those things. I tend to read a lot outside of the technology sphere as well. If you look at the history of science, people 100, 200 years ago thought that they already knew everything that there was to know. These predictions about the next leap in human evolution to the posthuman development that Ray Kurzweil and similar futurists are predicting. They are nothing new either. Every time we seem to be able to get to that level, something new about how biology works and how the universe works are discovered that makes us realize we’re still very far off. The thought of uploading your brain into a computer is as old as computers are. The closer we seem to get to that, the further away we seem to get. We are starting to realize that the human brain is much more complex than we ever could have begun to imagine.
We are at the stage where we have no idea how consciousness works so how could we possibly replicate that into a computer? You need to first understand something before you could digitize it and then create a machine that can mimic it. This is something that was predicted back in the 1940s and 1950s. We’re 70 years later and still, it’s a couple of decades away. That line always seems to be moving ahead. The more we know, the more we learn about the universe in general, the human psyche and the intelligence that we take for granted because we seem to have it, we never contemplate our own intelligence. The more we know about it, the more we realize how little we know and how far we still have to go.
We’ll agree to disagree on the timings and how quickly we will end up.
Don’t get me wrong, for me, it can’t come soon enough. I’d love to be able to upload my brain and be semi-immortal in a digital cloud but I’m preparing myself that’s not going to happen in my lifetime.
There’s this thing called longevity escape velocity, where there is a time in the very near future, even within the next five or ten years. If you managed to stay alive that long for every year that you live after the longevity escape velocity, technology provides you an extra year of life beyond that.
There was someone who said that the first person who will become 1,000 years old has already been born. I want to believe that, but it also scares the crap out of me. It goes against everything that we as people are set up to exist. Mortality is part of our existence and a lot of what we do like procreate and advance ourselves so that we can build better lives for our children. We only do that because we know we’re going to die. What happens if we stop dying? Will we still go forward or will we find ourselves going around in circles? Will that make for a worse world rather than a better world? That’s interesting philosophical questions I suppose to have.
Let’s bring this back to practical stuff with our audience. I want to get out of how to do best practice SEO. We gave them some insight into how the crawler, indexer and query engine work and some discussion around crawl budget and EAT. Let’s talk about your process for doing an SEO audit. What is an SEO audit look like and the scope of it and the benefits of it?
A lot of SEO agencies and consultants sell SEO audits as finished products and I’m guilty of that as well to a certain extent. The importance of an audit should always be derived from the context in which it is done. For me, an audit serves a greater purpose which is always to help the client become more successful online in whatever context that happens to be the case. I try to always cater the outcome of the audit to that overarching goal. Be it an audit for example, a new website replacing an old website or an audit for a website with a very old tech base that is not going to change. It always tends to define and guide how I make the recommendations and the type of recommendations that I make. They could for example with an old website based on a very old text tech be like, “We’ll go out and start over.” That’s not going to be realistic in a lot of scenarios. You have to cater your recommendations to that specific context and say this is the more or less immutable truth of this website. What can we do that is not immutable that can help this website perform better?
You have to have conversations with the product owners, the developers and the marketers who own that website and who are involved with it to try and find that context. It’s not a matter of me sitting on the outside and throwing the report over the wall and saying good luck. Within that context, the SEO audit itself have that mental model of SEO with technical SEO focusing on the crawler relevancy, on page SEO focusing on the indexer and authority links, trust signals focusing on the ranking engine. There is a little overlap between those three. That model helps me create an audit report that makes sense and speaks to different stakeholders within the business. The technical part you can primarily leave that over to the developers who keep the website taking over. The marketers tend to love the on-page stuff as well as the authority link building stuff. Some of the more technical on-page stuff like structured data speaks to both developers and marketers.
To the people who pay the bills, the executive level, you want to make sure they are a part of that loop as well. Maybe you would like an executive summary or a very high-level overview of the action points so that they can feel that they have a bit of ownership there as well. Not just something that goes over their heads and they have no comprehension of what it is supposed to do. It’s trying to make audit useful and try to understand the context in which you deliver an audit for a website. Some website owners want specific audits for specific aspects of the site. I get contacted quite a lot to do an audit purely focusing on the technical aspects of the site, which I enjoy the most because I come from an IT background myself. That technical digging into a website to try and find ways to make it faster, to make it better, to make it more optimized. I get a lot of satisfaction from that. I never consider myself a marketer. I was always more of a technology geek. For me, the technical side of SEO has always been the most attractive in that regard.
The rabbit hole goes pretty deep with technical SEO. Let’s take some points from a technical audit. Are you checking to see for example whether something is accessible and executable from the particular version of Chrome? Maybe we should back up and describe how Google is based on an old version of Chrome, what a headless browser is and all that stuff.
It’s important to understand that Google has a two-stage indexing process. The first stage is based purely on the HTML source code and that is nearly instantaneous. I’m nearly sure that when Googlebot crawls the page, it does that first pass indexing at the same time. That’s why the indexing process is so fast and that’s also why Google tends to conflate crawling and indexing when you talk about Googlebot because it’s the same system. That first pass indexing system is based purely on HTML source code and the raw code doesn’t do anything. It doesn’t render it. It purely analyzes the code and tries to extract what that page is about based on title, tag, metadata, on-page content, headlines, etc. Then there’s a second stage indexing process in which Google renders the page. The headless browser is a browser without a user interface. It’s purely under the hood code without something materializing in your browser’s screen.An SEO audit serves a greater purpose which is always to help the client become more successful online. Click To Tweet
A couple of days later, Google renders the page in Chrome 41 in the headless browser, sees the internal links and then starts calling those links and has to recalculate the internal link graph to see how many pages rank each of those sub-pages get. Then again at that second level, it stops because the source code doesn’t have any links. It has to wait again for the indexer to render those pages. A couple of days or sometimes even weeks later and then it can crawl the third level of pages and has to recalculate that internal link graph again to see how much page link is sent to those third level of pages. If you combine that then with what we know about our Google prioritizes crawling of pages, Googlebot tends to prioritize crawling of pages that it already knows and that have a high amount of page rank associated with it. The higher the page rank, the more often Google would try what we call recrawl, re-index the same page.
I don’t care what framework they use because they all have their pros and cons. I’m entirely agnostic on that topic. If you have a good developer who knows what they’re doing, it doesn’t matter what framework they’ll use. You would still find a good way to get them in the end anyway. For me, it’s purely about what is presented on the front-end to both users and search engines. The methods to make that happen is less important to me. Be it Angular, be it React, be it Vue or a universal approach or whatever it ends up being. I’ve seen terrible websites launch with React, which is theoretically one of the more search engine friendly platforms. The websites were entirely unsuitable for search engines.
Let’s move on and talk about mobile. I know you’re going to be speaking at BrightonSEO, doing a training workshop on mobile. Can you give us a bit of a rundown on some of the key points you’re going to share? I know we can only go so far because that’s a three-hour workshop. Some of the key points that you’d like our audience to walk away with in terms of mobile.
It’s something that even nowadays some SEOs and marketers tend to ignore a bit because they think, “Our target audience isn’t mobile first.” I did a corporate training not long ago for a client who makes industrial grade digging machines. You’ll think, “That’s a very B2B-focused industry and that’s old school construction type industry,” but even they said they had about 30% to 40% of the traffic coming from mobile devices. That’s a significant chunk of traffic. That’s just people in the field looking up, for example, services manuals for the machines that they happen to have in the field there. You never underestimate the adoption of mobile in any given industry. It’s probably bigger than you think. Also, you have to be careful with looking at your website data and thinking we have a minority of mobile traffic. Maybe that’s because your website isn’t particularly mobile-friendly.
It’s important to understand your audience and understand how people use your website and use your online presence and try to optimize for that. When it comes to mobile optimizations, you want to have full parity between desktop and mobile as much as possible. The older ranking signals that are present to desktop were also present on mobile. Even with responsive design, you have to be a little bit careful there because theoretically with a responsive design, the source code is identical regardless of whether you look at it from a desktop or a mobile. In the complete page, the links might be a different area. Some content might be hidden on the screen and that can have an SEO impact as well. For example, we have this aspect of Google’s ranking system called Reasonable Surfer Patent where Google applies certain measurements on the likelihood that someone is going to click on the link to determine the value of that link.
If there’s a link that’s fairly prominently present in the body of a page, it means people are very likely to click on that link which means there will be a higher page rank associated flowing through that link. Whereas a link that’s fairly deeply hidden on the page, maybe it’s in the footer or maybe it’s way down in the content, it makes it much less likely someone’s going to click on the link. That link would carry less value and less weight in Google’s ranking. The same applies to mobile. The mobile we sometimes have to play a few tricks with the page’s layout to make sure that people get a good user experience and could still read what we want to read. It also means that sometimes links are placed in a different context or different place on the page entirely. That can affect the amount of page, the number of flows through that link. The moment Google is in the process of switching over to mobile first indexing, which means that historically Google has used a desktop user agent to crawl websites primarily and the most have a secondary mobile user agent to crawl a website to compare the two. That’s going to switch around.
For a lot of websites, it already has switched so that Googlebot will crawl the website primarily from a mobile user agent. Use what it finds when it crawls that website with a mobile user agent to index the content. If there’s any content that’s missing on mobile that you do have on desktop, Google is not going to see it anymore. It’s not going to index that content anymore. The same with links, if some links are very prominent in desktop and not on mobile, that can have all kinds of repercussions. Your website will rank and Google has to crawl your website once it switches over to mobile for its indexing. Those are things you have to keep in mind. Even with a responsive design, you have to do some extra checks on your website to make sure that it presents as identical experience as possible and mobile as it does on desktop, including old things like structured data, could only put tags, HTML tags, pagination and all those other aspects. Make sure that you give your mobile sites the best possible chance to rank in Google search results.
If you have a different mobile site that is perhaps dynamic serving instead of responsive or a completely separate mobile site with different URLs, then you need to have other considerations that you’ve got to check for, things like the bi-directional linkages. If you have a separate mobile site, rel=alternate needs to point to the mobile site from the desktop site and the rel=canonical on the mobile site needs to point to the desktop site. You’re bi-directionally linking the two sites together. If you’re dynamic serving, you need to have the vary HTTP header presenting so that Google knows that you’re varying the content based on the user agent, changing the source code essentially for mobile users versus desktop users so they don’t think you’re beating and switching doing some cloaking.
What are the fixes that are required then in that scenario? I’m guessing a lot of our audience are either doing the infinite scroll thing or going to in the future.
I’m always a fan of implementing old-fashioned pagination, and it can behave differently on the mobile screen than it has in the source code. For me, it’s always important to make crawlable links to Google when you go to page two or page three of a certain list of content. That’s an old-fashioned HREF tag with the page two URL in there, which should be able to be uniquely accessed and only show the content they show on the second page of your section or your category. There are very different ways to implement that. The easiest way usually is when people scroll down, you load page two of the articles when they head to the bottom. Then you use the HTML 5 push to update the URL to be page equals two or whatever it is that you want to implement there to indicate pagination. As people scroll down and you do the same for page two, page four, page five, etc. As long as in the HTML source code you have that HREF tag in there with that URL pointing to the page two, page three, page four. Ideally as well with pagination meta tags and the page as well so that Google knows it’s looking at the paginated list of pages.
You’re talking about the rel=”prev” and rel=”next” link tags going to the head portion of the HTML?
Absolutely. Google doesn’t perform actions, Google doesn’t scroll. What is loaded initially is the only thing that Google is going to see. It also means any links to products or articles or whatever it is that is listed on the first page, those are the only links that Googlebots are going to crawl through and the only links that are going to be passing internal link value. Any link that requires that Ajax or whatever technology you use to load page two, page three, that’s going to be invisible to Googlebot for all the time and purposes. That content links underneath there are not going to perform as well or might not perform at all, compared to the old-fashioned desktop links.
Another thing that oftentimes the mistake that happens developers think they know SEO. Then they’ll start implementing canonicals in a way that they think is helpful but it’s thwarting Googlebot and that is to rel=canonical to the very first page of the pagination series from all the other second to the end pagination pages.
That’s what a lot of my perpetual buck bears they keep it wrong in almost every implementation of pagination.
Why don’t you describe what the end result is in that scenario because it is bad for SEO?
Canonical tags are intended to identify which version of a page you want to rank if there are duplicate versions of a page you want. For example, you can create a duplicate version of the page very easily by adding some tracking parameters to the end of it. The canonical tag is supposed to signpost to search engines, “Don’t use this version, use this version instead.” When you talk about paginate list of content say page three of a list. The canonical version of page three is just page three. It’s not duplicate content. Page three shows different content on page one, page two, page four, etc. The canonical tag for page three should just point to page three. The only exception to that is if you have a view all page. That’s where a lot of people get the confusion from because in Google documentation they say if you have a view all page, you can see all products and all articles, you can point to canonical tag with that one.
The problem with that is if they do then view all page theoretically at least what’s going to show up in the search results and that could be a very slow loading, a very long page that is not a great user experience.
What I see is if you use canonical tags incorrectly, Google tends to ignore them anyway and do their best guess of what are the things you mean through the internal link structure. If you have a view all page that doesn’t have any external or even internal links pointing to it except the canonical tag, that might not rank at all. It’s just going to use whatever it thinks is the best possible page and sometimes might not even rank the category page at all in the first place. It’s all about consistency of signals. Trying not to confuse search engines so that it has a very clear picture of where each page sits in the overall structure of your website that gives the best chance to select the right page in the right context when it comes to ranking pages from your site and search results.
Speaking of consistency of signals, a lot of people don’t recognize or understand that the XML sitemaps is used as a canonicalization signal by Google as well. If you’re including nine canonical URLs in the XML sitemap and that’s a differing information, a differing signal from what you’re sending in the canonical tags. That’s not good.
It’s a powerful signal. In the XML sitemaps, you will never replace an old-fashioned webcrawler. A crawl oversight is still supreme when it comes to discovering content and evaluating a ranking power of every URL. XML sitemap is a canonical signal. I always try to explain it the way that makes sense to people when they try to explain this to others and that I see an XML sitemap as an extra internal link to a URL. That if a page is listed in an XML sitemap, it’s the same with having a link from your home page to that particular URL. That’s a very powerful link to have. That’s why you need to make sure that the XML sitemap contains the right canonical and destination URLs, so that again you reinforce that signal. You’ll make sure that those pages have the poorest ranking power and there’s no confusion there.
One last topic related to Accelerated Mobile Pages, AMP. What’s your take on that? Is that essential for our audience to implement AMP or is it something that we shouldn’t be doing Google’s job for them?
I have very strong opinions on AMP. I work with news publishers and we don’t get a choice. We have to implement AMP or we lose a lot of visibility and mobile search result because the top stories carousel in mobile is exclusively AMP. It’s very rare you see a normal AMP story there but there’s nothing that you can’t do with normal HTML that you could do in AMP. Google makes AMP faster because Google keeps. Google will preload AMP articles from their own Google AMP cache, which performs its own performance optimizations and then pre-render those homepages in the Chrome browsers especially on Android phones. When you get that search result in front of you so that when you tap on the AMP article that somebody loaded in the background, it feels like a very fast, near instantaneous experience. Inherently, AMP isn’t any faster than old-fashioned HTML websites are. Google wants to force them down your throat because if you have your website in AMP or AMP version of your website, you’re making Google’s life a lot easier.
Those articles can be very easily crawled, very easily indexed as well because there’s integrated structure data, which is mandatory. It’s very clean HTML so that Google’s indexers can make good sense of it. The World Wide Web is called the World Wide Web and not a Google Wide Web. Google’s got this the wrong way around. Google is forcing websites to create their own pages in the way that Google can easily understand. I’m with them to a certain extent on that. With AMP overreaching there that they want to make the entire web like AMP so that Google has no effort at all to crawl and index web pages anymore. Anything that’s not AMP complied they can ignore and throw out. If we start embracing that at scale, it means losing the foundation of the free and open web. Theoretically, AMP is an open source project and practice most of the code is made by Google engineers. Most of the decisions about the AMP standard are made by Google engineers.
Also, Google has a history of abandoning the source of pet projects when it loses interest in them and letting it flounder or just pulling the plug entirely. The websites could invest a lot in trying to implement AMP at the moment, and there might be some temporary benefits from that. In the long run, Google might just stop giving prominent rankings to AMP pages of news or in another context like recipes and fall back on showing the best possible results regardless of the technology that is built on. In this case, all the efforts you put into creating an AMP version of your website is gone to waste when you could put the same amount of effort into building fast loading websites based on already existing HTML5 standards. There’s no reason why you shouldn’t be doing that in the first place. If Google cared about fast loading web with optimal mobile experiences, they could just enhance the already existing ranking signals that they already have for load speed and mobile usability. Rather than try to create a secondary standard for the web that nobody asked for and nobody wants. It’s not necessary in the first place.
You’re begrudgingly implementing AMP because there’s no other way when you’re dealing with news publishers. They’ve got to play Google’s game in order to get in that carousel. What about the other sites? What about non-news publishers, should they employ an AMP?
No, absolutely not. The only reason AMP still exists is that Google feeds it. Google forces people to accept it. For me, that’s not the hallmark of the good standards. If you introduce a new technology and it doesn’t survive without you forcing people to use it or giving it massive prominence in the search results, then you’re not creating a standard that people asked for that people meet. That’s my main objection to AMP. Other technologies Google use like Angular or Reactive Facebook use, they survived because of the merits of those particular technologies. We may moan about SEOs, they do fulfill a purpose and developers can build better websites or build websites faster with those technologies.
AMP on the other hand, if it wasn’t for Google’s artificial feeding of the standard, it wouldn’t exist anymore. It would have floundered and disappeared. Google has forced websites to adopt it like news websites. Google is trying to make people build entire websites in AMP and needs a lot of Google engineering behind it. It’s the only reason that the AMP standard is still in existence. It’s entirely unnecessary. If you’re not seeing your own particular niche dominated by AMP carousel and AMP articles, then by all means stay away from AMP. Rather invest your time and effort into making your websites in old-fashioned HTML and CSS as fast and as mobile friendly as possible.
This has been a fabulous fascinating geeked-out interview. I appreciate your time, Barry. If somebody wants to work with you, how would they get in touch?
My website is www.PolemicDigital.com. I’m also on Twitter @Badams, which is the only social media network I’m still a member of because I deleted Instagram and Facebook. Google me, Barry Adams SEO, you’ll tend to find me quite quickly.
Barry, thank you again. Thank you to our audience. I appreciate your dedication to making the web a better place by making it more Google-friendly, crawler friendly, indexer friendly and ranking engine friendly. It’s time to implement some of what you’ve learned. We’ll catch you on the next episode of Marketing Speak. In the meantime, have a fantastic week.
- Polemic Digital
- Barry Adams
- Dawn Anderson – previous episode
- Ray Kurzweil interview with Peter Diamandis
- Bartosz Góralewicz – previous episode
- Accelerated Mobile Pages
- Twitter – @Badams
Your Checklist of Actions to Take
☑ Understand the three main processes on a search engine: crawler, indexer and ranking system. Having an in-depth understanding of each process gives me an opportunity to build pages that will rank better in search engines such as Google, Bing or Yahoo.
☑ Make sure that my pages adhere to Google quality rater guidelines which are expertise, authoritativeness and trustworthiness.
☑ Be informed about the conceptual difference of authority versus trust.
☑ Revisit Dawn Anderson‘s episode and gain a deeper understanding about crawl budget.
☑ Know the importance of an SEO audit. Barry states it should serve a greater purpose which is to help clients become more successful online.
☑ Communicate openly with the product owner, developer, and marketers of the website when performing an SEO audit. Provide tips on how a website can perform better.
☑ Ask these questions, “What’s the lowest bandwidth and CPU capabilities I expect that my user will have?” Barry says we must aim to design a website for the lowest common denominator.
☑ Research on Accelerated Mobile Pages, AMP, and gain knowledge on how I can use it for my website’s visibility and mobile search result.
About Barry Adams
Barry Adams is an award-winning SEO consultant specializing in SEO for news publishers and ecommerce websites. He’s been doing SEO since 1998, and works with some of the world’s largest media brands. Barry regularly speaks at conferences around the world serves as chief editor at StateOfDigital.com.