In this Episode
- [01:07] – Stephan starts off by geeking out a bit on SEO, pointing out that there’s too much emphasis on the fundamentals.
- [01:46] – In terms of data mining, how do you make data-driven decisions that are based on more than a hunch? In his answer, Bill talks about some of the things he does to offer value to people.
- [07:28] – Stephan goes back to recap a few of the things Bill has been saying to ensure they’re clear for listeners. Bill then clarifies and elaborates on some of his earlier information.
- [10:04] – One of the amazing things that Google is willing to give users is impressions. Bill then describes how to do what he calls the “poor man’s yield analysis.”
- [12:45] – Bill discusses the importance of a statement answering the question “why us?” This should be shorter than an elevator speech.
- [15:39] – What are some of Bill’s other favorite search tips and tricks to find out what’s going on with your SEO?
- [17:47] – Stephan asks Bill a little bit of a trick question about how you tease out the HTTPS URLs. In his answer, Bill talks about canonicals.
- [21:48] – What do you do when your canonical is not getting obeyed?
- [23:32] – Stephan talks about what he finds when he audits clients’ websites. He then uses the example of a specific client, and Bill offers his advice for the situation Stephan described.
- [26:14] – Stephan points out that XML sitemaps are supposed to only have canonical URLs in them.
- [30:56] – What should listeners pay attention to with regards to HTTP migrations?
- [34:12] – Bill discusses chains of redirects and what he would recommend that listeners do about them.
- [37:17] – We learn about the benefits of using a 410 instead of a 404, with Bill describing the intended uses of each.
- [41:36] – If you want to reduce the amount of wasted crawl equity or crawl budget, would Bill recommend disallows, noindexes, or 410s to carve pages out of the search results? He likes to do a plain noindex nofollow, he explains.
- [44:35] – Stephan describes a common approach that he thinks is a huge mistake, which is to disallow these pages. He and Bill then discuss this.
- [48:39] – Bill discusses whether link sculpting is still possible.
- [51:30] – We learn about the benefits of being able to have a cascading drop-down and the impact of hamburger menus.
- [55:04] – We move into the lightning round. What’s Bill’s take on iframes from an SEO standpoint?
- [55:27] – What’s Bill’s take on 302s versus 301s?
- [56:58] – Is there a tool to track how many clicks away your pages are from an outside link source?
- [57:53] – Are there any URL best practices that Bill thinks are no longer relevant or no longer true?
- [59:24] – Bill elaborates on what he meant by saying that there’s no need to localize URLs.
- [60:58] – Does Bill have any favorite tools for log analysis and looking at what spiders (like Googlebot) are doing?
- [62:38] – Bill talks about where listeners can find him.
Too many folks who know how to build a website, or use keyword research tools call themselves SEO experts. Today on this show, we have a real SEO expert, his name is Bill Hunt. Bill is the president of Back Azimuth Consulting and formerly was in charge of search strategy for Ogilvy and for IBM. He consults with high profile companies and has his own suite of awesome tools. He co-authored the book Search Engine Marketing, Inc. In this episode, he’s going to share some really ninja SEO strategies and tactics, you’re going to really enjoy it. I’m your host Stephan Spencer, let’s get on with the show. Bill, it’s great to have you on the show.
I’m glad to be here. Great company of people come before me so hopefully I can do you justice.
I’m sure you will. Let’s geek out a bit on SEO because I think too much emphasis is more on the fundamentals when people are talking about SEO. They don’t get super geeky and I want to get super geeky.
Let’s talk about data mining. How do you make data driven decisions that are based on more than a hunch? Because a lot of SEO is, “Let’s make a hunch.” Or “Let’s follow the best practice that everybody else is doing.” What’s your approach? What’s some of the magic that you do differently?
I look at it a number of ways. For the last probably six years that I’ve really been into this, I think the number one thing I look at is yield. I know that’s sort of a funky word for search but are we getting enough out of what we have? If you look at almost anything out there related to keywords today in search, it’s all about more. How do I find more? I’ve had people, we’ve imported 10 million words into a database and they’re like, “How do we get more?” Are we doing enough with what we have? I think that’s the first data mining. What I mean by yield is if there’s a certain amount of opportunity, think of it like airline seats. An airline really makes money when the entire plane is full at the higher side of the equation. That’s one thing I look at, what is a fair share of search? We’ve seen for years these rank to click curves and stuff like that but we can use a simple number, 5%, because there used to be, it doesn’t work as well now but used to be 10 paid ads, 10 organic or free listings and a 1 out of 20 chance of being clicked was 5%. When somebody is doing a search and you’re getting less than 5%, we clearly have a yield problem, any ranking in the top three. We know that on a branded query, it’s closer to 50%. On a non-branded, it could be anywhere from a fraction of 1% to 50%, 60%, 70% about the relevance. I like to spin that number. If you’re trying to justify that 5% is pretty high, that means 95% of the people who wanted what you sell and maybe a market leader don’t necessarily want it from you. When we find these, we can say, “Why aren’t they clicking on us?” We have the high position and a lot of times, it’s our description. Other people like Marcus Tober have talked about intent. Why did someone do that query? That’s what we’re looking at, is your snippet matching that intent? That’s a first one we do, is yield. Do we get enough share of that? Another really interesting thing that very few people do and I’ve been trying to get people to do. That was the second reason we built Data Prism was to do co optimization. What happens when paid and organic are there or together? Most of the time, everybody immediately goes to the cannibalization effect, that paid is stealing that free traffic. We’re doing a few more of these models now for people because this exact-ish match big Google has, where now, it’s not literally the same phrase but multiple versions of that. People are starting to see what’s happening. There’s three ways you can mine the data. When we have both, when we have one, and what happens when we add the other one. If you’re not ranking for something, what happens to the data mix when we are? We already would not provide it and have a natural limitation. But if you set up your Search Console correctly, what I do with my enterprise brands is with each category of product that they have, we build a sub folder. We have some people that might have 5,000 to 10,000 sub accounts within Search Console and we can pull in all that data. Now, we’ve got a larger sample but literally, first thing we do is of your top 20 most expensive words, how are you doing organically? Not how are they working to get bit of this, anywhere that you’re willing to pay that much more for, are you even ranking and more importantly, does your SEO team even know that you’re buying it? That’s the other data mining thing we do, is just try to put all the sets together. What is paid doing? What is organic doing? What are we getting out of analytics? What’s the total universe and how much of that do we have in a trading capacity, that we’re doing something with. It’s very interesting that people use tools like BrightEdges, Conductors, Covarios… They have a natural limitation of what they can do by say, 5,000 words, 10,000 words because of cost. You may not be looking at the performance for a wider set of words. Those are some of the type of things we’re doing. We’re finding some amazing opportunities for people. In one case, we found for a company, when we then sliced that by phases of the biocycle, take words, apply biocycle attributes to them and what share of that are you getting? Many people do a lot upfront on awareness. Some people do a lot on the last query, which is typically more of a branding query. But there’s a few in the middle and we find some companies maybe missing as much as 85% of that. That’s where the real action is happening because people are trying to make a decision. This is often missed by content marketing because it’s not top of the funnel and it’s not bottom of the funnel that we’re often monitoring with like AB testing and stuff like that. Those are some of the models that we’ve been spending the most time on that have been getting the most value for people.
Let’s recap a few key points because it’s really important that our listeners understand first of all what exactly is not provided and it’s the bane of the SEO’s existence but some of our listeners won’t know what we’re referring to when we talk about not provided. And then your gem of a piece of advice here of build sub folders in Google Search Console is genius so that you can get more data around keywords to get around this not provided issue. Let’s further flesh this out a bit for our listeners who aren’t as familiar with this.
Sure. Trying to be politically correct with not provided, Google had the idea, well not just Google but a lot of search engines had the idea that by giving us the words people use to come to the website, that we could use that for evil marketing purposes. They basically strip it out because you have a referral. Whenever anybody visits the website, it tells you where they came from and from a search engine, it would tell you exactly what keyword phrase they use. That was a gold mine for us to say, “Hey, we have eight pages. It had this many visits from this keyword.” When that went away, not provided literally is the statement that’s in the analytics tool that says, “Yeah, you have visits from organic search but we don’t know what phrase they use.” The only place we can get that truly is through Google Search Console. For the old timers, that was Google Webmaster tools. In there, Google is giving us a sampling of those words. As we were saying, create the additional folder. Typically, when you download it into an Excel file, they give you about 500 rows to 1,000 rows. If you use the API to extract that data, you can get up to 3,000 rows. Imagine if you had 10 sub accounts. One for blue widgets and green widgets or however you segment your website, you can get up to 3,000 for each of those or 1,000 on a simple download. It extends the amount of data at a keyword level where you have nowhere else to get it. There are some tools that are starting to do some approximations based on ranks and click ratios and stuff like that but I like to start with let’s take what Google is willing to give us and if we want to apply any other calculated data across that, then go for it but at least take what they’re willing to give you.
One of the amazing things that Google is willing to give us that we can’t get anywhere else is impressions, the number of impressions where people have seen our search listing but haven’t clicked. That’s gold.
Exactly. And you get the clicks. While we’re on there, okay, how do we do the poor man’s yield analysis? That’s a perfect segway into that. What I tell people to do is go into your Google Search Console and simply go into search analytics, click all the box at the top impressions click rate and number of clicks and sort it by impressions. Simply look down that list to see words where you’re ranking in the top three to five positions organically, where you’re getting less than a 5% click rate. You don’t need any fancy specialty tools or anything, just simply use that. You can export in Excel, put a ratio, highlight it, whatever you want to do. But that’s going to give you some things where you’ve already did the first part of the battle is your ranking. It’s just people aren’t clicking and you can see that plainest day because exactly as you said, you can see the number of times you were shown, the number of times you were clicked and if you’re not getting clicked and you’re ranking well, then there’s your problem, is that your description is not connecting with the user. It’s fascinating how many people never look at that description even for their brand name. When they do, they’re immediately shocked that oh, it’s just a bunch of countries or it’s a bunch of menu navigation. That’s all someone has to choose to click. They’re not going to click. They’re going to click on somebody maybe ranking higher or even lower than you that has a more compelling and interesting offer to answer their question or suit their need.
The description, the snippet portion of the search listing is dependent on the search query. Sometimes, you’ll get a list of countries or you’ll get copyright all rights reserved sort of mumbo jumbo at the bottom of your footer that’s being used. Sometimes, that will be the meta description but it’s not going to always just be the meta description if you’re not using any good keywords in your meta description and somebody’s putting in a bunch of important keywords and it’s in a body copy but not the meta description. You may end up getting a snippet served up by Google that looks like gibberish because it’s being extracted from different parts of the page.
Right. An interesting thing, that concept, how do we give them something to use? For lack of a better word and I still use it today whether I work with IBM or not, is the why IBM statement. Because one of the things with scale is you have to give people a concept or a set of rules or a set of parameters that they can repeat time and time again. We created this thing called the why IBM statement, which was essentially a sentence or two, somewhere between 145 and 200 characters. I try to stay away from actual character counts but people need parameter. Somewhere in that ballpark, that really said why us and as you said, contained the primary keyword phrase, and you can have multiple of these on a page, and what you’ll find is that if you write something like this, it answers the question why us in the context of that product. We are the market leader in x and we have point, point, point. But if we keep it brief, it’s not even as long as an elevator speech and you can have multiple of these on a page. What you said is exactly so true and if you want to test this, geeking out with a little bit of a formula, you can actually take a keyword phrase and then go into the search box on Google. After the phrase, put in site colon and then put it your domain name. First thing it’s going to show you is all the pages that relate to that. What you’re going to have is you’re going to have the first page you type. And then if you change that query, you’re going to even slightly putting in a by cycle attribute or anything like that, even changing the color, and see if Google changes your description. It’s pretty interesting when people do that. You can start to see how and where by looking at the content on the page. They’re pulling that information for your search result listing. If you don’t get it, if there’s a particular query that you’re not, then you need to go in and make some changes to that page.
Yup. That’s a great tip. I love geeking out with query operators and different advanced search approaches with Google. In fact, I have a whole book on it. That’s Google Power Search. I’m coming up with a second edition of that. Listeners, keep an eye out for that. That’s going to be coming pretty soon in the next month or so. Besides keyword and site colon, and there’s no space after the colon. That’s something that a lot of people mess up. They won’t work if you put a space after the colon. Site colon and then your domain name, what will be some other of your favorite advanced search tips and tricks to find stuff and do a forensic analysis on what’s going on with your SEO?
The other thing too, if we break down SEO, I like to keep things simple. I break SEO into four parts. Indexability, relevance beyond page bits, authority, and lastly is clickability. We were talking click ability with yield analysis and that. The other thing to look at is the number of pages that you have. Because if you don’t buy the lottery ticket, you could have the winning numbers but without that ticket in your hand, you can’t win. On an enterprise level, which is where I typically work, it’s interesting how many people never look at how many pages are indexed. One thing you can do is we can use that same site operator and just leave a keyword phrase off of it and then go ahead and put in just the domain. With the www, without the www, with HTTP, without HTTP, all of those things allows us to see, it’s not a completely accurate number but we can see how many pages. If you have 50,000 pages, 10,000 or even 100, and this comes back saying it only found 10, then clearly, we have a problem. Too many pages is a problem and too few pages is a problem. I think that’s a really awesome diagnostic tool for people to see because starting with that, you can start to get an idea of do you have a crawl problem. On a page level, similar type operator, you can use info colon or just the info and the colon and put the URL and it’ll actually tell you is that URL indexed. And then information about that like the cache date. You can actually see the text as the search engine site. That’s probably two other that are really good forensic diagnostic tools that you can use to see at a URL or at a domain level.
Yup and I love going directly to the cache by using cache colon and then the URL. That’s another nifty trick but yeah, I use the info colon all the time to see if it’s indexed. Good stuff. Here’s a little bit of a trick question. How you do tease out the HTTPs URLs? Because site:HTTPs://www.yourcompany.com doesn’t display only HTTPs URLs, it’ll also display HTTP URLs.
For me, I built a tool that I can specifically query those. But I think the way I would do it just using the operators would probably an allinurl and using the HTTPs and the domain so that would bring back just those with HTTPs.
That’s very clever. I like it. You could use allinurl or inurl. You don’t actually need allinurl, do you? If you use inurl:HTTPs as the additional operator, along with the site:, HTTPs: or even just your domain .com because that HTTPs portion gets ignored, site:, your domain .com and then inurl:HTTPs. That should work.
You’re specifically telling it, it has to have that element of HTTPs.
Very cool. You’re a ninja.
One other thing while we’re here, while we’re talking about this is when you do that info:, one of the things to look at is is it the actual URL? Because this is one of the things that I built into it my, for lack of creative word, index checker tool is we’re finding now, with a canonical or the illusion of a canonical, and for those that don’t know what canonical is, it’s just simply saying, “Hey, if you find this page, we prefer that you use this other version of the page.” It’s really handy when you use a lot of tracking parameters on your pages. It strips all that out and consolidates your link value to the one is look at exactly, is the URL that you queried the one that’s there? We see this a lot with international and with canonicals where people have implemented them together. A very interesting example of this, I was at a conference in Australia and the photographer of the conference came up and he asked me a question about international because all of their Australian and UK and Singapore pages disappeared from Google. They had no idea why. The only thing they did was they put in this href and he’s the photographer. He just happened to be in a meeting where they were talking about how this devastated their traffic. We just did a simple look and what had happened, somebody read a blog post that said when you have international pages, you want to use a canonical to the mother ship page. If you’ve got a just a regular .com or if you’ve got US as your global page, canonically to that, which was completely wrong. When they set this tag and you came, Google came to the Australian page, it read the directive you gave it. Hey Google, if you found this page, this Australian page, we’d actually prefer that you use this US page. In this case, it was the US home page. Google just follows the direction you gave it because it assumes you know what you’re doing and simply said, “Okay, we’ll drop all of your Australian, Singapore, and UK pages and we’ll take these US pages.” Again, this was something that they read, didn’t really know how to use it, implemented it, and it was potentially devastating to their business. This info colon query can help you see some of those things.
In regards to the canonical link element or canonical tag, as many people colloquially refer to it, it’s only a hint. As far as Google’s concerned, they may or may not obey it. It’s not an absolute directive.
What do you do if your canonical is not getting obeyed?
Most of the time, in my experience, when they don’t obey it, you’re giving them mixed signals. I think when they say they may or may not is sort of a cop out. Where I see them not obeying it is when you’re submitting. Like in this case, if they were submitting an xml site map, they clearly listed these Australian URLs as something that Google should index but when they come to visit the pages from that laundry list of pages they should have and find that they’re being told to use another one, Google is smart enough to say, “Hey, you do not know what you’re doing on one of these two. We’ll take the chance that the fact that you submitted a site map tells us you do want those.” And they’ll ignore that canonical. The only other time I’ve really seen it in an enterprise is when again, so many signals are pointing to a different page. That you have a UK site but even in the UK, everybody in the UK is pointing to the US. The only way we’ve been able to break that, and that’s only really for big companies where I’ve seen it, is using an href language, which is a way to tell the search engines that you have alternative language versions of different pages. In the UK, they speak a different english than in the US or in Canada. They speak different in the US so we want to designate when we have that. That’s the other way where I’ve seen them sort of override your canonical wishes, is when either the signal is from the external forces are more powerful or you’ve done something that would be devastating to your business.
Regularly, when I’m auditing client’s websites, I’m finding stuff that just shouldn’t be in the index. It’s canonical out. They don’t have the non canonical version that shouldn’t be there in their sitemaps file, in their xml sitemaps. There’s really no reason for Google to not obey this canonical. I’ll give you an example. I won’t name the client but they’re using Google Analytics as most of us do and you have UTM source, UTM medium, etc, these UTM parameters. They even went as far as to specify some of these will not let Googlebot decide in the Google Search Console so that they’re being even more explicit to tell Google, “Okay, this are superfluous parameters.” And yet, you do a site colon search on their domain and then inurl: on UTM_medium or UTM_source or whatever and there are hundreds of pages showing up. Now, granted it’s a large website, a few hundred of these isn’t that big of a deal but there are times where there are thousands of pages of things that shouldn’t be in there. They got me scratching my head like what the heck? Why is Google not following an obvious and well executed canonical? Don’t know.
Yeah, it’s pretty interesting. That one, if I would see something like that, the first thing I’d do is make sure that the page in the actual HTTP header is not forcing a different one. There’s one I just had recently where again, the page had a canonical. The other thing we see is that you have a page that has uppercase words but in the canonical, it’s lower case, but in the header, and you don’t need to serve it both places, but the installed it in the header and didn’t tell anybody where it goes back to upper case. We’ve had them where they are actually appending the URL in the HTTP header, which is what’s taking the MTM and stuff like that as opposed to trimming it, which is what they do in the body. A lot of times, there’s a secondary source of that signal but if you’ve got that many of them, that’s pretty interesting, that they’re not obeying that. But they’re usually pretty good about it as long as like you said, you do that diagnostic and all the other things are in line. But it is honestly just a suggestion. The more you can make sure that that’s the only suggestion, the better off you’re going to be.
Yeah. The key point here, we oftentimes, as expert SEOs, assume that folks are doing the right things, xml sitemaps really only are supposed to have canonical URLs in them. No non canonical URLs, nothing duplicate.
Just plain. That’s a great point that you brought up. When you look at your website, you just go today. No matter where you are in the food chain. Even if you are running your company, just have somebody show this to you. What’s interesting is when you open it and then there’s 10,000 errors or the one we see a lot that is people will leave the redirects in. They’ll go from HTTP to HTTPs. They’ll do something to where there’s 10,000 URLs that have to have some sort of change or if you’re an ecommerce site and use some set of product and you don’t take it off, the normal view from a lot of IT people is Google should be able to sort this out. I’m not really a subscriber that there’s a limit or that there’s a crawl allocation to you but let’s just imagine there was and Google only gives you 10,000 requests. Of those 10,000 requests, the first 1,000 have 2 redirects or the redirect goes to a canonical, you’ve just wasted a big chunk of your allocation by wasting their resources. If you put that into human terms, it means you move from this location, you put a sign on the door that you moved around the block to a bigger space so people walk around the block. They go there and you’ve yet grown again. There’s a sign there saying no, we’re over on Elm Street, how many times do you think that person is going to go to each of these addresses before they say, “Look, there’s other places I can go.” I think that’s a lot of things that people don’t understand. It’s not really the IT team’s fault because most of the time, to them, a hammer’s a hammer and getting somebody from point A to point B, regardless of how many steps it take, is getting people from point A to point B. It’s us on the search side that needs to educate people that there is a search friendlier way to do things. So even though there are six types of redirects, really only one is recommended for search. These are all things, when you start adding them all together, that’s one reason your snippet is not good. The page may not be indexed. The page may not have the right pieces of relevance on it to be relevant to that topic. Nobody mailed link to it so how do you take something that’s perfectly written from an SEO perspective and hope that it’s going to be ranked over other people that people think is through votes, through links think it’s more relevant piece. That’s the irony with a lot of this search stuff, is that there’s so much of the basics that we could focus on that would improve your business whereas everybody wants the sexy. They want something that sounds cool. They want something that’s overly technical. They want these things when often, some of the easiest things to focus on are some of the simplest that can move the needles almost immediately.The irony with a lot of this search stuff, is that there’s so much of the basics that we could focus on that would improve your business whereas everybody wants the sexy. Click To Tweet
With regards to let’s take migrating to HTTPs as an example and you mentioned leaving redirects in, in some cases, you don’t want to have a redirect when you’re migrating over. I think you should have the robust .txt on the HTTP version, not redirect so that there is a robust .txt file on the old site that has a site maps directive to the old site map to help discovery by Google bot of all the redirects, at least for a short period of time until the redirects are all discovered that point to the HTTPs site. These nuances I think are oftentimes overlooked. What would some of the gotchas be with regards to HTTPs migration? Because this is going to be a big priority for a lot of folks now that Chrome has the warning message with security issue or whatever so people are going to be suddenly be highly motivated to migrate over to HTTPs even if it’s just a content site if they have any kind of forms where they’re asking for information.
I think historically, most people have done HTTP migrations when they’ve done some other refresh, either migrating to a new CMS or rebuilding the site. They do it all on one whack. The first thing I see is that when they do it in that manner as part of a site refresh, a lot of the old content never gets redirected or you end up with double redirects, that you had the old page. It then goes to aversion on HTTP which then pops over to the HTTPs version. A lot of people keep both sites up. While this is going on, you need two full load balancing. But as you start to see that content being picked up, you can either use something that looks on log analysis. Did Google fetch this page? Those type of things or ways that I typically do. You nailed it with the xml site map. I will submit a new xml sitemap for the old site map so that Google sees that it’s new and they’ll fetch these pages. That sort of violates what I said about making it clean but it does prompt them to go get the new pages. At the same time, you submit the new pages via an xml sitemap and the index checker tool we built, it’s part of this site migration suite that we have, it basically does that. It goes and looks are the old pages dropped out? You can start to use that as a clue for are they resonating. That’s number one, is they don’t migrate correctly. Number two, a lot of this is just a wildcard from HTTPs to HTTP. We’re seeing problem with certificates. A lot of people using free certificates from who knows where and those are getting gigged by people. The other is mixing resources. On enterprise sites, most of them have parsed out their assets to an asset server so we don’t have to run images and stuff like that through an HTTPs server just to serve an image. That helps you both on the mobile site. Using these sort of independent URLs with this slashes to say, “Hey, it works fine with HTTP and HTTPs.” That doesn’t work if you throw a canonical in there. You can’t have both. We see a lot of ecommerce sites trying to do this because they haven’t migrated their images into the HTTPs side yet so they try to get away with this but end up getting caught in the middle somehow because again, they’ve got a canonical saying HTTPs but they’re giving sort of that the neutrality using both HTTP and HTTPs. Those can be big problems with this. I totally agree. If you’ve got any kind of form, you don’t want that. That’s a potential barrier for people who want to come into your site if you haven’t done that migration.
Yeah. Even the mom and pop shops, the authors, speakers, consultants, with a smallish website, you really need to get your site moved over to HTTPs. Also, what about chains of redirects? That can be a real problem too, right?
It is a problem. This is low hanging fruit. Like most SEOs, I have a pretty big toolbox of stuff that I’ve had to build because there’s nothing in the market place. This was one of them. We can take your list of redirects, pull it in and in again, a non interesting named tool, Redirect Checker. It will follow the chain until it resolves to either a final page or breakage. I totally recommend people do that. I built this originally when I was at IBM because we’d have as many as 10, 15 chained redirects. The easiest thing I would suggest that you do, again, taking a poor man’s approach to this, is use your favorite link analysis tool whether Ahrefs or Majestic. Most of them will tell you the links that you have to your site, especially old HTTP. And then what you can do is look at those and just simply sort it by the most links and see if where the redirect goes. You can load that into a Screaming Frog or one of those where you can upload a list of URLs and to see if they redirect and how many redirects they have. That’s the easiest thing because it creates an interesting table for you to say here’s all the old URL and then the five, six, seven, eight, nine chains and all you have to do is really in many cases delete each of these chains starting from the right to the left so the URL that loads correctly, you start deleting that chains so that you get all these different redirects and then reload that. What you end up with is six refreshes ago redirect not going through all these hops but directly to the most current one. It does reduce the number of redirects that you have exponentially. The other things are if there’s no backlinks to these pages and there’s no page views or there’s no other signal of value, then you can in theory, it either is or could be have a 410 to get rid of it, that means permanently gone or you don’t even need the redirect anymore because nobody has been coming through it. It is one these things where there really isn’t a tool that makes this easy to do, which is why most people don’t invest any time doing it. But reducing the size of that, I find a lot of time, there’s a lot of these great links out there that just get lost over generations of migration, especially at the enterprise level. People have reviewed an HP printer that’s still very much available or a server or something like that. Those are the things you definitely want to make sure you reduce those hops. And just make sure that that’s as correct as it can be when you’re managing it.
You mentioned a 410 rather than a 404. What would be the benefits of using a 410 permanently gone versus a 404 file not found?
I think in the name. 404, usually it’s broken, not found. The purpose of that was to say if you look at this report, these are pages that are broken either by accident or someone copied the URL wrong whereas a 410 means that you’ve deliberately said this page is gone and never coming back. What you reduce is the search engine coming back to fetch that. To the user, they see the same thing. They see that this page is gone but the question is where did they get to it from? They will never get to it again from the search engine because it’s gone. I think some people can argue, well we want them to come from a 404 page because we have a search box or we have this. Two things with that, most people don’t. Having them go to a page, telling the engine the page is gone means the engine will drop the page, again, reducing the amount of resource that it takes to crawl. You’re never going to do this to a page that’s got links but a page that is literally dead, a page that doesn’t have any value of any sort. Why not get rid of those completely instead of just continuously as they come to refresh this page overtime getting that 404.
Right. Super important not to 410 a page that has links pointing to it because those links are like votes that are being thrown out, those no longer count if it’s hit a 410 or a 404 for that matter.
Right. Where do you even send them to? I encounter this a lot. The one I get the most is seasonality. It’s fascinating how many companies, let’s say you’re doing something for Cinco de Mayo, Cinco de Mayo has done campaigns over. We shut all that down and nothing happens to it until next year. And then everything starts from zero because now, it’s Cinco de Mayo 2018, whereas you could either keep a generic Cinco de Mayo and maybe put a countdown timer on it because the past 10 years of link equity to your Cinco de Mayo campaign pages can keep that link equity or you could roll it up to just a broader campaign stage. But do something with this. I think that’s one of the challenges we have in this industry. Everything seems to be all or nothing. We’re going to get rid of a set of products so A, we’re going to redirect it to a higher level. B, we’re going to keep it there and put a searchable 404 page or C, we’re going to redirect everything back to the home page whereas we stop back and say, “Where is a logic home?” That’s the first thing the engines will tell you. Just don’t wholesale send all these stuff up to the next category or back to the homepage. If it’s legitimately gone and there’s not tradable value to it, no backlinks, no traffic, no nothing, then get rid of it. Just get rid of it because the chance you might give one person that misspelled something and come there to the 404 page, you can look at the data and that goes back to how we started with data mining. We’ve mined a lot of this for people to make the business case of why they should do this or that. We find that people just don’t have that behaviour. If they find a 404 page, sometimes, they might search, most of the time, it’s easier to hit the back button and look for something else.
Sure. Makes sense. Let’s say that you are wanting to reduce the amount of wasted crawl equity or crawl budget, because we’re talking about this hypothetically, would you recommend disallows or no index or 410s to carve out thin pages, low value pages or junk out of the search results? Let’s say it’s tag pages and a lot of those tag pages have only just zero or one products or blog posts in them so it’s really thin content of low value. What’s the preferred approach? I know I’ve got my recommendation but let’s hear yours first.
There are a couple of scenarios in there. Let’s say tag page. Tag pages, you can canonical them back to the core category page and you can set that up fairly standard. If you’re a blogger or a consultant, a speaker and you’ve got 300, 400 pages and then by default, it’s creating one of those. That’s one way I would do it. Definitely, you can block those because they don’t add any value. Let’s take the one that I see the most where I’m in ecommerce or I sell something that has multiple variables. What people have done is let’s say it’s an apartment. There might be 18 to 20 desired attributes of an apartment so they automatically create a URL for all 18 of those variations. You nailed it when you said maybe there’s only one apartment or one product that meets that criteria. When there’s very little content on a page like that meaning there’s only one apartment or one blue widget in the size 14, a lot of times, the engine will mark that as a soft 404 because it’s there. The worst one, I truly believe search engines can read this, is when you said we’re out of stock or we did not find anything that met this criteria and it’s there 10,000 times because why would they show somebody a product page that’s out of stock or it’s been eliminated or there’s no choice on that page, especially when I use the plural form of blue widgets. I’m expecting to see more than one, not a page with this one listing. All of those, I like to clean out. There are two schools of thought. There’s some that say no index but put follow because we want the engine to carry through. The problem is that with that follow is they’re probably getting into other pages like that and you sort of continue to send them down these rat holes. I like to just do a plain noindex, nofollow and/or canonical it back up to it. There are a lot of sites that have put a lot of logic into this to say if we have less than three, then we canonical it up one level. Anyone of these can work. They get more sophisticated, the more you have the problem. Again, the easy way is a noindex, nofollow so the engines don’t even get down into that. There are even greater chances of having that. That’s the way I would typically deal with it and then work my way back to a more fuzzy logic type thing that actually writes it in on load if we’re less than three, then we send it back up one level. That’s a nice thing about using schema with Breadcrumbs is because now, you’ve got your hierarchal path. That’s pretty to identify that and write that canonical.
Yup. One approach you didn’t suggest, and this is the one that’s used pretty much across the board and I think it’s a huge mistake a lot of website owners make, is to disallow these pages. Instead of no indexing, they’re disallowing them and the robot said txt file. After all, that’s so easy to do. They’re disallowing WPAD Man, disallowing these discontinued product pages, disallowing tag URLs and all that sort of stuff. It’s like, “That doesn’t work. These pages are still showing up in the search results. Here, I’ll show you.” And then I’ll do a site colon search with an inurl and then I’m showing them how there is this not so user friendly message about the robots.txt in the snippet of the search listing. That’s not what you intended, is it? Any planners around that sort of situation somebody’s got disallows in place and they really shouldn’t be doing disallows?
If you have a disallow in place and you’re now going to use the meta robots noindex, that had to be visible to Google bot. The only way it’s going to become visible is if you remove the disallow.
Same thing with canonicals. Nothing on a page is visible, technically, unless you let them fetch the page, which is this whole thing when going back to your migration question, a lot of people, when they sort of maybe change their domain or they split the company or they sell the company and it get’s sort of sucked up into a bigger one, first thing they do is want to get rid of that server, throw a robot’s block on it and so all of those redirects and all of those pages they’re trying to fetch to find where it went, they can’t because you’ve blocked it. Most people don’t think about that. The same is true when you do the migration because maybe you did some debugging on HTTPs for your launch. Remove the robot’s block when you do the launch. I’d say 99.8% of the time, site gets launched and you go in and check and nobody’s removed the block on the new version or on the HTTPs version. It’s very interesting. Like you’re saying, with this, they try to be overly complex as well, doing things with scripting and link sculpting and trying to create these narrow channels for them to come through and end up screwing up more things than they fix.
I think it’s harder. Probably the easiest way is throw it in some sort of AJAX, Angular, Ember type of thing, which is killing a lot of people. I think somebody just posted today, I saw the headline that Google confirmed that anything after a hash key or a pound sign, they’re not reading so if you wanted to block something, use any one of those ways. It really minimizes what they can crawl. But that’s the other thing, if the URL is on the page, I just cleaned up one for a company, they were getting like a gazillion 404s. It was the same pages time and time again. What happened is the developer turned off the script in the page for the pagination but left the paginated URL in the page. Here, you’ve got a URL because they’re just scraping a URL syntax and adding it to the database. Just by simply turning off the AJAX load of the pagination in the display didn’t take away the URLs. It’s a lot of things like that I think makes it harder because they can’t fair it out more, which is what we’ve always wanted. Some of these crafty ways of channelling element, trying to get things. The opposite is even more of an impact now with hamburger menus on mobile friendly sites, specifically, mobile or responsive, I’ve seen so much prime real estate being removed. If all the links are coming to your home page and you’ve only got two links inbound or three because that’s all that will fit on a hamburger menu, you’re losing a lot of the homepage equity that historically would cascade down into these internal pages. That’s another thing we see. Somebody might be ranking for particular service, redesign the site, take that service and just makes a disservices now as opposed to the service that they had and you’ll see them tank because they’re not able to share in that homepage link equity or that category page equity because they’ve removed that disconnect.
Yeah, that’s such a critical point. The way that you spend that hard earned link equity, most of it is flowing into your homepage, where usually you get the most links, is with really well thought out internal links to feature products to the key landing pages and subcategories that are really important to your business. If you stop exposing that from the homepage level, through the hamburger menus and through cleaning things up from a screen real estate point of view, you’ve just tanked your strategic flow of link equity down into your site tree.
Before they do it.
Before they do it. And many times, it’s not that that that team wants it. It’s just the search team is saying, “Hey, I think this is going to kill us. How do we quantify this?” Because that’s an interesting thing. My wife and I, we’re just having this discussion. Why do the branding agencies have such power over these decisions when you show fact and logic and data of what a negative impact can be yet people don’t care until that traffic goes away? That’s when search comes back on the senior team’s radar. I wrote about this. It’s like, “Does your CFO understand the cost of not ranking?” We’re starting to see in financial reports now, a lot of companies, especially ecommerce companies, actually post in their quarterly financials a little statement of risk that hey, if Google changes their algorithm, we could A, lose traffic and B, may cause an increase in marketing cost because we have to pay for traffic now that we were historically getting for free. I think that’s an interesting thing now, I think a lot of people have a problem with being able to communicate the negative impact of something. We often say, “Hey, it’s going to kill us.” But we don’t quantify it. If you can quantify it and you can find somebody, I’m finding like the CFO is a pretty interesting person to be able to present that kind of data to what is the impact and how do we mitigate it? Little by little, people are saying, “Okay, I don’t think we can take that risk.” And they’re willing to think about it. And as a good SEO, it’s your job to figure out how do we play nice with that, how do we make this new design? How do we make this migration? How do we coexist in this new content grazing, content marketing is the leader environment where many of these formats that these guys use aren’t really conducive to traditional ways that people approach SEO.
Let’s go into a lightning round here for a few minutes because we’re getting close to the end of the interview and I wanted to cover a few more topics.
iFrames. What’s your take on iframes from an SEO standpoint
I don’t think anybody should be using, there’s no business need to use them. I think Google can get in them somewhat but in tests that I’ve done between iframes and models and other things like that, they sometimes will score it but still give credit back to the original page. We have a case I’m working out with a client. They put all their PDFs. Google totally ignores the page that references the iframes and went in straight and got the PDF. They can get into them because we saw it with this case about 100,000 PDFs were indexed as opposed to the page that they built to track and house them. I would avoid them. I think there’s other ways you can design content nowadays than using iframes.
I’m not in the all or nothing bucket so I still have hundreds of examples of sites that have done a 302 and where the negative impact came in is the display URL. The URL that was being shown to people in the search results was the original URL and not the destination. When we’re doing an HTTPs, we’re doing things related to the homepage or category pages or we’re doing, the big one we see it now in load IPD [00:55:19] for global sites, 302 is brilliant because it basically takes whatever Google found before they went through the IP redirector. The wholesale, I do agree with him that I do think it passes the value but I still quite strongly recommend 301s versus 302s except when we think the IT team is going to screw up the migration. Then we do a 302 and then flip that over to a 301 once everything’s in place.
What if you wanted to track how many clicks away your various pages of your website are from an external link source. To see if pages are just not very interesting to the outside world if you’re let’s say three clicks away from an external link source in a particular page, that page doesn’t look very important. Is there any approach or tool that utilizes that thought process?
I’ve not seen a tool that utilizes that but I think it’s actually a pretty interesting thing. It would be interesting to look at that data. I think the only thing to me is you’re going to have to boost that somehow. Try to elevate it either through the way you’re linking to it, especially on page, that’s where we see this a lot, where people put stuff too deep from an external. I don’t know. That’s an interesting question. I don’t do that kind of analysis. I don’t have a really cool answer for you.
Okay, no problem. Any kind of URL best practices that you think are no longer relevant or no longer true? Like is the hyphen versus underscore thing still valid or use a short URL instead of a long URL still valid and that sort of thing?
My number one pet peeve with URLs is from a global perspective trying to build Ahref language. There’s no need to localize them. It’s been shown time and time again, keywords in the URL isn’t that beneficial so there’s no reason to localize those. That’s the big one. Shorter obviously is better. Again, for every example of shorter is better, there’s moderate so a lot of companies have country language, business units, sub business unit then page so four or five away. I think this is something that’s logically structured that is easy for the engines. That folds into now using things like the Breadcrumb, the more logically we can name things, the more likely and using Breadcrumb schema, Google is going to break it up as alternative gateways into the site. That’s why the only reason I’m still friendly toward named folders. Look at Amazon, Best Buy, and a lot of these guys, they’re using alphanumeric URL structures. This make it easier at a database level. Yeah, those are my big ones. Anything gimmicky, stay away from it and just manage a site that’s going to be robust and allow people to understand the basic hierarchy.
Awesome. When you said there’s no need to localize these URLs, could you get a little more specific about what you mean about that? For somebody who’s not into international SEO who might be listening.
Let’s say you’re Nike and you sell Sneakers. They might be called Sneakers here or running shoes in the US, but in Germany, they’re something else so that folder for running shoes or men’s, we see this a lot, men’s running shoes. [00:59:02] for men. That’s exactly what it is. Either you use the or in Japanese or Chinese, you’re using those local characters. There’s a lot of people that have said, “Hey, if you have a directory or you have the page name or all this stuff in words, if you have them in words, then if you localize them, that people in Germany, you can rank better. We’ve definitely seen that that’s not a true statement as well as it used to be. But that’s literally what it is. Men’s running shoes in 21 different ways to say men’s running shoes is really not necessary and it just makes the system management much harder. If you keep it men’s running shoes, translate the rest of everything else on the page, but the URL doesn’t need to be translated.
Got it and that doesn’t mean though that you’re trying to have one URL for every country version, you’re actually going to still have like a UK directory or a UK sub domain or something like that.
Yeah, exactly. Okay, cool. Last question, super quick one. Log analysis, looking at what the spiders are doing, Googlebot, etc, using log analysis, any favorite tools for that?
Nothing really favorite. We created our own we use. I think Sawmill, because I’m on a Mac, Sawmill works on a Mac. That’s the only one if I had to give one. I think Botify is doing some of this and then there’s another tool I read recently but we just sort of pull it in and parse it because things we’re looking for on log analysis, number one is frequency of visit to a page. Frequency of visit, that’s the other one and then just crawl patterns are the things we’re looking, mainly from a diagnostic standpoint. But just frequency of visit to a page is a pretty powerful piece of data so any tool like that that can do it would be a good benefit to you.
Frequency of visit by Google bot to a page versus users, which is what we’re able to get from Google analytics but not being able to see what Google bot’s up to using Google Analytics.
Right. It gives you an idea of the freshness if you change something because number one question we get asked, how long is it going to take for this to improve? If we know the tempo that the spider is coming to those pages, we can pretty much tell you if we’re waiting. Obviously, with fetch in Google, we can do it but this gives us an idea of how frequently the bulk or specific pages are being visited.
Yeah. This is great stuff so thank you so much, Bill. I enjoyed geeking out as you can tell. I loved it and I’m sure our listeners got a ton of value out of it too. If folks wanted to work with you, hire your agency to do a migration or SEO project, how will they find you?
Our site is back-azimuth.com. Back Azimuth. They can definitely reach us there. If you’re interested in nature of language, then you can go to hrefbuilder.com. That’s for the international stuff. If they want to hear more of my rants and raves about things, they can go to my blog at whunt.com.
Awesome. Thanks, Bill. It was a pleasure and thank you listeners. We’ll catch you on the next episode of Marketing Speak. This is your host, Stephan Spencer, signing off.
- @billhunt on Twitter
- Bill Hunt on LinkedIn
- Search Engine Marketing, Inc
- Data mining
- Marcus Tober
- “(not provided)” on Google Analytics
- Google Search Console
- UTM codes
- XML sitemaps
- Redirect chains
- Tag pages
- Link sculpting
- Hamburger menus
- <iframe> HTML element
- Christoph Cemper on Marketing Speak
- 302s versus 301s
Your Checklist of Actions to Take
☑ Create subfolders in my Google Search Console to get more “not provided” (by analytics) keywords.
☑ Check my search yield by going into Google Search Consolesort by impressions. Then look at the list for keywords where I rank in the top 3-5 organic positions where I am getting a 5% or less click rate. These are my low yield keywords.
☑ Rewrite my descriptions for the newly discovered low yield keywords. Answer the question “why us” to get my descriptions to resonate with users.
☑ Create a search query with the keyword phrase and site:MyDomain. This will show me how Google is creating my descriptions and snippets.
☑ Check the number of pages I have indexed using site:MyDomain without the keyword phrase. Having too many and too few indexed pages can be problematic.
☑ To query HTTPS only pages use the site:MyDomain and in URL:HTTPS operators together. It should look like this: site:MyDomain in URL:HTTPS.
☑ Only have canonical urls in my XML sitemap. Having both canonical and non-canonical urls will create duplicates.
☑ I want to make sure my content is migrated effectively, my redirects are recognized, and my new pages are indexed by Google when switching to HTTPS or creating a site refresh. Use Back Azimuth’s Site Migration Tool Set to do this correctly.
☑ I can find my redirect link chains by using a link tool like Majestic or Ahrefs and sort through the links to see where the redirects go. I can delete a chain of multiple redirects and have all of the redirects go to the main page. If the redirect has no links or search value, I can delete it altogether.
☑ If a page is permanently gone I can use a 410 redirect to tell the search engine that the page is gone. Do not 410 pages with links going to them or I lose the link value.
About Bill Hunt
Bill is the President of Back Azimuth Consulting specializing in helping companies implement Global Digital Marketing Strategies and “Customer Journey Models” to identify missed opportunities with content on a global scale. Bill also works with global companies develop their global search programs to enable them to scale.