Maintaining a top spot in search results is like hitting a moving target, given the swift evolution of Google’s algorithms.
In this episode, we dive deep into the realm of diagnosing and bouncing back from penalties. With Kaspar’s unique vantage point as a former key figure in Google’s webspam team, he sheds light on the intricacies of algorithm shifts and the triggers behind website flags. He shares effective strategies, utilizing data from server logs, Bing analytics, and tools like Screaming Frog to accurately identify SEO issues. Together, we explore the multi-faceted nature of penalties—ranging from page-specific issues to site-wide dilemmas and even keyword-targeted glitches. A discussion on the potential risks of disavow files is particularly enlightening. Kaspar advises not getting distracted by the temptation of vanity link building projects. Instead, site owners should focus on optimizing technical elements first in order to achieve lasting improvements in organic search visibility. For those steering the unpredictable waters of SEO, this episode is your compass to stay ahead and remain prominent in the digital realm.
So, without any further ado, on with the show!
In This Episode
- [02:37] – Kaspar Szymanski highlights prioritizing SEO efforts, as finite resources require strategic decision-making.
- [07:52] – Kaspar discusses the significance of the crawl budget for large websites, emphasizing its criticality in improving site performance.
- [11:37] – Kaspar stresses the advantage of having a unique selling proposition for business and SEO success.
- [19:21] – Kaspar mentions using internal tools to analyze backlinks, including crawlers, commercial tools, and parsing server logs.
- [22:10] – Kaspar and Stephan indicate the value of analyzing data from multiple sources, including Google Search Console and Bing Webmaster Tools, to gain a comprehensive understanding of a website’s SEO performance.
- [27:31] – Kaspar provides an example of a client ranking for a commercial query despite having a relevant website, pointing out the relevance of analyzing anchor text distribution in SEO.
- [29:59] – Kaspar explains that a sudden drop in website traffic can be caused by either a manual penalty or an algorithmic issue and provides examples of both.
- [35:00] – Kaspar elaborates on saving and preserving web server logs for actionable insights, citing the ability to identify which parts of a website are prioritized by Google and understand how landing pages are crawled and indexed over time.
- [42:18] – Stephan and Kaspar talk about using multiple tools for server log analysis to double-check findings and identify outliers.
- [49:55] – Kaspar advises against submitting a disavow file without a penalty in place.
Kaspar, it’s great to have you back.
It’s a pleasure to be on the show again.
Let’s talk about what’s keeping you up at night or something changed in the SEO work world and the discipline of SEO. It’s something that you’re working on, and you’re concerned about, or your clients are concerned about, something that may involve AI, or changes in Google policies, or the technology that Google is applying to eliminate spam. What’s concerning you these days?
I want to get across a couple of things that are very much on my mind. Stephan, you did mention AI. AI is the new trend, the new favorite topic of many colleagues. I’m fond of AI.
Personally, I consider current developments, maybe more language models than anything else. It’s probably more applicable in some languages than others. It’s a fantastic tool to automate where automation is warranted.
At times, I’m surprised how much enthusiasm is being poured into a task that takes a few minutes. The AI solution is being produced in several weeks, and yet, at times, it’s an overkill.
It is true and accurate that AI can be a phenomenal solution. And it has been. Automated solutions have been around for a long time, especially in content. Content can be written, and words can be editorial. Data can be content, too.A crawl budget is not a silver bullet: It's not a solution that is applicable to every situation. Click To Tweet
It wasn’t called AI. There was really no AI behind it, but automated solutions were in place. They were both usable, applicable, and useful for users in the context of the user journey in their respective verticals on the respective website. But that’s not so much what’s keeping me up at night if we’re talking about these terms.
I have a couple of favorite topics. I would love to talk about server logs. This is where a lot of potential lies for every single large website out there. We can park that for a moment. I recently dealt with questions about the pros and cons of website consolidation.
Let’s say you have one, two, three, or more websites. How do you go about it? Do you keep them? Do you run them separately? I’d love to talk about that when the time allows.
The one thing that recently has been very much on my mind is SEO priorities. That may not be a popular topic. This is not something that is on everybody’s mind these days. But I realized something that is rather concerning because I don’t think that, in an awful lot of cases, resources are applied in the most impactful way.
We all have finite resources. Every SEO team, every developer team, and every organization has only finite resources. We have to pick and choose which area we tackle. In most cases, our website offers a lot of avenues for action. There can be technical solutions that can be improved. Our technical setup—the infrastructure and architecture, the content on a page, and the backlink profile—can be reviewed and risks mitigated in many avenues. But it does make a difference in what we prioritize.
Larger organizations take time before introducing the changes they planned a while ago.
We’re recently working with a client who was very well-positioned to do well. However, they were very much concerned about their URL structure. In this particular instance, this is a very common question and very representative of the topic.
They were keen on having the right time. Have keywords in the URL, which is fine to have. However, if you don’t have them making that change, changing your URL structure is a huge undertaking, where a lot can go south, and fluctuations are to be anticipated.
It makes zero difference for SEO because whether you have that particular keyword you care for in the URL structure is of no consequence for the ranking. Google doesn’t care one way or the other. Whether we have private health insurance in the URL or one, two, three, four, or five in the URL, it doesn’t make any difference. This is just one example.
In another instance, the question was, “Can we have a hard rule? How many are outgoing? How many internal links would we have per page?” We can have that hard rule, but that hard rule, unless it’s completely out of proportion unless it says we need to have a couple of hundred links, it’s really of little consequence. The more important thing is we don’t want to have orphan pages. This is rather basic SEO if you think about it, but it’s not so much. What the discussion is about is this is a priority.
In comparison to, for instance, managing the crawl budget or in comparison to having the canonical set up correctly, these action items, while they can be legitimate, this is something that can be done. They pale. If you’re asking, “What keeps you up at night?” It is providing the right answer to complex questions and making sure it is absolutely clear. We have to focus on where the impact is largest.
An awful lot of times, this takes a lot of convincing because, of course, there are already to-do-lists and plans within larger organizations, and they don’t necessarily overlap with the more current findings because larger organizations tend to take a little bit of time before they introduce those changes they planned a while ago.
How impactful is content pruning and managing that crawl budget if you had to prioritize that versus other typical SEO issues regarding identifying better keywords to target, link building, or fixing other technical issues? Where would content printing go?
I love that question because it allows me to talk about the topic I’m so fond of. In a nutshell, the honest answer is you can’t tell hypothetically because every website is different. If you have a one-page website that is a couple hundred pages large, the crawl budget isn’t really an issue.
It doesn’t matter much what we change on the website, whether the content or the technical solutions.
Let’s assume we’re talking about a substantially large website, say 10-15 million landing pages. A substantially large crawl budget is critical. I’m saying this because it doesn’t matter much what we change on the website, whether it’s the content or the technical solutions, whatever it is, even if it’s pure optimum. If the content is phenomenal, it won’t make a difference.
If the crawl budget isn’t there because Google will take too long, potentially months or even longer, they’ll realize some changes have been introduced. Crawl budget, in the case of a large website, can be critical. But again, this is not a silver bullet. This is not a solution that applies to every single situation.
For instance, another website, large or small, if the canonicals are an issue and if the canonicals are being paired with no index unintentionally, this is, of course, a huge problem. If another website in a very competitive environment happens to be a legacy backlink profile, I’m calling that legacy backlink profile, but we know what we’re talking about. There has been some link-building that Google may frown upon. It has been done over time. These tend to mushroom as time passes.
Out of the blue, we have a hundred million backlinks that are not exactly squeaky clean as far as Google would be concerned. This can be a huge liability because overnight, that can trigger a penalty. That penalty, if left to linger, of course it’s going to drag us down. But it will also prevent us from improving the site because any release, better structure, technical setup, or new content will be impaired in this ranking if that penalty isn’t fixed.
It is individual. I like to compare the situation we’re talking about here right now to driving a car. Again, I’m from the south. I grew up in Southern Germany. You may expect some examples from the automotive industry. There are Mercedes and Porsche. I’m comparing this to a situation where we have a nice car.
A vehicle generating revenue is comparable to a car because you must maintain it. Before we go on vacation with the family, I always bring the car for a checkup. I want to make sure that the fluids are up to par, tire pressure, brakes, lights, you name it. These things are important for the road ahead, and the website is comparable. You have to audit, and you have to check it regularly. Of course, not on a monthly basis, but more or less every year, depending on what the budget allows.
It is important to realize what’s out there. What is Google seeing? In particular, I’m a big fan of Bing. I don’t think Bing should be neglected here because it’s another source of organic traffic. What are the major search engines seeing? What kind of signals are they detecting? How can we improve them based on current technology, not the technology from a year ago or two years ago?
Crawl budget, in the case of a large website, can be absolutely critical.
Given things have changed quite a lot in the last several years, with the advent of AI being used for content farms, nefarious link building, and all sorts of crazy things, what’s the risk that a website owner has these days in terms of negative SEO attacks and algorithms misidentifying their site as spam? What’s the risk these days?
Without painting a grim picture, without scaring people, the risk is there. It has to be set. Algorithms are being fine-tuned regularly. But if algorithms worked perfectly, particularly Google algorithms, if they work perfectly as intended, there would be no link-building. Yet, it still exists as a perfectly working industry, a very large one, a matter of factly speaking.
Algorithms are large-scale solutions, but they sometimes do not get it right. The risk for legitimate businesses is not particularly large. Yes, it can happen. But most of the time, it’s something that can be avoided in situations where signals are unintentionally impaired, may that be by content that is stated, or maybe you’re entirely outdated.
A very common example I’m particularly fond of because I see that happening many times is when the notion that more landing pages are better is being introduced as a policy. That frequently leads to a situation where filtering is being opened up for indexing.
Whenever we do that, there is a website with products available in different varieties, solutions, and colors. You name it, and every single variation becomes a separate URL. That can boost the total volume of landing pages, but it also requires a new crawl budget.
In a particularly competitive environment, we work a lot, for instance, in the retail or the travel industry. Both those industries are very competitive. If you’re not just competing against the other players in the market and against your primary competitors, but you’re also competing internally, your URLs and your landing pages are competing against each other, and a lot of cannibalization is going on.
I’m talking about situations where algorithms don’t necessarily mistakenly detect signals and read them poorly. Most of the time, situations are unintentionally created by the website operator or owner, and the consequences are sometimes dire.
One of my favorite articles I ever panned for Search Engine Land was SEO horror stories. These are particularly grim stories. These are the worst-case scenarios that you can think of. These happen, but it’s not the norm. It doesn’t make an awful lot of sense for website operators to be particularly worried about that.
Looking from their vantage point, it’s much more important to focus on the one thing critical to the business’s long-term success and the website: the unique selling proposition. This term is being highlighted, even tossed around a little bit by Google now and then.
John Mueller, a former peer from when I was back at Google, also mentioned that recently. It is important to understand to make sure that’s the one thing that the businesses stand out for, be that a community, be that the pricing, be that the best selection, be that a product that is completely unbeatable. This is something that is being communicated so that the users understand.
That also means simple things, such as the snippet representation. At times, even businesses with a unique selling proposition don’t have the brand recognition that they could be having. This is something that impacts CTR and user signals because if I don’t recognize the brand of the landing page that is ranking for the query I’m currently looking at, I’m less likely to click on it.
This is one more reason why it’s so important. If you have that unique selling proposition that is clearly defined, make sure to tell the user, “This is why you want to come to us, and this is why we are better, different, and more interesting.”
Algorithms are being fine-tuned on a regular basis. But if algorithms worked perfectly, there would be no link-building.
There is another way of ensuring success if you want. Because in some industries, a unique selling proposition is a challenge. That is given, of course. Take, for instance, the affiliate industry, which is quite competitive.
In an area where your website is more or less similar, the offer is similar in comparison to your competitors. One thing you can do that will give you the upper hand in the higher rankings is performance. All other signals are roughly the same. If the website loads faster, Google loves that website more. The reason is not because Google cares for that website but because users prefer faster-loading websites.
We are a pure SEO consulting business. When I’m saying us, I’m primarily referring to the other co-founder/brother. He’s also somebody that you have the opportunity to talk to, Fili Wiese, the giant Dutchman and the best SEO out there. I’m very confident saying that.
We don’t do actual implementations. We work with clients and research teams. We address questions about how they can do it. I’m working with large organizations, and you might empathize with that. Frequently, the situation is one where we happen to have a lot of stakeholders and a lot of vested interest in larger organizations. It’s a larger group of people where expectations have to be mitigated, and there are more in-depth and longer discussions.
Our job is frequently to clearly outline, “This is the signal we’re currently seeing. This is where we can be to get there. This is what needs to be done. Was it in your framework, given the acumen of the team that is currently in place? Let’s look at what can be done,” but we don’t implement the changes that the client would, in fact, implement based on the recommendations pending their approval, of course.
Do you have somebody that you recommend to implement PageSpeed optimization, or do you just try and coach them through it and hope they implement that well?
Most of the time, the latter, to be honest. We sometimes recommend third parties, but this is on a very case-by-case basis, especially something that we don’t do. There is a recent request or collaboration that we happen to be having, and there is an inquiry about paid search, which we don’t do at all.
Most of the time, we stick to what we believe we’re best at, and that is technical SEO.
Even though Fili has some expertise in the field, he used to work in that area as an engineer. In these cases, we would pass on the inquiry to a trusted partner. Sometimes, they are former Googlers as well. There’s a handful of people we have the necessary insight to gauge the quality of work being delivered and work ethics. This is also important in our industry.
You may have had an experience where a client says, “I did not get what I was hoping for from an engagement, and I happen to be disappointed.” It’s very important because we’re such a small industry to have high standards in terms of work ethic and living up to expectations.
In some cases, we do recommend third parties, but most of the time, we really stick to what we believe we’re best at, and that is literally technical SEO—crawling a website and a multitude of tools, including our own proprietary tools, reading the signals, comparing those signals against each other, understanding what they mean, really digging into the code, and translating this to actionable advice.
What are some of those proprietary tools? What do they do?
The proprietary ones that we have? These are not public tools. In fact, little is known that Fili used to be a lead developer at Google Search. Fili had been coding and developing at Google. These tools, we could not possibly take with us as we left the company to start Search Brothers. But, of course, we were at liberty to retain Fili’s brains.
We rebuilt tools that mimic this kind of experience being deployed on a much smaller scale, much more boutique scale, of course, not on a global level, but really for the benefit of individual clients. Unfortunately, I have to say, none of these are public. We have contemplated that idea a while ago. However, we are in the lucky situation that we tend to be in demand. I have to say that as well.
We never got around to doing that. I think for the time being, and this is the situation that it’s going to be like that for the foreseeable time. Simply put, there is just too much demand for qualified answers to SEO questions. And we are addressing those needs primarily.
When you say that these tools aren’t public, I understand they’re proprietary. But are they also secret? You can’t even tell prospects that you’re talking to that you have a tool that does X, Y, and Z, and it’s just not a feature that’s available in any of the enterprise SEO tools.
No. We can talk about that, most certainly. We have our own crawlers, of course, to begin with. We also crawled backlinks, so not just the websites but also backlinks. I may recommend some third-party tools just to give the listeners or audience an idea of which direction we’re coming from.
We utilize commercial tools as well. There will be Deepcrawl, aka Lumar nowadays, Oncrawl, Screaming Frog, and multitudes of tools that we tap into as data sources. We use these crawls next to Oncrawl. We do parse server logs. Several logs are a separate topic. Happy to talk about that. One of my favorites.
When a client is actually in a position to provide server logs covering an extended period of time, that’s a treasure trove of data, and we parse those as well. Of course, there are tools that we utilize in order to filter and sort backlink profiles. Some of the larger clients we work with have their backlink profiles in the billions rather than in the millions.
These are, of course, substantial enough. While investigating deals and factoring in the anchor text, the quality of the websites, and so forth, it is important to pre-filter and find common patterns within the data available. That’s something we do as well with the tools that are quite internal, indeed.
I’m curious why you didn’t mention Botify. I don’t use it personally.
I totally slipped my mind. You are absolutely right. I’m a huge fan of the tool, as well as the Botify team. We have the privilege of working closely together. Botify is indeed one of my favorites. Among a handful of my work, there are others. There’s Majestic, there’s SEMrush, there’s Ahrefs. I’m probably forgetting a handful. These are all great tools.
On top of those, and I really want to mention that as well, is Google Search Console, which provides the insight. The data are, of course, just samples, but these are the insights into how Google reads the SEO signals of a website.
We do not want to forget Bing Webmaster Tools because, yes, the market share tends to be rather small in some areas, Germany. Some people say around 5% wanted to see yet another major search engine and yet another opportunity to tap into more data, so why not utilize it?
If 95% of the organic traffic is coming from Google, and, for the sake of argument, let’s say it’s 5% coming from Bing, isn’t it like the 80/20 rule? Do you focus on the 20% that is going to deliver 80% of the outcome or the value? If you monkey around with stuff that’s not really a big needle mover, it’s, at best, 5% of the traffic or just the organic traffic. Maybe, what’s the point?
You’re absolutely right with regard to 80/20. You do want to focus on wherever is the bigger bang for your buck. For the sake of the discussion, let’s stick to the 5%. That 5% can make a tremendous difference because 5% of 100 visitors is really much, but 5% of 50 million visits is quite large, especially if they’re converting. It’s not to be rejected easily.
If we think about it for a moment, if Google traffic weren’t forthcoming, and this is something that can happen with or without the website operators’ wrongdoing, we might be very grateful for those remaining 5%, which will constitute whatever traffic is still out there.
To answer your actual question, your real question is why does Bing matter, or why does Bing Webmaster Tools matter, not so much in order to optimize in any different way for Bing.
In some industries, a unique selling proposition is a challenge.
As a matter of fact, if one were to optimize for Google, this tends to work really well for Bing as well, but it’s the data source. It’s different types of data. It’s different types of samples. And it’s free of charge. We can tap into it at no charge whatsoever, so why not do it, given that data is really the key in order to optimize in an awful lot of cases and in cases of large websites? It is the key, most definitely.
Can you give an example where you worked with a client and found something in Bing Webmaster Tools that wasn’t visible, or you didn’t get the information from Google Search Console or from third-party tools?
Certainly. backlinks. That’s a very good example. It’s a very good question, indeed. Because backlinks, no matter where we’re looking, are mere samples. It’s never about an individual backlink. It’s not that one backlink. We’re not going to point fingers right now in this recording talking about which backlinks are particularly prone to triggering undesirable attention on Google’s part.
It’s never about that one backlink. But the larger the sample, the easier it is to understand what was happening in the past. If we’re talking about large, old websites, or they start packing us Dutch directories that are still lingering around from the mid-2000s, is there something else? Is this PR? Is this something that was conducted around 2010-2015? Or is it something else entirely?
The idea here is to understand the general trends in order to address the risks. Backlinks and backlink samples, which take quite a while if the backlink profile is large to collect the critical mass, are a good example of where you want to tap into the highest number of sources. Even if the samples are overlapping, you can normalize these. You can have a sample that is as close as possible to what Google might be utilizing as a backlink profile.
You’re only then in a position to say those patterns tend to be problematic. This is something where we want to be careful. Not necessarily disavow straightaway. At times, yes, as well, but when you are in a position where you can gauge the risk levels.
Yup. I presume that you are disavowing not just in the Google Search Console but in Bing Webmaster Tools.Discover what bots prioritize by collecting and analyzing server logs. Click To Tweet
Yes, but I’ll be very honest with you. Most clients are primarily focused on Google Search. I also have to say that. It is rare for a client to say, “I want to focus our efforts on Bing.”
Right. But if you’re putting together a disavow file, it would make sense that you would take a few minutes to upload it to Bing Webmaster Tools.
Across the board, indeed.
What would be an example of something that you found through Bing Webmaster Tools for a client relating to backlinks and the quality of those? Then you took some action and got some results? What would that example be?
This is actually an interesting question. For a moment, hypothetically speaking, we’re looking at a client that has a hard time ranking for a particularly relevant query. Again, I’m from Germany, so let’s say it is private health insurance, one of the most popular careers in Germany, most competitive if you want. Let’s say for a moment the client has a really hard time ranking private health insurance as being the desirable anchor.
Anchor is the key here because if that’s the question if that’s the issue, we would certainly look into the backlink profile on a holistic level. We will look, of course, at the landing pages as well. Are they fast? Do they provide the actual service that is desired? Does it actually say private health insurance?
Another step is certainly looking into what’s the anchor text distribution. This is something that is not being discussed enough out there. If the anchor text distribution is overwhelmingly commercial, it’s the top 10. The top 10 anchors that are identified are all commercial. That is really close to a smoking gun, as far as Google is concerned.
Let’s say for a moment that private health insurance is among those top 10. The next step is, of course, to look into where these originate from. Do they all look identical? Does a large chunk of those look like press releases, that mushroom? First, they were, of course, released to begin with over the press release wires, but then they mushroomed.
When a client is actually in a position to provide server logs covering an extended period, that’s a treasure trove of data, and we parse those as well.
The same type of article, which tends to be a rather short article, was the exact match query. It was the exact match anchor text that says private health insurance is all over the place. That is the exact reason why. Of course, Google detects those things. Those algorithms we discussed prior may not be perfect, but they are quite capable of detecting what is a commercial anchor text and a commercial query and matching those two together.
This would be a very good example where more samples will be actually very conducive. It’s very beneficial in order to understand a website that arguably can rank for a commercial query and why it doesn’t rank for that query in particular.
Do you find, when you’re working with clients, that there are page-level penalties, keyword-level penalties, sitewide-level penalties, and maybe other types of penalties that are maybe even a combination of some penalties at a keyword and a page level? How do you diagnose when those are the case?
In a nutshell, yes, to what you just described, all of it. Again, this is a topic close to my heart. I’ve been writing for a long time on that topic, on understanding really what algorithms are, how they fare against manual spam action, that’s the euphemism being utilized for penalties, really, what the differences are, how to tackle those.
To address your question, initially, yes. The real penalties can be actually quite nuanced. And they frequently are. The idea really is to tackle the behavior that Google deems to be outside Google Webmaster Guidelines to be really a violation, but leave the rest of the website where it is, or leave it ranking, not really applying a very vindictive approach if you want.
In these instances, let’s say for a moment, we happen to have an isolatable pattern that demonstrates doorway behavior. This is the kind of pattern that would be done indeed, typically selectively tackled and either removed altogether or maybe ranking very poorly. We already discussed the case where there are excessive anchor texts that are commercial in nature, and they were the links that are clearly identified as spam. That website would most likely not rank for that particular term, either individual landing pages or the entire website altogether.
What we see also quite frequently is where websites are being tackled in a multitude of ways because they indeed demonstrate a multitude of violations. There are doorway pages that are link content pages. They’re buying links, from Google’s vantage point, very obviously. They’re disseminating links that pass anchor text, which can be considered very unlikely to be merit-based. These websites are, of course, tackled for that violation as well.
Most of these work sites never make that public. If we were to talk to the largest media outlets, be that in the US, Germany, or wherever else, none of them would really admit that the links that they are offering are those PageRank passive links. Those links really aren’t worth much in terms of PageRank. For the very simple reason, it’s because those pages, those websites, are being detected by Google as selling links, as selling PageRank passive links and being highlighted as such in Google Search Console, but they don’t make that public.
It’s a wide topic because, on the other hand of the spectrum, we also have algorithms. Algorithms and Google penalties are most easily distinguished from each other. If there is a Google Search Console message that says the website has been penalized, that’s actually a manual spam action, aka penalty. If there is no such information, yet the website seems to be nosediving and traffic nosediving in organic search, that’s an algorithmic issue, but it’s not a penalty.
The best comparison is to look at a navigational app. If you want to go from A to B, you get the best route. If you pick and choose a different route, that’s just different signals. It’s not like you can’t get there. You’re just slower in getting there. Whenever websites take a nosedive and traffic unexpectedly, typically, there is a good reason for that, be that a Google algorithmic update, where the calculation method has changed, or something that was actually changed within the website quite recently.
If you happen to open the staging server of several million landing pages for indexing, that’s going to dilute your content on technical signals. We see that happening quite a bit. The staging server should not be open for indexing, yet it happens. The content application becomes a huge issue.
These are cannibalizing the main production site.
Yes. Very often, when there is a staging server, there will be an awful lot of placeholder content or content that really wasn’t meant for users or for indexing. It was just, let’s write something here because we need the landing page for testing. All of that becomes available for crawling, indexing, and ranking.
Of course, it’s going to spoil user signals as well because whenever they experience those landing pages, they’re going to take a look at it and be like, “That’s not what I was looking for. I’m going to go back, I’m going to refine the query, I’m going to look for something else.” Their behavior indicated to Google isn’t a great experience. Of course, that’s going to have a negative impact on the rankings.
Most clients are primarily focused on Google Search.
Again, it’s not intentional, but it happens. Most of the time, the moment when it’s being noticed when it’s being realized, is really when it’s a little bit too late. Let’s face it: Google Search Console data, there is a lag. There is a lag when Google Search Console data has been populated. When we noticed the dip, it was already a little bit too late. It’s not too late to fix it, but it’s going to take some work.
Yeah. All right, what are some of your favorite success stories where you identified a penalty or a technical misconfiguration and changed their whole business, saved some jobs, catapulted somebody’s career, etc.?
I’m not sure we catapulted somebody’s career. One of my favorite stories of all time is a real estate business. I will allocate them to the small and medium businesses area. We worked, I think, three or four years in a row. After the third year, it wasn’t discontinued, it was concluded, because as a real estate developer, the guys ran out of stuff to sell. They sold everything.
The real estate business, given our times, is not particularly difficult to optimize because it’s in demand. But this was a couple of years back, and this was really a heartwarming story where we temporarily lost a client, but we gained their fullest confidence. Of course, they were happy enough that they sold out all the merchandise they had.
When we talk about penalties, we had, of course, plenty of opportunities to fix penalties. When that’s being done, when penalties are about to be lifted, it is, in my book, the most critical to get it done right at the very first instance. The reason for me saying this is because this is a collaborative effort. It’s not just the consultants providing the right kind of data, the right kind of solution, including the reconsideration request rationale for Google, the one that makes sense.
It is critical that the client, the website operator, also plays ball, meaning lifting link-related penalty only makes sense if we actually disavow whatever happens to be the reason for the penalty, without the idea or the notion to instantly revoke a disavow file or remove the disavow file subsequently, because that penalty is most likely to be reintroduced.Losses from a disavow file can be offset by fine-tuned technical signals and clear content signals. Click To Tweet
When we’re dealing with a client who happens to be suffering from a penalty that is related to their contents or a thin content penalty, it is critically important to understand the nature of their entire network. The reason for me saying that is if you happen to have a network of 50, 100, or even more websites, and they all look and feel the same, and they all tap into the same database, it is most unlikely for Google to look at your consideration request with a favorable eye because those websites are very unlikely to provide a unique selling proposition.
As far as Google is concerned, this is really just templated content, spam content, if you want, so it’s important to work together. I want to say that if that collaboration is given if there is full transparency mutually both ways, penalties can be fixed. It’s not a problem. Technical issues can be fixed.
The one thing that has to be kept in mind is that, sometimes, it takes a little bit longer than expected. For instance, in terms of backlinks, if we happen to have 100 million backlink profiles, we need to pull those backlinks first to understand what we’re having. If we’re changing something on the website that tackles a technical issue, those landing pages need to be crawled first.
Crawl budget becomes a really important issue in this discussion, which brings us to my favorite topic, which is server logs. If we still have time, I would love to talk about that because I think there are some real takeaways for all the listeners.
Yeah, let’s do it.
The reason for me mentioning server logs is because whenever I go to a conference, and I go to conferences to share and to speak about SEO on a very regular basis—it’s a real privilege to travel around and talk to cool people—I ask the audience, how many of you guys are aware that you save and preserve your raw web server logs? The answers vary, but most of the time, I would say about 10% say, yes, we would go ahead and save and preserve server logs.
If you open the staging server of several million landing pages for indexing, that will dilute your content on technical signals.
I also want to say not every website has to do that because some websites are literally very small. It can be beneficial, but it is not going to be enough data to really draw any actionable conclusions. When we’re talking about large websites, having server logs collected over an extended period of time makes a huge difference.
For instance, we can understand which parts of the website are being prioritized by Google and then compare. Are these the types of landing pages that we actually want to be crawled and indexed in ranking as time passes? Is it something else altogether? Is it the FAQ part or the Wiki pages? Or is it a blog that is only supplemental?
This is the most basic finding. There are, of course, HTTP responses. Are we responding with 200 okay with actual 200 okay pages, or are our responses 5xx responses? Or are we actually saying 200 is okay for actual errors, expired content, or expired product landing pages? All of these things can be much better identified if we happen to have server logs. Server logs that have not been recorded can never be retrieved, and this is very important.
If I may share a piece of advice for our listeners, something that you can consider a takeaway and maybe implement within your organization and benefit from as time passes, go ahead save and preserve your web server logs. Given our past experience, there will be two obstacles frequently, one being the legal team saying, “Oh, that might not be illegal.” Yes, it is.
You’re only saving and preserving server logs that pertain to bots. Do you understand which kind of landing pages are being requested by bots? We’re not talking about users, so only bots. The other question or concern being raise is, “Oh, it’s going to be so expensive.” But it’s not, luckily. Even though everything is getting more expensive, hard drives are not really that expensive, and server logs can be zipped. They can be compressed. It’s not going to be a huge drain on the budget.
If you happen to embrace that recommendation in six months or a year from now, you will be sitting on a treasure trove of data. Please make sure to factor it in when you conduct an audit. The findings will be much more precise, and you will be able to actually pinpoint those landing pages that are your cash cows in order to make sure that you put your best foot forward, even in the most competitive environments.
When I heard some of the use cases for server log analysis, like identifying when 404 error pages are returning a status code of 200, these different use cases you can identify without the server logs.
Yes, but you will never have the full picture. You can identify it, but you won’t be able to say which bits and pieces of the website are actually being prioritized. You can also not identify which kind of bots are actually crawling my site. Are these scraper bots? Are these Google bots and Bing bots? This is all the insight that is only possible with server logs.
There is a certain level of technical acumen that is required in order to actually take advantage of that data. Not every organization is large enough to justify that cost, but then you can collect, save, and preserve that data. If it’s two years of server logs that are being ultimately utilized when the audit is happening, that’s perfectly fine. But it really is an opportunity lost if the server logs are never being recorded because, again, there’s no way whatsoever to recover that data. It is lost for eternity.
Yeah. What tool are you using for the server log analysis? Is it Screaming Frog, Log Analyzer?AI is a great tool for automation. Click To Tweet
Among others, we have our own solution as well, but Screaming Frog is a very good one, indeed.
Oncrawl has a Log Analyzer as well.
They do. I’m a big fan here. I believe it’s a French operation, a French company that’s offering Oncrawl. We’re very keen on Oncrawl as well. The idea really is, and this is something that maybe warrants a little bit of explaining. Why utilize a multitude of tools?
Of course, the results are not 100% overlapping. But at times, one result stands out. Whenever that’s the case, either there is a false positive, or you’re actually finding something that the other tools omitted, which can happen.
The functionality of the tools that we utilize isn’t identical either. Sometimes, you can actually pinpoint what you want to be crawling, and sometimes it’s not possible. You can tap crawling so you can make sure that it’s not excessive. In cases where there is, for instance, filtering open to crawling, you can crawl frequently forever. But it’s not necessarily the kind of data that gives you additional information.
The reason for using a multitude of different tools is really to have this opportunity to double-check findings and to look for outliers that are out there. Something that I would recommend, it is also something that takes time. This is where server performance is generally holster. It’s also a factor in the hosting service. At times, when crawling, you have to be careful not to impair the actual user traffic. This is not something that you ever want to do.
No matter what you’re doing, if you’re conducting an audit externally or internally, you need to make sure that the user experience isn’t impaired in any way. The website cannot be slowed because of the crawling, which is the reason why crawling can take a really long time; five URLs per second tends to be a good pace. But at times, we crawl with one URL per second, and you have to live with that. Give it time. That is very necessary in order to crawl the critical mess.
If it’s a 50 million-page website—
It takes a bit of time, right?
You’ve been doing that for a year or something.
With the large websites, in my book, maybe starting at 50 million. Fifteen million is already very, very substantial, but what you probably consider the giants, particularly in the retail area, it does take time. If it’s one-off engagement, of course, there are several weeks of run-up time, where you just continuously crawl in order to get the data that you need. It doesn’t have to be 100% sample, but it needs to be representative enough.
I’m a big fan of the operator of Link Detox, but we’re not utilizing the tool itself. We’re tapping into the data source, but we’re not relying on the analysis itself.
Christoph is very cool, of course. If he sees that, I hope he does. I want to say that Link Research Tool is very cool. I’m very much hoping they’re going to reintroduce the annual event that they were offering in Vienna. I’d love to be part of that.
No, we’re not passing any judgment in terms of not utilizing Link Research Tools or the Detox Analyzer. The reason for us doing it differently is because we have our own tools, and we rely to a large extent while pre-filtering on the instinct gained over respective ten years working at Google. It turns out to be a very reliable method as far as we’re concerned.
I know we’re getting close to the end of time here. But one thing that I think is important to clarify for our listener is when a disavow file is going to potentially be helpful and when it’s not. There’s a manual link penalty, and you upload a disavow file. That’s a good action to take. But what about if there’s an algorithmic adjustment?
You kept the nuclear question to the very end because as much as I would love to give our audience a hopeful outlook and the answer we all strive for, the honest answer really is there is no way of telling in advance. I’ve seen everything. Frequently, there is plateauing, but it’s not given. I’ve seen websites actually decline, lose traffic, and lose organic visibility upon submitting the disavow file.
I’ve seen the opposite. I also want to say that this is the more optimistic and preferred course of action. Websites benefit from the single action. Nothing else was done from the single action, which is disavowing backlinks that were deemed to be in violation of Google Webmaster Guidelines and subsequently to websites being much more visible for relevant queries.
Unfortunately, there is no way of predicting that. I have never seen a tool that can do that. There is a risk. Unless there is a penalty, submitting a disavow file should be lower than P0 priority, so it shouldn’t be the first thing to do.
Backlinks are a big topic. I happen to be talking about backlinks at conferences recently quite a lot. It seems to be popular. Backlinks are a great topic, and they can be obviously beneficial. There are various ways of dealing with backlinks and how backlinks can be built in more or less safe ways.
We won’t be able to address all of the fine nuances of link building today, but I want to say one thing. Most websites out there can very well do without any link building or without any backlinks, as a matter of fact. There are some bigger fish to fry. Very frequently, technical signals and user signals, which can be very nicely fine-tuned, are literally the bigger fish to fry, the bigger return on investment if we apply finite resources.
One of my favorite things to look at in Google Search Console is sorting traffic over an extended period of time, looking for high-impression landing pages that have very low CTR. In turn, this indicates that user expectations are not met and that users happen to be very reluctantly clicking on those landing pages because this is almost low-hanging fruit if we happen to experiment and improve those snippets. It’s not across the board, not if you happen to have 150 landing pages. You can do that across the board.
When we’re dealing with a client who happens to be suffering from a penalty related to thin content penalty, it is critically important to understand the nature of their entire network.
You can pick and choose a few that already are very visible, where users don’t click on them. You can improve the snippets, test around, and see if you can tap into that reservoir of traffic that is already there for the taking.
There’s bigger fish to fry in my book than backlinks, even though they can be very, very important. There are simply more signals than just traffic, or just technical signals, or just content signals. They all play a factor, and they tend to affect each other as well. It’s good advice not to focus on just one exclusively as we go ahead and optimize the website.
There must have been occasions where you’ve had a client submit a disavow file, and then they’ve lost traffic organically.
Yes, of course. That can happen. Clients come to us and say, we have submitted this disavow file. Why is this happening to us? We had already business partners in the past, where the disavow file included both Google and our own website as a pattern. These things happen intentionally or not.
Very frequently, they can be undone. At times, they can be correlated. But as much as we would love to be able to predict which direction it is going to go, are we constantly predicting it’s going to be an upward or downward trajectory? It’s not something where we can, with confidence, say, yeah, we are absolutely sure this is the way this thing works.
In the case where the traffic has gone down after submitting a disavow file, do you then remove the disavow file, update it, or just leave it alone? What’s the next action?
It depends. We happen to be the party building the disavow file. We generally tend to have a very high level of confidence. But then again, we would not recommend you submit the disavow file initially before all the hard work has been done unless there was a penalty in place. If there is a penalty in place, if the penalty is link-related, the disavow file has to stay in place. Simply put, if it’s removed, the penalty is most likely to reappear again, and we basically have to start from scratch.
There’s the waiting game again because the request reconsideration request is a manual process. There is no confirmed turnaround time whatsoever. At times, it takes a couple of hours or days. It takes weeks or even months, so this is a huge, huge, huge risk. I do not recall off the top of my head a situation where, without a penalty in place, submitting the disavowed file was on top of the list because it never is. It really is a low priority.
What does that mean that it’s something to not do? Do you recommend the client not submit a disavow file if there’s no message in Google Search Console of a link penalty?
It depends. Sometimes, we do recommend maintaining the disavow file in case it’s needed. If the website fails to rank for relevant queries, where they have the relevant content, we may suggest to them to submit a disavow file nonetheless, but it is never the P0. It is never the top priority on the list.
If the priority, which is the topic we started our conversation with perfectly, is actually embraced, if the technical signals are fine-tuned, and if the content signals are unambiguous, any potential loss from submitting the disavow file can be offset with improved signals. This is the reason why priority is so important.
Again, there are no guarantees. There is no way of telling a client it’s going to go one way or the other. But generally speaking, the upward trajectory is, of course, the objective. If we keep an eye on the order of things, on the order of battle if you want, the order of the SEO battle if that’s something that we keep an eye on and embrace, typically the trajectory is a desirable one up into the right.
What happens if you need a Google Insider to press a button or move something along, and you haven’t gotten any action yet? Do you pick up the phone and call John Mueller, Gary Illyes, or somebody else inside of Google that you know? What do you do?
I would love to say yes. I suppose that would further boost our sales, but the honest answer is no. We maintain contacts other than John and Gary. There are still a handful of people that I had the privilege to work with at Google. I’m visiting there occasionally, going for a free lunch. Again, growing up in southern Germany, you never say no to a free lunch.
I refrain from really disclosing any information from a client. To begin with, there is an NDA, of course. You never talked about the client outside of the contractual obligation, so we don’t do that. Secondly, simply put, I do not believe that any of my former colleagues would actually go ahead and say, “You know what, I know you’re so well, let’s forget about that penalty. Let’s push that button that we talked about.” That’s just never gonna happen.
Even the attempt would rather shatter those networks and those relationships that we built over the years. Unfortunately, we don’t do that. We don’t have that special channel. While we maintain those contacts, we don’t have that special channel where we can say, come on, guys, help us out here. Luckily, in my experience, it’s not needed.
Yeah. Awesome. If our listener wants to work with you guys, maybe do a technical audit or some ongoing engagement, how do they get in touch?
That’s perfectly simple. You go to searchbrothers.com, which I use as the handle for our conversation to make it easier for our audience. That’s one way to get in touch. The other, of course, is to come and talk to us, either myself or Fili Wiese, if we happen to run into each other at one of the conferences that are upcoming.
Luckily, the conference season is in full swing. 2020 was a difficult year for conference speakers. 2021 wasn’t really still what we used to be. The previous year and this year are really, really good. I very much look forward to many conferences in the future to come. If we have the chance to talk face-to-face, let’s do that. Otherwise, drop us a line, and we’ll be glad to have the conversation and see if we can help each other.
All right. Thanks very much, Kaspar. Thank you, listener. We’ll catch you in the next episode. I’m your host, Stephan Spencer, signing off.
Connect with Kaspar Szymanski
Previous Marketing Speak Episode
Your Checklist of Actions to Take
Understand the impact of AI on SEO. AI, especially language models. AI is a valuable tool for automation in my SEO.
Evaluate URL structure changes. Remember that minor changes may not significantly affect my SEO rankings, and large-scale changes can be risky.
Consider the size of my website to determine the importance of my crawl budget. For larger websites, a crawl budget can be critical; for smaller sites, it may be less of a concern.
Highlight my unique selling proposition. Focus on my unique selling proposition and brand recognition to stand out in search results.
Maintain and analyze server logs. This is crucial for insights on how search engine bots crawl and interact with my website.
Distinguish between page-level penalties, keyword-level penalties, sitewide-level penalties, and other types of penalties. Diagnose the root cause of penalties and take corrective actions, such as addressing doorway pages or thin content issues.
Utilize proprietary SEO tools for analyzing my private data. These tools can mimic search engine experience and insights and provide an in-depth analysis for my individual clients.
Use a combination of SEO tools to cross-verify results and identify outliers. Each tool may have unique features or different data sources that can provide a comprehensive analysis.
Practice caution with my algorithm adjustments and using a disavow file. Algorithm changes don’t always provide a straightforward solution.
Collaborate with SEO expert Kaspar Szymanski and visit his website at searchbrothers.com. Take advantage of upcoming conferences and industry events to meet Kaspar in person.
About Kaspar Szymanski
Kaspar Szymanski is a renowned SEO expert, former senior member of the famed Google Search team and very few former Googlers with extensive policy driving, webspam hunting and webmaster outreach expertise. Nowadays, Kaspar applies his skills to recover websites from Google penalties and help clients grow their website visibility in search engine results.