Okuma görünümü

Google Clarifies Google's Crawler File Size Limits Doc Again

Google Web

Earlier this month, we reported that Google updated two of its help documents around Google's crawler file size limits. Well, Google made a clarification to one of those documents the other day after some confusion within the SEO industry.

This was the help document that was updated and it specifically says now "a Google crawler like Googlebot may have a smaller size limit (for example, 2MB), or specify a larger file size limit for a PDF than for HTML."

The new version reads:

By default, Google's crawlers and fetchers only crawl the first 15MB of a file, and any content beyond this limit is ignored. However, individual projects may set different limits for their crawlers and fetchers, and also for different file types. For example, a Google crawler like Googlebot may have a smaller size limit (for example, 2MB), or specify a larger file size limit for a PDF than for HTML.

The older version read:

By default, Google's crawlers and fetchers only crawl the first 15MB of a file. Any content beyond this limit is ignored. Individual projects may set different limits for their crawlers and fetchers, and also for different file types. For example, a Google crawler may set a larger file size limit for a PDF than for HTML.

This change was made a couple of days ago.

Forum discussion at X.

  •  

Google On If A Younger Web Site Can Beat An Older Website In Search

Old New Website Google

A thread at Reddit asked if a one-year-old website can beat a four-year-old website in SEO and perform better in Google Search. John Mueller from Google answered with his classic "it depends."

The answer depends on what that four year old site has been doing over the years versus what you have been doing over the years with your one year old site. If the older site has done nothing much, and the new site has been doing good things to build up the site's user based and such, then it can do better.

The question posted was:

Can a 1-year-old site realistically beat a 4-year-old competitor in SEO, and what actually helps close that gap fastest?

If a competitor has been around for 4 years and we started just 1 year ago, is it still realistic to beat them in SEO? What strategies actually help close the gap faster when competitors have more age, authority, and links?

John Mueller's answer:

A website growing older is inevitable; growing worthwhile is earned.

What happened in those 4 years? What happened in the 1 year? Some older websites have done amazing things over the years, and you can't just jump in with a site that has 2 links more and is 3 points closer to 100, and expect to be considered more relevant (whether by search engines or people). Other sites may have squandered their time in the domain registry and survived by being the proverbial one-eyed man of the web. (I think I'm trying to say "it depends"?)

If your site has been around for a year now, and don't have a clear understanding of the differences in value (not SEO metrics) between these sites, I'd recommend taking a step back and first trying to get an objective bigger picture view, and then - in most cases - thinking more strategically (users, marketing, functionality, business, promotion, users) rather than purely SEO-centric. Likely your site is built on a modern setup, and you sound SEO-focused, so technically there's probably not a lot to break or fix.

Forum discussion at Reddit.

  •  

Google Hit Self-Promotional Listicles In Recent Unconfirmed Updates?

Google Listicles

Google may have hit those self-promotional and self-serving listicle articles in one of the more recent unconfirmed Google search ranking updates. Lily Ray dug into a pattern she spotted with these types of pieces of content, mostly in the SaaS space, being hit hard with the January Google updates.

Lily Ray wrote Is Google Finally Cracking Down on Self-Promotional Listicles? "The most popular 'GEO' tactic might indeed be risky for SEO purposes after all," she added. She then dug through examples of these types of tactics performing super well, only to see those gains mostly wiped out in the past month.

The example she provided at the top was for [best content marketing agencies], but you've seen these - best X and top Y types of listicle articles. They almost always mention their own company or tool at the top position and then below, it lists competitors.

Here is one of those screenshots she posted:

Best Content Marketing Agency Google

But try it yourself, let's say for [best seo agency] in AI Mode and you get this page ranking in those responses. They list their own agency followed by others. It is these self-promotional listicles that were the "hack" to do well here for a while now and supposedly, a lot of these are not ranking anymore in Google.

Ranking Chart Declines

"As with many SEO trends before it, what works today may quietly become a liability tomorrow," Lily wrote. Danny Sullivan from Google said similar things recently but this is the trend, this is what we do.

Glenn Gabe added on X:

Yes, IT'S HAPPENING! With the latest volatility, Lily saw a bunch of sites dropping, but it was their blog content dropping heavily. And when digging into that, there were self-serving listicles all over the place. I have seen the same thing and have been digging in heavily with the latest volatility we have seen. Lily does a great job breaking everything down. And to me, I'm glad this is finally happening. Like I have explained before, it was super embarrassing for the sites publishing those self-serving listicles, we knew Google would crack down on that at some point, etc.

BTW, I feel like this could be a reviews system update and could be based on all the self-serving listicle crap flooding Google's index (and ranking in AIOs, AI Mode, etc.) And of course, any AI Search platform leveraging Google's results would be impacted downstream. This is big news. I would read Lily's post and then start digging into the movement.

Yes, IT'S HAPPENING! With the latest volatility, Lily saw a bunch of sites dropping, but it was their blog content dropping heavily. And when digging into that, there were self-serving listicles all over the place. I have seen the same thing and have been digging in heavily with'... https://t.co/HxKwl3VBFP

'" Glenn Gabe (@glenngabe) February 3, 2026

Here he says he thinks it goes beyond listicles but also is an update to the Google reviews system, formerly product reviews system.

Good Morning Google Land! This is the February 4th edition of "Reviews Update Notes". Hey, it doesn't roll like "Core Update Notes" but it fits... :) Also, at least I *think* it was a reviews update. Google stopped announcing reviews updates a while ago since the system is now'... pic.twitter.com/uTk7VmAdrF

— Glenn Gabe (@glenngabe) February 4, 2026

I also love the Mt. AI, although different situation from listicles, same conceptual issue:

I think I'll coin this "Mt. AI". Yet another example of a site publishing a ton of AI-generated content that surged... and ultimately came crashing down. I'm sure the site owners were saying, "AI content is amazing! Look at our trending!" ... until the crash happened. And that'... pic.twitter.com/DfWslqThrd

'" Glenn Gabe (@glenngabe) January 21, 2026

Forum discussion at X.

  •  

Google On Serving Markdown Pages To LLM Crawlers

Google Text File Laptop

Google's John Mueller responded to a question on the pros and cons of serving raw markdown pages to LLM crawlers and bots. John didn't say much but he did list a number of concerns and things you should be on top of, if you do go down that avenue.

A Markdown is a lightweight markup language used to create and edit technical documents using plain text and special characters for formatting. Markdown files are converted into HTML by a Markdown parser, which allows browsers to display the content to readers.

The question posted on Reddit was, "What is the actual risk/reward impact of serving raw Markdown to LLM bots?"

John replied with these concerns:

  • Are you sure they can even recognize MD on a website as anything other than a text file?
  • Can they parse & follow the links?
  • What will happen to your site's internal linking, header, footer, sidebar, navigation?
  • It's one thing to give it a MD file manually, it seems very different to serve it a text file when they're looking for a HTML page.

John then wrote on Bluesky, "Converting pages to markdown is such a stupid idea. Did you know LLMs can read images? WHY NOT TURN YOUR WHOLE SITE INTO AN IMAGE?"

So keep these questions in mind when considering doing this.

Hat tip to Gagan on this:

ð Creating markdown pages for LLM crawlers??
Here is what Google's John Mueller said about this
"Are you sure they can even recognize MD on a website as anything other than a text file? Can they parse & follow the links? What will happen to your site's internal linking, header,'... pic.twitter.com/KjvPk8t1NC

'" Gagan Ghotra (@gaganghotra_) February 3, 2026

Also check out:

This morning I made a small change to my site: I made every page available as Markdown for AI agents and crawlers. I expected maybe a trickle. Within an hour, I was seeing hundreds of requests from ClaudeBot, GPTBot, and OpenAI's SearchBot. ð² https://t.co/UD0h22AZEC

'" Dries Buytaert (@Dries) January 14, 2026

Forum discussion at Reddit.

Update: From both Google and Bing:

Got answers from both Google and now also Bing (thx @facan) about using separate .md pages for LLM crawlers

I'm not saying it's right or wrong either way (jury is still out as many folks are testing the impact now), but worth reviewing these official responses'... pic.twitter.com/ot6DdRBg44

— Lily Ray ð (@lilyraynyc) February 5, 2026
  •  

Googlebot File Limit Is 15MB But 64MB For PDF & 2MB For Other File Types

Google Spider

We have known for a long time that Google can crawl web pages up to the first 15MB but now Google updated some of its help documentation to clarify that it will crawl the first 64MB of a PDF file and the first 2MB of other supported file types.

The 64MB and 2MB items might not be new, but I don't think I covered those before. I know I covered the Google will crawl up to 2MB of your disavow file but no other mentions of 2MB is in my coverage.

This help document was updated to now read:

When crawling for Google Search, Googlebot crawls the first 2MB of a supported file type, and the first 64MB of a PDF file. From a rendering perspective, each resource referenced in the HTML (such as CSS and JavaScript) is fetched separately, and each resource fetch is bound by the same file size limit that applies to other files (except PDF files).

Once the cutoff limit is reached, Googlebot stops the fetch and only sends the already downloaded part of the file for indexing consideration. The file size limit is applied on the uncompressed data. Other Google crawlers, for example Googlebot Video and Googlebot Image, may have different limits.

Then Google also updated this document to add the 15MB limit, but that was not new - it now says:

By default, Google's crawlers and fetchers only crawl the first 15MB of a file. Any content beyond this limit is ignored. Individual projects may set different limits for their crawlers and fetchers, and also for different file types. For example, a Google crawler may set a larger file size limit for a PDF than for HTML.

Google explained, that "While moving over the information about the default file size limits of Google's crawlers and fetchers to the crawler documentation, we also updated the Googlebot documentation about its own file size limits." "The original location of the default file size limits was not the most logical place as it applies to all of Google's crawlers and fetchers, and the move enabled us to be more precise about Googlebot's limits," Google added.

The more precise details are useful to know.

There is some confusion around the 15MB for HTML files or 2MB files for HTML files and I asked John Mueller who replied on Bluesky saying, "In short (gotta run), Googlebot is one of Google's crawlers, but not all of them." He added, "Google has a lot of crawlers, which is why we split it. It's extremely rare that sites run into issues in this regard, 2MB of HTML (for those focusing on Googlebot) is quite a bit. The way I usually check is to search for an important quote further down on a page - usually no need to weigh bytes." "Sorry for missing this - like I mentioned in the other thread, we have a bunch of different crawlers (I know SEOs focus on Googlebot, but there's life outside of textual web-search :-)), so we have the general limit + the Googlebot-specifics documented," he added later.

Forum discussion at X.

  •  

Google: Search Algorithms, Spam Detections & Policies Don't Fundamentally Change With AI Search

Google Algorithm Code

Google's John Mueller said that when it comes to AI Search and the changes that come with that, Google's core search algorithms, spam detection methods, spam policies, and other search systems do not fundamentally change.

While he said there are some changes, but he added there are always some changes because "none of these things live in isolation, search evolves outside of AI, and the web is a dynamic place," he added.

This stems from a question Lily Ray asked on on Bluesky - she wrote:

Can you comment on whether anything has changed with regard to Google handling web spam, Helpful Content, etc. with the rise of AI search?

Does Google still plan to develop new algorithm updates and spam policies aimed at manipulation of search results? Can we expect Google to continue building new types of manual actions?

Just want to see if you can comment on whether anything has changed on this front as search moves more toward AIO/AI Mode.

John replied:

I don't think these things fundamentally change. Search has a looong history, with lots of experience & expertise. Of course there's *some* change - none of these things live in isolation, search evolves outside of AI, and the web is a dynamic place.

Here is a screenshot of those posts:

Bluesky Posts Johnmu

Forum discussion at Bluesky.

  •  

Google's Top Crawling Challenges In 2025

Lizzi Google Crawley

Gary Illyes, along with Martin Splitt, of Google posted a podcast explaining the top crawling challenges Google noticed amongst its 2025 year of crawling. The top challenges Google had with crawling included faceted navigation, action parameters, irrelevant parameters, calendar parameters and other "weird" parameters.

Here is the podcast embed:

These issues with crawling can impact a site's performance because bots might go in a loop of the site and cause server issues because of the load the bot is putting on the server resources. And as Gary said, "once it discovers a set of URLs, it cannot make a decision about whether that URL space is good or not unless it crawled a large chunk of that URL space."

Here is how Gary Illyes put the challenges by percentage:

  • Faceted Navigation was 50%: This occurs on websites (often e-commerce) that allow users to filter and sort items by various dimensions like price, category, or manufacturer. These combinations create a massive number of unique URL patterns. Googlebot may try to crawl all of them to determine their value, potentially crashing the server or rendering the site useless for users due to heavy load.
  • Action Paramters was 25%: These are URL parameters that trigger a specific action rather than changing the page content significantly. Common examples include parameters like ?add_to_cart=true or ?add_to_wishlist=true. Adding these parameters doubles or triples the URL space (e.g., a product page URL vs. the same URL with an "add to cart" parameter), causing the crawler to waste resources on identical content. These are often injected by CMS plugins, such as those for WordPress.
  • Irrelevant Parameters was 10%: Like UTM tracking parameters or parameters that Googlebot generally ignores or finds irrelevant to the content's state, such as Session IDs and UTM parameters. Googlebot struggles to determine if these random strings change the page content. It may crawl aggressively to test whether the parameters are meaningful, especially if standard naming conventions.
  • WordPress Plugins or Widgets was 5%: Where maybe these widgets add sort of event tracking or other things. This was a big challenge for Google because of the open source nature of it.
  • Other "Weird Stuff" was 2%: This catch-all category includes rare technical errors, such as accidentally double-encoding URLs (e.g., percent-encoding a URL that was already encoded). The crawler decodes the URL once but is left with a still-encoded string, often leading to errors or broken pages that the crawler attempts to process anyway.

This was an interesting podcast - here is the transcript if you want it.

Forum discussion at X.

Image credit Lizzi Sassman

  •  

Google: Don't Spend Too Much Time On Redirects Analysis For SEO

Google Redirect Line Laser

Google's John Mueller said he would "caution against assuming that you need to do this level of analysis for all URLs on a website in order to achieve optimal SEO" when it comes to reviewing bad redirects or CSP settings. Why, because bad redirects or CSP settings are often simply visible when doing normal browsing, and if you see it, then that is enough analysis.

In short, you don't need fancy SEO tools or features to find issues with bad redirects or CSP settings because you would likely see those anyway when just using your web browser and visiting those URLs.

John said this in a Reddit thread asking about if "auditing redirect chains in DevTools a massive time-sink." The answer is, yes, it can be.

John wrote:

There are a bunch of browser extensions that do this already (eg Redirect Path from Ayima is one I see a lot in screenshots, and CSP is very different from redirects, so I don't understand the connection). I don't recall a time when I ran into something like this causing SEO issues which weren't also visible to average users in their browsers.

I use Redirect Path from Ayima, it is a nice browser extension.

John goes into why it may be fun to dig in:

This is not to discourage you from digging into minute technical details, chasing through rabbit holes, and then making tools to make it easier for you :-). I have also spent days & weeks analyzing technical quirks & puzzles that in the end I realized ultimately don't matter, but which were "fun" (only 1/4 was done out of spite) & somewhat educational along the way. It's probably not healthy to over-fixate on these things, but I learn minutiae (that will never really matter, I know).

But ultimately, it is probably overkill, especially for SEO - John wrote:

This is mostly just to say I applaud your desire to understand all of these details, it's not unreasonable to practice digging into this from time to time, but I'd caution against assuming that you need to do this level of analysis for all URLs on a website in order to achieve optimal SEO/EtcO. There are many things that can subtly & invisibly go wrong with websites, but usually bad redirects or CSP settings will generally be very visible to people using browsers.

Forum discussion at Reddit.

  •  

Google On Recent Google Search Ranking Volatility - No Insights To Share

Google Code

As you know, January was an incredibly intense month of Google Search ranking volatility. It surprised me that Google has still not confirmed that a Google search ranking update took place. Now, John Mueller of Google was asked about it and said, "Unfortunately, I don't have any insights / updates to share."

John seem to imply the changes we (or he is) are seeing are related to the normal dynamic nature of the web. "The web & search is a very dynamic place, it's expected that things change over time," John explained.

John Mueller was asked:

We've recently noticed a drop in keyword rankings specifically in the USA region, while other regions look stable. There haven't been major site changes from our side, so I wanted to ask if there are any recent updates, signals.

John Mueller replied:

I understand the desire to learn more about search, but unfortunately I don't have any insights / updates to share. The web & search is a very dynamic place, it's expected that things change over time. I'd recommend not just looking at numbers, but digging into the details, if it's important to you.

Here are some of the unconfirmed Google search ranking updates I covered in January:

And the volatility from late last week is finally starting to calm down a bit today.

John basically is saying he is not able to share any insights about any changes on Google's end when it comes to rankings or signals.

As you know, Google does push out smaller core updates without announcing them - so maybe that is what we covered above. Or maybe they were not core related, it is hard to know for sure.

But Google has nothing to share about these updates, not yet at least...

Here are those posts on Bluesky:

John Update Post

Forum discussion at Bluesky.

  •  

Google Search Console Adding AI Visibility Reporting?

Google Data Chart

John Mueller from Google dropped a very John-like hint that maybe, just maybe, Google Search Console will add AI visibility reporting. John was asked about it again and responded, saying, "While I have nothing to announce, I can say for sure that very few things online are permanent."

John was asked, "Also, are there any plans to include AIO/AI Mode insights in Google Search Console?" by 'ªMarcus Anthony Cyganiak'¬. Lily Ray, replied, like I would have, saying, "Well yeah, also that, but we already know the answer."

But John said, "many things aren't permanent decisions." He added, "Things change - SEOs write about what changed - while I have nothing to announce, I can say for sure that very few things online are permanent."

Here is that post:

Kinda like the rest of search, many things aren't permanent decisions that are never discussed. Things change - SEOs write about what changed - while I have nothing to announce, I can say for sure that very few things online are permanent. It's also good to have input from others, like you all.

'" John Mueller (@johnmu.com) January 30, 2026 at 7:00 AM

Recently, we caught Bing testing an AI Performance report in Bing Webmaster Tools. While it doesn't show click data, it does give us more visibility than we had before.

Like Bing, Google currently lumps its AI search features with web search in the Search Console reporting. Everyone I know wants this data broken out but we also doubt Google or Bing will show us the click-through rate of AI search features compared to normal search results. Bing's AI performance report dodges that, of course. So does lumping the data together.

Will we see Search Console come out with AI reports? Possibly. But I highly doubt it would include click data.

Forum discussion at Bluesky.

Update: John Mueller said this was NOT a hint but just a comment:

Not a hint, just a comment.

— John Mueller (@johnmu.com) February 2, 2026 at 8:38 AM
  •  

February 2026 Google Webmaster Report

Google Webmaster Report

Welcome to the Google Webmaster report, where I sum up all the more important Google organic search topics that occurred over the past month - just in case you (or I) missed it. January was an incredibly volatile month regarding unconfirmed Google search ranking movement. I posted about it several times.

Google officially made the switch to have AI Overviews flow into AI Mode, resulting in less traffic for publishers and site owners. Google is being forced to allow sites to opt out of AI search experiences but will you block Google, many say no. Personal Intelligence is now in both AI Mode and Gemini, and Google is personalizing some AI answers which are now powered by Gemini 3. But don't worry, core search signals are built into those AI experiences.

Google will demote prediction content in top stories and news. Plus, we posted an affidavit on Google search signals, and tons of random SEO tips.

Local search is also suffering from AI Overviews, which is a shame but at least review appeals are no longer delayed.

Yep, Google is appealing the search monopoly ruling - no one is surprised. And Apple is going with Google Gemini to power Siri and Apple Intelligence.

Those were the some of the larger changes over the past month, make sure to check out the January 2026 Google webmaster report if you missed that.

Here are the bigger Google SEO stories from the past 30 days:

Google Algorithm Updates/ Volatility:

Google AI: Google SEO: Google Local & Business Profiles: Google User Interface: Google Business:

Forum discussion at WebmasterWorld

  •  

Google Search Adds Preferred Sources Help Docs

Robot Paper Park Bench

Several weeks after Google rolled out support for Preferred Sources globally, Google added official help documentation for site owners to use to help them understand what it is all about and how to encourage their readers to subscribe to your site as a preferred source.

In December, Google rolled out Preferred sources globally after rolling it out in the US and India in August and beta testing it in June.

Now the new help documentation is available here if you need it.

Google wrote:

If you're a website owner, you can help your audience find your publication as a preferred source in Google Search. When a user selects your site as a preferred source, your content is more likely to appear for them during relevant news queries in "Top Stories".

As a reminder, if you love this site, you can add this site as a preferred source on Google by clicking here.

The preferred sources feature is available globally in English for queries that trigger the "Top Stories" feature.

Only domain-level and subdomain-level sites are eligible to appear in the source preferences tool. For example, https://www.example.com/ and https://code.example.com/ are eligible for preferred sources, but the subdirectory https://www.example.com/blog isn't eligible.

Google added, "These methods are examples on how you can build your audience and help people find your site as a preferred source. It's not required to do them in order to appear as a preferred source."

Here is a screenshot of the help document, so I can archive it myself:

Google Preferred Sources Help Documentation

Forum discussion at X.

  •  

Poll: 33% Will Block Google AI Search Experience: AI Mode & AI Overviews

Google Bar Chart

I ran a poll yesterday on X asking Would you block Google from using your content for AI Overviews and AI Mode. About 33% of the over 350 responses said they would block Google from using and showing their content in the AI search experiences. 42% said they would not block Google and 25% are not sure yet.

As a reminder, Google, after being forced by the UK regulatory body, said it is exploring ways to allow us to block our sites from being used in search AI experiences, like AI Overviews and AI Mode.

Here is that poll:

Would you block Google from using your content for AI Overviews and AI Mode - Google may be giving us more controls - take my poll below. https://t.co/60M3Vt0YlN

'" Barry Schwartz (@rustybrick) January 28, 2026

Question: Would you block Google from using your content for AI Overviews and AI Mode?

  • 33.2% - Yes, I'd block Google
  • 41.9% - No, I wouldn't block
  • 24.9% - I am not sure yet.

I wish it had more results but I do think most people do not know yet.

Forum discussion at X.

  •  

Google Exploring Ways To Allow Sites To Opt Out Of AI Overviews & AI Mode

Google Ai Cards

Google just announced it is looking into ways to allow websites to specifically opt out of Google using its content in the Search generative AI features such as AI Overviews and AI Mode. This comes based on UK's Competition and Markets Authority (CMA) new requirements for Google Search - the CMA posted more here.

These new requirements include ways for Google to "provide to websites to manage their content in Search AI features."

Google said, "we're now exploring updates to our controls to let sites specifically opt out of Search generative AI features." This will be based on how they allow sites to control how their sites show up for featured snippets and Google Extended, which does not impact search AI features.

Google added, "Our goal is to protect the helpfulness of Search for people who want information quickly, while also giving websites the right tools to manage their content. We look forward to engaging in the CMA's process and will continue discussions with website owners and other stakeholders on this topic."

Google made it clear that the new controls "need to avoid breaking Search in a way that leads to a fragmented or confusing experience for people. As AI increasingly becomes a core part of how people find information, any new controls also need to be simple and scalable for website owners."

The CMA wrote, "To provide certainty for stakeholders, the CMA published possible measures it might take in a roadmap in June 2025. The CMA is now consulting on the proposed conduct requirements below. The measures have been designed to support innovation and growth, ensuring people benefit from a high-quality digital experience:"

  • Publisher controls: Making sure content publishers get a fairer deal by giving them more'¯choice and transparency over how their content is used in Google's AI Overviews. Publishers will be able to opt out of their content being used to power AI features such as AI Overviews or to train AI models outside of Google search. Google will also be required to take practical steps to ensure publisher content is properly attributed in AI results.
  • Fair ranking: Making sure Google's approach to ranking search results is fair and transparent for businesses, with an effective process for raising and investigating issues. Google will be required to demonstrate to the CMA and its users that it ranks search results fairly, including in its AI Overviews and AI Mode.
  • Choice screens: Making it easier for people to switch the search services they use by making default choice screens on Android mobiles a legal requirement and introducing choice screens on the Chrome browser.
  • Data portability: Making it easier for people and businesses to make use of Google search data.

I am looking forward to giving content creators, publishers and site owners more control over how search engines can use your content. I mean, Google already thought of ways but went with the least helpful option. I guess this is back on the table.

Forum discussion at X.

If you are on X, take my poll:

Would you block Google from using your content for AI Overviews and AI Mode - Google may be giving us more controls - take my poll below. https://t.co/60M3Vt0YlN

— Barry Schwartz (@rustybrick) January 28, 2026
  •  

New GoogleBot: Google Messages

Google Message Robot

Google added a new crawler, robot, to its list of user-triggered fetchers in the Google crawlers documentation. This specific bot is named Google Messages and it is a fetcher "used to generate link previews for URLs sent in chat messages," Google wrote.

The User-Agent in HTTP requests is "GoogleMessages" and it is a fetcher and "is used to generate link previews for URLs sent in chat messages."

Google said they added it to "To help site owners identify traffic from Google Messages when it generates link previews for URLs sent in chat messages."

Googlebot Google Messages

Forum discussion at X.

  •  

Google Search Monopoly Appeal Legal Docs Mention Search Signals

Google Legal

As you know, Google has appealed its search monopoly ruling and with that, filed a number of new documents with the court. One is an affidavit of Elizabeth Reid, Google's Vice President and Head of Search. The other is of Jesse Adkins Director of Product Management for Search Syndication and Search Ads Syndication.

In the Affidavit of Elizabeth Reid '" Document #1471, Attachment #2 Reid talks about why Google thinks it should not go through with some of the court's remedies.

Specifically, Google does not want to go through with the "Required Disclosures of Data" and Section V titled "Required Syndication of Search Results." Why? Reid wrote, "Google will suffer immediate and irreparable harm as a result of the transfer of this proprietary information to Google's competitors, and may additionally suffer irreparable financial and reputational harm should the data provided to competitors be leaked or hacked."

The details Google would have to give competitors include:

  • a unique identifier ('DocID') of each document (i.e., URL) in Google's Web Search Index and information sufficient to identify duplicates;
  • 'a DocID to URL map'; and
  • "for each Doc ID, the (A) time that the URL was first seen, (B) time that the URL was last crawled, (C) spam score, and (D) device-type flag."

Google thinks handing this over will:

(1) Give its competitors an unfair advantage because Google spent dozens of years working on these methods.

(2) It would give away which URLs Google thinks are more important than others.

(3) It would allow spammers to reverse engineer some of its algorithms.

(4) It will make private information from searchers available to its competitors.

Google wrote:

First, Google's crawling technology processes webpages on the open web, relying on proprietary page quality and freshness signals to focus on webpages most likely to serve users' information needs. Second, Google marks up crawled webpages with proprietary page understanding annotations, including signals to identify spam and duplicate pages. Finally, Google builds the index using the marked-up webpages generated in the annotation phase. Google's index employs a proprietary tiering structure that organizes webpages based on how frequently Google expects the content will need to be accessed and how fresh the content needs to be (the fresher the content needs to be, the more frequently Google must crawl the webpage).

It goes on to read, "The image below from the demonstrative (RDXD-28.005) shows the fraction of pages (in green) that make it into Google's web index, compared with the pages that Google crawls (in red). Under the Final Judgment, Google must disclose to Qualified Competitors the curated subset reflected in green."

Google Indexed Urls Vs All Urls

Yea, that shows how many URLs Google knows about what is indexed by Google. That is a huge difference!

Google added:

If spammers or other bad actors were to gain access to Google's spam scores from Qualified Competitors via data leaks or breaches'"a realistic outcome given the tremendous value of the data'"Google's search quality would be degraded and its users exposed to increased spam, thereby weakening Google's reputation as a trustworthy search engine.

The disclosure of the spam signal values for Google's indexed webpages via a data leak or breach would degrade Google's search quality and diminish Google's ability to detect spam. As I testified at the remedies hearing, the open web is filled with spam. Google has developed extensive spam-fighting technologies to attempt to keep spam out of the index. Fighting spam depends on obscurity, as external knowledge of spam-fighting mechanisms or signals eliminates the value of those mechanisms and signals.

If spammers or other bad actors gained access to Google's spam scores, they could bypass Google's spam detection technologies and hamstring Google in its efforts to combat spam. For example, spammers commonly buy or hack legitimate websites and replace the content with spam, an attack made easier if spammers can use Google's spam scores to target webpages Google has assessed as low spam risk. In this way, the compelled disclosures are likely to cause more spam and misleading content to surface in response to user queries, compromising user safety and undermining Google's reputation as a trustworthy search engine.

Then it gets into GLUE and RankEmbed:

User-side Data used to build, create, or operate the GLUE statistical model(s)' and (ii) 'User-side Data used to train, build, or operate the RankEmbed model(s),' 'at marginal cost.'

The 'User-side Data' encompassed by Section IV.B of the Final Judgment includes highly sensitive user data, including but not limited to the user's query, location, time of search, and how the user interacted with what was displayed to them, for example hovers and clicks.

The data used to build Google's 'Glue' model also includes all web results returned and their order, as well as all search features returned and their order. The Glue model captures this data for the preceding thirteen months of search logs.

You can also review the Affidavit of Jesse Adkins '" Document #1471, Attachment #3 - that is on the ad side.

Forum discussion at Marie Haynes private forums (sorry).

  •  

Google Search Team Does Not Endorse LLMs.txt Files

Google Llms Files

Yep, back on comments from Google on the LLMs.txt file. Another question came up on Bluesky asking if the fact that some Google properties still have the LLMs.txt files up, if that is some sort of endorsement from Google. John Mueller from Google said, simply, "no," it is not an endorsement.

John was asked by 'ªEsben Rasmussen, "Sorry for being late to the party, but I just spotted this ai.google.dev/api/llms.txt. @johnmu.com Is this an endorsement from Google?"

John wrote on Bluesky, "I'm tempted to say something snarky since this has come up so often, but to be direct, no."

Here is a screenshot of that conversation:

Bluesky Convo

As a reminder, several weeks ago, the CMS platform Google uses began supporting LLMs.txt files and it was added to a lot of Google's various developer docs. This includes the Google Search dev docs but shortly after it was added, the search team removed it from its specific developer docs. Other teams didn't care or didn't notice and left it up. John said it was added for other reasons, not for what you might think.

Google has been saying that no one uses the LLMs.txt file, that Google won't use it, that it can be useless, and you probably should noindex it if you do use it.

So here it is again, Google's Search team is not a fan of the LLMs.txt file. Although I doubt it hurts to have one...

Forum discussion at Bluesky.

  •  

Google Warns On Hosting With Free Subdomain Hosts

Dirty Google Data Center

Google's John Mueller once again warned about hosting your website on free subdomain hosting service because they are magnets for a "lot of spam & low-effort content."

You want to make sure to host your site on a server that is not overrun by spam and low-quality sites.

John said this on Reddit the other day but it was not the first time he said, this. He actually said it a number of times, such as don't host on cheap TLDs, they can hurt your efforts and Google even targets whole TLDs sometimes.

John wrote on Reddit, "A free subdomain hosting service attracts a lot of spam & low-effort content."

His rationale:

It's a lot of work to maintain a high quality bar for a website, which is hard to qualify if nobody's getting paid to do that (and just generally tricky: do they throw people out if they don't agree with the quality?). For you, this means you're basically opening up shop on a site that's filled with - potentially - problematic "flatmates". This makes it harder for search engines & co to understand the overall value of the site - is it just like the others, or does it stand out in a positive way? On a domain name of your own you stand on your own, with its pros and cons. (And with a domain of your own, I'd also watch out for the super-cheap TLDs, which come with similar hurdles.)

Here are other reasons why this guy's site won't rank well, outside of the cheap hosting.

* You're publishing content on a topic that's already been extremely well covered. There are sooo many sites out there which offer similar things. Why should search engines show yours? There's sooo much competition out there, with people who have worked on their sites for more than a decade, many of them with professional web-oriented backgrounds. Yes, sometimes a "new take" on an "old topic" makes sense, but then I'd expect that users would recognize that and link to your site, even sending direct traffic.

* These things take time & work / promotion. Especially if it's a new site (even if you had it on a good domain name of your own), getting indexed is one thing, but appearing in search results often just takes time, especially when there are alternative sites already.

Another thing to keep in mind is that search engines are just one part of the web. If you love making pages with content like this, and if you're sure that it hits what other people are looking for, then I'd let others know about your site, and build up a community around it directly. Being visible in popular search results is not the first step to becoming a useful & popular web presence, and of course not all sites need to be popular.

This doesn't mean you need to pay a fortune for your host. Just don't go free.

Forum discussion at Reddit.

  •  

Google: Comments Link Spam Has No Effect On SEO/Search

Google Link Shield

Google's John Mueller said that link spam left in the comments section have no effect on Google Search or SEO, (and maybe even your website's performance in Google Search?).

He wrote on Bluesky, "These links all have no effect - they're from spammers dropping links into comments. These would not have any effect, positive nor negative, on your site."

It seems like John is specifically saying that even if I have a lot of spammy links in the comments section on my site, those links would not have a positive or negative impact on the performance of this site in Google Search.

Here was the post he was replying to:

I would like to bring an issue to your attention that is caused via below mentioned domain. I have emailed the owner also but no response yet! What I must do now? I have already disavowed in GSC too but our web traffic is constantly dropping.

URL - mattsoncreative.com/blog/2013/09...

This was the message I sent to the owner last week [We would like to remove this unwanted anchor tag (porn) directing to our blog either you remove it completely and block the user & its IP for further creating more such links/ or change it to sperm cramps.

For more details Kindly have a look at the below image.]

Bhs

Here is John's reply:

These links all have no effect - they're from spammers dropping links into comments. These would not have any effect, positive nor negative, on your site.

— John Mueller (@johnmu.com) January 18, 2026 at 3:25 AM

Google has long downplayed link spam as a waste of time to deal with. Google also said link spam doesn't work in forums and that you can generally ignore link spam.

Forum discussion at Bluesky.

  •  

Google To Prioritize Removing Prediction News Content From Search & News

Google Sports News Tv

Rajan Patel, the VP of Engineering for Search, said on X that Google is making "changes to ranking" to remove prediction content from showing up in the Google Search top stories and news sections.

This comes after some sites are posting "prediction" content, predicting that some sports trades may happen, that have not happened yet, and those "stories" show up in the news section as actually having occurred.

Matt Mikle shared a number of examples on X since the beginning of the year. For when you search for some sports teams or players, fake news comes up. The truth is, these are labled on the site's "predictions" category but no where in the title or image of the post do you see it is a prediction. You need to click over to the content, scroll to the bottom of the page, to know this is not news, but rather a prediction and has not necessarily happened or won't necessarily ever happen.

Here are some examples but Matt posted many more:

Google Search Ranking Prediction News Sites

Rajan Patel from Google wrote on X:

This is definitely an opportunity for us to improve and we're working on it. We make changes to ranking thoughtfully and after considerable experimentation and analysis, so it won't be a quick fix type of thing but it is something we're prioritizing.

Sorry for the slow reply on this. This is definitely an opportunity for us to improve and we're working on it. We make changes to ranking thoughtfully and after considerable experimentation and analysis, so it won't be a quick fix type of thing but it is something we're'...

'" Rajan Patel (@rajanpatel) January 16, 2026

So we won't see changes tomorrow but Google is "prioritizing" its efforts to resolve this.

Although, maybe this should be a manual action because of the misleading content policy? That policy reads, "We don't allow preview content that misleads users to engage with it by promising details which aren't reflected in the underlying content."

Prediction articles in itself aren't bad as long as it says it in the title. But these titles make it seem like the move or trade already happened which is pure click bait pic.twitter.com/CtnagmLVfM

— Matt Mikle (@Moneyman2626) January 16, 2026

Forum discussion at X.

  •