Search engine cache

The link for the cached version of a web page in search results from Google (top), Bing (middle) and Yandex (bottom)

A search engine cache is a cache of web pages that shows the page as it was when it was indexed by a web crawler. Cached versions of web pages can be used to view the contents of a page when the live version cannot be reached, has been altered or taken down.^[1]

A web crawler collects the contents of a web page, which is then indexed by a web search engine. The search engine might make the copy accessible to users. Web crawlers that obey restrictions in robots.txt^[2] or meta tags^[3] by the site webmaster may not make a cached copy available to search engine users if instructed not to.

Search engine cache can be used for crime investigation,^[4] legal proceedings^[5] and journalism.^[6]^[1] Examples of search engines that offer their users cached versions of web pages are Bing, Yandex Search, and Baidu.

Search engine cache may not be fully protected by the usual laws that protect technology providers from copyright infringement claims.^[7]

Google Cache

Google retired its web caching service in 2024.^[8] The service was designed for websites that might show up in a Google search result, but are temporarily offline. As a "cache", it was not designed for archival purposes, the cache had expiration. Google said the Internet as of 2024 is much more reliable than it was "way back" in earlier days, and therefore its cache service is no longer an important service to maintain.^[8]

Google pointed to the Wayback Machine as a better alternative, and suggested Google might work with them in the future.^[8] In September 2024, Google and the Internet Archive announced a collaboration providing links to the Wayback Machine from within Google Search.^[9]

References

^ ^a ^b Wilfried Ruetten (2012). The Data Journalism Handbook. O'Reilly Media, Inc. ISBN 9781449330064. When a page becomes controversial, the publishers may take it down or alter it without acknowledgment. If you suspect you're running into the problem, the first place to turn is Google's cache of the page as it was when it did its last crawl.
^ "Robots meta tag, data-nosnippet, and X-Robots-Tag specifications". noarchive: Do not show a cached link in search results.
^ "Special tags that Google understands - Search Console Help". noarchive - Don't show a Cached link for a page in search results.
^ Todd G. Shipley, Art Bowker (2013). Investigating Internet Crimes: An Introduction to Solving Crimes in Cyberspace. Newnes. ISBN 9780124079298. For the investigator this can be a valuable piece of information. Depending on when Google crawled the site, the last page may contain information different from the current page. Documenting and capturing Google's cached page of a webpage can therefore be important step to ensure this time snapshot is preserved.
^ Steven Mark Levy (2011). Regulation of Securities: SEC Answer Book. Aspen Publishers Online. ISBN 9781454805434. The World Wide Web is not as ephemeral as one might think. An increasing number of older web pages are available online through such services as the Wayback Machine, Yahoo Cache, or Bing Cache. Some plaintiffs' lawyers and corporate gadflies use these services as a matter of routine.
^ Cleland Thom (2014-10-23). "Google's caches and .com search engine provide 'right to be forgotten' solutions". Press Gazette. Journalists can also access delisted content via the Google cache.
^ Herman De Bauw, Valerie Vandenweghe (June 2011). "Brussels Court of Appeal upholds judgment against Google News and Google Cache". Archived from the original on 2015-04-26. For the cache function, the Court rejected the exception of a "technically necessary copy". This exception exempts temporary reproduction which is a necessary part of a technical process applied by an intermediary for transmission in a network between third parties. According to the Court, the cache copy that Google stores on its server is not technically necessary for efficient transmission.
^ ^a ^b ^c "Google Search's cache links are officially being retired". 2 February 2024.
^ Freeland, Chris (September 11, 2024). "New Feature Alert: Access Archived Webpages Directly Through Google Search". The Internet Archive. Retrieved 2024-09-11.

[journalismhandbook-1] Wilfried Ruetten (2012). The Data Journalism Handbook. O'Reilly Media, Inc. ISBN 9781449330064. When a page becomes controversial, the publishers may take it down or alter it without acknowledgment. If you suspect you're running into the problem, the first place to turn is Google's cache of the page as it was when it did its last crawl.

[2] "Robots meta tag, data-nosnippet, and X-Robots-Tag specifications". noarchive: Do not show a cached link in search results.

[3] "Special tags that Google understands - Search Console Help". noarchive - Don't show a Cached link for a page in search results.

[4] Todd G. Shipley, Art Bowker (2013). Investigating Internet Crimes: An Introduction to Solving Crimes in Cyberspace. Newnes. ISBN 9780124079298. For the investigator this can be a valuable piece of information. Depending on when Google crawled the site, the last page may contain information different from the current page. Documenting and capturing Google's cached page of a webpage can therefore be important step to ensure this time snapshot is preserved.

[5] Steven Mark Levy (2011). Regulation of Securities: SEC Answer Book. Aspen Publishers Online. ISBN 9781454805434. The World Wide Web is not as ephemeral as one might think. An increasing number of older web pages are available online through such services as the Wayback Machine, Yahoo Cache, or Bing Cache. Some plaintiffs' lawyers and corporate gadflies use these services as a matter of routine.

[pressgazette-6] Cleland Thom (2014-10-23). "Google's caches and .com search engine provide 'right to be forgotten' solutions". Press Gazette. Journalists can also access delisted content via the Google cache.

[eubelius-7] Herman De Bauw, Valerie Vandenweghe (June 2011). "Brussels Court of Appeal upholds judgment against Google News and Google Cache". Archived from the original on 2015-04-26. For the cache function, the Court rejected the exception of a "technically necessary copy". This exception exempts temporary reproduction which is a necessary part of a technical process applied by an intermediary for transmission in a network between third parties. According to the Court, the cache copy that Google stores on its server is not technically necessary for efficient transmission.

[Verge-8] "Google Search's cache links are officially being retired". 2 February 2024.

[9] Freeland, Chris (September 11, 2024). "New Feature Alert: Access Archived Webpages Directly Through Google Search". The Internet Archive. Retrieved 2024-09-11.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]