Author Topic: Cache on search engines  (Read 517 times)

Xeen

  • VIP
  • Member
  • ***
  • Posts: 1,065
  • Kudos: 55
Cache on search engines
« on: 16 August 2004, 18:51 »
You know when you search for something on Yahoo or Google, next to each search result there is a link that says "cached" which takes you to a cached version of the result. Well I was just wondering - how long are those things cached for? Is it forever or do they expire and get deleted at some point. Also who caches them?

Orethrius

  • Member
  • **
  • Posts: 1,783
  • Kudos: 982
Cache on search engines
« Reply #1 on: 16 August 2004, 19:02 »
quote:
Originally posted by xeen:
You know when you search for something on Yahoo or Google, next to each search result there is a link that says "cached" which takes you to a cached version of the result. Well I was just wondering - how long are those things cached for? Is it forever or do they expire and get deleted at some point. Also who caches them?


Those results are cached by a crawlerbot, probably not unlike WebReaper or similar products.  If I recall correctly, they're held in a "common cache" on the search servers, which is periodically flushed whenever the bot comes across a new version of the site.  Since the bots can only process sites sequentially (even working in parallel, there are limits) it may be some time before the bot comes back around, sees a new site, and caches it.  If the robots exclusion is present on the site (don't remember the filename offhand), it will never be cached.  I still think a better way to do this is by combined hit count/relevance, but whatever.

Proudly posted from a Gentoo Linux system.

Quote from: Calum
even if you're renting you've got more rights than if you're using windows.

System Vitals

KernelPanic

  • VIP
  • Member
  • ***
  • Posts: 1,878
  • Kudos: 222
Cache on search engines
« Reply #2 on: 16 August 2004, 22:05 »
quote:
Originally posted by Midnight Candidate/BOB:
If the robots exclusion is present on the site (don't remember the filename offhand), it will never be cached.


robots.txt
Contains scenes of mild peril.