quote:
Originally posted by xeen:
You know when you search for something on Yahoo or Google, next to each search result there is a link that says "cached" which takes you to a cached version of the result. Well I was just wondering - how long are those things cached for? Is it forever or do they expire and get deleted at some point. Also who caches them?
Those results are cached by a crawlerbot, probably not unlike WebReaper or similar products. If I recall correctly, they're held in a "common cache" on the search servers, which is periodically flushed whenever the bot comes across a new version of the site. Since the bots can only process sites sequentially (even working in parallel, there are limits) it may be some time before the bot comes back around, sees a new site, and caches it. If the robots exclusion is present on the site (don't remember the filename offhand), it will never be cached. I still think a better way to do this is by combined hit count/relevance, but whatever.