Although Search engine Spiders are a good thing (e.g. GoogleBot, YahooSlurp etc) as if they dont spider your website, you will probably not appear on search results, but at the same time, they also cause a lot of un-necessary management to be done on the developer side of things. Here are a few of the things I have noticed :-

1) They cause a lot of un-necessary active sessions created on the server if you have session management turned on (which most sites do).
2) There is a growing number of bots and its really hard to tell which ones to block and which ones not.
3) You do want the major search engines to spider you but you don’t necessarily want every search engine to spider you and it becomes hard to maintain this list (for a developer).

Apart form the above, another major problem is when someone intentionally tries to download/crawl your website using a custom written script or using a site downloader and in this case, the Browser/Referrer is mostly spoofed as someting they are not. These crawlers most probably do not even respect the existence of a robots.txt file.

My question is, apart from checking the name of the bot and restricting their session to like a few seconds , what other checks do people put in place to solve these hurdles ?