How’s Forbidden Land Twisted PageGetter?
May 6th, 2007 at 12:50 AM (18 years ago) by Matt FreedmanOh, that’s right, you won’t be able to read this post, since you’re banned! Anywho…
I recently noticed that a Bot, that identifies itself as “Twisted PageGetter”, has been crawling this blog (more specifically, my Feed) quite frequently. Last month it made about 10 932 hits, and this month (as of May 6, 2007 at about 2AM) about 1883 times. That’s some serious crawling. I’ve yet to find any information on what exactly this Bot does, and what it crawls for. So, I decided to Block it, and send it a nice 403 Forbidden Error. Not that it uses much bandwidth (didn’t even use 1MB last month). I have lots of Bandwidth anyways. No, I banned it because it’s throwing off my Stats. Plus it’s not useful and doesn’t seem to be doing anything good.
If you’re wondering, its User-Agent is “Twisted PageGetter” and its IP Address is “207.0.19.182”. If you want to block it, you can put this code in your .htaccess file:
RewriteEngine On RewriteCond %{HTTP_USER_AGENT} "Twisted PageGetter" [NC] RewriteRule ^.*$ - [F,L]
That’ll give it a 403 Forbidden Error when it tries to access your site. Hopefully it’ll “get the message”, and stop crawling your site.
Hopefully “Twisted PageGetter” will be gone from my Blog now… 😛
Update [May 6, 2007]: Twisted PageGetter tried to crawl my Feed again today. You know what it got? A big fat 403 Forbidden Error! Ha!
Find something useful here? Feel free to help me out by sending a donation. :)
7 Responses to “How’s Forbidden Land Twisted PageGetter?”
Leave a Reply
May 16th, 2007 at 1:36 AM | Quote Comment
The spider used by Bloglines also shows up as “Twisted PageGetter”, so you may want to think twice about blocking it. (see http://ascher.ca/blog/2004/11/09/bloglines-twisted/)
May 17th, 2007 at 5:58 PM | Quote Comment
Hello,
I’m an engineer for Bloglines, and would just like to let you know that the “Twiested PageGetter” is not bloglines — and AFAIK, never was Bloglines. The IP address mentioned by Steve above, has never hosted Bloglines services, so I’m not sure where Steve got this impression.
Bloglines crawlers always have “Bloglines” in their user-agent — even new services or crawlers in development/testing.
Thanks,
-Paul
May 19th, 2007 at 8:52 PM | Quote Comment
Thanks for taking the time to come and comment on this matter, Paul.
I actually contacted BlogLines, and no, they don’t use “Twisted PageGetter” as their spider’s User-Agent.
September 20th, 2007 at 3:07 AM | Quote Comment
I had to block it too.. real annoying.. seems more people have trouble with 207.0.19.182
January 27th, 2008 at 1:10 PM | Quote Comment
Twisted PageGetter – block or not…
I found a Bot identifying itself as Twisted PageGetter in the access log of my site.
After I googled for it I found on Matt’s Blog that he suggests to block this Bot with an htaccess entry.
This is working quite well, I tried it.
But: I am using …
January 29th, 2008 at 12:35 PM | Quote Comment
[…] PageGetter is… SpotPlex? Awhile back, I made a post on a bot called Twisted PageGetter. Twisted PageGetter is a bot that I found while searching through this blog’s access logs, […]
May 10th, 2008 at 11:01 PM | Quote Comment
[…] a Digg-like site based on page views instead of voting, has shutdown their site and their hyper […]