Matt's Blog

How’s Forbidden Land Twisted PageGetter?

May 6th, 2007 at 12:50 AM (11 years ago) by Matt Freedman

Oh, that’s right, you won’t be able to read this post, since you’re banned! Anywho…

I recently noticed that a Bot, that identifies itself as “Twisted PageGetter”, has been crawling this blog (more specifically, my Feed) quite frequently. Last month it made about 10 932 hits, and this month (as of May 6, 2007 at about 2AM) about 1883 times. That’s some serious crawling. I’ve yet to find any information on what exactly this Bot does, and what it crawls for. So, I decided to Block it, and send it a nice 403 Forbidden Error. Not that it uses much bandwidth (didn’t even use 1MB last month). I have lots of Bandwidth anyways. No, I banned it because it’s throwing off my Stats. Plus it’s not useful and doesn’t seem to be doing anything good.

If you’re wondering, its User-Agent is “Twisted PageGetter” and its IP Address is “207.0.19.182”. If you want to block it, you can put this code in your .htaccess file:

RewriteEngine On 
RewriteCond %{HTTP_USER_AGENT} "Twisted PageGetter" [NC] 
RewriteRule ^.*$ - [F,L]

That’ll give it a 403 Forbidden Error when it tries to access your site. Hopefully it’ll “get the message”, and stop crawling your site.

Hopefully “Twisted PageGetter” will be gone from my Blog now… 😛

Update [May 6, 2007]: Twisted PageGetter tried to crawl my Feed again today. You know what it got? A big fat 403 Forbidden Error! Ha!

Twisted PageGetter Blocked

7 Responses to “How’s Forbidden Land Twisted PageGetter?”

  1. Steve
    Steve says:

    The spider used by Bloglines also shows up as “Twisted PageGetter”, so you may want to think twice about blocking it. (see http://ascher.ca/blog/2004/11/09/bloglines-twisted/)

  2. Paul Querna
    Paul Querna says:

    Hello,

    I’m an engineer for Bloglines, and would just like to let you know that the “Twiested PageGetter” is not bloglines — and AFAIK, never was Bloglines. The IP address mentioned by Steve above, has never hosted Bloglines services, so I’m not sure where Steve got this impression.

    Bloglines crawlers always have “Bloglines” in their user-agent — even new services or crawlers in development/testing.

    Thanks,

    -Paul

  3. Matt
    Matt says:

    Thanks for taking the time to come and comment on this matter, Paul.

    I actually contacted BlogLines, and no, they don’t use “Twisted PageGetter” as their spider’s User-Agent.

  4. Champi
    Champi says:

    I had to block it too.. real annoying.. seems more people have trouble with 207.0.19.182

  5. ezXplain
    ezXplain says:

    Twisted PageGetter – block or not…

    I found a Bot identifying itself as Twisted PageGetter in the access log of my site.
    After I googled for it I found on Matt’s Blog that he suggests to block this Bot with an htaccess entry.
    This is working quite well, I tried it.
    But: I am using …

  6. Matt’s Blog » Blog Archive » Twisted PageGetter is… SpotPlex?
    Matt’s Blog » Blog Archive » Twisted PageGetter is… SpotPlex? says:

    […] PageGetter is… SpotPlex? Awhile back, I made a post on a bot called Twisted PageGetter. Twisted PageGetter is a bot that I found while searching through this blog’s access logs, […]

  7. SpotPlex Closed Down « Matt's Blog
    SpotPlex Closed Down « Matt's Blog says:

    […] a Digg-like site based on page views instead of voting, has shutdown their site and their hyper […]

Leave a Reply

Quote selected text

Leave the following field empty: