Bad Behaviour and PubSub
-
I use the excellent BB plugin from Michael Hampton, as no doubt a significant number of you do as well. The plugin has served me very well over the last year or so and I obviously recommend it very highly.
My particular BB logtable has shown over time that the two most widespread problems (from BB’s point of view) are:
1. Required header ‘Accept’ missing.
2. Header ‘Pragma’ without ‘Cache-Control’ prohibited for HTTP/1.1 requests.Maybe this is not the case for some of you but anyway, if I could just get to the point.
It’s my understanding that (1) is typically caused by misconfigured personal firewalls, proxies, download accelerators and privacy software. For instance, I have found that the customers (or staff) of Time Warner Telecom are by far the biggest culprits.
The issue which seems somewhat contentious is (2) and I’ll cite PubSub as an example. Their crawler sends a HTTP/1.1 request to our site only to be blocked by BB with a “Header ‘Pragma’ without ‘Cache-Control’ prohibited for HTTP/1.1 requests” error.
I have reported the issue to them but it seems that they have a much more “liberal” interpretation of RFC2616 then Michael Hampton does. They’re basically saying that the RCF states that a HTTP/1.1 request only should, not must send the “Cache-Control” field, if it sends the (depreciated) “Pragma” field. The upshot basically seems to be that even though the RFC says that “Cache-Control” should be present when “Pragma” is present, they don’t consider their crawler “faulty”. Why? because it doesn’t say “must”, that’s why.
Michael Hampton sees it from a different point of view: If they’re just going to provide the “Pragma” field without “Cache-Control” then they should be using HTTP/1.0, not HTTP/1.1 which they currently do.
Of course, BB provides the mechanism to whitelist user agents and IP addresses, so that’s one solution. The other, I guess, is to do nothing and just forget about the potential traffic PubSub may send my way. I guess the question is: Why do I really need traffic from PubSub anyway?
A very long post, I know.
My question to all of you BB users out there is how do you handle the issue of (2) above? Who have you found to be the culprits? Have you tried contacting their webmasters and what responses have you got?
The topic ‘Bad Behaviour and PubSub’ is closed to new replies.