It's not only the chief

by Volker Weber

Do they pay the webmaster by LOC?


The competitor is more concise :-)

Klaus Pohlmann, 2006-02-20

Looks like that way would be more efficient for the White House in Washington... but actually, maybe some of the directories would be interesting to crawl and he gives away there names on purpose so some smart journalist could get to the information more easily?

Ragnar Schierholz, 2006-02-20

Did you notice that it's recursive? 5th line of "Disallow"...

Disallow: /robots.txt

Wouldn't that mean that a well behaved robot would bail out at that point and not follow the remaining directives? ;-)

Chris Linfoot, 2006-02-20

Interesting observation... is that actually covered by the standard? I mean, those guys who designed the robots.txt mechanism, would they have thought about what happens if robots.txt denies access to itself? Somehow, I have my doubts...

Ragnar Schierholz, 2006-02-20


"robots.txt" is interpreted by many search engines. How this recursive instruction is interpreted depends on the implementation of the crawler. Formally, the behaviour is undefined, so anything could happen :-)

In practice, the instruction will probably have no effect. The resource "robots.txt" is already loaded when the parser hits the instruction "Disallow /robots.txt" and the admission to load the resource can not be denied retroactively.

Timo Stamm, 2006-02-20

Maybe it was just a misunderstanding of the webmaster ... maybe he had "sitemap.xml" and "robots.txt" mixed up?

Steffen Gutermann, 2006-02-21

Old archive pages

I explain difficult concepts in simple ways. For free, and for money. Clue procurement and bullshit detection.


Paypal vowe