It's not only the chief
by Volker Weber
Do they pay the webmaster by LOC?
Comments
The competitor is more concise :-)
Looks like that way would be more efficient for the White House in Washington... but actually, maybe some of the directories would be interesting to crawl and he gives away there names on purpose so some smart journalist could get to the information more easily?
Did you notice that it's recursive? 5th line of "Disallow"...
Disallow: /robots.txt
Wouldn't that mean that a well behaved robot would bail out at that point and not follow the remaining directives? ;-)
Interesting observation... is that actually covered by the standard? I mean, those guys who designed the robots.txt mechanism, would they have thought about what happens if robots.txt denies access to itself? Somehow, I have my doubts...
Ragnar,
"robots.txt" is interpreted by many search engines. How this recursive instruction is interpreted depends on the implementation of the crawler. Formally, the behaviour is undefined, so anything could happen :-)
In practice, the instruction will probably have no effect. The resource "robots.txt" is already loaded when the parser hits the instruction "Disallow /robots.txt" and the admission to load the resource can not be denied retroactively.
Maybe it was just a misunderstanding of the webmaster ... maybe he had "sitemap.xml" and "robots.txt" mixed up?