A few weeks ago, I saw a flurry of conversation about how you can now disallow OpenAI from indexing your personal website using robots.txt
:
User-agent: GPTBot
Disallow: /
That felt a bit “ex post facto“ as they say. Or, as Jeremy put it, “Now that the horse has bolted—and ransacked the web—you can shut the barn door.”
User-agent: GPTBot
Disallow: /
But folks seemed to be going ahead and doing it anyway and I thought to myself, “Yeah, I should probably do that too…” (especially given how “fucking rude” AI is in not citing its sources).
But I never got around to it.
Tangentially, Manuel asked: what if you updated your robots.txt
and blocked all bots? What would happen? Well, he did it and after a week he followed up. His conclusion?
the vast majority of automated tools out there just don't give a fuck about what you put in your robots.txt
That’s when I realized why I hadn’t yet added any rules to my robots.txt
: I have zero faith in it.
Perhaps that faith is not totally based in reality, but this is what I imagine a robots.txt
file doing for my website: