Good morning, everyone! Sorry I haven't checked in (in quite a bit) but life's other obstacles sometimes get in the way!
Paul_123: Agents...
Until they figure out we're onto them... use their agent tags as a death-trap:
156472 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36
43616 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/145.0.0.0 Safari/537.36
The two top scrapers claim to be APPLE + CHROME + SAFARI
Do some digging and see if there's a REAL browser out there claiming to be safari AND google, IF NOT, there's the first security trap at our front door.
For a REAL macOS: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15
For a REAL iPhone: Mozilla/5.0 (iPhone; CPU iPhone OS 17_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Mobile/15E148 Safari/604.1
Note: no mention of CHROME anywhere... that's likely a tactic to "please most any website/server" by user-agent.
I haven't looked, but if there's an actual G00gle
browser FOR APPLE for some reason, then user-agent traps may not be the ticket.
That said, you can instead use
cookie batter as bait...
1. Plant a session cookie that expires in, oh, say 5 seconds - most INTERACTIVE websites use session cookies for even simple tasks like logins
2a. If bot follows suit, wait for around 5 hits within that time frame then self-jail that IP for however long sounds fair - humans can't "read" a web page at 1 page per second - the actual time and logic will have to be tweaked based on TLC.net's server response time to make it truly worthwhile
2b. If bot finds a way AROUND session cookies, do a bounce-test (on landing, if $_SESSION['self_test'] is empty, goto ./test.php, if test.php detects $_SESSION['self_test'] is STILL empty, that's a red flag for bots and humans alike. It's not something people normally can "turn off" in settings or preferences in major browsers.) Note: "self_test" has to be randomized to prevent bots from "learning" that if they want in, they need to tamper with "self_test" in order to come in without issue. Session cookies are stored SERVER SIDE so that's rare to happen.
3. Kill a bot's connection for even 15 seconds once this flag's been tripped and you're likely to force it to turn away OR throttle itself. For humans... pretending their F5 key got stuck... a 15 second ban isn't the end of the world

4. Next comes the three-strike-rule... trip the above hits-per-second three times in a row within so many minutes and all hits thereafter get redirected (header 301/302) to themselves (127.0.0.1) which "should" in theory actually slow the bot down overall as all of these thousands of sockets hitting us are being redirected... and now waiting for "localhost" to answer on port 12345.
In theory. 
It's funny, but it's "AI" that brought me here! (Automated Idiocracy)
I was running a scenario through one of M$ LLMs asking what the challenges would be to install vLLM/ollama/etc. onto TinyCore (it laughed, basically telling me it'll be a painful experience) and I remembered a TLC member asking about a year or so ago why we don't have an AI doing our extension builds (as maintainers) - which is somewhat what I'm finagling... which led me to here.
So I did a little more digging and the LLM actually knows quite a bit about the ins and outs of the OS and the content of the wiki and forum, so yes, there's SOME good that's come from it, but tactfulness and respect of the crawlers is virtually non-existent, so we may have to teach it a few graces. Weather it likes it or not.
NOTE: Google crawler isn't overly socket-friendly either, so what keeps the beasts away may also keep the spiders away.