Tiny Core Linux
Off-Topic => Off-Topic - Tiny Core Lounge => Topic started by: gadget42 on August 23, 2025, 01:49:31 AM
-
question for admin/mods: wondering reason for increased forum website traffic?
iirc, up until quite recently the most online ever was about 1k less?
did we get a mention somewhere recently?
or is it just increased ai/bot/llm/lvm/etc activities?
-
just noticed that a few hours after i posted the above commentary on the increased traffic from ai/bot/llm/lvm/etc, the ai/bot/llm/lvm/etc traffic doubled. ai/bot/llm/lvm/etc is ruining the open web for everyone everywhere.
-
It’s bots. There is not an easy way to remove them from the online count
-
imho i think it is better that forum members/visitors ARE able to see the bot traffic
i would not want it hidden at/on any website, in fact it should be actively called out by all the websites under siege.
-
Most of them are well behaved, honoring rate settings. I’ve not seen it really affect the load of the server.
-
https://tech.slashdot.org/story/25/08/31/1820249/are-ai-web-crawlers-destroying-websites-in-their-hunt-for-training-data
-
wowza!
Most Online Ever: 16857 (December 09, 2025, 12:37:20 PM)
anyone know the _what/why_ regarding this recent rather large spike in MOE traffic?
(might be ai/bot/llm/lvm/etc using residential based proxies which would massively increase the "individual" entity traffic based on originating ip addresses)
(re: residential proxies, see for example _randomly_referenced_ oxylabs.io and www[.]webshare.io)
-
after reading a recent commentary that included information about aisuru botnet here:
https://blog.cloudflare.com/ddos-threat-report-2025-q3/#aisuru-breaking-records-with-ultrasophisticated-hyper-volumetric-ddos-attacks
more searching resulted in a couple pieces from krebs:
https://krebsonsecurity.com/2025/10/ddos-botnet-aisuru-blankets-us-isps-in-record-ddos/
https://krebsonsecurity.com/2025/10/aisuru-botnet-shifts-from-ddos-to-residential-proxies/
snippet tidbit(mostly because an earlier post referenced oxylabs.io and www[.]webshare.io):
Today, Spur says it is tracking an unprecedented spike in available proxies across all providers, including;
LUMINATI_PROXY 11,856,421
NETNUT_PROXY 10,982,458
ABCPROXY_PROXY 9,294,419
OXYLABS_PROXY 6,754,790
IPIDEA_PROXY 3,209,313
EARNFM_PROXY 2,659,913
NODEMAVEN_PROXY 2,627,851
INFATICA_PROXY 2,335,194
IPROYAL_PROXY 2,032,027
YILU_PROXY 1,549,155
-
naturally the bots are still out of control:
https://forum.tinycorelinux.net/index.php/topic,28089.0.html
-
Most Online Today: 31582. Most Online Ever: 31582 (Today at 10:43:55 AM)
-
Yup. They all nailed the server at the same time about 11:40 EDT.
It’s why a lot of servers are putting cloudflare in front of them.
-
It is probably my fault. I check TC a few times a week.
Sorry
Vic
-
Hi Paul_123
There were about 16000 users around 11:40. They really managed
to slow the site down. When I returned later on I saw the number
had peaked to over 31000.
-
It's why a lot of servers are putting cloudflare in front of them.
It's not clear if that means you're considering doing the same, but I'll just make the point that when most sites do that (or start using any other service that requires Javascript to try and verify humanity) I stop visiting.
I know it's a tough problem to solve (my own website was getting crippled by millions of bot hits a day a while ago), and other common solutions like blocking IPs from certain countries may cut off other users, but I'm just sharing my point of view.
-
Hits from today by useragent only the top 20
156472 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36
43616 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/145.0.0.0 Safari/537.36
11518 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
9614 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36
4645 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.7680.177 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
3814 Mozilla/5.0 (X11; Linux i686; rv:109.0) Gecko/20100101 Firefox/115.0
2841 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36
2662 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36
2157 Mozilla/5.0 (X11; Linux x86_64; rv:149.0) Gecko/20100101 Firefox/149.0
1728 Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:149.0) Gecko/20100101 Firefox/149.0
1683 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36
1605 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; ClaudeBot/1.0; +claudebot@anthropic.com)
1543 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
1446 Mozilla/5.0 (X11; Linux x86_64; rv:140.0) Gecko/20100101 Firefox/140.0
1255 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)
1079 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36
927 Terra Cotta 0.1 https://www.github.com/ceramicTeam/CeramicTerracotta
788 Wget
759 Mozilla/5.0 (compatible; Thinkbot/0.5.8; +In_the_test_phase,_if_the_Thinkbot_brings_you_trouble,_please_block_its_IP_address._Thank_you.)
755 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/140.0.0.0 newsai/1.0 Safari/537.36
The first user agent was the offender. they launched almost 60 requests per second for about 20 minutes) Here is the real problem. This attack came from 104,000 different ip addresses.
-
Hits from today by useragent only the top 20
Hi Paul_123. How awful.
I'll just make the point that when most sites do that (or start using any other service that requires Javascript to try and verify humanity) I stop visiting.
It's not like server administrators have great options in this situation.
I have my own http server which I use just for myself, family, and few friends. The server was getting hammered with tens of thousands of visits an hour, every hour, every day. I don't have premium hardware, so sometimes I couldn't access my own http server >:(
The options I considered were 1) shut off the server, 2) use a Javascript gatekeeper (e.g., Anubis), and 3) put the server on a nonstandard port. I actually tried all three options for a while. Doing without the server was too painful. Anubis worked well but added too much complexity to my otherwise barebones setup. Using a nonstandard port turned out to be the right balance for me.
Using a nonstandard port does not eliminate the problem (some bots are more sophisticated and do port scanning), but it eliminates >50% of bot traffic, bringing the noise down to a tolerable level. Would using a nonstandard port be worth trying for the TCL forum? The problem is that this would prevent a lot of legitimate, human users from finding the forum.
-
Based on the port scanning going on, I doubt it. Might slow them down for a couple of days. And would just frustrate users.
Yesterday's hit was the first botnet that I know of. Otherwise the bots have been fairly respectful.
-
Otherwise the bots have been fairly respectful.
perhaps a-bit in parallel to this topic ,
i post as i just happened to read the above then the quote below quote from https://lists.gnu.org/archive/html/help-guix/2026-04/msg00047.html
which seam to be vastly differing perspectives
GPTBot alone did 109,552 accesses to my website in march, so I think
they are telling the truth in a very misleading way.
The websites that go into these stats have together about 2000 HTML
documents (https://www.1w6.org has 811, https://www.draketo.de/node has
827 and https://www.draketo.de/ has 296).
99% of these change less than once per year.
If GPTbot crawls them every day, that’s 2000x30 = 60.000 accesses per
month -- which is pretty close to the 109,552 accesses I see.
But I built these websites over 20 years. The oldest articles are from
2007.
A human goes there, reads 1-20 articles and leaves again. Maybe to
return later when there’s a new article (I have RSS feeds).
An LLM goes there and crawls everything. Every day.
There even was a week where GPT tried every possible combination of
search inputs on 1w6.org -- including repeated arguments, likely until
it hit the URL length limit of the server. My log analysis tool needed
days to complete the analysis after that week. And I give thanks to my
hoster that they didn’t boot me then (and that I don’t have to pay for
excess bandwidth).
-
The different perspective is that I expect some level of scraping. Its just the time we live. I specifically use a host that allows for unlimited bandwidth. Anything I can do to limit it will be obtrusive to the real users.
-
Hi Paul_123
... I specifically use a host that allows for unlimited bandwidth. Anything I can do to limit it will be obtrusive to the real users.
If you are talking about downloading extensions from the repo, then
yes, I would agree with that statement.
But this is a simple forum that's not littered with adds and videos.
Even attachments are limited to 200K in size (and total). How much
bandwidth is really needed for reading the forum.
I lowered the download speed on one of my machines to 1Mbit/sec
and had no trouble navigating the website.
If it's possible to set a speed limit that's comfortable for human
consumption, but less comfortable for bots scraping web pages, it
might be worth considering.
Just a thought.
-
External bandwidth is never an issue and never what throttles the site. Its the php processing and database processes that jam up the CPU. Things are already rate limited for IP addresses and sessions. But a botnet avoids all of these limits.
-
I'll just make the point that when most sites do that (or start using any other service that requires Javascript to try and verify humanity) I stop visiting.
It's not like server administrators have great options in this situation.
I have my own http server which I use just for myself, family, and few friends. The server was getting hammered with tens of thousands of visits an hour, every hour, every day. I don't have premium hardware, so sometimes I couldn't access my own http server >:(
The options I considered were 1) shut off the server, 2) use a Javascript gatekeeper (e.g., Anubis), and 3) put the server on a nonstandard port. I actually tried all three options for a while. Doing without the server was too painful. Anubis worked well but added too much complexity to my otherwise barebones setup. Using a nonstandard port turned out to be the right balance for me.
In my case I was able to identify a common argument in the request URL strings in all the requests coming from the botnet that was making millions of requests per day to my site. By adding a rule in the Apache configuration I blocked requests matching the bot's requests, and since that prevented loading the PHP module for them the server was then able to handle all the requests the botnet could sent without running out of RAM anymore. I still needed to significantly increase overall connection limit settings in Apache and the Linux kernel itself, but then it was able to absorb the attack which continued for a week or two before finally giving up.
That was with a $1/month VPS, but I was lucky it was a crazy bot using a pointless argument in requested URLs (I guess it was running some idiotic AI-generated code), so I could block it without affecting human (or even sensible crawler) visitors at all. I've read accounts of other people identifying similar ways of blocking bots with web server rules to filter request URLs. Others have blocked impossible or unlikely User-Agents (really old browsers without sufficiently modern HTTPS support to really connect), since some botnets seem to use a pool of random browser User-Agents which isn't up to date. I could have blocked South American and Asian IP addresses since all the hundreds of thousands of IPs the botnet used seemed to be from there, but I didn't want to. Maybe that would be another option for your personal site though. Others block IPs based on the owners of IP blocks (eg. cloud/VPS hosting companies).
Lots of answers, but I agree no single one is perfect for every situation.
-
another bot saga
https://words.filippo.io/dependabot/