Tiny Core Linux

Off-Topic => Off-Topic - Tiny Core Lounge => Topic started by: gadget42 on August 23, 2025, 01:49:31 AM

Title: question for admin/mods: wondering reason for increased forum website traffic?
Post by: gadget42 on August 23, 2025, 01:49:31 AM: question for admin/mods: wondering reason for increased forum website traffic?

iirc, up until quite recently the most online ever was about 1k less?

did we get a mention somewhere recently?

or is it just increased ai/bot/llm/lvm/etc activities?
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: gadget42 on September 01, 2025, 02:20:56 AM: just noticed that a few hours after i posted the above commentary on the increased traffic from ai/bot/llm/lvm/etc, the ai/bot/llm/lvm/etc traffic doubled. ai/bot/llm/lvm/etc is ruining the open web for everyone everywhere.
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: Paul_123 on September 01, 2025, 08:55:22 AM: It’s bots. There is not an easy way to remove them from the online count
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: gadget42 on September 02, 2025, 09:43:50 AM: imho i think it is better that forum members/visitors ARE able to see the bot traffic

i would not want it hidden at/on any website, in fact it should be actively called out by all the websites under siege.
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: Paul_123 on September 02, 2025, 10:01:59 AM: Most of them are well behaved, honoring rate settings. I’ve not seen it really affect the load of the server.
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: gadget42 on September 02, 2025, 06:36:16 PM: https://tech.slashdot.org/story/25/08/31/1820249/are-ai-web-crawlers-destroying-websites-in-their-hunt-for-training-data
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: gadget42 on December 21, 2025, 06:11:10 AM: wowza!

Most Online Ever: 16857 (December 09, 2025, 12:37:20 PM)

anyone know the _what/why_ regarding this recent rather large spike in MOE traffic?
(might be ai/bot/llm/lvm/etc using residential based proxies which would massively increase the "individual" entity traffic based on originating ip addresses)
(re: residential proxies, see for example _randomly_referenced_ oxylabs.io and www[.]webshare.io)
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: gadget42 on December 29, 2025, 10:09:49 AM: after reading a recent commentary that included information about aisuru botnet here:

https://blog.cloudflare.com/ddos-threat-report-2025-q3/#aisuru-breaking-records-with-ultrasophisticated-hyper-volumetric-ddos-attacks

more searching resulted in a couple pieces from krebs:

https://krebsonsecurity.com/2025/10/ddos-botnet-aisuru-blankets-us-isps-in-record-ddos/

https://krebsonsecurity.com/2025/10/aisuru-botnet-shifts-from-ddos-to-residential-proxies/

snippet tidbit(mostly because an earlier post referenced oxylabs.io and www[.]webshare.io):
Quote
Today, Spur says it is tracking an unprecedented spike in available proxies across all providers, including;

LUMINATI_PROXY 11,856,421
NETNUT_PROXY 10,982,458
ABCPROXY_PROXY 9,294,419
OXYLABS_PROXY 6,754,790
IPIDEA_PROXY 3,209,313
EARNFM_PROXY 2,659,913
NODEMAVEN_PROXY 2,627,851
INFATICA_PROXY 2,335,194
IPROYAL_PROXY 2,032,027
YILU_PROXY 1,549,155
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: gadget42 on April 04, 2026, 07:59:52 AM: naturally the bots are still out of control:

https://forum.tinycorelinux.net/index.php/topic,28089.0.html
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: gadget42 on April 07, 2026, 04:28:55 PM: Most Online Today: 31582. Most Online Ever: 31582 (Today at 10:43:55 AM)
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: Paul_123 on April 07, 2026, 04:43:03 PM: Yup. They all nailed the server at the same time about 11:40 EDT.

It’s why a lot of servers are putting cloudflare in front of them.
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: Vic on April 07, 2026, 04:45:00 PM: It is probably my fault. I check TC a few times a week.

Sorry

Vic
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: Rich on April 07, 2026, 04:52:06 PM: Hi Paul_123
There were about 16000 users around 11:40. They really managed
to slow the site down. When I returned later on I saw the number
had peaked to over 31000.
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: CNK on April 07, 2026, 08:08:17 PM: Quote from: Paul_123 on April 07, 2026, 04:43:03 PM
It's why a lot of servers are putting cloudflare in front of them.

It's not clear if that means you're considering doing the same, but I'll just make the point that when most sites do that (or start using any other service that requires Javascript to try and verify humanity) I stop visiting.

I know it's a tough problem to solve (my own website was getting crippled by millions of bot hits a day a while ago), and other common solutions like blocking IPs from certain countries may cut off other users, but I'm just sharing my point of view.
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: Paul_123 on April 07, 2026, 08:10:43 PM: Hits from today by useragent only the top 20

Code: [Select]
156472 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36 43616 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/145.0.0.0 Safari/537.36 11518 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com) 9614 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36 4645 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.7680.177 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 3814 Mozilla/5.0 (X11; Linux i686; rv:109.0) Gecko/20100101 Firefox/115.0 2841 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36 2662 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36 2157 Mozilla/5.0 (X11; Linux x86_64; rv:149.0) Gecko/20100101 Firefox/149.0 1728 Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:149.0) Gecko/20100101 Firefox/149.0 1683 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36 1605 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; ClaudeBot/1.0; +claudebot@anthropic.com) 1543 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot 1446 Mozilla/5.0 (X11; Linux x86_64; rv:140.0) Gecko/20100101 Firefox/140.0 1255 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot) 1079 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36 927 Terra Cotta 0.1 https://www.github.com/ceramicTeam/CeramicTerracotta 788 Wget 759 Mozilla/5.0 (compatible; Thinkbot/0.5.8; +In_the_test_phase,_if_the_Thinkbot_brings_you_trouble,_please_block_its_IP_address._Thank_you.) 755 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/140.0.0.0 newsai/1.0 Safari/537.36
The first user agent was the offender. they launched almost 60 requests per second for about 20 minutes) Here is the real problem. This attack came from 104,000 different ip addresses.
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: GNUser on April 08, 2026, 09:14:48 AM: Quote from: Paul_123 on April 07, 2026, 08:10:43 PM
Hits from today by useragent only the top 20
Hi Paul_123. How awful.

Quote from: CNK on April 07, 2026, 08:08:17 PM
I'll just make the point that when most sites do that (or start using any other service that requires Javascript to try and verify humanity) I stop visiting.
It's not like server administrators have great options in this situation.

I have my own http server which I use just for myself, family, and few friends. The server was getting hammered with tens of thousands of visits an hour, every hour, every day. I don't have premium hardware, so sometimes I couldn't access my own http server >:(

The options I considered were 1) shut off the server, 2) use a Javascript gatekeeper (e.g., Anubis), and 3) put the server on a nonstandard port. I actually tried all three options for a while. Doing without the server was too painful. Anubis worked well but added too much complexity to my otherwise barebones setup. Using a nonstandard port turned out to be the right balance for me.

Using a nonstandard port does not eliminate the problem (some bots are more sophisticated and do port scanning), but it eliminates >50% of bot traffic, bringing the noise down to a tolerable level. Would using a nonstandard port be worth trying for the TCL forum? The problem is that this would prevent a lot of legitimate, human users from finding the forum.
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: Paul_123 on April 08, 2026, 10:46:01 AM: Based on the port scanning going on, I doubt it. Might slow them down for a couple of days. And would just frustrate users.

Yesterday's hit was the first botnet that I know of. Otherwise the bots have been fairly respectful.
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: mocore on April 09, 2026, 04:12:16 AM: Quote from: Paul_123 on April 08, 2026, 10:46:01 AM
Otherwise the bots have been fairly respectful.

perhaps a-bit in parallel to this topic ,
i post as i just happened to read the above then the quote below quote from https://lists.gnu.org/archive/html/help-guix/2026-04/msg00047.html

which seam to be vastly differing perspectives

Quote from: help-guix/2026-04/msg00047
GPTBot alone did 109,552 accesses to my website in march, so I think
they are telling the truth in a very misleading way.

The websites that go into these stats have together about 2000 HTML
documents (https://www.1w6.org has 811, https://www.draketo.de/node has
827 and https://www.draketo.de/ has 296).

99% of these change less than once per year.

If GPTbot crawls them every day, that’s 2000x30 = 60.000 accesses per
month -- which is pretty close to the 109,552 accesses I see.

But I built these websites over 20 years. The oldest articles are from
2007.

A human goes there, reads 1-20 articles and leaves again. Maybe to
return later when there’s a new article (I have RSS feeds).

An LLM goes there and crawls everything. Every day.

There even was a week where GPT tried every possible combination of
search inputs on 1w6.org -- including repeated arguments, likely until
it hit the URL length limit of the server. My log analysis tool needed
days to complete the analysis after that week. And I give thanks to my
hoster that they didn’t boot me then (and that I don’t have to pay for
excess bandwidth).
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: Paul_123 on April 09, 2026, 08:41:08 AM: The different perspective is that I expect some level of scraping. Its just the time we live. I specifically use a host that allows for unlimited bandwidth. Anything I can do to limit it will be obtrusive to the real users.
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: Rich on April 09, 2026, 02:27:45 PM: Hi Paul_123
Quote from: Paul_123 on April 09, 2026, 08:41:08 AM
... I specifically use a host that allows for unlimited bandwidth. Anything I can do to limit it will be obtrusive to the real users.
If you are talking about downloading extensions from the repo, then
yes, I would agree with that statement.

But this is a simple forum that's not littered with adds and videos.
Even attachments are limited to 200K in size (and total). How much
bandwidth is really needed for reading the forum.

I lowered the download speed on one of my machines to 1Mbit/sec
and had no trouble navigating the website.

If it's possible to set a speed limit that's comfortable for human
consumption, but less comfortable for bots scraping web pages, it
might be worth considering.

Just a thought.
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: Paul_123 on April 09, 2026, 04:07:14 PM: External bandwidth is never an issue and never what throttles the site. Its the php processing and database processes that jam up the CPU. Things are already rate limited for IP addresses and sessions. But a botnet avoids all of these limits.
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: CNK on April 09, 2026, 08:25:04 PM: Quote from: GNUser on April 08, 2026, 09:14:48 AM
Quote from: CNK on April 07, 2026, 08:08:17 PM
I'll just make the point that when most sites do that (or start using any other service that requires Javascript to try and verify humanity) I stop visiting.
It's not like server administrators have great options in this situation.

I have my own http server which I use just for myself, family, and few friends. The server was getting hammered with tens of thousands of visits an hour, every hour, every day. I don't have premium hardware, so sometimes I couldn't access my own http server >:(

The options I considered were 1) shut off the server, 2) use a Javascript gatekeeper (e.g., Anubis), and 3) put the server on a nonstandard port. I actually tried all three options for a while. Doing without the server was too painful. Anubis worked well but added too much complexity to my otherwise barebones setup. Using a nonstandard port turned out to be the right balance for me.

In my case I was able to identify a common argument in the request URL strings in all the requests coming from the botnet that was making millions of requests per day to my site. By adding a rule in the Apache configuration I blocked requests matching the bot's requests, and since that prevented loading the PHP module for them the server was then able to handle all the requests the botnet could sent without running out of RAM anymore. I still needed to significantly increase overall connection limit settings in Apache and the Linux kernel itself, but then it was able to absorb the attack which continued for a week or two before finally giving up.

That was with a $1/month VPS, but I was lucky it was a crazy bot using a pointless argument in requested URLs (I guess it was running some idiotic AI-generated code), so I could block it without affecting human (or even sensible crawler) visitors at all. I've read accounts of other people identifying similar ways of blocking bots with web server rules to filter request URLs. Others have blocked impossible or unlikely User-Agents (really old browsers without sufficiently modern HTTPS support to really connect), since some botnets seem to use a pool of random browser User-Agents which isn't up to date. I could have blocked South American and Asian IP addresses since all the hundreds of thousands of IPs the botnet used seemed to be from there, but I didn't want to. Maybe that would be another option for your personal site though. Others block IPs based on the owners of IP blocks (eg. cloud/VPS hosting companies).

Lots of answers, but I agree no single one is perfect for every situation.
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: gadget42 on April 21, 2026, 12:48:36 AM: another bot saga

https://words.filippo.io/dependabot/
Title: Re: question for admin/mods: wondering reason for increased forum website traffic?
Post by: CentralWare on June 07, 2026, 10:08:08 AM: Good morning, everyone! Sorry I haven't checked in (in quite a bit) but life's other obstacles sometimes get in the way! ???

Paul_123: Agents...
Until they figure out we're onto them... use their agent tags as a death-trap:
Code: [Select]
156472 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36 43616 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/145.0.0.0 Safari/537.36The two top scrapers claim to be APPLE + CHROME + SAFARI
Do some digging and see if there's a REAL browser out there claiming to be safari AND google, IF NOT, there's the first security trap at our front door.

For a REAL macOS: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15
For a REAL iPhone: Mozilla/5.0 (iPhone; CPU iPhone OS 17_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Mobile/15E148 Safari/604.1
Note: no mention of CHROME anywhere... that's likely a tactic to "please most any website/server" by user-agent.
I haven't looked, but if there's an actual G00gle browser FOR APPLE for some reason, then user-agent traps may not be the ticket.

That said, you can instead use cookie batter as bait...

1. Plant a session cookie that expires in, oh, say 5 seconds - most INTERACTIVE websites use session cookies for even simple tasks like logins

2a. If bot follows suit, wait for around 5 hits within that time frame then self-jail that IP for however long sounds fair - humans can't "read" a web page at 1 page per second - the actual time and logic will have to be tweaked based on TLC.net's server response time to make it truly worthwhile

2b. If bot finds a way AROUND session cookies, do a bounce-test (on landing, if $_SESSION['self_test'] is empty, goto ./test.php, if test.php detects $_SESSION['self_test'] is STILL empty, that's a red flag for bots and humans alike. It's not something people normally can "turn off" in settings or preferences in major browsers.) Note: "self_test" has to be randomized to prevent bots from "learning" that if they want in, they need to tamper with "self_test" in order to come in without issue. Session cookies are stored SERVER SIDE so that's rare to happen.

3. Kill a bot's connection for even 15 seconds once this flag's been tripped and you're likely to force it to turn away OR throttle itself. For humans... pretending their F5 key got stuck... a 15 second ban isn't the end of the world :)

4. Next comes the three-strike-rule... trip the above hits-per-second three times in a row within so many minutes and all hits thereafter get redirected (header 301/302) to themselves (127.0.0.1) which "should" in theory actually slow the bot down overall as all of these thousands of sockets hitting us are being redirected... and now waiting for "localhost" to answer on port 12345. In theory. :)

It's funny, but it's "AI" that brought me here! (Automated Idiocracy)
I was running a scenario through one of M$ LLMs asking what the challenges would be to install vLLM/ollama/etc. onto TinyCore (it laughed, basically telling me it'll be a painful experience) and I remembered a TLC member asking about a year or so ago why we don't have an AI doing our extension builds (as maintainers) - which is somewhat what I'm finagling... which led me to here.

So I did a little more digging and the LLM actually knows quite a bit about the ins and outs of the OS and the content of the wiki and forum, so yes, there's SOME good that's come from it, but tactfulness and respect of the crawlers is virtually non-existent, so we may have to teach it a few graces. Weather it likes it or not.

NOTE: Google crawler isn't overly socket-friendly either, so what keeps the beasts away may also keep the spiders away.