WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: question for admin/mods: wondering reason for increased forum website traffic?  (Read 2285 times)

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1832
Hits from today by useragent   only the top 20
Hi Paul_123. How awful.

I'll just make the point that when most sites do that (or start using any other service that requires Javascript to try and verify humanity) I stop visiting.
It's not like server administrators have great options in this situation.

I have my own http server which I use just for myself, family, and few friends. The server was getting hammered with tens of thousands of visits an hour, every hour, every day. I don't have premium hardware, so sometimes I couldn't access my own http server >:(

The options I considered were 1) shut off the server, 2) use a Javascript gatekeeper (e.g., Anubis), and 3) put the server on a nonstandard port. I actually tried all three options for a while. Doing without the server was too painful. Anubis worked well but added too much complexity to my otherwise barebones setup. Using a nonstandard port turned out to be the right balance for me.

Using a nonstandard port does not eliminate the problem (some bots are more sophisticated and do port scanning), but it eliminates >50% of bot traffic, bringing the noise down to a tolerable level. Would using a nonstandard port be worth trying for the TCL forum? The problem is that this would prevent a lot of legitimate, human users from finding the forum.
« Last Edit: April 08, 2026, 09:24:03 AM by GNUser »

Offline Paul_123

  • Administrator
  • Hero Member
  • *****
  • Posts: 1537
Based on the port scanning going on, I doubt it.  Might slow them down for a couple of days.   And would just frustrate users.

Yesterday's hit was the first botnet that I know of.  Otherwise the bots have been fairly respectful.

Offline mocore

  • Hero Member
  • *****
  • Posts: 745
  • ~.~
  Otherwise the bots have been fairly respectful.

perhaps a-bit in parallel to this topic  ,
i post as i just happened to read the above then the quote below quote from  https://lists.gnu.org/archive/html/help-guix/2026-04/msg00047.html

which seam to be vastly differing perspectives


Quote from: help-guix/2026-04/msg00047
GPTBot alone did 109,552 accesses to my website in march, so I think
they are telling the truth in a very misleading way.

The websites that go into these stats have together about 2000 HTML
documents (https://www.1w6.org has 811, https://www.draketo.de/node has
827 and https://www.draketo.de/ has 296).

99% of these change less than once per year.

If GPTbot crawls them every day, that’s 2000x30 = 60.000 accesses per
month -- which is pretty close to the 109,552 accesses I see.

But I built these websites over 20 years. The oldest articles are from
2007.

A human goes there, reads 1-20 articles and leaves again. Maybe to
return later when there’s a new article (I have RSS feeds).

An LLM goes there and crawls everything. Every day.

There even was a week where GPT tried every possible combination of
search inputs on 1w6.org -- including repeated arguments, likely until
it hit the URL length limit of the server. My log analysis tool needed
days to complete the analysis after that week. And I give thanks to my
hoster that they didn’t boot me then (and that I don’t have to pay for
excess bandwidth).



Offline Paul_123

  • Administrator
  • Hero Member
  • *****
  • Posts: 1537
The different perspective is that I expect some level of scraping.   Its just the time we live.   I specifically use a host that allows for unlimited bandwidth.   Anything I can do to limit it will be obtrusive to the real users.



Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12687
Hi Paul_123
... I specifically use a host that allows for unlimited bandwidth.   Anything I can do to limit it will be obtrusive to the real users.
If you are talking about downloading extensions from the repo, then
yes, I would agree with that statement.

But this is a simple forum that's not littered with adds and videos.
Even attachments are limited to 200K in size (and total). How much
bandwidth is really needed for reading the forum.

I lowered the download speed on one of my machines to 1Mbit/sec
and had no trouble navigating the website.

If it's possible to set a speed limit that's comfortable for human
consumption, but less comfortable for bots scraping web pages, it
might be worth considering.

Just a thought.

Offline Paul_123

  • Administrator
  • Hero Member
  • *****
  • Posts: 1537
External bandwidth is never an issue and never what throttles the site.  Its the php processing and database processes that jam up the CPU.  Things are already rate limited for IP addresses and sessions.   But a botnet avoids all of these limits.

Offline CNK

  • Wiki Author
  • Sr. Member
  • *****
  • Posts: 430
I'll just make the point that when most sites do that (or start using any other service that requires Javascript to try and verify humanity) I stop visiting.
It's not like server administrators have great options in this situation.

I have my own http server which I use just for myself, family, and few friends. The server was getting hammered with tens of thousands of visits an hour, every hour, every day. I don't have premium hardware, so sometimes I couldn't access my own http server >:(

The options I considered were 1) shut off the server, 2) use a Javascript gatekeeper (e.g., Anubis), and 3) put the server on a nonstandard port. I actually tried all three options for a while. Doing without the server was too painful. Anubis worked well but added too much complexity to my otherwise barebones setup. Using a nonstandard port turned out to be the right balance for me.

In my case I was able to identify a common argument in the request URL strings in all the requests coming from the botnet that was making millions of requests per day to my site. By adding a rule in the Apache configuration I blocked requests matching the bot's requests, and since that prevented loading the PHP module for them the server was then able to handle all the requests the botnet could sent without running out of RAM anymore. I still needed to significantly increase overall connection limit settings in Apache and the Linux kernel itself, but then it was able to absorb the attack which continued for a week or two before finally giving up.

That was with a $1/month VPS, but I was lucky it was a crazy bot using a pointless argument in requested URLs (I guess it was running some idiotic AI-generated code), so I could block it without affecting human (or even sensible crawler) visitors at all. I've read accounts of other people identifying similar ways of blocking bots with web server rules to filter request URLs. Others have blocked impossible or unlikely User-Agents (really old browsers without sufficiently modern HTTPS support to really connect), since some botnets seem to use a pool of random browser User-Agents which isn't up to date. I could have blocked South American and Asian IP addresses since all the hundreds of thousands of IPs the botnet used seemed to be from there, but I didn't want to. Maybe that would be another option for your personal site though. Others block IPs based on the owners of IP blocks (eg. cloud/VPS hosting companies).

Lots of answers, but I agree no single one is perfect for every situation.