Investigation on residential IP botnets and fingerprinting

### Is your feature request related to a problem? Please describe.

Hello,

I’m Neil and working for the French nonprofit organization [La Contre-Voie](https://lacontrevoie.fr/). We host a Gitea instance on [git.lacontrevoie.fr](https://git.lacontrevoie.fr/), which is (partly) protected by Anubis.

I would like to open a meta-issue about the recent traffic waves we got through our service, and how we can fight them.

### Environment information

We use Docker on Debian Stable, Caddy as a reverse-proxy with the `forward_auth` directive only asking Anubis when browsing files, pull requests, issues or related routes ; repository main pages and account pages are not restricted.
Docker, Caddy and Anubis are running their latest versions each.

Here is our Anubis configuration file. Quite aggressive, though it works quite well for us:
```
bots:
  - import: (data)/bots/_deny-pathological.yaml
  - import: (data)/bots/aggressive-brazilian-scrapers.yaml
  - import: (data)/meta/ai-block-aggressive.yaml
  - import: (data)/crawlers/_allow-good.yaml
  - import: (data)/clients/x-firefox-ai.yaml
  - import: (data)/common/keep-internet-working.yaml

  - name: generic-browser
    user_agent_regex: >-
      Mozilla|Opera
    action: WEIGH
    weight:
      adjust: 10

dnsbl: false

openGraph:
  enabled: false
  considerHost: false
  ttl: 24h

status_codes:
  CHALLENGE: 200
  DENY: 403

store:
  backend: memory
  parameters: {}

thresholds:
  - name: minimal-suspicion
    expression: weight <= 0
    action: CHALLENGE
    challenge:
      algorithm: fast
      difficulty: 4
      report_as: 4
  - name: mild-suspicion
    expression:
      all:
        - weight > 0
        - weight < 10
    action: CHALLENGE
    challenge:
      algorithm: fast
      difficulty: 5
      report_as: 5
  - name: moderate-suspicion
    expression:
      all:
        - weight >= 10
        - weight < 20
    action: CHALLENGE
    challenge:
      algorithm: fast
      difficulty: 5
      report_as: 5
  - name: mild-proof-of-work
    expression:
      all:
        - weight >= 20
        - weight < 30
    action: CHALLENGE
    challenge:
      algorithm: fast
      difficulty: 6
      report_as: 6
  - name: extreme-suspicion
    expression: weight >= 30
    action: CHALLENGE
    challenge:
      algorithm: fast
      difficulty: 8
      report_as: 8
```

### What happened?

Yesterday we received a huge traffic wave that lasted 17 hours, from 14:00 to 06:00. We managed to mitigate the wave around 02:00.

*On the second graph below, the yellow curve is Caddy, and the green one is Gitea.*

<img width="1354" height="580" alt="Image" src="https://github.com/user-attachments/assets/200712ac-8de5-4811-8fd3-4b489b87b597" />

In a 17 hours of time span, our server received 8 millions of queries originating from 680 000 different IPs, which represents 277 GB of Internet traffic. The log file size was 2.1 GB.

During those 17 hours, none of those IPs has individually exceeded 800 queries.

We’ve tried to identify a couple of culprit AS numbers, though there is a trendemous amount of involved ASes and none of them seems to really stand out among the others.

No Tencent, no Alibaba, no Huawei, no Chinanet (not in the « top ASes », at least).

<img width="1003" height="654" alt="Image" src="https://github.com/user-attachments/assets/294f05a3-7645-4b15-bb7f-db68b2ad924e" />

Furthermore, most of them seems to be originating from **residential IPs**. They are able to solve **level 6 challenges**. To finally slow down the attack at 02:00, we got to further increase the challenge difficulty to level 7 or 8 for everyone, which means we got to inflict the Anubis wall for everyone during ~30 seconds to 1 minute, or even more. (I don’t mind the cute catgirl, though the wait is painful for end users.)

### Regular situation

Usually, Anubis just blocks regular bots quite well. Here are the stats from last week, a pretty normal week for our Gitea ; the log file spans over 7 days and weighs 28MB, for almost 1GB of traffic.

<img width="995" height="638" alt="Image" src="https://github.com/user-attachments/assets/df6e211b-69f6-4480-ab77-169c41b25f25" />

We get a lot of Huawei, China Mobile and similar networks, though the traffic is mostly blocked (or challenged) by Anubis and we do not get impacted that much.

### Solution you would like.

# Moving forward

Understanding how this kind of attack works and blocking it is quite a pain, because we hardly cannot outweigh the soul of residential IPs, or even AS numbers ; even though those Brazilian Telefonica AS are quite common, they’re not the only ones and we can’t just outright block a whole country.

I think the attack may originate from some kind of botnet which uses a popular desktop or mobile application, and sells or gives the right to some organizations to run queries on millions of devices under residential IPs. Though this is unfortunately purely hypothetical; we don’t have enough metrics to know for sure.

**Addendum :** I got linked to a [blog post](https://jan.wildeboer.net/2025/04/Web-is-Broken-Botnet-Part-2/) explaining how some apps SDK monetizes residential IP connections by allowing other companies to send queries. It matches our situation quite well.

Thus, I think it would be necessary for Anubis to integrate, under an **opt-in** feature, further **fingerprinting** and web client analysis to get a better picture of our threat model. Those fingerprinting mechanisms may be used then to weight the souls of those IPs once we know them better.

We can get inspiration from the [EFF’s fingerprinting tool](https://coveryourtracks.eff.org/) to identify which kind of fingerprinting can be used:
- Browser plugins;
- Installed fonts
- Screen size
- Canvas fingerprint hash
- WebGL fingerprint hash
- Platform, CPU model, GPU model, CPU class and threads
- Device RAM
- Touch support
- Audiocontext fingerprint
- …

It seems that we can get all this information from Javascript APIs in some way.

I know those fingerprinting techniques are privacy-invasive by design, but let’s be clear: this data will be used solely for security purposes, not for advertising / data reselling / user behavior profiling. 

### Describe alternatives you have considered.

Blocking queries after a certain number of queries per IP doesn’t work anymore, because their IP pool is too big, making fail2ban-like software ineffective.

Blocking AS numbers from our reverse-proxy doesn’t work anymore, because the IPs are coming from too many ASes, including residential IPs. Ratelimiting per IP or AS number doesn’t work either for this reason.

Challenging suspicious IPs with Anubis still barely works but they are now able to solve hard challenges, and it severely degrades the user experience.

Thank you for your time, please share your feedback!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Investigation on residential IP botnets and fingerprinting #1313

Is your feature request related to a problem? Please describe.

Environment information

What happened?

Regular situation

Solution you would like.

Moving forward

Describe alternatives you have considered.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Investigation on residential IP botnets and fingerprinting #1313

Description

Is your feature request related to a problem? Please describe.

Environment information

What happened?

Regular situation

Solution you would like.

Moving forward

Describe alternatives you have considered.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions