Skip to content

Investigation on residential IP botnets and fingerprinting #1313

@neil-lcv-cs

Description

@neil-lcv-cs

Is your feature request related to a problem? Please describe.

Hello,

I’m Neil and working for the French nonprofit organization La Contre-Voie. We host a Gitea instance on git.lacontrevoie.fr, which is (partly) protected by Anubis.

I would like to open a meta-issue about the recent traffic waves we got through our service, and how we can fight them.

Environment information

We use Docker on Debian Stable, Caddy as a reverse-proxy with the forward_auth directive only asking Anubis when browsing files, pull requests, issues or related routes ; repository main pages and account pages are not restricted.
Docker, Caddy and Anubis are running their latest versions each.

Here is our Anubis configuration file. Quite aggressive, though it works quite well for us:

bots:
  - import: (data)/bots/_deny-pathological.yaml
  - import: (data)/bots/aggressive-brazilian-scrapers.yaml
  - import: (data)/meta/ai-block-aggressive.yaml
  - import: (data)/crawlers/_allow-good.yaml
  - import: (data)/clients/x-firefox-ai.yaml
  - import: (data)/common/keep-internet-working.yaml

  - name: generic-browser
    user_agent_regex: >-
      Mozilla|Opera
    action: WEIGH
    weight:
      adjust: 10

dnsbl: false

openGraph:
  enabled: false
  considerHost: false
  ttl: 24h

status_codes:
  CHALLENGE: 200
  DENY: 403

store:
  backend: memory
  parameters: {}

thresholds:
  - name: minimal-suspicion
    expression: weight <= 0
    action: CHALLENGE
    challenge:
      algorithm: fast
      difficulty: 4
      report_as: 4
  - name: mild-suspicion
    expression:
      all:
        - weight > 0
        - weight < 10
    action: CHALLENGE
    challenge:
      algorithm: fast
      difficulty: 5
      report_as: 5
  - name: moderate-suspicion
    expression:
      all:
        - weight >= 10
        - weight < 20
    action: CHALLENGE
    challenge:
      algorithm: fast
      difficulty: 5
      report_as: 5
  - name: mild-proof-of-work
    expression:
      all:
        - weight >= 20
        - weight < 30
    action: CHALLENGE
    challenge:
      algorithm: fast
      difficulty: 6
      report_as: 6
  - name: extreme-suspicion
    expression: weight >= 30
    action: CHALLENGE
    challenge:
      algorithm: fast
      difficulty: 8
      report_as: 8

What happened?

Yesterday we received a huge traffic wave that lasted 17 hours, from 14:00 to 06:00. We managed to mitigate the wave around 02:00.

On the second graph below, the yellow curve is Caddy, and the green one is Gitea.

Image

In a 17 hours of time span, our server received 8 millions of queries originating from 680 000 different IPs, which represents 277 GB of Internet traffic. The log file size was 2.1 GB.

During those 17 hours, none of those IPs has individually exceeded 800 queries.

We’ve tried to identify a couple of culprit AS numbers, though there is a trendemous amount of involved ASes and none of them seems to really stand out among the others.

No Tencent, no Alibaba, no Huawei, no Chinanet (not in the « top ASes », at least).

Image

Furthermore, most of them seems to be originating from residential IPs. They are able to solve level 6 challenges. To finally slow down the attack at 02:00, we got to further increase the challenge difficulty to level 7 or 8 for everyone, which means we got to inflict the Anubis wall for everyone during ~30 seconds to 1 minute, or even more. (I don’t mind the cute catgirl, though the wait is painful for end users.)

Regular situation

Usually, Anubis just blocks regular bots quite well. Here are the stats from last week, a pretty normal week for our Gitea ; the log file spans over 7 days and weighs 28MB, for almost 1GB of traffic.

Image

We get a lot of Huawei, China Mobile and similar networks, though the traffic is mostly blocked (or challenged) by Anubis and we do not get impacted that much.

Solution you would like.

Moving forward

Understanding how this kind of attack works and blocking it is quite a pain, because we hardly cannot outweigh the soul of residential IPs, or even AS numbers ; even though those Brazilian Telefonica AS are quite common, they’re not the only ones and we can’t just outright block a whole country.

I think the attack may originate from some kind of botnet which uses a popular desktop or mobile application, and sells or gives the right to some organizations to run queries on millions of devices under residential IPs. Though this is unfortunately purely hypothetical; we don’t have enough metrics to know for sure.

Addendum : I got linked to a blog post explaining how some apps SDK monetizes residential IP connections by allowing other companies to send queries. It matches our situation quite well.

Thus, I think it would be necessary for Anubis to integrate, under an opt-in feature, further fingerprinting and web client analysis to get a better picture of our threat model. Those fingerprinting mechanisms may be used then to weight the souls of those IPs once we know them better.

We can get inspiration from the EFF’s fingerprinting tool to identify which kind of fingerprinting can be used:

  • Browser plugins;
  • Installed fonts
  • Screen size
  • Canvas fingerprint hash
  • WebGL fingerprint hash
  • Platform, CPU model, GPU model, CPU class and threads
  • Device RAM
  • Touch support
  • Audiocontext fingerprint

It seems that we can get all this information from Javascript APIs in some way.

I know those fingerprinting techniques are privacy-invasive by design, but let’s be clear: this data will be used solely for security purposes, not for advertising / data reselling / user behavior profiling.

Describe alternatives you have considered.

Blocking queries after a certain number of queries per IP doesn’t work anymore, because their IP pool is too big, making fail2ban-like software ineffective.

Blocking AS numbers from our reverse-proxy doesn’t work anymore, because the IPs are coming from too many ASes, including residential IPs. Ratelimiting per IP or AS number doesn’t work either for this reason.

Challenging suspicious IPs with Anubis still barely works but they are now able to solve hard challenges, and it severely degrades the user experience.

Thank you for your time, please share your feedback!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions