Skip to content

Implement a more robust malware detector #7748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ewjoachim opened this issue Apr 5, 2020 · 7 comments
Closed

Implement a more robust malware detector #7748

ewjoachim opened this issue Apr 5, 2020 · 7 comments
Labels
malware-detection Issues related to automated malware detection.

Comments

@ewjoachim
Copy link
Contributor

ewjoachim commented Apr 5, 2020

Hello there. I'm probably going to say a bunch of obvious things, sorry in advance :/

Current YARA-based malware detector can be circumvented easily:

  • It's regex based, and the regexs don't account for all the leeway in writing python (e.g. import builtins will happily not be detected because all spaces have not been marked as repeatable)
  • Even if it was AST-based, I'm afraid it will still be hard to tame this snake. I mean... one would think they've been thourough and then they realize timeit does eval or that platform has a popen method... Did I mention that ().__class__.__bases__[0].__subclasses__()[88] is <class 'zipimport.zipimporter'>? I think it's endless...
    That being said, maybe there IS such a thing as being thourough. I doubt it. Maybe detecting nearly all dunder methods AND unusual standard lib modules and functions AND a few builtins... Maybe a whitelist ? I'm afraid this would make more noise than signal, but maybe we should try.
    (For reference, https://ctf-wiki.github.io/ctf-wiki/pwn/linux/sandbox/python-sandbox-escape/)

So... There is one remaining way to know what a script does: executing it in a sandboxed environment, but this raises questions too:

  • How to sandbox Python? My expertise in there is close to zero, but I seem to recall Pypy (yes, with a y) could do that (and the idea of including Pypy in PyPI is a nice level of meta ;) )
  • Is it only possible to sandbox python in a way that it doesn't know it's sandboxed ? Because if it can figure out it's sandboxed, it can still deactivate the malicious parts, and then it's almost useless...
    (One advantage of this approach would be to be able to extract metadata from sdists though, which I believe is another problem that exists out there)

So many questions... I hope this hasn't already been answered in another issue, I couldn't find anything when I searched.

Ping @xmunoz and @woodruffw to continue the discussion.

@pradyunsg
Copy link
Contributor

There was a fairly public effort, pysandbox, to create a "python sandbox" that was discontinued since it's really really [redacted] difficult to sandbox Python in-process.

More details are in this LWN article: https://lwn.net/Articles/574215/

@xmunoz xmunoz added the malware-detection Issues related to automated malware detection. label Apr 6, 2020
@ewjoachim
Copy link
Contributor Author

Thanks a lot ! This goes in the direction we were heading I guess, leaving at least a few options that were suggested:

  • Pypy (but I’m afraid the execution context would be so different that it would make it trivially easy to detect the sandbox)
  • solutions around seccomp and namespaces are hinted, which I believe could hint toward Docker. A bit of googling says I may have to read more about SELinux, SMACK, AppArmor, Tomoyo, and this feels like a rabbit hole :)

I have clearly reached my competency level, and continued a bit beyond, I’d love to learn more but I won’t be able to suggest a lot, and at this point, anything I might add will likely be a laughable proof of the dunning-kruger effect...

@ewjoachim ewjoachim changed the title Implement more robust malware detector Implement a more robust malware detector Apr 6, 2020
@woodruffw
Copy link
Member

PEP 578 + the new audit API in Python 3.8 would probably work well for this purpose. We'd still need some amount of sandboxing, though.

@ewjoachim
Copy link
Contributor Author

I'm now more convinced that the way to go would rather be to provide a way for 3rd parties to be warned anytime a release is uploaded, and an API to report status on those ("found safe" / "malware" / ...) (along the line of the API we already have for CVEs etc). Implementing malware detection within the warehouse codebase and/or on the PyPI server itself is not the solution.

@abitrolly
Copy link
Contributor

a way for 3rd parties to be warned anytime a release is uploaded

Looks like the only API for that are XML based RSS feeds https://warehouse.pypa.io/api-reference/feeds.html

@abitrolly
Copy link
Contributor

For an effective event propagation through the network it may be possible to use https://docs.libp2p.io/concepts/publish-subscribe/ so that every interested party could run their node that will receive notifications automatically without polling PyPI endpoints.

As an extension to that, nodes can sign the events/package hashes with the result of validation checks and submit them to the network the same way.

That would require quite a bit of prototyping, so I propose to participate in Gitcoin grants to attract more people who can help psf/fundable-packaging-improvements#40

@dstufft
Copy link
Member

dstufft commented May 23, 2023

This is gone now #13647

@dstufft dstufft closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
malware-detection Issues related to automated malware detection.
Projects
None yet
Development

No branches or pull requests

6 participants