Implement a more robust malware detector #7748

ewjoachim · 2020-04-05T23:48:45Z

Hello there. I'm probably going to say a bunch of obvious things, sorry in advance :/

Current YARA-based malware detector can be circumvented easily:

It's regex based, and the regexs don't account for all the leeway in writing python (e.g. import builtins will happily not be detected because all spaces have not been marked as repeatable)
Even if it was AST-based, I'm afraid it will still be hard to tame this snake. I mean... one would think they've been thourough and then they realize timeit does eval or that platform has a popen method... Did I mention that ().__class__.__bases__[0].__subclasses__()[88] is <class 'zipimport.zipimporter'>? I think it's endless...
That being said, maybe there IS such a thing as being thourough. I doubt it. Maybe detecting nearly all dunder methods AND unusual standard lib modules and functions AND a few builtins... Maybe a whitelist ? I'm afraid this would make more noise than signal, but maybe we should try.
(For reference, https://ctf-wiki.github.io/ctf-wiki/pwn/linux/sandbox/python-sandbox-escape/)

So... There is one remaining way to know what a script does: executing it in a sandboxed environment, but this raises questions too:

How to sandbox Python? My expertise in there is close to zero, but I seem to recall Pypy (yes, with a y) could do that (and the idea of including Pypy in PyPI is a nice level of meta ;) )
Is it only possible to sandbox python in a way that it doesn't know it's sandboxed ? Because if it can figure out it's sandboxed, it can still deactivate the malicious parts, and then it's almost useless...
(One advantage of this approach would be to be able to extract metadata from sdists though, which I believe is another problem that exists out there)

So many questions... I hope this hasn't already been answered in another issue, I couldn't find anything when I searched.

Ping @xmunoz and @woodruffw to continue the discussion.

The text was updated successfully, but these errors were encountered:

pradyunsg · 2020-04-06T00:38:44Z

There was a fairly public effort, pysandbox, to create a "python sandbox" that was discontinued since it's really really [redacted] difficult to sandbox Python in-process.

More details are in this LWN article: https://lwn.net/Articles/574215/

ewjoachim · 2020-04-06T07:07:09Z

Thanks a lot ! This goes in the direction we were heading I guess, leaving at least a few options that were suggested:

Pypy (but I’m afraid the execution context would be so different that it would make it trivially easy to detect the sandbox)
solutions around seccomp and namespaces are hinted, which I believe could hint toward Docker. A bit of googling says I may have to read more about SELinux, SMACK, AppArmor, Tomoyo, and this feels like a rabbit hole :)

I have clearly reached my competency level, and continued a bit beyond, I’d love to learn more but I won’t be able to suggest a lot, and at this point, anything I might add will likely be a laughable proof of the dunning-kruger effect...

woodruffw · 2020-05-28T16:40:29Z

PEP 578 + the new audit API in Python 3.8 would probably work well for this purpose. We'd still need some amount of sandboxing, though.

ewjoachim · 2021-09-03T21:40:17Z

I'm now more convinced that the way to go would rather be to provide a way for 3rd parties to be warned anytime a release is uploaded, and an API to report status on those ("found safe" / "malware" / ...) (along the line of the API we already have for CVEs etc). Implementing malware detection within the warehouse codebase and/or on the PyPI server itself is not the solution.

abitrolly · 2021-09-07T18:10:47Z

a way for 3rd parties to be warned anytime a release is uploaded

Looks like the only API for that are XML based RSS feeds https://warehouse.pypa.io/api-reference/feeds.html

abitrolly · 2021-09-07T19:29:11Z

For an effective event propagation through the network it may be possible to use https://docs.libp2p.io/concepts/publish-subscribe/ so that every interested party could run their node that will receive notifications automatically without polling PyPI endpoints.

As an extension to that, nodes can sign the events/package hashes with the result of validation checks and submit them to the network the same way.

That would require quite a bit of prototyping, so I propose to participate in Gitcoin grants to attract more people who can help psf/fundable-packaging-improvements#40

dstufft · 2023-05-23T13:46:28Z

This is gone now #13647

xmunoz added the malware-detection Issues related to automated malware detection. label Apr 6, 2020

ewjoachim changed the title ~~Implement more robust malware detector~~ Implement a more robust malware detector Apr 6, 2020

xmunoz mentioned this issue Apr 26, 2021

Productionize Malware Detection psf/fundable-packaging-improvements#38

Open

jspeed-meyers mentioned this issue May 13, 2021

Reduce Typosquatting Harm via Social Distancing for Top PyPI Packages #9527

Open

dstufft closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement a more robust malware detector #7748

Implement a more robust malware detector #7748

ewjoachim commented Apr 5, 2020 •

edited

Loading

pradyunsg commented Apr 6, 2020

Uh oh!

ewjoachim commented Apr 6, 2020

Uh oh!

woodruffw commented May 28, 2020

Uh oh!

ewjoachim commented Sep 3, 2021

Uh oh!

abitrolly commented Sep 7, 2021

Uh oh!

abitrolly commented Sep 7, 2021

Uh oh!

dstufft commented May 23, 2023

Uh oh!

Implement a more robust malware detector #7748

Implement a more robust malware detector #7748

Comments

ewjoachim commented Apr 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

pradyunsg commented Apr 6, 2020

Uh oh!

ewjoachim commented Apr 6, 2020

Uh oh!

woodruffw commented May 28, 2020

Uh oh!

ewjoachim commented Sep 3, 2021

Uh oh!

abitrolly commented Sep 7, 2021

Uh oh!

abitrolly commented Sep 7, 2021

Uh oh!

dstufft commented May 23, 2023

Uh oh!

ewjoachim commented Apr 5, 2020 •

edited

Loading