Skip to content

Added rules regarding metaprogramming detection in yara #7425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 10, 2020

Conversation

ewjoachim
Copy link
Contributor

@ewjoachim ewjoachim commented Feb 23, 2020

Without those additions, accessing eval/exec without mentionning neither "eval" nor "exec" not breaking any existing rule is as simple as:

vars(__builtin__)[dir(__builtin__)[102]]("""malicious_code()""")
# or
import builtins
builtins.__dict__["lave"[::-1]]("""malicious_code()""")

Note: even with these rules, we can still bypass the rules in a number of ways (let me know if it's acceptable to start listing them publicly :) )
(Edit: the more I think about it, the more I find ways :D )

@xmunoz
Copy link
Contributor

xmunoz commented Apr 5, 2020

This seems reasonable. @ewjoachim, can you add a test or two for these new rules? Look here for examples of how to do that:
https://github.com/pypa/warehouse/blob/master/tests/unit/malware/checks/setup_patterns/test_check.py

@ewjoachim
Copy link
Contributor Author

Seeing what's done, I'm writing a very simple detection test for all the rules, so that we can check all of the regexes there. This, once again, shows how easy it is to circumvent the whole thing. I think it should really be checked with AST rather than regexes. Do you think a rewrite with astpath (or even bellybutton) has a chance of being merged (in another PR of course) ?

Without those additions, accessing eval/exec without mentionning neither "eval" not "exec" not breaking any existing rule is as simple as:
```python
vars(__builtin__)[dir(__builtin__)[102]]("""malicious_code()""")
import builtins
builtins.__dict__["lave"[::-1]]("""malicious_code()""")
```
@ewjoachim
Copy link
Contributor Author

Here you are, tests for every rule :)

@xmunoz
Copy link
Contributor

xmunoz commented Apr 5, 2020

I agree that AST is the way forward for this stuff. Without some more advanced parsing, we end up with issues like this: #7475

Another security researcher has some some work on python AST parsing for malware detection and described it here:
https://medium.com/@bertusk/detecting-cyber-attacks-in-the-python-package-index-pypi-61ab2b585c67

Feel free to open an issue and work on that. I'd be happy to provide a review.

Copy link
Contributor

@xmunoz xmunoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look good, thanks for adding them :)

Make sure to get everything passing and then we should be good for a merge.

@xmunoz
Copy link
Contributor

xmunoz commented Apr 5, 2020

@woodruffw do you have anything to add here?

@woodruffw
Copy link
Member

Thanks for the ping!

do you have anything to add here?

I agree completely that an AST-based matcher is the way to go -- we should treat the current YARA check as a proof of concept that's easy to circumvent.

Another (more complex) option here is a mocked Python interpreter with models for potentially malicious functions. Running setup.py under this would allow us to record the calls without needing to worry about AST changes or common obfuscations like string chunking.

@ewjoachim
Copy link
Contributor Author

Another (more complex) option here is a mocked Python interpreter with models for potentially malicious functions. Running setup.py under this would allow us to record the calls without needing to worry about AST changes or common obfuscations like string chunking.

Hm, yes. However creating we're getting with AST, I agree with you that we'll never be smart enough to detect code that does nasty stuff without executing it.

Your link @xmunoz was very interesting, but the researcher decided to go through static analysis, and, well, is it worth doing something that will make false-positives, and probably false-negatives too, quite easily ?

Just re-reading the current rules, I realized it still didn't take care of:

import inspect
inspect.getmodule([].__class__).__getattribute__("lave"[::-1])("print('malicious')")

(which took me only a few minutes to write). This really is a pandora box.

Sadly, while I'd love to learn more, I currently have no experience whatsoever regarding sandboxing Python :( Time to learn ! And, as you suggest, continue the discussion in a dedicated ticket :)

@ewjoachim
Copy link
Contributor Author

ewjoachim commented Apr 5, 2020

BTW, do you want me to add __getattr__, __getattribute__ and import inspect to the current PR, or are we declaring it a lost cause for now ?

@ewjoachim
Copy link
Contributor Author

Ow gosh, I've just discovered the documentation, and it looks like I didn't increment the version number. Doing this now ! https://warehouse.pypa.io/development/malware-checks/

@xmunoz
Copy link
Contributor

xmunoz commented Apr 8, 2020

BTW, do you want me to add __getattr__, __getattribute__ and import inspect to the current PR, or are we declaring it a lost cause for now ?

If you want, but I'm not going to push for it. If you're happy with the PR, I'll re-assign the reviewer to someone that can merge it.

@ewjoachim
Copy link
Contributor Author

I think I'm happy with it. It will always be time to add stuff later.

@xmunoz xmunoz requested a review from ewdurbin April 9, 2020 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants