Skip to content

Event-based Malware check #7249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Jan 27, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
0b07fb6
requirements: Introduce yara
woodruffw Jan 16, 2020
a768b7e
[WIP] malware/check: SetupPatternCheck
woodruffw Jan 16, 2020
70ad771
malware/checks: Give MalwareCheckBase.run/scan args, kwargs
woodruffw Jan 16, 2020
3f14d38
malware: Add check preparation
woodruffw Jan 16, 2020
d80894c
malware/checks: Unpack file path correctly
woodruffw Jan 17, 2020
1728476
docker-compose: Override FILES_BACKEND for worker
woodruffw Jan 17, 2020
d8cc3d4
[WIP] malware/checks: setup.py extraction
woodruffw Jan 17, 2020
e1bb29c
malware/checks: setup_patterns: Fix enum, seek
woodruffw Jan 17, 2020
cc785b8
malware/checks: setup_patterns: Apply YARA rules
woodruffw Jan 21, 2020
c570cf5
malware/checks: setup_patterns: Prefer get over filter
woodruffw Jan 21, 2020
88a360c
warehouse/{admin,malware}: Consistent enum names
woodruffw Jan 21, 2020
3af7445
warehouse/{admin,malware}: More enum changes
woodruffw Jan 21, 2020
5040928
tests: Update admin, malware tests
woodruffw Jan 21, 2020
be9a146
tests: Fix enum, more test fixes
woodruffw Jan 21, 2020
d1f79f3
tests: Add prepare tests
woodruffw Jan 21, 2020
0f2693c
malware/changes: base: Unpack id correctly
woodruffw Jan 21, 2020
d6845c2
tests: Begin adding SetupPatternCheck tests
woodruffw Jan 21, 2020
fa70ef2
malware/checks: setup_patterns: Fix enum
woodruffw Jan 21, 2020
b41ed3f
tests: More SetupPatternCheck tests
woodruffw Jan 21, 2020
496ea7f
warehouse/malware: setup_patterns: Fix enums
woodruffw Jan 21, 2020
7ce0224
tests: More SetupPatternCheck tests
woodruffw Jan 21, 2020
abbab63
tests: Add license header
woodruffw Jan 21, 2020
ae6a4d8
malware/checks: setup_patterns: Add TODO
woodruffw Jan 21, 2020
e5a275b
tests: More SetupPatternCheck tests
woodruffw Jan 21, 2020
78fbd7f
tests: More SetupPatternCheck tests
woodruffw Jan 21, 2020
dac88cd
tests: Complete extraction tests for SetupPatternCheck
woodruffw Jan 22, 2020
b80bcd2
tests: Fix test
woodruffw Jan 22, 2020
59a4b3f
malware/checks: Add docstring for prepare
woodruffw Jan 22, 2020
7fe7304
malware/checks: blacken
woodruffw Jan 22, 2020
62f74db
malware/checks: Document, expand YARA rules
woodruffw Jan 22, 2020
25bf555
tests, warehouse: Restructure utilities
woodruffw Jan 23, 2020
29a9121
malware: Order some enums, reduce SetupPatternCheck verdicts
woodruffw Jan 23, 2020
4e07c4e
malware/models: Add missing __lt__
woodruffw Jan 23, 2020
706cfba
malware/checks: Always embed the model object in the prepared arguments
woodruffw Jan 23, 2020
8f00d35
malware/checks: Avoid raw bytes
woodruffw Jan 23, 2020
7604e0e
malware/changes: Remove unused import
woodruffw Jan 24, 2020
bb5a372
tests: Fixup malware tests
woodruffw Jan 24, 2020
f883483
warehouse/malware: blacken
woodruffw Jan 24, 2020
7471e74
tests: Fill in malware coverage
woodruffw Jan 24, 2020
61e03d7
tests, warehouse: Add a benign verdict for SetupPatternCheck
woodruffw Jan 24, 2020
00afc84
tests: blacken
woodruffw Jan 24, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ services:
env_file: dev/environment
environment:
C_FORCE_ROOT: "1"
FILES_BACKEND: "warehouse.packaging.services.LocalFileStorage path=/var/opt/warehouse/packages/ url=http://files:9001/packages/{path}"
links:
- db
- redis
Expand Down
1 change: 1 addition & 0 deletions requirements/main.in
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,6 @@ typeguard
webauthn
whitenoise
WTForms>=2.0.0
yara-python
zope.sqlalchemy
zxcvbn
14 changes: 14 additions & 0 deletions requirements/main.txt
Original file line number Diff line number Diff line change
Expand Up @@ -593,6 +593,20 @@ wired==0.2.1 \
wtforms==2.2.1 \
--hash=sha256:0cdbac3e7f6878086c334aa25dc5a33869a3954e9d1e015130d65a69309b3b61 \
--hash=sha256:e3ee092c827582c50877cdbd49e9ce6d2c5c1f6561f849b3b068c1b8029626f1
yara-python==3.11.0 \
--hash=sha256:105d851e050b32951ee577148c7f1b18c0a7c64432fef8159069191d522fba86 \
--hash=sha256:1d35c7f606465015de02143dfa4e1ad2f4ee85fdb5d5af756b51b2bac62ac7bc \
--hash=sha256:24cd492d6bf8ecedb128f5b02886770be9df03bd1b84ab06a978d45bb1a8ff92 \
--hash=sha256:58cfc837e7769811afbfb19b1db952ec01e50cdbf9df576fb587e1e343694526 \
--hash=sha256:5b8d708751a66d1507d819218d06baccdf5527c147c2bd3062f087e2f367a17d \
--hash=sha256:6f90bb264470235549e1bb4e355fa82895409cd46f27aceecaddfbf55e66ed71 \
--hash=sha256:70d39c2238c5854e7cd8f11595317dc4d89417e88035d8acca24bcc58a93150f \
--hash=sha256:8d255349d69d833bca604b4215bdf499c87357172512273feb934f6442b8e6b2 \
--hash=sha256:8e44f9600607cb1d74a0f26df5d0a1c06ea54f4601206124f47f1bbb58e6a374 \
--hash=sha256:9e4fafc327e3a343c545dcf5f173fa8bc712aebffe5f034d205c0bac1f1c5df6 \
--hash=sha256:c919ee656139ed46a0056e8a3de179bbc98d42a2be6fb85c95b1e2ec65396b34 \
--hash=sha256:e4124414d3cff9a10669569a89f585f81c8114b283ab48b2e756e0347a89de0a \
--hash=sha256:f104f0bb21a0867f22e750bb4e05de629ec9f37facc84daf963385a86371b0d9
zipp==1.0.0 \
--hash=sha256:8dda78f06bd1674bd8720df8a50bb47b6e1233c503a4eed8e7810686bde37656 \
--hash=sha256:d38fbe01bbf7a3593a32bc35a9c4453c32bc42b98c377f9bff7e9f8da157786c
Expand Down
8 changes: 6 additions & 2 deletions tests/common/checks/hooked.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,14 @@ class ExampleHookedCheck(MalwareCheckBase):
def __init__(self, db):
super().__init__(db)

def scan(self, file_id=None):
def scan(self, **kwargs):
file_id = kwargs.get("obj_id")
if file_id is None:
return

self.add_verdict(
file_id=file_id,
classification=VerdictClassification.benign,
classification=VerdictClassification.Benign,
confidence=VerdictConfidence.High,
message="Nothing to see here!",
)
12 changes: 6 additions & 6 deletions tests/unit/admin/views/test_checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,11 +72,11 @@ def test_no_check_state(self, db_request):
views.change_check_state(db_request)

@pytest.mark.parametrize(
("final_state"), [MalwareCheckState.disabled, MalwareCheckState.wiped_out]
("final_state"), [MalwareCheckState.Disabled, MalwareCheckState.WipedOut]
)
def test_change_to_valid_state(self, db_request, final_state):
check = MalwareCheckFactory.create(
name="MyCheck", state=MalwareCheckState.disabled
name="MyCheck", state=MalwareCheckState.Disabled
)

db_request.POST = {"check_state": final_state.value}
Expand Down Expand Up @@ -104,7 +104,7 @@ def test_change_to_valid_state(self, db_request, final_state):

assert check.state == final_state

if final_state == MalwareCheckState.wiped_out:
if final_state == MalwareCheckState.WipedOut:
assert wipe_out_recorder.delay.calls == [pretend.call("MyCheck")]

def test_change_to_invalid_state(self, db_request):
Expand Down Expand Up @@ -134,11 +134,11 @@ class TestRunBackfill:
("check_state", "message"),
[
(
MalwareCheckState.disabled,
MalwareCheckState.Disabled,
"Check must be in 'enabled' or 'evaluation' state to run a backfill.",
),
(
MalwareCheckState.wiped_out,
MalwareCheckState.WipedOut,
"Check must be in 'enabled' or 'evaluation' state to run a backfill.",
),
],
Expand All @@ -160,7 +160,7 @@ def test_invalid_backfill_parameters(self, db_request, check_state, message):
assert db_request.session.flash.calls == [pretend.call(message, queue="error")]

def test_sucess(self, db_request):
check = MalwareCheckFactory.create(state=MalwareCheckState.enabled)
check = MalwareCheckFactory.create(state=MalwareCheckState.Enabled)
db_request.matchdict["check_name"] = check.name

db_request.session = pretend.stub(
Expand Down
11 changes: 11 additions & 0 deletions tests/unit/malware/checks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
11 changes: 11 additions & 0 deletions tests/unit/malware/checks/setup_patterns/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
145 changes: 145 additions & 0 deletions tests/unit/malware/checks/setup_patterns/test_check.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import pretend
import pytest
import yara

from warehouse.malware.checks.setup_patterns import check as c
from warehouse.malware.models import (
MalwareCheckState,
VerdictClassification,
VerdictConfidence,
)

from .....common.db.malware import MalwareCheckFactory
from .....common.db.packaging import FileFactory


def test_initializes(db_session):
check_model = MalwareCheckFactory.create(
name="SetupPatternCheck", state=MalwareCheckState.Enabled
)
check = c.SetupPatternCheck(db_session)

assert check.id == check_model.id
assert isinstance(check._yara_rules, yara.Rules)


@pytest.mark.parametrize(
("obj", "file_url"), [(None, pretend.stub()), (pretend.stub(), None)]
)
def test_scan_missing_kwargs(db_session, obj, file_url):
MalwareCheckFactory.create(
name="SetupPatternCheck", state=MalwareCheckState.Enabled
)
check = c.SetupPatternCheck(db_session)
check.scan(obj=obj, file_url=file_url)

assert check._verdicts == []


def test_scan_non_sdist(db_session):
MalwareCheckFactory.create(
name="SetupPatternCheck", state=MalwareCheckState.Enabled
)
check = c.SetupPatternCheck(db_session)

file = FileFactory.create(packagetype="bdist_wheel")

check.scan(obj=file, file_url=pretend.stub())

assert check._verdicts == []


def test_scan_no_setup_contents(db_session, monkeypatch):
monkeypatch.setattr(
c, "fetch_url_content", pretend.call_recorder(lambda *a: pretend.stub())
)
monkeypatch.setattr(
c, "extract_file_content", pretend.call_recorder(lambda *a: None)
)

MalwareCheckFactory.create(
name="SetupPatternCheck", state=MalwareCheckState.Enabled
)
check = c.SetupPatternCheck(db_session)

file = FileFactory.create(packagetype="sdist")

check.scan(obj=file, file_url=pretend.stub())

assert len(check._verdicts) == 1
assert check._verdicts[0].check_id == check.id
assert check._verdicts[0].file_id == file.id
assert check._verdicts[0].classification == VerdictClassification.Indeterminate
assert check._verdicts[0].confidence == VerdictConfidence.High
assert (
check._verdicts[0].message
== "sdist does not contain a suitable setup.py for analysis"
)


def test_scan_benign_contents(db_session, monkeypatch):
monkeypatch.setattr(
c, "fetch_url_content", pretend.call_recorder(lambda *a: pretend.stub())
)
monkeypatch.setattr(
c,
"extract_file_content",
pretend.call_recorder(lambda *a: b"this is a benign string"),
)

MalwareCheckFactory.create(
name="SetupPatternCheck", state=MalwareCheckState.Enabled
)
check = c.SetupPatternCheck(db_session)

file = FileFactory.create(packagetype="sdist")

check.scan(obj=file, file_url=pretend.stub())

assert len(check._verdicts) == 1
assert check._verdicts[0].check_id == check.id
assert check._verdicts[0].file_id == file.id
assert check._verdicts[0].classification == VerdictClassification.Benign
assert check._verdicts[0].confidence == VerdictConfidence.Low
assert check._verdicts[0].message == "No malicious patterns found in setup.py"


def test_scan_matched_content(db_session, monkeypatch):
monkeypatch.setattr(
c, "fetch_url_content", pretend.call_recorder(lambda *a: pretend.stub())
)
monkeypatch.setattr(
c,
"extract_file_content",
pretend.call_recorder(
lambda *a: b"this looks suspicious: os.system('cat /etc/passwd')"
),
)

MalwareCheckFactory.create(
name="SetupPatternCheck", state=MalwareCheckState.Enabled
)
check = c.SetupPatternCheck(db_session)

file = FileFactory.create(packagetype="sdist")

check.scan(obj=file, file_url=pretend.stub())

assert len(check._verdicts) == 1
assert check._verdicts[0].check_id == check.id
assert check._verdicts[0].file_id == file.id
assert check._verdicts[0].classification == VerdictClassification.Threat
assert check._verdicts[0].confidence == VerdictConfidence.High
assert check._verdicts[0].message == "process_spawn_in_setup"
93 changes: 93 additions & 0 deletions tests/unit/malware/checks/test_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import io
import tarfile
import zipfile

import pretend

from warehouse.malware.checks import utils


def test_fetch_url_content(monkeypatch):
response = pretend.stub(
raise_for_status=pretend.call_recorder(lambda: None), content=b"fake content"
)
requests = pretend.stub(get=pretend.call_recorder(lambda url: response))

monkeypatch.setattr(utils, "requests", requests)

io = utils.fetch_url_content("hxxp://fake_url.com")

assert requests.get.calls == [pretend.call("hxxp://fake_url.com")]
assert response.raise_for_status.calls == [pretend.call()]
assert io.getvalue() == b"fake content"


def test_extract_file_contents_zip():
zipbuf = io.BytesIO()
with zipfile.ZipFile(zipbuf, mode="w") as zipobj:
zipobj.writestr("toplevelgetsskipped", b"nothing to see here")
zipobj.writestr("foo/setup.py", b"these are some contents")
zipbuf.seek(0)

assert utils.extract_file_content(zipbuf, "setup.py") == b"these are some contents"


def test_extract_file_contents_zip_no_file():
zipbuf = io.BytesIO()
with zipfile.ZipFile(zipbuf, mode="w") as zipobj:
zipobj.writestr("foo/notsetup.py", b"these are some contents")
zipbuf.seek(0)

assert utils.extract_file_content(zipbuf, "setup.py") is None


def test_extract_file_contents_tar():
tarbuf = io.BytesIO()
with tarfile.open(fileobj=tarbuf, mode="w:gz") as tarobj:
contents = io.BytesIO(b"these are some contents")
member = tarfile.TarInfo(name="foo/setup.py")
member.size = len(contents.getbuffer())
tarobj.addfile(member, fileobj=contents)

contents = io.BytesIO(b"nothing to see here")
member = tarfile.TarInfo(name="toplevelgetsskipped")
member.size = len(contents.getbuffer())
tarobj.addfile(member, fileobj=contents)
tarbuf.seek(0)

assert utils.extract_file_content(tarbuf, "setup.py") == b"these are some contents"


def test_extract_file_contents_tar_empty():
tarbuf = io.BytesIO(b"invalid tar contents")

assert utils.extract_file_content(tarbuf, "setup.py") is None


def test_extract_file_contents_tar_no_file():
tarbuf = io.BytesIO()
with tarfile.open(fileobj=tarbuf, mode="w:gz") as tarobj:
contents = io.BytesIO(b"these are some contents")
member = tarfile.TarInfo(name="foo/notsetup.py")
member.size = len(contents.getbuffer())
tarobj.addfile(member, fileobj=contents)

contents = io.BytesIO(b"nothing to see here")
member = tarfile.TarInfo(name="toplevelgetsskipped")
member.size = len(contents.getbuffer())
tarobj.addfile(member, fileobj=contents)
tarbuf.seek(0)

assert utils.extract_file_content(tarbuf, "setup.py") is None
Loading