Skip to content

Commit 2aaa162

Browse files
Add performance benchmarks (#748)
* Configure asv and add import benchmark * Add item (de)serialization benchmarks & use 10 reps * Add catalog, collection benchmarks and tweak settings * Add convenience script for running locally * Use default Python * Add benchmark workflow to CI * Match label condition to label name in repo * Fix lint errors * Add virtualenv to benchmark deps * Fix artifact name, increase failure threshold * rm: benchmarks workflow I'm of the opinion that we _shouldn't_ run benchmarks on Github runners, so I'm removing this workflow. * refactor: use classes directly * refactor: move benchmarks up a level This lets simple command like `asv dev` work out of the box. * feat: add projection benchmarks I'm not really sure how useful this is, but it was asked for so at least we have something. * feat: add large catalog benchmarks * fix: benchmark config * feat: add benchmark docs * ci: add benchmark check This doesn't run benchmarks, but just checks to make sure they build. * ci: set the asv machine * ci: install pystac for benchmarks * docs: add more text about running benchmarks * bench: use timeraw for import Co-authored-by: Pete Gadomski <[email protected]>
1 parent f34d5d6 commit 2aaa162

16 files changed

+348
-0
lines changed

.github/workflows/continuous-integration.yml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,3 +158,24 @@ jobs:
158158

159159
- name: Install dev dependencies
160160
run: pip install -r requirements-dev.txt
161+
162+
check-benchmarks:
163+
# This checks to make sure any API changes haven't broken any of the
164+
# benchmarks. It doesn't do any actual benchmarking, since (IMO) that's not
165+
# appropriate for CI on Github actions.
166+
runs-on: ubuntu-latest
167+
steps:
168+
- uses: actions/checkout@v3
169+
- uses: actions/setup-python@v4
170+
with:
171+
python-version: "3.8"
172+
cache: "pip"
173+
cache-dependency-path: requirements-bench.txt
174+
- name: Install pystac
175+
run: pip install .
176+
- name: Install benchmark dependencies
177+
run: pip install -r requirements-bench.txt
178+
- name: Set asv machine
179+
run: asv machine --yes
180+
- name: Check benchmarks
181+
run: asv dev -a repeat=1 -a rounds=1 --strict

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,3 +157,6 @@ dmypy.json
157157

158158
# Cython debug symbols
159159
cython_debug/
160+
161+
# asv environments
162+
.asv

asv.conf.json

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
{
2+
"version": 1,
3+
"project": "pystac",
4+
"project_url": "https://pystac.readthedocs.io/",
5+
"repo": ".",
6+
"branches": [
7+
"main"
8+
],
9+
"dvcs": "git",
10+
"environment_type": "virtualenv",
11+
"show_commit_url": "http://github.com/stac-utils/pystac/commit/",
12+
"matrix": {
13+
"req": {
14+
"orjson": [
15+
null,
16+
""
17+
]
18+
}
19+
},
20+
"benchmark_dir": "benchmarks",
21+
"env_dir": ".asv/env",
22+
"results_dir": ".asv/results",
23+
"html_dir": ".asv/html"
24+
}

benchmarks/__init__.py

Whitespace-only changes.

benchmarks/_base.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
class Bench:
2+
# Repeat between 10-50 times up to a max time of 5s
3+
repeat = (10, 50, 2.0)
4+
5+
# Bump number of rounds to 4
6+
rounds = 4

benchmarks/_util.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
import os
2+
from typing import Union, TYPE_CHECKING
3+
4+
if TYPE_CHECKING:
5+
PathLike = os.PathLike[str]
6+
else:
7+
PathLike = os.PathLike
8+
9+
10+
def get_data_path(rel_path: Union[str, PathLike]) -> str:
11+
"""Gets the absolute path to a file based on a path relative to the
12+
tests/data-files directory in this repo."""
13+
rel_path = os.fspath(rel_path)
14+
return os.path.abspath(
15+
os.path.join(os.path.dirname(__file__), "..", "tests", "data-files", rel_path)
16+
)

benchmarks/catalog.py

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
import datetime
2+
import json
3+
import os
4+
import shutil
5+
import tempfile
6+
from pathlib import Path
7+
from tempfile import TemporaryDirectory
8+
from pystac import (
9+
Catalog,
10+
StacIO,
11+
Collection,
12+
Extent,
13+
TemporalExtent,
14+
SpatialExtent,
15+
Item,
16+
)
17+
18+
from ._base import Bench
19+
from ._util import get_data_path
20+
21+
22+
class CatalogBench(Bench):
23+
def setup(self) -> None:
24+
self.temp_dir = tempfile.mkdtemp()
25+
26+
self.stac_io = StacIO.default()
27+
28+
self.catalog_path = get_data_path("examples/1.0.0/catalog.json")
29+
with open(self.catalog_path) as src:
30+
self.catalog_dict = json.load(src)
31+
self.catalog = Catalog.from_file(self.catalog_path)
32+
33+
def teardown(self) -> None:
34+
shutil.rmtree(self.temp_dir, ignore_errors=True)
35+
36+
def time_catalog_from_file(self) -> None:
37+
"""Deserialize an Item from file"""
38+
_ = Catalog.from_file(self.catalog_path)
39+
40+
def time_catalog_from_dict(self) -> None:
41+
"""Deserialize an Item from dictionary."""
42+
_ = Catalog.from_dict(self.catalog_dict)
43+
44+
def time_catalog_to_dict(self) -> None:
45+
"""Serialize an Item to a dictionary."""
46+
self.catalog.to_dict(include_self_link=True)
47+
48+
def time_catalog_save(self) -> None:
49+
"""Serialize an Item to a JSON file."""
50+
self.catalog.save_object(
51+
include_self_link=True,
52+
dest_href=os.path.join(self.temp_dir, "time_catalog_save.json"),
53+
stac_io=self.stac_io,
54+
)
55+
56+
57+
class WalkCatalogBench(Bench):
58+
def setup_cache(self) -> Catalog:
59+
return make_large_catalog()
60+
61+
def time_walk(self, catalog: Catalog) -> None:
62+
for (
63+
_,
64+
_,
65+
_,
66+
) in catalog.walk():
67+
pass
68+
69+
def peakmem_walk(self, catalog: Catalog) -> None:
70+
for (
71+
_,
72+
_,
73+
_,
74+
) in catalog.walk():
75+
pass
76+
77+
78+
class ReadCatalogBench(Bench):
79+
def setup(self) -> None:
80+
catalog = make_large_catalog()
81+
self.temporary_directory = TemporaryDirectory()
82+
self.path = str(Path(self.temporary_directory.name) / "catalog.json")
83+
catalog.normalize_and_save(self.temporary_directory.name)
84+
85+
def teardown(self) -> None:
86+
shutil.rmtree(self.temporary_directory.name)
87+
88+
def time_read_and_walk(self) -> None:
89+
catalog = Catalog.from_file(self.path)
90+
for _, _, _ in catalog.walk():
91+
pass
92+
93+
94+
class WriteCatalogBench(Bench):
95+
def setup(self) -> None:
96+
self.catalog = make_large_catalog()
97+
self.temporary_directory = TemporaryDirectory()
98+
99+
def teardown(self) -> None:
100+
shutil.rmtree(self.temporary_directory.name)
101+
102+
def time_normalize_and_save(self) -> None:
103+
self.catalog.normalize_and_save(self.temporary_directory.name)
104+
105+
106+
def make_large_catalog() -> Catalog:
107+
catalog = Catalog("an-id", "a description")
108+
extent = Extent(
109+
SpatialExtent([[-180.0, -90.0, 180.0, 90.0]]),
110+
TemporalExtent([[datetime.datetime(2023, 1, 1), None]]),
111+
)
112+
for i in range(0, 10):
113+
collection = Collection(f"collection-{i}", f"Collection {i}", extent)
114+
for j in range(0, 100):
115+
item = Item(f"item-{i}-{j}", None, None, datetime.datetime.now(), {})
116+
collection.add_item(item)
117+
catalog.add_child(collection)
118+
return catalog

benchmarks/collection.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
import json
2+
import os
3+
import shutil
4+
import tempfile
5+
from pystac import StacIO, Collection
6+
7+
from ._base import Bench
8+
from ._util import get_data_path
9+
10+
11+
class CollectionBench(Bench):
12+
def setup(self) -> None:
13+
self.temp_dir = tempfile.mkdtemp()
14+
15+
self.stac_io = StacIO.default()
16+
17+
self.collection_path = get_data_path("examples/1.0.0/collection.json")
18+
with open(self.collection_path) as src:
19+
self.collection_dict = json.load(src)
20+
self.collection = Collection.from_file(self.collection_path)
21+
22+
def teardown(self) -> None:
23+
shutil.rmtree(self.temp_dir, ignore_errors=True)
24+
25+
def time_collection_from_file(self) -> None:
26+
"""Deserialize an Item from file"""
27+
_ = Collection.from_file(self.collection_path)
28+
29+
def time_collection_from_dict(self) -> None:
30+
"""Deserialize an Item from dictionary."""
31+
_ = Collection.from_dict(self.collection_dict)
32+
33+
def time_collection_to_dict(self) -> None:
34+
"""Serialize an Item to a dictionary."""
35+
self.collection.to_dict(include_self_link=True)
36+
37+
def time_collection_save(self) -> None:
38+
"""Serialize an Item to a JSON file."""
39+
self.collection.save_object(
40+
include_self_link=True,
41+
dest_href=os.path.join(self.temp_dir, "time_collection_save.json"),
42+
stac_io=self.stac_io,
43+
)

benchmarks/extensions/__init__.py

Whitespace-only changes.

benchmarks/extensions/projection.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
import datetime
2+
3+
from pystac import Item
4+
from pystac.extensions.projection import ProjectionExtension
5+
6+
from .._base import Bench
7+
8+
9+
class ProjectionBench(Bench):
10+
def setup(self) -> None:
11+
self.item = Item("an-id", None, None, datetime.datetime.now(), {})
12+
13+
def time_add_projection_extension(self) -> None:
14+
_ = ProjectionExtension.ext(self.item, add_if_missing=True)

benchmarks/import_pystac.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
class ImportPySTACBench:
2+
repeat = 10
3+
4+
def timeraw_import_pystac(self) -> str:
5+
return """
6+
import pystac
7+
"""

benchmarks/item.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
import json
2+
import os
3+
import shutil
4+
import tempfile
5+
from pystac import StacIO, Item
6+
7+
from ._base import Bench
8+
from ._util import get_data_path
9+
10+
11+
class ItemBench(Bench):
12+
def setup(self) -> None:
13+
self.temp_dir = tempfile.mkdtemp()
14+
15+
self.stac_io = StacIO.default()
16+
17+
self.item_path = get_data_path("item/sample-item-asset-properties.json")
18+
with open(self.item_path) as src:
19+
self.item_dict = json.load(src)
20+
self.item = Item.from_file(self.item_path)
21+
22+
def teardown(self) -> None:
23+
shutil.rmtree(self.temp_dir, ignore_errors=True)
24+
25+
def time_item_from_file(self) -> None:
26+
"""Deserialize an Item from file"""
27+
_ = Item.from_file(self.item_path)
28+
29+
def time_item_from_dict(self) -> None:
30+
"""Deserialize an Item from dictionary."""
31+
_ = Item.from_dict(self.item_dict)
32+
33+
def time_item_to_dict(self) -> None:
34+
"""Serialize an Item to a dictionary."""
35+
self.item.to_dict(include_self_link=True)
36+
37+
def time_item_save(self) -> None:
38+
"""Serialize an Item to a JSON file."""
39+
self.item.save_object(
40+
include_self_link=True,
41+
dest_href=os.path.join(self.temp_dir, "time_item_save.json"),
42+
stac_io=self.stac_io,
43+
)

docs/contributing.rst

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,45 @@ flag to Git commit commands, as in ``git commit --no-verify``.
7373
.. [#] In rare cases changes to one file might invalidate an unchanged file, such as
7474
when modifying the return type of a function used in another file.
7575
76+
Benchmarks
77+
^^^^^^^^^^
78+
79+
PySTAC uses `asv <https://asv.readthedocs.io>`_ for benchmarking. Benchmarks are
80+
defined in the ``./benchmarks`` directory. Due to the inherent uncertainty in
81+
the environment of Github workflow runners, benchmarks are not executed in CI.
82+
If your changes may affect performance, use the provided script to run the
83+
benchmark suite locally. This script will compare your current ``HEAD`` with
84+
the **main** branch and report any improvements or regressions.
85+
86+
.. code-block:: bash
87+
88+
scripts/bench
89+
90+
The benchmark suite takes a while to run, and will report any significant
91+
changes to standard output. For example, here's a benchmark comparison between
92+
v1.0.0 and v1.6.1 (from `@gadomski's <https://github.com/gadomski>`_ computer)::
93+
94+
before after ratio
95+
[eee06027] [579c071b]
96+
<v1.0.0^0> <v1.6.1^0>
97+
- 533±20μs 416±10μs 0.78 collection.CollectionBench.time_collection_from_file [gadomski/virtualenv-py3.10-orjson]
98+
- 329±8μs 235±10μs 0.72 collection.CollectionBench.time_collection_from_dict [gadomski/virtualenv-py3.10-orjson]
99+
- 332±10μs 231±4μs 0.70 collection.CollectionBench.time_collection_from_dict [gadomski/virtualenv-py3.10]
100+
- 174±4μs 106±2μs 0.61 item.ItemBench.time_item_from_dict [gadomski/virtualenv-py3.10]
101+
- 174±4μs 106±2μs 0.61 item.ItemBench.time_item_from_dict [gadomski/virtualenv-py3.10-orjson]
102+
before after ratio
103+
[eee06027] [579c071b]
104+
<v1.0.0^0> <v1.6.1^0>
105+
+ 87.1±3μs 124±5μs 1.42 catalog.CatalogBench.time_catalog_from_dict [gadomski/virtualenv-py3.10]
106+
+ 87.1±4μs 122±5μs 1.40 catalog.CatalogBench.time_catalog_from_dict [gadomski/virtualenv-py3.10-orjson]
107+
108+
When developing new benchmarks, you can run a shortened version of the benchmark suite:
109+
110+
.. code-block:: bash
111+
112+
asv dev
113+
114+
76115
CHANGELOG
77116
^^^^^^^^^
78117

requirements-bench.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
asv==0.5.1
2+
virtualenv==20.13.1

requirements-dev.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
-r ./requirements-docs.txt
22
-r ./requirements-test.txt
3+
-r ./requirements-bench.txt
34

45
jupyter==1.0.0

scripts/bench

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#!/bin/bash
2+
3+
set -e
4+
5+
if [[ -z $ASV_FACTOR ]]; then
6+
ASV_FACTOR=1.25;
7+
fi
8+
9+
asv continuous --split -e --interleave-rounds \
10+
--factor ${ASV_FACTOR} \
11+
main HEAD;

0 commit comments

Comments
 (0)