Skip to content

Commit c087cf0

Browse files
authored
Allow Notebook output to be preserved (#771)
* Fix diff command * Allow keeping certain output cells * Document possible cell tags * Reflect ruff usage and notebook options in docs * Describe ipynb -> doc
1 parent bb254de commit c087cf0

File tree

5 files changed

+121
-52
lines changed

5 files changed

+121
-52
lines changed

TESTING.md

Lines changed: 18 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ To run only integration tests that are marked as `encrypted_only`, call:
7878

7979
```bash
8080
pytest graphdatascience/tests/integration --encrypted-only
81-
````
81+
```
8282

8383

8484
### GDS library versions
@@ -90,52 +90,38 @@ For this reason only tests compatible with the GDS library server version you ar
9090

9191
## Style guide
9292

93-
The code follows a rather opinionated style based on [pep8](https://www.python.org/dev/peps/pep-0008/).
93+
The code and examples use [ruff](hhttps://docs.astral.sh/ruff/) to format and lint.
9494
You can check all code using all the below mentioned code checking tools by running the `scripts/checkstyle` bash script.
9595
There's also a `scripts/makestyle` to do formatting.
96+
Use `SKIP_NOTEBOOKS=true` to only format the code.
9697

97-
98-
### Linting
99-
100-
To enforce pep8 conformity (with the exception of using max line length = 120) [flake8](https://flake8.pycqa.org/en/latest/) is used.
101-
To run it to check the entire repository, simply call:
102-
103-
```bash
104-
flake8
105-
```
106-
107-
from the root. See `.flake8` for our custom flake8 settings.
98+
See `pyproject.toml` for the configuration.
10899

109100

110-
### Formatting
101+
### Static typing
111102

112-
For general formatting we use [black](https://black.readthedocs.io/en/stable/) with default settings.
113-
black can be run to format the entire repository by calling:
103+
The code is annotated with type hints in order to provide documentation and allow for static type analysis with [mypy](http://mypy-lang.org/).
104+
Please note that the `typing` library is used for annotation types in order to stay compatible with Python versions < 3.9.
105+
To run static analysis on the entire repository with mypy, just run:
114106

115107
```bash
116-
black .
108+
mypy .
117109
```
118110

119-
from the root. See the `[tool.black]` section of `pyproject.toml` for our custom black settings.
111+
from the root. See `mypy.ini` for our custom mypy settings.
120112

121-
Additionally [isort](https://pycqa.github.io/isort/) is used for consistent import sorting.
122-
It can similarly be run to format all source code by calling:
123113

124-
```bash
125-
isort .
126-
```
114+
## Notebook examples
127115

128-
from the root. See `.isort.cfg` for our custom isort settings.
116+
The notebooks under `/examples` can be run using `scripts/run_notebooks`.
129117

130118

131-
### Static typing
119+
### Cell Tags
132120

133-
The code is annotated with type hints in order to provide documentation and allow for static type analysis with [mypy](http://mypy-lang.org/).
134-
Please note that the `typing` library is used for annotation types in order to stay compatible with Python versions < 3.9.
135-
To run static analysis on the entire repository with mypy, just run:
121+
*Verify version*
122+
If you only want to let CI run the notebook given a certain condition, tag a given cell in the notebook with `verify-version`.
123+
As the name suggests, the tag was introduced to only run for given GDS server versions.
136124

137-
```bash
138-
mypy .
139-
```
125+
*Teardown*
140126

141-
from the root. See `mypy.ini` for our custom mypy settings.
127+
To make sure certain cells are always run even in case of failure, tag the cell with `teardown`.

examples/README.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Examples
2+
3+
This folder contains example notebooks on how to use the `graphdatascience` python client.
4+
5+
6+
## Custom cell tags for notebooks
7+
8+
*Preserve cell outputs*
9+
10+
By default, `makestyle` will remove all cell outputs. If you want to preserve some outputs, tag the cell with `preserve-output`.
11+
12+
13+
## Update /tutorials in docs
14+
15+
Every notebook is also available as `adoc` version living under `doc/pages/tutorials/`.
16+
The latest published version can be viewed at https://neo4j.com/docs/graph-data-science-client/current/.
17+
18+
To update the adoc version, run
19+
20+
```bash
21+
./scripts/nb2doc/convert.sh
22+
```
23+
24+
On how to render the docs locally, the doc [README](../doc/README)

scripts/checkstyle

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -17,19 +17,12 @@ NOTEBOOKS="./examples/*.ipynb" # ./examples/dev/*.ipynb"
1717
for f in $NOTEBOOKS
1818
do
1919
NB=$(cat $f)
20-
FORMATTED_NB=$(python -m jupyter nbconvert \
21-
--clear-output \
22-
--stdout \
23-
--ClearOutputPreprocessor.enabled=True \
24-
--ClearMetadataPreprocessor.enabled=True \
25-
--ClearMetadataPreprocessor.preserve_cell_metadata_mask='tags' \
26-
--log-level CRITICAL \
27-
$f)
20+
FORMATTED_NB=$(python scripts/clean_notebooks.py -i "$f" -o stdout)
2821

2922
if [[ "$FORMATTED_NB" != "$NB" ]];
3023
then
31-
echo "Notebook $f is not correctly formatted"
32-
diff --color=always --suppress-common-lines --minimal --side-by-side $NB $FORMATTED_NB
24+
echo "Notebook $f is not correctly formatted. See diff above for more details."
25+
diff --color=always --suppress-common-lines --minimal --side-by-side <(echo "$NB") <(echo "$FORMATTED_NB")
3326
exit 1
3427
fi
3528
done

scripts/clean_notebooks.py

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# reasons for not using nbconvert cli tool:
2+
# * cannot keep output based on a given tag
3+
4+
import argparse
5+
import logging
6+
from enum import Enum
7+
from pathlib import Path
8+
9+
import nbconvert
10+
from nbconvert.preprocessors import Preprocessor
11+
12+
PRESERVE_CELL_OUTPUT_KEY = "preserve-output"
13+
METADATA_TAG_KEY = "tags"
14+
15+
16+
class OutputMode(Enum):
17+
STDOUT = "stdout"
18+
INPLACE = "inplace"
19+
20+
21+
class CustomClearOutputPreprocessor(Preprocessor):
22+
"""
23+
Removes the output from all code cells in a notebook.
24+
Option to keep cell output for cells with a given metadata tag
25+
"""
26+
27+
def preprocess_cell(self, cell, resources, cell_index):
28+
"""
29+
Apply a transformation on each cell. See base.py for details.
30+
"""
31+
if cell.cell_type == "code" and PRESERVE_CELL_OUTPUT_KEY not in cell["metadata"].get(METADATA_TAG_KEY, []):
32+
cell.outputs = []
33+
cell.execution_count = None
34+
return cell, resources
35+
36+
37+
def main(input_path: Path, output_mode: str) -> None:
38+
logger = logging.getLogger("NotebookCleaner")
39+
logger.info(f"Cleaning notebooks from `{input_path}`, mode: `{output_mode}`")
40+
41+
exporter = nbconvert.NotebookExporter()
42+
43+
metadata_cleaner = nbconvert.preprocessors.ClearMetadataPreprocessor(preserve_cell_metadata_mask=METADATA_TAG_KEY)
44+
output_cleaner = CustomClearOutputPreprocessor()
45+
46+
exporter.register_preprocessor(metadata_cleaner, enabled=True)
47+
exporter.register_preprocessor(output_cleaner, enabled=True)
48+
49+
if input_path.is_file():
50+
notebooks = [input_path]
51+
else:
52+
notebooks = [f for f in input_path.iterdir() if f.is_file() and f.suffix == ".ipynb"]
53+
54+
logger.info(f"Formatting {len(notebooks)} notebooks.")
55+
56+
for notebook in notebooks:
57+
output = exporter.from_filename(notebook)
58+
59+
formatted_notebook = output[0]
60+
61+
if output_mode == OutputMode.INPLACE:
62+
with notebook.open(mode="w") as file:
63+
file.write(formatted_notebook)
64+
elif output_mode == OutputMode.STDOUT:
65+
print(formatted_notebook)
66+
67+
68+
if __name__ == "__main__":
69+
parser = argparse.ArgumentParser()
70+
parser.add_argument("-o", "--output", choices=[e.value for e in OutputMode])
71+
parser.add_argument("-i", "--input", default="examples", help="path to the notebook file or folder")
72+
73+
args = parser.parse_args()
74+
75+
main(Path(args.input), OutputMode(args.output))

scripts/makestyle

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,4 @@ if [ "${SKIP_NOTEBOOKS:-false}" == "true" ]; then
1313
exit 0
1414
fi
1515

16-
echo "Cleaning notebooks"
17-
python -m jupyter nbconvert \
18-
--clear-output \
19-
--inplace \
20-
--ClearOutputPreprocessor.enabled=True \
21-
--ClearMetadataPreprocessor.enabled=True \
22-
--ClearMetadataPreprocessor.preserve_cell_metadata_mask='tags' \
23-
--log-level CRITICAL \
24-
./examples/*.ipynb \
25-
./examples/dev/*.ipynb
16+
python scripts/clean_notebooks.py -i examples/ -o inplace

0 commit comments

Comments
 (0)