This example shows how to run the Titanic example locally to debug your code. For more information on the debug mode, see the documentation.
In this example, everything is run locally without a connection to a deployed Substra platform. This requires the user
to create all of his assets locally.
We use the same objective, data manager and algorithms as the Titanic example (see the assets folder in the Titanic
example).
In order to run this example, you'll need to:
- use Python 3
- have Docker installed
- install
substra - checkout this repository
All commands in this example are run from the substra/examples/debugging/ folder.
Follow the data preparation phase instructions from the Titanic example to generate the data samples on disk.
The local_debugging.py script contains the code to add the assets to the platform, train the algorithm and make predictions.
The differences with the Titanic example (executed on a running Substra platform) are:
- the client is given the argument
debug=Trueand there is no need of a username and password - Adding the dataset and algorithm must be done in the same script since the objects (traintuple, objective...) are saved in memory and erased at the end of the script
Apart from this, this example is the same as the Titanic example.
In this example, the dataset and objective are those from the Titanic example, on the deployed Substra platform whereas the algo, traintuple and testtuple are created locally.
For this, run the Titanic example and keep the Substra platform running. You will need
the assets_keys.json generated while running the Titanic example.
The debugging.py script contains the code to load the
Titanic asset keys then create a traintuple and testtuple using those.
In this setup, you can also create any asset you want, as in the example with only local assets.
With debug=True in the client, the deployed Substra platform is read-only: all new assets are created locally.
The data cannot leave the deployed platform. When a traintuple or testtuple uses a dataset from the remote platform,
it runs on the fake data that the dataset opener generates (see the fake_X and fake_y methods in the
dataset opener). The number of samples generated by the fake methods is equal to the number
of data samples that the traintuple or testtuple uses.
In the Titanic example, there are 10 data samples, so the traintuple we create in the script uses 10 samples:
traintuple_key = client.add_traintuple(
{
...
"train_data_sample_keys": assets_keys["train_data_sample_keys"],
},
)So in this mode the traintuple runs on 10 fake data samples generated by the opener.