Debugging

This example shows how to run the Titanic example locally to debug your code. For more information on the debug mode, see the documentation.

Debugging
- Debug using only local assets
- Debug locally using the Titanic example assets
  - Create new assets
  - Use a dataset from the deployed platform

Debug using only local assets

In this example, everything is run locally without a connection to a deployed Substra platform. This requires the user to create all of his assets locally. We use the same objective, data manager and algorithms as the Titanic example (see the assets folder in the Titanic example).

Prerequisites

In order to run this example, you'll need to:

use Python 3
have Docker installed
install substra
checkout this repository

All commands in this example are run from the substra/examples/debugging/ folder.

Data and script preparation

Follow the data preparation phase instructions from the Titanic example to generate the data samples on disk.

Run and debug our pipeline

The local_debugging.py script contains the code to add the assets to the platform, train the algorithm and make predictions.

Difference with the Titanic example

The differences with the Titanic example (executed on a running Substra platform) are:

the client is given the argument debug=True and there is no need of a username and password
Adding the dataset and algorithm must be done in the same script since the objects (traintuple, objective...) are saved in memory and erased at the end of the script

Apart from this, this example is the same as the Titanic example.

Debug locally using the Titanic example assets

In this example, the dataset and objective are those from the Titanic example, on the deployed Substra platform whereas the algo, traintuple and testtuple are created locally.

For this, run the Titanic example and keep the Substra platform running. You will need the assets_keys.json generated while running the Titanic example.

The debugging.py script contains the code to load the Titanic asset keys then create a traintuple and testtuple using those. In this setup, you can also create any asset you want, as in the example with only local assets.

Create new assets

With debug=True in the client, the deployed Substra platform is read-only: all new assets are created locally.

Use a dataset from the deployed platform

The data cannot leave the deployed platform. When a traintuple or testtuple uses a dataset from the remote platform, it runs on the fake data that the dataset opener generates (see the fake_X and fake_y methods in the dataset opener). The number of samples generated by the fake methods is equal to the number of data samples that the traintuple or testtuple uses.

In the Titanic example, there are 10 data samples, so the traintuple we create in the script uses 10 samples:

traintuple_key = client.add_traintuple(
    {
        ...
        "train_data_sample_keys": assets_keys["train_data_sample_keys"],
    },
)

So in this mode the traintuple runs on 10 fake data samples generated by the opener.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debugging

Debug using only local assets

Prerequisites

Data and script preparation

Run and debug our pipeline

Difference with the Titanic example

Debug locally using the Titanic example assets

Create new assets

Use a dataset from the deployed platform

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Debugging

Debug using only local assets

Prerequisites

Data and script preparation

Run and debug our pipeline

Difference with the Titanic example

Debug locally using the Titanic example assets

Create new assets

Use a dataset from the deployed platform