Skip to content

Commit f66ff94

Browse files
Feature / Documentation updates (#618)
* Use latest conventions for using data and schema examples * Rename models for using data examples * Add chaining example with a static customer filter model * Fixes for using data tutorial * Update model names in tests * Use new API methods in dynamic / optional models * Update hello world model and tutorial * Update config path resolution in hello world docs * Update chaining examples * Remove old model 1 / model 2 example * Update e-2-e tests * Update example tests in Python after renaming * Add chaining tutorial * Add doc comments for get / put file methods * Update references to renamed models and parameters * Rename Connacht in example data files * Bump netty for compliance
1 parent 1a3702c commit f66ff94

33 files changed

+393
-313
lines changed
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
2+
***************************
3+
Chapter 3 - Chaining Models
4+
***************************
5+
6+
This tutorial is based on example code which can be found in the |examples_repo|.
7+
8+
9+
Adding a second model
10+
---------------------
11+
12+
In :doc:`using_data` we wrote a simple model to perform PnL aggregation on
13+
some customer accounts. In this example we add a second model, to pre-filter the account
14+
data. We can chain these two models together to create a FLOW. TRAC will run the flow
15+
for us as a single job.
16+
17+
First, here is a new model that we can use to build the chain:
18+
19+
.. literalinclude:: ../../../examples/models/python/src/tutorial/chaining.py
20+
:caption: src/tutorial/chaining.py
21+
:name: chaining_py_part_1
22+
:language: python
23+
:lines: 22 - 50
24+
:linenos:
25+
:lineno-start: 22
26+
27+
The model takes a single parameter, ``filter_region``, and filers out any records in the
28+
dataset that match that region. The schema of the input and output datasets are the same.
29+
30+
Notice that that input dataset key, ``customer_loans``, is the same key we used in the
31+
``PnLAggregation`` model. Since this input is expected to refer to the same dataset, it
32+
makes sense to give it the same key. The output key, ``filtered_loans``, is different so
33+
we will have to tell TRAC how to connect these models together.
34+
35+
36+
Defining a flow
37+
---------------
38+
39+
To run a flow locally, we need to define the flow in YAML. Here is an example of a flow YAML file
40+
that wires together the customer data filter with our PnL aggregation model:
41+
42+
.. literalinclude:: ../../../examples/models/python/config/chaining_flow.yaml
43+
:caption: config/chaining_flow.yaml
44+
:name: chaining_flow_yaml
45+
:language: yaml
46+
47+
The flow describes the chain of models as a graph, with **nodes** and **edges**. This example has
48+
one input, two models and one output, which are defined as the flow *nodes*. Additionally,
49+
the model nodes have to include the names of their inputs and outputs, so that TRAC can
50+
understand the shape of the graph. The model inputs and outputs are called **sockets**.
51+
52+
TRAC wires up the *edges* of the graph based on name. If all the names are consistent and unique,
53+
you might not need to define any edges at all! In this case we only need to define a single edge,
54+
to connect the ``filtered_loans`` output of the filter model to the ``customer_loans`` input of
55+
the aggregation model.
56+
57+
In this example the input and output nodes will be connected automatically, because their names
58+
mach the appropriate model inputs and outputs. If we wanted to define those extra two edges
59+
explicitly, it would look like this:
60+
61+
.. code-block:: yaml
62+
:class: container
63+
64+
- source: { node: customer_loans }
65+
target: { node: customer_data_filter, socket: customer_loans }
66+
67+
- source: { node: pnl_aggregation, socket: profit_by_region }
68+
target: { node: profit_by_region }
69+
70+
Notice that the input and output nodes do not have *sockets*, this is because each input and
71+
output represents a single dataset, while models can have multiple inputs and outputs.
72+
73+
.. note::
74+
Using a consistent naming convention for the inputs and outputs of models in a single project
75+
can make it significantly easier to build and manage complex flows.
76+
77+
78+
Setting up a job
79+
----------------
80+
81+
Now we have a flow definition, in order to run it we will need a job config file.
82+
Here is an example job config for this flow, using the two models we have available:
83+
84+
.. literalinclude:: ../../../examples/models/python/config/chaining.yaml
85+
:caption: config/chaining.yaml
86+
:name: chaining_yaml
87+
:language: yaml
88+
89+
The job type is now ``runFlow`` instead of ``runModel``. We supply the path to the flow YAML
90+
file, which is resolved relative to the job config file. The parameters section has the
91+
parameters needed by all the models in the flow. For the inputs and outputs, the keys
92+
(``customer_loans`` and ``profit_by_region`` in this example) have to match the input and
93+
output nodes in the flow.
94+
95+
In the models section, we specify which model to use for every model node in the flow.
96+
It is important to use the fully-qualified name for each model, which means the Python
97+
package structure should be set up correctly. See :doc:`hello_world` for a refresher on
98+
setting up the repository layout and package structure.
99+
100+
101+
Running a flow locally
102+
----------------------
103+
104+
A flow can be launched locally as a job in the same way as a model.
105+
You don't need to pass the model class (since we are not running a single model),
106+
so just the job config and sys config files are required:
107+
108+
.. literalinclude:: ../../../examples/models/python/src/tutorial/chaining.py
109+
:caption: src/tutorial/chaining.py
110+
:name: chaining_py_part_2
111+
:language: python
112+
:lines: 53 - 55
113+
:linenos:
114+
:lineno-start: 53
115+
116+
This approach works well in some simple cases, such as this example, but for large codebases with
117+
lots of models and multiple flows it is usually easier to launch thw flow directly. You can launch
118+
a TRAC flow from the command line like this:
119+
120+
.. code-block::
121+
:class: container
122+
123+
python -m tracdap.rt.launch --job-config config/chaining.yaml --sys-config config/sys_config.yaml --dev-mode
124+
125+
You can set this command up to run from your IDE and then use the IDE tools to run the command
126+
in debug mode, which will let you debug into all the models in the chain. For example in PyCharm
127+
you can set this command up as a Run Configuration.
128+
129+
.. note::
130+
Launching TRAC from the command line does not enable dev mode by default,
131+
always use the ``--dev-mode`` flag for local development.

doc/modelling/tutorial/hello_world.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -275,9 +275,11 @@ this, but the model will fail to deploy)!
275275

276276
Paths for the system and job config files are resolved in the following order:
277277

278-
1. If absolute paths are supplied, these take top priority
278+
1. If an absolute path is supplied, this takes priority
279279
2. Resolve relative to the current working directory
280-
3. Resolve relative to the directory containing the Python module of the model
280+
3. Search relative to parents of the current directory
281+
4. Resolve relative to the directory containing the model
282+
5. Search relative to parents of the directory containing the model
281283

282284
Now you should be able to run your model script and see the model output in the logs:
283285

@@ -287,7 +289,7 @@ Now you should be able to run your model script and see the model output in the
287289
288290
2022-05-31 12:19:36,104 [engine] INFO tracdap.rt.exec.engine.NodeProcessor - START RunModel [HelloWorldModel] / JOB-92df0bd5-50bd-4885-bc7a-3d4d95029360-v1
289291
2022-05-31 12:19:36,104 [engine] INFO __main__.HelloWorldModel - Hello world model is running
290-
2022-05-31 12:19:36,104 [engine] INFO __main__.HelloWorldModel - The meaning of life is 42
292+
2022-05-31 12:19:36,104 [engine] INFO __main__.HelloWorldModel - The input number is 42
291293
2022-05-31 12:19:36,104 [engine] INFO tracdap.rt.exec.engine.NodeProcessor - DONE RunModel [HelloWorldModel] / JOB-92df0bd5-50bd-4885-bc7a-3d4d95029360-v1
292294
293295

doc/modelling/tutorial/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,5 @@ Modelling Tutorial
77

88
./hello_world
99
./using_data
10+
./chaining
1011
./inputs_and_outputs

doc/modelling/tutorial/inputs_and_outputs.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11

22
****************************
3-
Chapter 3 - Inputs & Outputs
3+
Chapter 4 - Inputs & Outputs
44
****************************
55

66
This tutorial is based on example code which can be found in the |examples_repo|.

doc/modelling/tutorial/using_data.rst

Lines changed: 27 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ lenient type handling for input files.
7979
:name: using_data_py_part_3
8080
:language: python
8181
:class: container
82-
:lines: 68 - 77
82+
:lines: 68 - 78
8383
:linenos:
8484
:lineno-start: 68
8585

@@ -94,9 +94,9 @@ Models are free to define multiple outputs if required, but this example only ha
9494
:name: using_data_py_part_4
9595
:language: python
9696
:class: container
97-
:lines: 79 - 85
97+
:lines: 80 - 87
9898
:linenos:
99-
:lineno-start: 79
99+
:lineno-start: 80
100100

101101
Now the parameters, inputs and outputs of the model are defined, we can implement the
102102
:py:meth:`run_model() <tracdap.rt.api.TracModel.run_model>` method.
@@ -117,9 +117,9 @@ schema for this input.
117117
:name: using_data_py_part_5
118118
:language: python
119119
:class: container
120-
:lines: 87 - 93
120+
:lines: 89 - 95
121121
:linenos:
122-
:lineno-start: 87
122+
:lineno-start: 89
123123

124124
Once all the inputs and parameters are available, we can call the model function. Since all the inputs
125125
and parameters are supplied using the correct native types there is no further conversion necessary,
@@ -129,9 +129,9 @@ they can be passed straight into the model code.
129129
:name: using_data_py_part_6
130130
:language: python
131131
:class: container
132-
:lines: 95 - 97
132+
:lines: 97 - 99
133133
:linenos:
134-
:lineno-start: 95
134+
:lineno-start: 97
135135

136136
The model code has produced a Pandas dataframe that we want to record as an output. To do this, we can use
137137
:py:meth:`put_pandas_table() <tracdap.rt.api.TracContext.put_pandas_table>`. The dataframe should match
@@ -151,41 +151,42 @@ columns will be dropped.
151151
:name: using_data_py_part_7
152152
:language: python
153153
:class: container
154-
:lines: 99
154+
:lines: 101
155155
:linenos:
156-
:lineno-start: 99
156+
:lineno-start: 101
157157

158158
The model can be launched locally using :py:func:`launch_model() <tracdap.rt.launch.launch_model()>`.
159159

160160
.. literalinclude:: ../../../examples/models/python/src/tutorial/using_data.py
161161
:name: using_data_py_part_8
162162
:language: python
163163
:class: container
164-
:lines: 102-104
164+
:lines: 104-106
165165
:linenos:
166-
:lineno-start: 102
166+
:lineno-start: 104
167167

168168
Configure local data
169169
--------------------
170170

171171
To pass data into the local model, a little bit more config is needed in the *sys_config* file
172-
to define a storage bucket. In TRAC storage buckets can be any storage location that can hold
173-
files. This would be bucket storage on a cloud platform, but you can also use local disks or other
174-
storage protocols such as network storage or HDFS, so long as the right storage plugins are available.
172+
to define a storage location. For development this can be a local folder, although in production
173+
deployments storage locations can be cloud buckets or use other protocols such as network storage
174+
or HDFS, so long as the right storage plugins are available.
175175

176-
This example sets up one storage bucket called *example_data*. Since we are going to use a local disk,
176+
This example sets up one storage location called *example_data*. Since we are going to use a local disk,
177177
the storage protocol is *LOCAL*. The *rootPath* property says where this storage bucket will be on disk -
178178
a relative path is taken relative to the *sys_config* file by default, or you can specify an absolute path
179179
here to avoid confusion.
180180

181-
The default bucket is also where output data will be saved. In this example we have only one storage
182-
bucket configured, which is used for both inputs and outputs, so we mark that as the default.
181+
The example config also sets the default storage location and format, which controls where
182+
output data will be saved. In this example we have only one storage
183+
location configured, which is used for both inputs and outputs, so we mark that as the default.
183184

184185
.. literalinclude:: ../../../examples/models/python/config/sys_config.yaml
185186
:caption: config/sys_config.yaml
186187
:name: sys_config.yaml
187188
:language: yaml
188-
:lines: 2-12
189+
:lines: 2-15
189190

190191
In the *job_config* file we need to specify what data to use for the model inputs and outputs. Each
191192
input named in the model must have an entry in the inputs section, and each output in the outputs
@@ -277,22 +278,23 @@ Now we can re-write our model to use the new schema files. First we need to impo
277278
:linenos:
278279
:lineno-start: 20
279280

280-
Then we can load schemas from the schemas package in the
281+
Then we can load schemas from the schemas package in the model's
281282
:py:meth:`define_inputs() <tracdap.rt.api.TracModel.define_inputs>` and
282283
:py:meth:`define_outputs() <tracdap.rt.api.TracModel.define_outputs>` methods:
283284

284285
.. literalinclude:: ../../../examples/models/python/src/tutorial/schema_files.py
285286
:name: using_data_part_10
286287
:language: python
287288
:class: container
288-
:lines: 47 - 57
289+
:lines: 39 - 51
289290
:linenos:
290-
:lineno-start: 47
291+
:lineno-start: 39
291292

292-
Notice that the :py:func:`load_schema() <tracdap.rt.api.load_schema>` method is the same
293-
for input and output schemas, so we need to use
294-
:py:class:`ModelInputSchema <tracdap.rt.metadata.ModelInputSchema>` and
295-
:py:class:`ModelOutputSchema <tracdap.rt.metadata.ModelOutputSchema>` explicitly.
293+
Notice that the :py:func:`load_schema() <tracdap.rt.api.load_schema>` method only creates
294+
the `SchemaDefinition <tracdap.rt.metadata.SchemaDefinition>`, to use this schema for
295+
model inputs and outputs we need to call
296+
:py:func:`define_input() <tracdap.rt.api.define_input>` and
297+
:py:func:`define_output() <tracdap.rt.api.define_output>` explicitly.
296298

297299
.. seealso::
298300
Full source code is available for the

examples/models/python/config/chaining.yaml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,17 @@ job:
55
flow: ./chaining_flow.yaml
66

77
parameters:
8-
param_1: 42
9-
param_2: "2015-01-01"
10-
param_3: 1.5
8+
eur_usd_rate: 1.2071
9+
default_weighting: 1.5
10+
filter_defaults: false
11+
filter_region: munster
1112

1213
inputs:
1314
customer_loans: "inputs/loan_final313_100.csv"
14-
currency_data: "inputs/currency_data_sample.csv"
1515

1616
outputs:
1717
profit_by_region: "outputs/chaining/profit_by_region.csv"
1818

1919
models:
20-
model_1: tutorial.model_1.FirstModel
21-
model_2: tutorial.model_2.SecondModel
20+
customer_data_filter: tutorial.chaining.CustomerDataFilter
21+
pnl_aggregation: tutorial.using_data.PnlAggregation

examples/models/python/config/chaining_flow.yaml

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,21 @@ nodes:
44
customer_loans:
55
nodeType: "INPUT_NODE"
66

7-
currency_data:
8-
nodeType: "INPUT_NODE"
9-
10-
model_1:
7+
customer_data_filter:
118
nodeType: "MODEL_NODE"
12-
inputs: [customer_loans, currency_data]
13-
outputs: [preprocessed_data]
9+
inputs: [customer_loans]
10+
outputs: [filtered_loans]
1411

15-
model_2:
12+
pnl_aggregation:
1613
nodeType: "MODEL_NODE"
17-
inputs: [preprocessed_data]
14+
inputs: [customer_loans]
1815
outputs: [profit_by_region]
1916

2017
profit_by_region:
2118
nodeType: "OUTPUT_NODE"
19+
20+
21+
edges:
22+
23+
- source: { node: customer_data_filter, socket: filtered_loans }
24+
target: { node: pnl_aggregation, socket: customer_loans }

examples/models/python/config/chaining_2.yaml renamed to examples/models/python/config/dynamic_chaining.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
job:
33
runFlow:
44

5-
flow: ./chaining_flow_2.yaml
5+
flow: ./dynamic_chaining_flow.yaml
66

77
models:
88
dynamic_filter: tutorial.dynamic_io.DynamicDataFilter
File renamed without changes.

examples/models/python/config/hello_world.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@ job:
33
runModel:
44

55
parameters:
6-
meaning_of_life: 42
6+
input_number: 42

0 commit comments

Comments
 (0)