|
| 1 | + |
| 2 | +*************************** |
| 3 | +Chapter 3 - Chaining Models |
| 4 | +*************************** |
| 5 | + |
| 6 | +This tutorial is based on example code which can be found in the |examples_repo|. |
| 7 | + |
| 8 | + |
| 9 | +Adding a second model |
| 10 | +--------------------- |
| 11 | + |
| 12 | +In :doc:`using_data` we wrote a simple model to perform PnL aggregation on |
| 13 | +some customer accounts. In this example we add a second model, to pre-filter the account |
| 14 | +data. We can chain these two models together to create a FLOW. TRAC will run the flow |
| 15 | +for us as a single job. |
| 16 | + |
| 17 | +First, here is a new model that we can use to build the chain: |
| 18 | + |
| 19 | +.. literalinclude:: ../../../examples/models/python/src/tutorial/chaining.py |
| 20 | + :caption: src/tutorial/chaining.py |
| 21 | + :name: chaining_py_part_1 |
| 22 | + :language: python |
| 23 | + :lines: 22 - 50 |
| 24 | + :linenos: |
| 25 | + :lineno-start: 22 |
| 26 | + |
| 27 | +The model takes a single parameter, ``filter_region``, and filers out any records in the |
| 28 | +dataset that match that region. The schema of the input and output datasets are the same. |
| 29 | + |
| 30 | +Notice that that input dataset key, ``customer_loans``, is the same key we used in the |
| 31 | +``PnLAggregation`` model. Since this input is expected to refer to the same dataset, it |
| 32 | +makes sense to give it the same key. The output key, ``filtered_loans``, is different so |
| 33 | +we will have to tell TRAC how to connect these models together. |
| 34 | + |
| 35 | + |
| 36 | +Defining a flow |
| 37 | +--------------- |
| 38 | + |
| 39 | +To run a flow locally, we need to define the flow in YAML. Here is an example of a flow YAML file |
| 40 | +that wires together the customer data filter with our PnL aggregation model: |
| 41 | + |
| 42 | +.. literalinclude:: ../../../examples/models/python/config/chaining_flow.yaml |
| 43 | + :caption: config/chaining_flow.yaml |
| 44 | + :name: chaining_flow_yaml |
| 45 | + :language: yaml |
| 46 | + |
| 47 | +The flow describes the chain of models as a graph, with **nodes** and **edges**. This example has |
| 48 | +one input, two models and one output, which are defined as the flow *nodes*. Additionally, |
| 49 | +the model nodes have to include the names of their inputs and outputs, so that TRAC can |
| 50 | +understand the shape of the graph. The model inputs and outputs are called **sockets**. |
| 51 | + |
| 52 | +TRAC wires up the *edges* of the graph based on name. If all the names are consistent and unique, |
| 53 | +you might not need to define any edges at all! In this case we only need to define a single edge, |
| 54 | +to connect the ``filtered_loans`` output of the filter model to the ``customer_loans`` input of |
| 55 | +the aggregation model. |
| 56 | + |
| 57 | +In this example the input and output nodes will be connected automatically, because their names |
| 58 | +mach the appropriate model inputs and outputs. If we wanted to define those extra two edges |
| 59 | +explicitly, it would look like this: |
| 60 | + |
| 61 | +.. code-block:: yaml |
| 62 | + :class: container |
| 63 | +
|
| 64 | + - source: { node: customer_loans } |
| 65 | + target: { node: customer_data_filter, socket: customer_loans } |
| 66 | +
|
| 67 | + - source: { node: pnl_aggregation, socket: profit_by_region } |
| 68 | + target: { node: profit_by_region } |
| 69 | +
|
| 70 | +Notice that the input and output nodes do not have *sockets*, this is because each input and |
| 71 | +output represents a single dataset, while models can have multiple inputs and outputs. |
| 72 | + |
| 73 | +.. note:: |
| 74 | + Using a consistent naming convention for the inputs and outputs of models in a single project |
| 75 | + can make it significantly easier to build and manage complex flows. |
| 76 | + |
| 77 | + |
| 78 | +Setting up a job |
| 79 | +---------------- |
| 80 | + |
| 81 | +Now we have a flow definition, in order to run it we will need a job config file. |
| 82 | +Here is an example job config for this flow, using the two models we have available: |
| 83 | + |
| 84 | +.. literalinclude:: ../../../examples/models/python/config/chaining.yaml |
| 85 | + :caption: config/chaining.yaml |
| 86 | + :name: chaining_yaml |
| 87 | + :language: yaml |
| 88 | + |
| 89 | +The job type is now ``runFlow`` instead of ``runModel``. We supply the path to the flow YAML |
| 90 | +file, which is resolved relative to the job config file. The parameters section has the |
| 91 | +parameters needed by all the models in the flow. For the inputs and outputs, the keys |
| 92 | +(``customer_loans`` and ``profit_by_region`` in this example) have to match the input and |
| 93 | +output nodes in the flow. |
| 94 | + |
| 95 | +In the models section, we specify which model to use for every model node in the flow. |
| 96 | +It is important to use the fully-qualified name for each model, which means the Python |
| 97 | +package structure should be set up correctly. See :doc:`hello_world` for a refresher on |
| 98 | +setting up the repository layout and package structure. |
| 99 | + |
| 100 | + |
| 101 | +Running a flow locally |
| 102 | +---------------------- |
| 103 | + |
| 104 | +A flow can be launched locally as a job in the same way as a model. |
| 105 | +You don't need to pass the model class (since we are not running a single model), |
| 106 | +so just the job config and sys config files are required: |
| 107 | + |
| 108 | +.. literalinclude:: ../../../examples/models/python/src/tutorial/chaining.py |
| 109 | + :caption: src/tutorial/chaining.py |
| 110 | + :name: chaining_py_part_2 |
| 111 | + :language: python |
| 112 | + :lines: 53 - 55 |
| 113 | + :linenos: |
| 114 | + :lineno-start: 53 |
| 115 | + |
| 116 | +This approach works well in some simple cases, such as this example, but for large codebases with |
| 117 | +lots of models and multiple flows it is usually easier to launch thw flow directly. You can launch |
| 118 | +a TRAC flow from the command line like this: |
| 119 | + |
| 120 | +.. code-block:: |
| 121 | + :class: container |
| 122 | +
|
| 123 | + python -m tracdap.rt.launch --job-config config/chaining.yaml --sys-config config/sys_config.yaml --dev-mode |
| 124 | +
|
| 125 | +You can set this command up to run from your IDE and then use the IDE tools to run the command |
| 126 | +in debug mode, which will let you debug into all the models in the chain. For example in PyCharm |
| 127 | +you can set this command up as a Run Configuration. |
| 128 | + |
| 129 | +.. note:: |
| 130 | + Launching TRAC from the command line does not enable dev mode by default, |
| 131 | + always use the ``--dev-mode`` flag for local development. |
0 commit comments