finos
diff --git a/‎doc/modelling/tutorial/chaining.rst‎
Lines changed: 131 additions & 0 deletions b/‎doc/modelling/tutorial/chaining.rst‎
Lines changed: 131 additions & 0 deletions
diff --git a/‎doc/modelling/tutorial/hello_world.rst‎
Lines changed: 5 additions & 3 deletions b/‎doc/modelling/tutorial/hello_world.rst‎
Lines changed: 5 additions & 3 deletions
diff --git a/‎doc/modelling/tutorial/index.rst‎
Lines changed: 1 addition & 0 deletions b/‎doc/modelling/tutorial/index.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/modelling/tutorial/inputs_and_outputs.rst‎
Lines changed: 1 addition & 1 deletion b/‎doc/modelling/tutorial/inputs_and_outputs.rst‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/modelling/tutorial/using_data.rst‎
Lines changed: 27 additions & 25 deletions b/‎doc/modelling/tutorial/using_data.rst‎
Lines changed: 27 additions & 25 deletions
diff --git a/‎examples/models/python/config/chaining.yaml‎
Lines changed: 6 additions & 6 deletions b/‎examples/models/python/config/chaining.yaml‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎examples/models/python/config/chaining_flow.yaml‎
Lines changed: 11 additions & 8 deletions b/‎examples/models/python/config/chaining_flow.yaml‎
Lines changed: 11 additions & 8 deletions
diff --git a/‎examples/models/python/config/chaining_2.yaml‎ renamed to ‎examples/models/python/config/dynamic_chaining.yaml‎
Lines changed: 1 addition & 1 deletion b/‎examples/models/python/config/chaining_2.yaml‎ renamed to ‎examples/models/python/config/dynamic_chaining.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/models/python/config/chaining_flow_2.yaml‎ renamed to ‎examples/models/python/config/dynamic_chaining_flow.yaml‎ b/‎examples/models/python/config/chaining_flow_2.yaml‎ renamed to ‎examples/models/python/config/dynamic_chaining_flow.yaml‎
diff --git a/‎examples/models/python/config/hello_world.yaml‎
Lines changed: 1 addition & 1 deletion b/‎examples/models/python/config/hello_world.yaml‎
Lines changed: 1 addition & 1 deletion
@@ -0,0 +1,131 @@
+
+***************************
+Chapter 3 - Chaining Models
+***************************
+
+This tutorial is based on example code which can be found in the |examples_repo|.
+
+
+Adding a second model
+---------------------
+
+In :doc:`using_data` we wrote a simple model to perform PnL aggregation on
+some customer accounts. In this example we add a second model, to pre-filter the account
+data. We can chain these two models together to create a FLOW. TRAC will run the flow
+for us as a single job.
+
+First, here is a new model that we can use to build the chain:
+
+.. literalinclude:: ../../../examples/models/python/src/tutorial/chaining.py
+    :caption: src/tutorial/chaining.py
+    :name: chaining_py_part_1
+    :language: python
+    :lines: 22 - 50
+    :linenos:
+    :lineno-start: 22
+
+The model takes a single parameter, ``filter_region``, and filers out any records in the
+dataset that match that region. The schema of the input and output datasets are the same.
+
+Notice that that input dataset key, ``customer_loans``, is the same key we used in the
+``PnLAggregation`` model. Since this input is expected to refer to the same dataset, it
+makes sense to give it the same key. The output key, ``filtered_loans``, is different so
+we will have to tell TRAC how to connect these models together.
+
+
+Defining a flow
+---------------
+
+To run a flow locally, we need to define the flow in YAML. Here is an example of a flow YAML file
+that wires together the customer data filter with our PnL aggregation model:
+
+.. literalinclude:: ../../../examples/models/python/config/chaining_flow.yaml
+    :caption: config/chaining_flow.yaml
+    :name: chaining_flow_yaml
+    :language: yaml
+
+The flow describes the chain of models as a graph, with **nodes** and **edges**. This example has
+one input, two models and one output, which are defined as the flow *nodes*. Additionally,
+the model nodes have to include the names of their inputs and outputs, so that TRAC can
+understand the shape of the graph. The model inputs and outputs are called **sockets**.
+
+TRAC wires up the *edges* of the graph based on name. If all the names are consistent and unique,
+you might not need to define any edges at all! In this case we only need to define a single edge,
+to connect the ``filtered_loans`` output of the filter model to the ``customer_loans`` input of
+the aggregation model.
+
+In this example the input and output nodes will be connected automatically, because their names
+mach the appropriate model inputs and outputs. If we wanted to define those extra two edges
+explicitly, it would look like this:
+
+.. code-block:: yaml
+    :class: container
+
+      - source: { node: customer_loans }
+        target: { node: customer_data_filter, socket: customer_loans }
+
+      - source: { node: pnl_aggregation, socket: profit_by_region }
+        target: { node: profit_by_region }
+
+Notice that the input and output nodes do not have *sockets*, this is because each input and
+output represents a single dataset, while models can have multiple inputs and outputs.
+
+.. note::
+    Using a consistent naming convention for the inputs and outputs of models in a single project
+    can make it significantly easier to build and manage complex flows.
+
+
+Setting up a job
+----------------
+
+Now we have a flow definition, in order to run it we will need a job config file.
+Here is an example job config for this flow, using the two models we have available:
+
+.. literalinclude:: ../../../examples/models/python/config/chaining.yaml
+    :caption: config/chaining.yaml
+    :name: chaining_yaml
+    :language: yaml
+
+The job type is now ``runFlow`` instead of ``runModel``. We supply the path to the flow YAML
+file, which is resolved relative to the job config file. The parameters section has the
+parameters needed by all the models in the flow. For the inputs and outputs, the keys
+(``customer_loans`` and ``profit_by_region`` in this example) have to match the input and
+output nodes in the flow.
+
+In the models section, we specify which model to use for every model node in the flow.
+It is important to use the fully-qualified name for each model, which means the Python
+package structure should be set up correctly. See :doc:`hello_world` for a refresher on
+setting up the repository layout and package structure.
+
+
+Running a flow locally
+----------------------
+
+A flow can be launched locally as a job in the same way as a model.
+You don't need to pass the model class (since we are not running a single model),
+so just the job config and sys config files are required:
+
+.. literalinclude:: ../../../examples/models/python/src/tutorial/chaining.py
+    :caption: src/tutorial/chaining.py
+    :name: chaining_py_part_2
+    :language: python
+    :lines: 53 - 55
+    :linenos:
+    :lineno-start: 53
+
+This approach works well in some simple cases, such as this example, but for large codebases with
+lots of models and multiple flows it is usually easier to launch thw flow directly. You can launch
+a TRAC flow from the command line like this:
+
+.. code-block::
+    :class: container
+
+    python -m tracdap.rt.launch --job-config config/chaining.yaml --sys-config config/sys_config.yaml --dev-mode
+
+You can set this command up to run from your IDE and then use the IDE tools to run the command
+in debug mode, which will let you debug into all the models in the chain. For example in PyCharm
+you can set this command up as a Run Configuration.
+
+.. note::
+    Launching TRAC from the command line does not enable dev mode by default,
+    always use the ``--dev-mode`` flag for local development.
@@ -275,9 +275,11 @@ this, but the model will fail to deploy)!
 
 Paths for the system and job config files are resolved in the following order:
 
-    1. If absolute paths are supplied, these take top priority
+    1. If an absolute path is supplied, this takes priority
     2. Resolve relative to the current working directory
-    3. Resolve relative to the directory containing the Python module of the model
+    3. Search relative to parents of the current directory
+    4. Resolve relative to the directory containing the model
+    5. Search relative to parents of the directory containing the model
 
 Now you should be able to run your model script and see the model output in the logs:
 
@@ -287,7 +289,7 @@ Now you should be able to run your model script and see the model output in the
 
     2022-05-31 12:19:36,104 [engine] INFO tracdap.rt.exec.engine.NodeProcessor - START RunModel [HelloWorldModel] / JOB-92df0bd5-50bd-4885-bc7a-3d4d95029360-v1
     2022-05-31 12:19:36,104 [engine] INFO __main__.HelloWorldModel - Hello world model is running
-    2022-05-31 12:19:36,104 [engine] INFO __main__.HelloWorldModel - The meaning of life is 42
+    2022-05-31 12:19:36,104 [engine] INFO __main__.HelloWorldModel - The input number is 42
     2022-05-31 12:19:36,104 [engine] INFO tracdap.rt.exec.engine.NodeProcessor - DONE RunModel [HelloWorldModel] / JOB-92df0bd5-50bd-4885-bc7a-3d4d95029360-v1
 
 
 
@@ -7,4 +7,5 @@ Modelling Tutorial
 
     ./hello_world
     ./using_data
+    ./chaining
     ./inputs_and_outputs
@@ -1,6 +1,6 @@
 
 ****************************
-Chapter 3 - Inputs & Outputs
+Chapter 4 - Inputs & Outputs
 ****************************
 
 This tutorial is based on example code which can be found in the |examples_repo|.
 
@@ -79,7 +79,7 @@ lenient type handling for input files.
     :name: using_data_py_part_3
     :language: python
     :class: container
-    :lines: 68 - 77
+    :lines: 68 - 78
     :linenos:
     :lineno-start: 68
 
@@ -94,9 +94,9 @@ Models are free to define multiple outputs if required, but this example only ha
     :name: using_data_py_part_4
     :language: python
     :class: container
-    :lines: 79 - 85
+    :lines: 80 - 87
     :linenos:
-    :lineno-start: 79
+    :lineno-start: 80
 
 Now the parameters, inputs and outputs of the model are defined, we can implement the
 :py:meth:`run_model() <tracdap.rt.api.TracModel.run_model>` method.
@@ -117,9 +117,9 @@ schema for this input.
     :name: using_data_py_part_5
     :language: python
     :class: container
-    :lines: 87 - 93
+    :lines: 89 - 95
     :linenos:
-    :lineno-start: 87
+    :lineno-start: 89
 
 Once all the inputs and parameters are available, we can call the model function. Since all the inputs
 and parameters are supplied using the correct native types there is no further conversion necessary,
@@ -129,9 +129,9 @@ they can be passed straight into the model code.
     :name: using_data_py_part_6
     :language: python
     :class: container
-    :lines: 95 - 97
+    :lines: 97 - 99
     :linenos:
-    :lineno-start: 95
+    :lineno-start: 97
 
 The model code has produced a Pandas dataframe that we want to record as an output. To do this, we can use
 :py:meth:`put_pandas_table() <tracdap.rt.api.TracContext.put_pandas_table>`. The dataframe should match
@@ -151,41 +151,42 @@ columns will be dropped.
     :name: using_data_py_part_7
     :language: python
     :class: container
-    :lines: 99
+    :lines: 101
     :linenos:
-    :lineno-start: 99
+    :lineno-start: 101
 
 The model can be launched locally using :py:func:`launch_model() <tracdap.rt.launch.launch_model()>`.
 
 .. literalinclude:: ../../../examples/models/python/src/tutorial/using_data.py
     :name: using_data_py_part_8
     :language: python
     :class: container
-    :lines: 102-104
+    :lines: 104-106
     :linenos:
-    :lineno-start: 102
+    :lineno-start: 104
 
 Configure local data
 --------------------
 
 To pass data into the local model, a little bit more config is needed in the *sys_config* file
-to define a storage bucket. In TRAC storage buckets can be any storage location that can hold
-files. This would be bucket storage on a cloud platform, but you can also use local disks or other
-storage protocols such as network storage or HDFS, so long as the right storage plugins are available.
+to define a storage location. For development this can be a local folder, although in production
+deployments storage locations can be cloud buckets or use other protocols such as network storage
+or HDFS, so long as the right storage plugins are available.
 
-This example sets up one storage bucket called *example_data*. Since we are going to use a local disk,
+This example sets up one storage location called *example_data*. Since we are going to use a local disk,
 the storage protocol is *LOCAL*. The *rootPath* property says where this storage bucket will be on disk -
 a relative path is taken relative to the *sys_config* file by default, or you can specify an absolute path
 here to avoid confusion.
 
-The default bucket is also where output data will be saved. In this example we have only one storage
-bucket configured, which is used for both inputs and outputs, so we mark that as the default.
+The example config also sets the default storage location and format, which controls where
+output data will be saved. In this example we have only one storage
+location configured, which is used for both inputs and outputs, so we mark that as the default.
 
 .. literalinclude:: ../../../examples/models/python/config/sys_config.yaml
     :caption: config/sys_config.yaml
     :name: sys_config.yaml
     :language: yaml
-    :lines: 2-12
+    :lines: 2-15
 
 In the *job_config* file we need to specify what data to use for the model inputs and outputs. Each
 input named in the model must have an entry in the inputs section, and each output in the outputs
@@ -277,22 +278,23 @@ Now we can re-write our model to use the new schema files. First we need to impo
     :linenos:
     :lineno-start: 20
 
-Then we can load schemas from the schemas package in the
+Then we can load schemas from the schemas package in the model's
 :py:meth:`define_inputs() <tracdap.rt.api.TracModel.define_inputs>` and
 :py:meth:`define_outputs() <tracdap.rt.api.TracModel.define_outputs>` methods:
 
 .. literalinclude:: ../../../examples/models/python/src/tutorial/schema_files.py
     :name: using_data_part_10
     :language: python
     :class: container
-    :lines: 47 - 57
+    :lines: 39 - 51
     :linenos:
-    :lineno-start: 47
+    :lineno-start: 39
 
-Notice that the :py:func:`load_schema() <tracdap.rt.api.load_schema>` method is the same
-for input and output schemas, so we need to use
-:py:class:`ModelInputSchema <tracdap.rt.metadata.ModelInputSchema>` and
-:py:class:`ModelOutputSchema <tracdap.rt.metadata.ModelOutputSchema>` explicitly.
+Notice that the :py:func:`load_schema() <tracdap.rt.api.load_schema>` method only creates
+the `SchemaDefinition <tracdap.rt.metadata.SchemaDefinition>`, to use this schema for
+model inputs and outputs we need to call
+:py:func:`define_input() <tracdap.rt.api.define_input>` and
+:py:func:`define_output() <tracdap.rt.api.define_output>` explicitly.
 
 .. seealso::
     Full source code is available for the
 
@@ -5,17 +5,17 @@ job:
     flow: ./chaining_flow.yaml
 
     parameters:
-      param_1: 42
-      param_2: "2015-01-01"
-      param_3: 1.5
+      eur_usd_rate: 1.2071
+      default_weighting: 1.5
+      filter_defaults: false
+      filter_region: munster
 
     inputs:
       customer_loans: "inputs/loan_final313_100.csv"
-      currency_data: "inputs/currency_data_sample.csv"
 
     outputs:
       profit_by_region: "outputs/chaining/profit_by_region.csv"
 
     models:
-      model_1: tutorial.model_1.FirstModel
-      model_2: tutorial.model_2.SecondModel
+      customer_data_filter: tutorial.chaining.CustomerDataFilter
+      pnl_aggregation: tutorial.using_data.PnlAggregation
@@ -4,18 +4,21 @@ nodes:
   customer_loans:
     nodeType: "INPUT_NODE"
 
-  currency_data:
-    nodeType: "INPUT_NODE"
-
-  model_1:
+  customer_data_filter:
     nodeType: "MODEL_NODE"
-    inputs: [customer_loans, currency_data]
-    outputs: [preprocessed_data]
+    inputs: [customer_loans]
+    outputs: [filtered_loans]
 
-  model_2:
+  pnl_aggregation:
     nodeType: "MODEL_NODE"
-    inputs: [preprocessed_data]
+    inputs: [customer_loans]
     outputs: [profit_by_region]
 
   profit_by_region:
     nodeType: "OUTPUT_NODE"
+
+
+edges:
+
+  - source: { node: customer_data_filter, socket: filtered_loans }
+    target: { node: pnl_aggregation, socket: customer_loans }
@@ -2,7 +2,7 @@
 job:
   runFlow:
 
-    flow: ./chaining_flow_2.yaml
+    flow: ./dynamic_chaining_flow.yaml
 
     models:
       dynamic_filter: tutorial.dynamic_io.DynamicDataFilter
 
@@ -3,4 +3,4 @@ job:
   runModel:
 
     parameters:
-      meaning_of_life: 42
+      input_number: 42