Skip to content

Commit baf4cab

Browse files
Updates to CLI and deployment docs. Closes #1517 (#1613)
* Updates to CLI and deployment docs. * WIP- Update custom operations and integrations sections * Edits to compute/storage adapter docs and operations * Add quick start to getting started docs * Fix typo * fix ordered list order * Fixed broken links * TrainVisualize -> TrainValidate * Fix coloring for operations * Fixed weird phrasing and typo? * Fixed typos * WIP- Add operation feedback plots * Misc edits * Fix broken link * Move quick start to be before overview Co-authored-by: Umesh Timalsina <[email protected]>
1 parent 3339cdc commit baf4cab

20 files changed

+232
-246
lines changed

docs/deployment/dockerized.rst

Lines changed: 0 additions & 45 deletions
This file was deleted.

docs/deployment/native.rst

Lines changed: 1 addition & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -55,17 +55,7 @@ By default, DeepForge will start on `http://localhost:8888`. However, the port c
5555
5656
Worker
5757
~~~~~~
58-
The DeepForge worker can be started with
59-
60-
.. code-block:: bash
61-
62-
deepforge start --worker
63-
64-
To connect to a remote deepforge instance, add the url of the DeepForge server:
65-
66-
.. code-block:: bash
67-
68-
deepforge start --worker http://myaddress.com:1234
58+
The DeepForge worker (used with WebGME compute) can be used to enable users to connect their own machines to use for any required computation. This can be installed from `https://github.com/deepforge-dev/worker`. It is recommended to install `Conda <https://conda.io/en/latest/>`_ on the worker machine so any dependencies can be automatically installed.
6959

7060
Updating
7161
~~~~~~~~
@@ -109,22 +99,6 @@ and navigate to `http://localhost:8888` to start using DeepForge!
10999

110100
Alternatively, if jobs are going to be executed on an external worker, run `./bin/deepforge start -s` locally and navigate to `http://localhost:8888`.
111101

112-
DeepForge Worker
113-
~~~~~~~~~~~~~~~~
114-
If you are using `./bin/deepforge start -s` you will need to set up a DeepForge worker (`./bin/deepforge start` starts a local worker for you!). DeepForge workers are slave machines connected to DeepForge which execute the provided jobs. This allows the jobs to access the GPU, etc, and provides a number of benefits over trying to perform deep learning tasks in the browser.
115-
116-
Once DeepForge is installed on the worker, start it with
117-
118-
.. code-block:: bash
119-
120-
./bin/deepforge start -w
121-
122-
Note: If you are running the worker on a different machine, put the address of the DeepForge server as an argument to the command. For example:
123-
124-
.. code-block:: bash
125-
126-
./bin/deepforge start -w http://myaddress.com:1234
127-
128102
Updating
129103
~~~~~~~~
130104
Updating can be done the same as any other git project; that is, by running `git pull` from the project root. Sometimes, the dependencies need to be updated so it is recommended to run `npm install` following `git pull`.

docs/deployment/overview.rst

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,20 +5,17 @@ DeepForge Component Overview
55
----------------------------
66
DeepForge is composed of four main elements:
77

8-
- *Server*: Main component hosting all the project information and is connected to by the clients.
9-
- *Database*: MongoDB database containing DeepForge, job queue for the workers, etc.
10-
- *Worker*: Slave machine performing the actual machine learning computation.
118
- *Client*: The connected browsers working on DeepForge projects.
12-
13-
Of course, only the *Server*, *Database* (MongoDB) and *Worker* need to be installed. If you are not going to execute any machine learning pipelines, installing the *Worker* can be skipped.
9+
- *Server*: Main component hosting all the project information and is connected to by the clients.
10+
- *Compute*: Connected computational resources used for executing pipelines.
11+
- *Storage*: Connected storage resources used for storing project data artifacts such as datasets or trained model weights.
1412

1513
Component Dependencies
1614
----------------------
1715
The following dependencies are required for each component:
1816

19-
- *Server* (NodeJS v8.11.3)
17+
- *Server* (NodeJS LTS)
2018
- *Database* (MongoDB v3.0.7)
21-
- *Worker*: NodeJS v8.11.3 (used for job management logic) and Python 3. If you are using the deepforge-keras extension, you will also need Keras and `TensorFlow <https://tensorflow.org>`_ installed.
2219
- *Client*: We recommend using Google Chrome and are not supporting other browsers (for now). In other words, other browsers can be used at your own risk.
2320

2421
Configuration

docs/deployment/quick_start.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
Quick Start
2+
===========
3+
The recommended (and easiest) way to get started with DeepForge is using docker-compose. First, install `docker <https://docs.docker.com/engine/installation/>`_ and `docker-compose <https://docs.docker.com/compose/install/>`_.
4+
5+
Next, download the docker-compose file for DeepForge:
6+
7+
.. code-block:: bash
8+
9+
wget https://raw.githubusercontent.com/deepforge-dev/deepforge/master/docker/docker-compose.yml
10+
11+
Then start DeepForge using docker-compose:
12+
13+
.. code-block:: bash
14+
15+
docker-compose up
16+
17+
and now DeepForge can be used by opening a browser to `http://localhost:8888 <http://localhost:8888>`_!
18+
19+
For detailed instructions about deployment installations, check out our `deployment installation instructions <../getting_started/configuration.rst>`_ An example of customizing a deployment using docker-compose can be found `here <https://github.com/deepforge-dev/deepforge/tree/master/.deployment>`_.

docs/fundamentals/custom_operations.rst

Lines changed: 96 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -9,68 +9,127 @@ Operations are used in pipelines and have named inputs and outputs. When creatin
99

1010
.. figure:: operation_editor.png
1111
:align: center
12-
:scale: 45 %
1312

14-
Editing the "Train" operation from the "CIFAR10" example
13+
Editing the "TrainValidate" operation from the "redshift" example
1514

16-
The interface editor is provided on the left and presents the interface as a diagram showing the input data and output data as objects flowing into or out of the given operation. Selecting the operation node in the operation interface editor will expand the node and allow the user to add or edit attributes for the given operation. These attributes are exposed when using this operation in a pipeline and can be set at design time - that is, these are set when creating the given pipeline. The interface diagram may also contain light blue nodes flowing into the operation. These nodes represent "references" that the operation accepts as input before running. When using the operation, references will appear alongside the attributes but will allow the user to select from a list of all possible targets when clicked.
15+
The interface editor is provided on the right and presents the interface as a diagram showing the input data and output data as objects flowing into or out of the given operation. Selecting the operation node in the operation interface editor will expand the node and allow the user to add or edit attributes for the given operation. These attributes are exposed when using this operation in a pipeline and can be set at design time - that is, these are set when creating the given pipeline. The interface diagram may also contain light blue nodes flowing into the operation. These nodes represent "references" that the operation accepts as input before running. When using the operation, references will appear alongside the attributes but will allow the user to select from a list of all possible targets when clicked.
1716

1817
.. figure:: operation_interface.png
1918
:align: center
2019
:scale: 85 %
2120

22-
The train operation accepts training data, a model and attributes for shuffling data, setting the batch size, and the number of epochs.
21+
The TrainValidate operation accepts training data, a model and attributes for setting the batch size, and the number of epochs.
2322

24-
On the right of the operation editor is the implementation editor. The implementation editor is a code editor specially tailored for programming the implementations of operations in DeepForge. It also is synchronized with the interface editor. A section of the implementation is shown below:
23+
The operation editor also provides an interface to specify operation python dependencies. DeepForge uses
24+
:code:`conda` to manage python dependencies for an operation. This pairs well with the integration of various compute platforms that available to the user and the only requirement for a user is to have Conda installed in their computing platform. You can specify operation dependencies using a conda environment `file <https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#create-env-file-manually>`_ as shown in the diagram below:
25+
26+
27+
.. figure:: operation_environment.png
28+
:align: center
29+
30+
The operation environment contains python dependencies for the given operation.
31+
32+
To the left of the operation editor is the implementation editor. The implementation editor is a code editor specially tailored for programming the implementations of operations in DeepForge. It also is synchronized with the interface editor. A section of the implementation is shown below:
2533

2634
.. code:: python
35+
36+
import numpy as np
37+
from sklearn.model_selection import train_test_split
2738
import keras
39+
import time
2840
from matplotlib import pyplot as plt
2941
30-
class Train():
31-
def __init__(self, model, shuffle=True, epochs=100, batch_size=32):
32-
self.model = model
33-
34-
self.epochs = epochs
35-
self.shuffle = shuffle
36-
self.batch_size = batch_size
37-
return
38-
42+
import tensorflow as tf
3943
40-
def execute(self, training_data):
41-
(x_train, y_train) = training_data
42-
opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)
43-
self.model.compile(loss='categorical_crossentropy',
44-
optimizer=opt,
45-
metrics=['accuracy'])
46-
plot_losses = PlotLosses()
47-
self.model.fit(x_train, y_train,
48-
self.batch_size,
49-
epochs=self.epochs,
50-
callbacks=[plot_losses],
51-
shuffle=self.shuffle)
52-
53-
model = self.model
54-
return model
44+
import tensorflow as tf
45+
config = tf.compat.v1.ConfigProto()
46+
config.gpu_options.allow_growth = True
47+
sess = tf.compat.v1.Session(config=config)
5548
56-
The "Train" operation uses capabilities from the :code:`keras` package to train the neural network. This operation sets all the parameters using values provided to the operation as either attributes or references. In the implementation, attributes are provided as arguments to the constructor making the user defined attributes accessible from within the implementation. References are treated similarly to operation inputs and are also arguments to the constructor. This can be seen with the :code:`model` constructor argument. Finally, operations return their outputs in the :code:`execute` method; in this example, it returns a single output named :code:`model`, that is, the trained neural network.
49+
class TrainValidate():
50+
def __init__(self, model, epochs=10, batch_size=32):
51+
self.model=model
52+
self.batch_size = batch_size
53+
self.epochs = epochs
54+
np.random.seed(32)
55+
return
5756
58-
After defining the interface and implementation, we can now use the "Train" operation in our pipelines! An example is shown below.
57+
def execute(self, dataset):
58+
model=self.model
59+
model.summary()
60+
model.compile(optimizer='adam',
61+
loss='sparse_categorical_crossentropy',
62+
metrics=['sparse_categorical_accuracy'])
63+
X = dataset['X']
64+
y = dataset['y']
65+
y_cats = self.to_categorical(y)
66+
model.fit(X, y_cats,
67+
epochs=self.epochs,
68+
batch_size=self.batch_size,
69+
validation_split=0.15,
70+
callbacks=[PlotLosses()])
71+
return model.get_weights()
72+
73+
def to_categorical(self, y, max_y=0.4, num_possible_classes=32):
74+
one_step = max_y / num_possible_classes
75+
y_cats = []
76+
for values in y:
77+
y_cats.append(int(values[0] / one_step))
78+
return y_cats
79+
80+
def datagen(self, X, y):
81+
# Generates a batch of data
82+
X1, y1 = list(), list()
83+
n = 0
84+
while 1:stash@{1}
85+
for sample, label in zip(X, y):
86+
n += 1
87+
X1.append(sample)
88+
y1.append(label)
89+
if n == self.batch_size:
90+
yield [[np.array(X1)], y1]
91+
n = 0
92+
X1, y1 = list(), list()
93+
94+
95+
class PlotLosses(keras.callbacks.Callback):
96+
def on_train_begin(self, logs={}):
97+
self.i = 0
98+
self.x = []
99+
self.losses = []
100+
101+
def on_epoch_end(self, epoch, logs={}):
102+
self.x.append(self.i)
103+
self.losses.append(logs.get('loss'))
104+
self.i += 1
105+
106+
self.update()
107+
108+
def update(self):
109+
plt.clf()
110+
plt.title("Training Loss")
111+
plt.ylabel("CrossEntropy Loss")
112+
plt.xlabel("Epochs")
113+
plt.plot(self.x, self.losses, label="loss")
114+
plt.legend()
115+
plt.show()
116+
117+
The "TrainValidate" operation uses capabilities from the :code:`keras` package to train the neural network. This operation sets all the parameters using values provided to the operation as either attributes or references. In the implementation, attributes are provided as arguments to the constructor making the user defined attributes accessible from within the implementation. References are treated similarly to operation inputs and are also arguments to the constructor. This can be seen with the :code:`model` constructor argument. Finally, operations return their outputs in the :code:`execute` method; in this example, it returns a single output named :code:`model`, that is, the trained neural network.
118+
119+
After defining the interface and implementation, we can now use the "TrainValidate" operation in our pipelines! An example is shown below.
59120

60121
.. figure:: train_operation.png
61122
:align: center
62123
:scale: 85 %
63124

64-
Using the "Train" operation in a pipeline
125+
Using the "TrainValidate" operation in a pipeline
65126

66-
Operation feedback
127+
Operation Feedback
67128
------------------
68129
Operations in DeepForge can generate metadata about its execution. This metadata is generated during the execution and provided back to the user in real-time. An example of this includes providing real-time plotting feedback. When implementing an operation in DeepForge, this metadata can be created using the :code:`matplotlib` plotting capabilities.
69130

70-
.. figure:: graph_example.png
131+
.. figure:: plotloss.png
71132
:align: center
72133
:scale: 75 %
73134

74-
An example graph of the loss function while training a neural network
75-
76-
Detailed information about the available operation metadata types can be found in the `reference <reference/feedback_mechanisms.rst>`_.
135+
An example graph of the loss function while training a neural network.

docs/fundamentals/integration.rst

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
Storage and Compute Adapters
2+
============================
3+
DeepForge is designed to integrate with existing computational and storage resources and is not intended to be a competitor to existing HPC or object storage frameworks.
4+
This integration is made possible through the use of compute and storage adapters. This section provides a brief description of these adapters as well as currently supported integrations.
5+
6+
Storage Adapters
7+
----------------
8+
Projects in DeepForge may contain artifacts which reference datasets, trained model weights, or other associated binary data. Although the project code, pipelines, and models are stored in MongoDB, this associated data is stored using a storage adapter. Storage adapters enable DeepForge to store this associated data using an appropriate storage resource, such as a object store w/ an S3-compatible API.
9+
This also enables users to "bring their own storage" as they can connect their existing cyberinfrastructure to a public deployment of DeepForge.
10+
Currently, DeepForge supports 3 different storage adapters:
11+
12+
1. S3 Storage: Object storage with an S3-compatible API such as `minio <https://play.min.io>`_ or `AWS S3 <https://aws.amazon.com/s3/>`_
13+
2. SciServer Files Service : Files service from `SciServer <https://sciserver.org>`_
14+
3. WebGME Blob Server : Blob storage provided by `WebGME <https://webgme.org/>`_
15+
16+
Compute Adapters
17+
----------------
18+
Similar to storage adapters, compute adapters enable DeepForge to integrate with existing cyberinfrastructure used for executing some computation or workflow. This is designed to allow users to leverage their existing HPC or other computational resources with DeepForge. Compute adapters provide an interface through which DeepForge is able to execute workflows (e.g., training a neural network) on external machines.
19+
20+
Currently, the following compute adapters are available:
21+
22+
1. WebGME Worker: A worker machine which polls for jobs via the `WebGME Executor Framework <https://github.com/webgme/webgme/wiki/GME-Executor-Framework>`_. Registered users can connect their own compute machines enabling them to use their personal desktops with DeepForge.
23+
2. SciServer-Compute: Compute service offered by `SciServer <https://sciserver.org>`_
24+
3. Server Compute: Execute the job on the server machine. This is similar to the execution model used by Jupyter notebook servers.
25+
59.3 KB
Loading
38.4 KB
Loading
6.16 KB
Loading

docs/fundamentals/plotloss.png

27.3 KB
Loading

0 commit comments

Comments
 (0)