Skip to content

Commit 75d8471

Browse files
fix typos
1 parent 561486d commit 75d8471

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

CodeFlareSDK_Design_Doc.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
The primary purpose for the CodeFlare SDK is to provide a pythonic interaction layer between a user and the CodeFlare Stack (a set of services that enable advanced queuing, resource management and distributed compute on a kubernetes cluster).
66

7-
The reason that this SDK is needed is due to the fact that many of the benefits associated with the CodeFlare stack are aimed at making the work of data scientists simpler and more efficient. However, since all parts of the CodeFlare stack are separate Kubernetes services, there needs to be something that unifies the interactions between the user and these separate services. Furthermore, we do not expect the average user to be experienced working with kubernetes infrastructure, and want to provide them with a python native way of interacting with these services.
7+
The reason that this SDK is needed is due to the fact that many of the benefits associated with the CodeFlare stack are aimed at making the work of data scientists simpler and more efficient. However, since all parts of the CodeFlare stack are separate Kubernetes services, there needs to be something that unifies the interactions between the user and these separate services. Furthermore, we do not expect the average user to be experienced working with kubernetes infrastructure, and want to provide them with a python native way of interacting with these services.
88

99
The SDK should support any operation that a user would need to do in order to successfully submit machine learning training jobs to their kubernetes cluster.
1010

@@ -24,11 +24,11 @@ The CodeFlare SDK is a python package that allows a user to programmatically def
2424

2525
In order to achieve this we need the capacity to:
2626

27-
* Generate valid appwrapper yaml files based on user provided parameters
28-
* Get, list, watch, create, update, patch, and delete appwrapper custom resources on a kubernetes cluster
27+
* Generate valid AppWrapper yaml files based on user provided parameters
28+
* Get, list, watch, create, update, patch, and delete AppWrapper custom resources on a kubernetes cluster
2929
* Get, list, watch, create, update, patch, and delete RayCluster custom resources on a kubernetes cluster.
3030
* Expose a secure route to the Ray Dashboard endpoint.
31-
* Define, submit, monitor and cancel Jobs submitted via Torchx. TorchX jobs must support both Ray and MCAD-Kubernetes scheduler backends.
31+
* Define, submit, monitor and cancel Jobs submitted via TorchX. TorchX jobs must support both Ray and MCAD-Kubernetes scheduler backends.
3232
* Provide means of authenticating to a Kubernetes cluster
3333

3434
![](/docs/images/sdk-diagram.png)
@@ -51,7 +51,7 @@ Finally we will use the Kubernetes python client to delete the AppWrapper via `C
5151

5252
### Training Jobs:
5353

54-
For the submission of Jobs we will rely on the [TorchX](https://pytorch.org/torchx/latest/) job launcher to handle orchestrating the distribution of our model training jobs across the available resources on our cluster. We will support two distributed backend schedulers: Ray and Kuberentes-MCAD. Torchx is designed to be used primarily as a CLI, so we will wrap a limited subset of its functionality into our SDK so that it can be used as part of a python script.
54+
For the submission of Jobs we will rely on the [TorchX](https://pytorch.org/torchx/latest/) job launcher to handle orchestrating the distribution of our model training jobs across the available resources on our cluster. We will support two distributed backend schedulers: Ray and Kuberentes-MCAD. TorchX is designed to be used primarily as a CLI, so we will wrap a limited subset of its functionality into our SDK so that it can be used as part of a python script.
5555

5656
Users can define their jobs with `DDPJobDefinition()` providing parameters for the script they want to run as part of the job, the resources required for the job, additional args specific to the script being run and scheduler being used.
5757

@@ -79,7 +79,7 @@ In either case, users can log out and clear the authentication inputs with `.log
7979
* Has no notion of MCAD. Does not support any other backends besides Ray
8080
* Existing CodeFlare CLI
8181
* Is not pythonic.
82-
* Nothing (let users define their own appwrappers manually)
82+
* Nothing (let users define their own AppWrappers manually)
8383
* Antithetical to the purpose of the SDK.
8484

8585
## Security Considerations

0 commit comments

Comments
 (0)