1
1
# Run template
2
2
3
- [ ` main.py ` ] ( main.py ) - Script to run an [ Apache Beam ] template on [ Google Cloud Dataflow ] .
3
+ [ ![ Open in Cloud Shell ] ( http://gstatic.com/cloudssh/images/open-btn.svg )] ( https://console.cloud.google.com/cloudshell/editor )
4
4
5
- The following examples show how to run the [ ` Word_Count ` template] , but you can run any other template.
5
+ This sample demonstrate how to run an
6
+ [ Apache Beam] ( https://beam.apache.org/ )
7
+ template on [ Google Cloud Dataflow] ( https://cloud.google.com/dataflow/docs/ ) .
8
+ For more information, see the
9
+ [ Running templates] ( https://cloud.google.com/dataflow/docs/guides/templates/running-templates )
10
+ docs page.
6
11
7
- For the ` Word_Count ` template, we require to pass an ` output ` Cloud Storage path prefix, and optionally we can pass an ` inputFile ` Cloud Storage file pattern for the inputs.
12
+ The following examples show how to run the
13
+ [ ` Word_Count ` template] ( https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/WordCount.java ) ,
14
+ but you can run any other template.
15
+
16
+ For the ` Word_Count ` template, we require to pass an ` output ` Cloud Storage path prefix,
17
+ and optionally we can pass an ` inputFile ` Cloud Storage file pattern for the inputs.
8
18
If ` inputFile ` is not passed, it will take ` gs://apache-beam-samples/shakespeare/kinglear.txt ` as default.
9
19
10
20
## Before you begin
11
21
12
- 1 . Install the [ Cloud SDK] .
13
-
14
- 1 . [ Create a new project] .
15
-
16
- 1 . [ Enable billing] .
17
-
18
- 1 . [ Enable the APIs] ( https://console.cloud.google.com/flows/enableapi?apiid=dataflow,compute_component,logging,storage_component,storage_api,bigquery,pubsub,datastore.googleapis.com,cloudfunctions.googleapis.com,cloudresourcemanager.googleapis.com ) : Dataflow, Compute Engine, Stackdriver Logging, Cloud Storage, Cloud Storage JSON, BigQuery, Pub/Sub, Datastore, Cloud Functions, and Cloud Resource Manager.
19
-
20
- 1 . Setup the Cloud SDK to your GCP project.
21
-
22
- ``` bash
23
- gcloud init
24
- ```
22
+ Follow the
23
+ [ Getting started with Google Cloud Dataflow] ( ../README.md )
24
+ page, and make sure you have a Google Cloud project with billing enabled
25
+ and a * service account JSON key* set up in your ` GOOGLE_APPLICATION_CREDENTIALS ` environment variable.
26
+ Additionally, for this sample you need the following:
25
27
26
28
1 . Create a Cloud Storage bucket.
27
29
28
- ``` bash
29
- gsutil mb gs://your-gcs-bucket
30
+ ``` sh
31
+ export BUCKET=your-gcs-bucket
32
+ gsutil mb gs://$BUCKET
30
33
```
31
34
32
- ## Setup
33
-
34
- The following instructions will help you prepare your development environment.
35
-
36
- 1 . [ Install Python and virtualenv] .
37
-
38
35
1 . Clone the ` python-docs-samples ` repository.
39
36
40
- ``` bash
41
- git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
42
- ```
37
+ ``` sh
38
+ git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
39
+ ```
43
40
44
41
1 . Navigate to the sample code directory.
45
42
46
- ` ` ` bash
43
+ ``` sh
47
44
cd python-docs-samples/dataflow/run_template
48
45
```
49
46
50
47
1 . Create a virtual environment and activate it.
51
48
52
- ` ` ` bash
49
+ ``` sh
53
50
virtualenv env
54
51
source env/bin/activate
55
52
```
@@ -58,18 +55,18 @@ The following instructions will help you prepare your development environment.
58
55
59
56
1 . Install the sample requirements.
60
57
61
- ` ` ` bash
58
+ ``` sh
62
59
pip install -U -r requirements.txt
63
60
```
64
61
65
62
## Running locally
66
63
67
- To run a Dataflow template from the command line.
64
+ * [ ` main.py ` ] ( main.py )
65
+ * [ REST API dataflow/projects.templates.launch] ( https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.templates/launch )
68
66
69
- > NOTE: To run locally, you' ll need to [create a service account key] as a JSON file.
70
- > Then export an environment variable called `GOOGLE_APPLICATION_CREDENTIALS` pointing it to your service account file.
67
+ To run a Dataflow template from the command line.
71
68
72
- ```bash
69
+ ``` sh
73
70
python main.py \
74
71
--project < your-gcp-project> \
75
72
--job wordcount-$( date +' %Y%m%d-%H%M%S' ) \
@@ -80,10 +77,10 @@ python main.py \
80
77
81
78
## Running in Python
82
79
83
- To run a Dataflow template from Python.
80
+ * [ ` main.py ` ] ( main.py )
81
+ * [ REST API dataflow/projects.templates.launch] ( https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.templates/launch )
84
82
85
- > NOTE: To run locally, you' ll need to [create a service account key] as a JSON file.
86
- > Then export an environment variable called ` GOOGLE_APPLICATION_CREDENTIALS` pointing it to your service account file.
83
+ To run a Dataflow template from Python.
87
84
88
85
``` py
89
86
import main as run_template
@@ -101,9 +98,12 @@ run_template.run(
101
98
102
99
## Running in Cloud Functions
103
100
101
+ * [ ` main.py ` ] ( main.py )
102
+ * [ REST API dataflow/projects.templates.launch] ( https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.templates/launch )
103
+
104
104
To deploy this into a Cloud Function and run a Dataflow template via an HTTP request as a REST API.
105
105
106
- ` ` ` bash
106
+ ``` sh
107
107
PROJECT=$( gcloud config get-value project) \
108
108
REGION=$( gcloud config get-value functions/region)
109
109
@@ -121,17 +121,3 @@ curl -X POST "https://$REGION-$PROJECT.cloudfunctions.net/run_template" \
121
121
-d inputFile=gs://apache-beam-samples/shakespeare/kinglear.txt \
122
122
-d output=gs://< your-gcs-bucket> /wordcount/outputs
123
123
```
124
-
125
- [Apache Beam]: https://beam.apache.org/
126
- [Google Cloud Dataflow]: https://cloud.google.com/dataflow/docs/
127
- [` Word_Count` template]: https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/WordCount.java
128
-
129
- [Cloud SDK]: https://cloud.google.com/sdk/docs/
130
- [Create a new project]: https://console.cloud.google.com/projectcreate
131
- [Enable billing]: https://cloud.google.com/billing/docs/how-to/modify-project
132
- [Create a service account key]: https://console.cloud.google.com/apis/credentials/serviceaccountkey
133
- [Creating and managing service accounts]: https://cloud.google.com/iam/docs/creating-managing-service-accounts
134
- [GCP Console IAM page]: https://console.cloud.google.com/iam-admin/iam
135
- [Granting roles to service accounts]: https://cloud.google.com/iam/docs/granting-roles-to-service-accounts
136
-
137
- [Install Python and virtualenv]: https://cloud.google.com/python/setup
0 commit comments