flink-extended
diff --git a/‎README.md‎
Lines changed: 16 additions & 178 deletions b/‎README.md‎
Lines changed: 16 additions & 178 deletions
diff --git a/‎doc/quick_start.md‎
Lines changed: 145 additions & 0 deletions b/‎doc/quick_start.md‎
Lines changed: 145 additions & 0 deletions
@@ -1,54 +1,26 @@
-[![Build Status](https://travis-ci.org/alibaba/flink-ai-extended.svg?branch=master)](https://travis-ci.org/alibaba/flink-ai-extended)
+# Deep Learning on Flink
 
-# deep-learning-on-flink
+Deep Learning on Flink aims to integrate Flink and deep learning frameworks
+(e.g. TensorFlow, PyTorch, etc) to enable distributed deep learning training and
+inference on a Flink cluster.
 
-Deep Learning on Flink aims to integrate Flink and deep learning frameworks (e.g. TensorFlow, PyTorch, etc). 
-It runs the deep learning tasks inside a Flink operator, so that Flink can help establish a distributed environment, 
-manage the resource, read/write the records and handle the failures.
+It runs the deep learning tasks inside a Flink operator so that Flink can help
+establish a distributed environment, manage the resource, read/write the data
+with the rich connectors in Flink and handle the failures.
 
 Currently, Deep Learning on Flink supports TensorFlow and PyTorch.
 
-**contents**
-
-- [TensorFlow support](#tensorflow-support)
-  * [Support Version](#support-version)
-  * [Quick Start](#quick-start)
-    + [Setup](#setup)
-    + [Build From Source](#build-from-source)
-    + [Build Source in virtual environment](#build-source-in-virtual-environment)
-    + [Example](#example)
-  * [Distributed Running](#distributed-running)
-    + [Deployment](#deployment)
-    + [Running Distributed Programs](#running-distributed-programs)
-  * [Distributed Running Example](#distributed-running-example)
-    + [Setup & Build](#setup---build)
-    + [Start Service](#start-service)
-    + [Prepare data & code](#prepare-data---code)
-    + [Submit train job](#submit-train-job)
-    + [Visit Flink Cluster](#visit-flink-cluster)
-    + [Stop all docker containers](#stop-all-docker-containers)
-    + [Summary](#summary)
-  * [Optional Tools](#optional-tools)
-    + [Build framework and tensorflow python package Independently](#build-framework-and-tensorflow-python-package-independently)
-    + [Build custom virtual environment package](#build-custom-virtual-environment-package)
-- [Structure](#structure)
-- [For More Information](#for-more-information)
-- [License](#license)
-
-# TensorFlow support
-TensorFlow is a deep learning system developed by Google and open source, which is widely used in the field of deep learning. There are many inconveniences in distributed use and resource management of native TensorFlow, but it can not integrate with the existing widely used large data processing framework.
-
-Flink is a data processing framework. It is widely used in data extraction, feature preprocessing and data cleaning.
-
-This project combines TensorFlow with Flink and provides users with more convenient and useful tools.
-**Currently, Flink job code can be written in both java with Flink Java API and in python with PyFlink. The algorithm code is written in python.**
-
 ## Support Version
-TensorFlow: 1.15.0 & 2.3.1
-
-Flink: 1.11.x
+TensorFlow: 1.15.x & 2.3.x
+Pytorch: 1.x
+Flink: 1.14.x
 
-## Quick Start
+## Getting Started
+
+To get you hand dirty, You can follow [quick start](doc/quick_start.md) 
+to submit an example job to a local standalone Flink cluster.
+
+## Build
 
 ### Setup
 
@@ -165,140 +137,6 @@ mvn clean install
 ```shell 
 deactivate
 ```
-                  
-### Example
-
-1. tensorflow add example
-    **<p>python code:</p>**
-
-```python
-import tensorflow as tf
-import time
-import sys
-from flink_ml_tensorflow.tensorflow_context import TFContext
-
-def build_graph():
-    global a
-    i = 1
-    a = tf.placeholder(tf.float32, shape=None, name="a")
-    b = tf.reduce_mean(a, name="b")
-    r_list = []
-    v = tf.Variable(dtype=tf.float32, initial_value=tf.constant(1.0), name="v_" + str(i))
-    c = tf.add(b, v, name="c_" + str(i))
-    add = tf.assign(v, c, name="assign_" + str(i))
-    sum = tf.summary.scalar(name="sum_" + str(i), tensor=c)
-    r_list.append(add)
-    global_step = tf.contrib.framework.get_or_create_global_step()
-    global_step_inc = tf.assign_add(global_step, 1)
-    r_list.append(global_step_inc)
-    return r_list
-    
-def map_func(context):
-    tf_context = TFContext(context)
-    job_name = tf_context.get_role_name()
-    index = tf_context.get_index()
-    cluster_json = tf_context.get_tf_cluster()
-    
-    cluster = tf.train.ClusterSpec(cluster=cluster_json)
-    server = tf.train.Server(cluster, job_name=job_name, task_index=index)
-    sess_config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=False,
-                                 device_filters=["/job:ps", "/job:worker/task:%d" % index])
-    t = time.time()
-    if 'ps' == job_name:
-        from time import sleep
-        while True:
-            sleep(1)
-    else:
-        with tf.device(tf.train.replica_device_setter(worker_device='/job:worker/task:' + str(index), cluster=cluster)):
-            train_ops = build_graph()
-            hooks = [tf.train.StopAtStepHook(last_step=2)]
-            with tf.train.MonitoredTrainingSession(master=server.target, config=sess_config,
-                                                    checkpoint_dir="./target/tmp/s1/" + str(t),
-                                                    hooks=hooks) as mon_sess:
-                while not mon_sess.should_stop():
-                    print (mon_sess.run(train_ops, feed_dict={a: [1.0, 2.0, 3.0]}))
-                    sys.stdout.flush()
-
-``` 
-   **<p>java code:</p>**
-   add maven dependencies
-```xml
-<?xml version="1.0" encoding="UTF-8"?>
-<project xmlns="http://maven.apache.org/POM/4.0.0"
-         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
-         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
-    <modelVersion>4.0.0</modelVersion>
-
-    <groupId>org.flinkextended</groupId>
-    <artifactId>flink-ai-extended-examples</artifactId>
-    <version>0.3.0</version>
-    <packaging>jar</packaging>
-    <dependencies>
-        <dependency>
-            <groupId>org.flinkextended</groupId>
-            <artifactId>flink-ml-tensorflow</artifactId>
-            <version>0.3.0</version>
-        </dependency>
-        <dependency>
-            <groupId>org.apache.curator</groupId>
-            <artifactId>curator-framework</artifactId>
-            <version>2.7.1</version>
-        </dependency>
-        <dependency>
-            <groupId>org.apache.curator</groupId>
-            <artifactId>curator-test</artifactId>
-            <version>2.7.1</version>
-            <exclusions>
-                <exclusion>
-                    <groupId>com.google.guava</groupId>
-                    <artifactId>guava</artifactId>
-                </exclusion>
-            </exclusions>
-        </dependency>
-        <dependency>
-            <groupId>com.google.guava</groupId>
-            <artifactId>guava</artifactId>
-            <version>20.0</version>
-        </dependency>
-    </dependencies>
-
-    <build>
-        <plugins>
-            <plugin>
-                <groupId>org.apache.maven.plugins</groupId>
-                <artifactId>maven-compiler-plugin</artifactId>
-                <version>3.1</version>
-                <configuration>
-                    <source>1.8</source>
-                    <target>1.8</target>
-                </configuration>
-            </plugin>
-        </plugins>
-    </build>
-</project>
-```
-*You can refer to the following POM*
-
-[example pom.xml](flink-ml-examples/pom.xml)
-
-```java
-class Add{
-    public static void main(String args[]) throws Exception{ 
-    	// local zookeeper server.
-        TestingServer server = new TestingServer(2181, true);
-        String script = "./add.py";
-        StreamExecutionEnvironment streamEnv = StreamExecutionEnvironment.getExecutionEnvironment();
-        // if zookeeper has other address
-        Map<String, String> prop = new HashMap<>();
-        prop.put(MLConstants.CONFIG_STORAGE_TYPE, MLConstants.STORAGE_ZOOKEEPER);
-        prop.put(MLConstants.CONFIG_ZOOKEEPER_CONNECT_STR, "localhost:2181");
-        TFConfig config = new TFConfig(2, 1, prop, script, "map_func", null);
-        TFUtils.train(streamEnv, null, config);
-        JobExecutionResult result = streamEnv.execute();
-        server.stop();
-    } 
-}
-```
 
 ## Distributed Running
 ### Deployment
 
@@ -0,0 +1,145 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Quick Start
+
+This tutorial provides a quick introduction to using Deep Learning on Flink. 
+This guide will show you how to download the latest stable version of Deep
+Learning on Flink, install. You will run a simple Flink job locally to train
+a linear model.
+
+## Environment Requirement
+
+- Java: 8
+- Python: 3.7 
+- Flink: 1.14
+- TensorFlow: 1.15.x or 2.3.x
+
+## Download & Install
+
+### Download Flink
+[Download the latest binary release](https://flink.apache.org/downloads.html) 
+of Flink 1.14, then extract the archive:
+
+```sh
+tar -xzf flink-*.tgz
+```
+
+Please refer to [guide](https://nightlies.apache.org/flink/flink-docs-release-1.14//docs/try-flink/local_installation/) 
+for more detailed step of downloading or installing Flink.
+
+### Download Deep Learning on Flink
+You can download the binary release of Deep Learning on Flink from
+[release](https://github.com/flink-extended/dl-on-flink/releases), then extract
+the archive:
+
+```sh
+tar -xzf flink-ml-*.tgz
+```
+
+Navigate to the extracted directory, you should see the following directory 
+layout:
+
+| Directory | Meaning |
+|---|---|
+|`lib/` | Directory containing the Deep Learning on Flink JARs compiled |
+|`examples/` | Directory containing examples. |
+
+### Install Python dependencies
+In order to run Deep Learning on Flink job, we need install the python
+dependency.
+
+Python dependency should be installed with pip. We strongly recommend using
+[virtualenv](https://virtualenv.pypa.io/en/latest/index.html) or other similar
+tools for an isolated Python environment.
+
+Use the following command to install `flink-ml-framwork`
+```bash
+pip install flink-ml-framework
+```
+
+Install `flink-ml-tensorflow` if you use Tensorflow 1.15.x
+```bash
+pip install flink-ml-tensorflow
+```
+
+Install `flink-ml-tensorflow-2.x` if you use Tensorflow 2.3.x
+```bash
+pip install flink-ml-tensorflow-2.x
+```
+
+## Starting Local Standalone Cluster
+
+In this example, we use two workers to train the model. Thus, there has to be
+at least 2 slots available in the Flink cluster. To do that, you can simply
+config the `taskmanager.numberOfTaskSlots` at `config/flink-config.yaml` to 2.
+You can use the following command to do that.
+
+```sh
+# We assume to be in the root directory of the Flink extracted distribution
+
+sed -i '' 's/taskmanager.numberOfTaskSlots: 1/taskmanager.numberOfTaskSlots: 2/' ./conf/flink-conf.yaml
+```
+
+Usually, starting a local Flink cluster by running the following command is 
+enough for this quick start guide.
+
+**Note: If you are using virtualenv, you should start your local Flink cluster
+with virtualenv activated.**
+
+```sh
+# We assume to be in the root directory of the Flink extracted distribution
+
+./bin/start-cluster.sh
+```
+
+You should be able to navigate to the web UI at 
+`http://<job manager ip address>:8081` to view the Flink dashboard and see that 
+the cluster is up and running.
+
+## Submit a Flink Job
+
+The examples are included in the binary release.  You can download the binary 
+release from [release](https://github.com/flink-extended/dl-on-flink/releases).
+
+You can run the following command to submit the job.
+
+**Note: If you are using virtualenv, you should submit the job
+with virtualenv activated.**
+
+```sh
+export DL_ON_FLINK_DIR=<root dir of Deep Learning on Flink extracted distribution>
+
+# We assume to be in the root directory of the Flink extracted distribution.
+
+# For tensorflow 1.15.x
+./bin/flink run \
+  -py ${DL_ON_FLINK_DIR}/examples/tensorflow-on-flink/linear/flink_job.py \
+  --jarfile ${DL_ON_FLINK_DIR}/lib/flink-ml-tensorflow-0.4-SNAPSHOT-jar-with-dependencies.jar
+  
+# For tensorflow 2.3.x
+./bin/flink run \
+  -py ${DL_ON_FLINK_DIR}/examples/tensorflow-on-flink/linear/flink_job.py \
+  --jarfile ${DL_ON_FLINK_DIR}/lib/flink-ml-tensorflow-2.x-0.4-SNAPSHOT-jar-with-dependencies.jar
+```
+
+After the job is submitted successfully, you should see the job at running state
+in the Flink web ui.
+
+If the job is finished, you will see the model saved at `/tmp/linear`.