Skip to content

Add samples for the Cloud ML Engine #824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Feb 28, 2017
Merged

Add samples for the Cloud ML Engine #824

merged 16 commits into from
Feb 28, 2017

Conversation

elibixby
Copy link
Contributor

Add samples for triggering online prediction from code.

@nikhilk @brandondutra @JayLoomis since I can't make you reviewers on this repo.

Tests are to follow. This is to solicit initial feedback while I write tests. Note that I don't think we should highlight predict_from_files in the docs, as that's redundant and not as efficient as batch prediction. It's mainly there for testing and to make the file runnable (A repository policy).

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Feb 24, 2017
version=None,
force_tfrecord=False):
import json
import itertools

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the local imports? This is not a DF job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way the necessary inputs will show up in snippets in the docs. The [START foo] and [END foo] blocks indicate a displayable chunk for the docs

@elibixby
Copy link
Contributor Author

Note. On discussing with @jonparrott I'm going to remove predict_from_files and write a short webapp.


# Requests to online prediction
# can have at most 100 instances
args = [instances] * 100

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we making 100 copies of this tuple. This looks wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 copies of the generator. This is how you batch generators in python (it's weird). But I'm deleting this code anyway in favor of a webapp.

batch,
version=version
))
return results

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are the results saved or printed?

args = [instances] * 100
instance_batches = itertools.izip(*args)

results = []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the input data could need batching (be large), but the results don't need batching? I'm ok with this script not doing batching. Depends on what others say.

# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations under
# the License.
"""Examples of using the Cloud ML Engine's online prediction service."""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments on authentication. When should this work, or what needs to be true for the script to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in get_ml_engine_service, can you add a link to the doc page describing how I can download a service account file?

@@ -0,0 +1 @@
{"age": 25, "workclass": " Private", "education": " 11th", "education_num": 7, "marital_status": " Never-married", "occupation": " Machine-op-inspct", "relationship": " Own-child", "race": " Black", "gender": " Male", "capital_gain": 0, "capital_loss": 0, "hours_per_week": 40, "native_country": " United-States"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does your batching code really work? Hard to tell with just 1 prediction row

@elibixby
Copy link
Contributor Author

@brandondutra Switched to user input stream to avoid problems of batching and file reading which don't really belong in an online prediction sample.

@elibixby
Copy link
Contributor Author

@xlcheng

@elibixby
Copy link
Contributor Author

@brandondutra PTAL

@elibixby
Copy link
Contributor Author

@jonparrott Installing TensorFlow appears to be broken... @jonparrott can you PTAL?

import json
while True:
try:
user_input = json.loads(raw_input("Valid JSON >>>"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of raw-input (cannot re-run this quickly, and typing valid json is a pain). But this allows interactive input and "python predict.py < my_data.json". Maybe add file-level comments on these two ways of using this script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the main reason I wanted to do it this way, is we already have a solution for batch prediction (via the API) and a 100 request limit seems really bad if the use-case we are highlighting is predicting from files.



# [START census_to_example_bytes]
def census_to_example_bytes(json_instance):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting the file path to be in the census example (in cloudml-samples). Is this file part of the census sample or a more generic 'calling online prediction' sample? If the latter, we need better warnings that this will not work with every model, and we need to describe what the model is expecting.

If this is not part of the census sample, a s/census/json/g is needed.

Sorry if this is a bad question, I not familiar with python-docs-samples

Copy link
Contributor Author

@elibixby elibixby Feb 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we have sort of a hard separation between "things run as part of training" and "code run in your own client to send requests to the prediction service" The former being in cloudml-samples (and in the future tf/garden) and the latter being in python-docs-samples.

I will definitely add some better prose around this in the form of a docstring, and we'll also make it clear in docs.

from predict import census_to_example_bytes, predict_json


MODEL = 'census'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to think you want these files in GoogleCloudPlatform/cloudml-samples/census/something

@@ -0,0 +1,173 @@
# Copyright 2016 Google Inc. All Rights Reserved. Licensed under the Apache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This license header looks weird, copy it from elsewhere?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/2016/2017/g

# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations under
# the License.
"""Examples of using the Cloud ML Engine's online prediction service."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: blank line between license and docstring.

@@ -0,0 +1,173 @@
# Copyright 2016 Google Inc. All Rights Reserved. Licensed under the Apache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a shebang

# [END import_libraries]


# [START authenticating]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally show constructing the service in each snippet instead of centralizing it. Every indirection adds cognitive load to the users.

# [START predict_json]
def predict_json(project, model, instances, version=None):
"""Send data instances to a deployed model for prediction
Args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank newline above here.

to data.
version: [optional] str, version of the model to target.
Returns:
A dictionary of prediction results defined by the model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally encourage snippets to be simple enough not to require this, but I understand if that's not reasonable here. If you're going to go full docstring, follow Napoleon style:

Args:
    project (str): ...
    model (str): ...
    instances (Mapping[ str, dict ]): ...
    version (str): optional ...

Returns:
   Mapping [str, ...] : ...

Returns:
A dictionary of prediction results defined by the model.
"""
import base64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't import here, import at the top.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you highlight that this import is only necessary for this snippet?

Is that not important?

for example_bytes in example_bytes_list
]}
).execute()
if 'error' in response:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blank new line to separate control statements.


def main(project, model, version=None, force_tfrecord=False):
"""Send user input to the prediction service."""
import json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't import here.

import json
while True:
try:
user_input = json.loads(raw_input("Valid JSON >>>"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do the users find out what kind of json to send here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on their model. This snippet will be part of a docs page that is attempting to explain just that. This will be at the end "now that you know what the prediction service does, here's how you call it".

@@ -0,0 +1,38 @@
# Online Prediction with the Cloud Machine Learning Engine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have hand-written readmes in here any more. Please move all of this to the documentation and just link to the docs from here. I can add an auto-generated readme later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool.

# the License.

"""Examples of using the Cloud ML Engine's online prediction service."""
from __future__ import print_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessary.

model (str): model name.
instances ([Mapping[str: any]]): dictionaries from string keys
defined by the model deployment, to data with types that match
expected tensors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Period? Also, maybe it's just my unfamiliarity with tensorflow, but this reads like gibberish.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it doesn't make much sense with context. But could also use some rewording.

Args:
project (str): project where the Cloud ML Engine Model is deployed.
model (str): model name.
instances ([Mapping[str: any]]): dictionaries from string keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any is capital. What's the key and value here?

expected tensors
version: str, version of the model to target.
Returns:
Mapping[str: any]: dictionary of prediction results defined by the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the key and value here?

# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations under
# the License.
"""Tests for predict.py ."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank newline both above and below this.

@@ -0,0 +1,68 @@
# Copyright 2016 Google Inc. All Rights Reserved. Licensed under the Apache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2017, also, these headers still seem different from the ones in the rest of the repo.


import pytest

from predict import census_to_example_bytes, predict_json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just import predict, please don't import individual members.

predict_json(PROJECT, MODEL, [{"foo": "bar"}], version=VERSION)


# TODO(elibixby) Run on Travis when TensorFlow PyPi package supports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't put todos in code, just file an issue or bug to track it.

@@ -0,0 +1 @@
tensorflow>=1.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use ranges, pin the version and dpebot will handle updating it.

Copy link
Contributor

@theacodes theacodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after final nits, pending Travis.

import googleapiclient.discovery
# [END import_libraries]

import six
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this goes in the same section as import googleapiclient.discovery

assert base64.b64encode(b) is not None


def test_predict_tfrecord():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not write a real test and mark it with pytest.mark.xfail('reason')?

@elibixby elibixby merged commit b0417c9 into master Feb 28, 2017
@elibixby elibixby deleted the mlengine branch February 28, 2017 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants