Skip to content

Commit df0153e

Browse files
authored
Merge pull request #426 from GoogleCloudPlatform/nl
Nl
2 parents bd7a093 + ae433ee commit df0153e

File tree

5 files changed

+668
-0
lines changed

5 files changed

+668
-0
lines changed

language/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ This directory contains Python examples that use the
55

66
- [api](api) has a simple command line tool that shows off the API's features.
77

8+
- [movie_nl](movie_nl) combines sentiment and entity analysis to come up with
9+
actors/directors who are the most and least popular in the imdb movie reviews.
10+
811
- [ocr_nl](ocr_nl) uses the [Cloud Vision API](https://cloud.google.com/vision/)
912
to extract text from images, then uses the NL API to extract entity information
1013
from those texts, and stores the extracted information in a database in support

language/movie_nl/README.md

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# Introduction
2+
This sample is an application of the Google Cloud Platform Natural Language API.
3+
It uses the [imdb movie reviews data set](https://www.cs.cornell.edu/people/pabo/movie-review-data/)
4+
from [Cornell University](http://www.cs.cornell.edu/) and performs sentiment & entity
5+
analysis on it. It combines the capabilities of sentiment analysis and entity recognition
6+
to come up with actors/directors who are the most and least popular.
7+
8+
### Set Up to Authenticate With Your Project's Credentials
9+
10+
Please follow the [Set Up Your Project](https://cloud.google.com/natural-language/docs/getting-started#set_up_your_project)
11+
steps in the Quickstart doc to create a project and enable the
12+
Cloud Natural Language API. Following those steps, make sure that you
13+
[Set Up a Service Account](https://cloud.google.com/natural-language/docs/common/auth#set_up_a_service_account),
14+
and export the following environment variable:
15+
16+
```
17+
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your-project-credentials.json
18+
```
19+
20+
**Note:** If you get an error saying your API hasn't been enabled, make sure
21+
that you have correctly set this environment variable, and that the project that
22+
you got the service account from has the Natural Language API enabled.
23+
24+
## How it works
25+
This sample uses the Natural Language API to annotate the input text. The
26+
movie review document is broken into sentences using the `extract_syntax` feature.
27+
Each sentence is sent to the API for sentiment analysis. The positive and negative
28+
sentiment values are combined to come up with a single overall sentiment of the
29+
movie document.
30+
31+
In addition to the sentiment, the program also extracts the entities of type
32+
`PERSON`, who are the actors in the movie (including the director and anyone
33+
important). These entities are assigned the sentiment value of the document to
34+
come up with the most and least popular actors/directors.
35+
36+
### Movie document
37+
We define a movie document as a set of reviews. These reviews are individual
38+
sentences and we use the NL API to extract the sentences from the document. See
39+
an example movie document below.
40+
41+
```
42+
Sample review sentence 1. Sample review sentence 2. Sample review sentence 3.
43+
```
44+
45+
### Sentences and Sentiment
46+
Each sentence from the above document is assigned a sentiment as below.
47+
48+
```
49+
Sample review sentence 1 => Sentiment 1
50+
Sample review sentence 2 => Sentiment 2
51+
Sample review sentence 3 => Sentiment 3
52+
```
53+
54+
### Sentiment computation
55+
The final sentiment is computed by simply adding the sentence sentiments.
56+
57+
```
58+
Total Sentiment = Sentiment 1 + Sentiment 2 + Sentiment 3
59+
```
60+
61+
62+
### Entity extraction and Sentiment assignment
63+
Entities with type `PERSON` are extracted from the movie document using the NL
64+
API. Since these entities are mentioned in their respective movie document,
65+
they are associated with the document sentiment.
66+
67+
```
68+
Document 1 => Sentiment 1
69+
70+
Person 1
71+
Person 2
72+
Person 3
73+
74+
Document 2 => Sentiment 2
75+
76+
Person 2
77+
Person 4
78+
Person 5
79+
```
80+
81+
Based on the above data we can calculate the sentiment associated with Person 2:
82+
83+
```
84+
Person 2 => (Sentiment 1 + Sentiment 2)
85+
```
86+
87+
## Movie Data Set
88+
We have used the Cornell Movie Review data as our input. Please follow the instructions below to download and extract the data.
89+
90+
### Download Instructions
91+
92+
```
93+
$ curl -O http://www.cs.cornell.edu/people/pabo/movie-review-data/mix20_rand700_tokens.zip
94+
$ unzip mix20_rand700_tokens.zip
95+
```
96+
97+
## Command Line Usage
98+
In order to use the movie analyzer, follow the instructions below. (Note that the `--sample` parameter below runs the script on
99+
fewer documents, and can be omitted to run it on the entire corpus)
100+
101+
### Install Dependencies
102+
103+
Install [pip](https://pip.pypa.io/en/stable/installing) if not already installed.
104+
105+
Then, install dependencies by running the following pip command:
106+
107+
```
108+
$ pip install -r requirements.txt
109+
```
110+
### How to Run
111+
112+
```
113+
$ python main.py analyze --inp "tokens/*/*" \
114+
--sout sentiment.json \
115+
--eout entity.json \
116+
--sample 5
117+
```
118+
119+
You should see the log file `movie.log` created.
120+
121+
## Output Data
122+
The program produces sentiment and entity output in json format. For example:
123+
124+
### Sentiment Output
125+
```
126+
{
127+
"doc_id": "cv310_tok-16557.txt",
128+
"sentiment": 3.099,
129+
"label": -1
130+
}
131+
```
132+
133+
### Entity Output
134+
135+
```
136+
{
137+
"name": "Sean Patrick Flanery",
138+
"wiki_url": "http://en.wikipedia.org/wiki/Sean_Patrick_Flanery",
139+
"sentiment": 3.099
140+
}
141+
```
142+
143+
### Entity Output Sorting
144+
In order to sort and rank the entities generated, use the same `main.py` script. For example,
145+
this will print the top 5 actors with negative sentiment:
146+
147+
```
148+
$ python main.py rank --entity_input entity.json \
149+
--sentiment neg \
150+
--reverse True \
151+
--sample 5
152+
```

0 commit comments

Comments
 (0)