Skip to content

Commit b235539

Browse files
author
Yundi Qian
committed
two changes: 1) a python script to combine multiple training corpus into one; 2) push Colibrow's PR (#8). Thanks, Colibrow!
1 parent 345011d commit b235539

File tree

2 files changed

+77
-1
lines changed

2 files changed

+77
-1
lines changed
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# coding=utf-8
2+
# Copyright 2020 Google LLC
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
r"""Combine multiple training corpus into a single training corpus.
17+
18+
Usage: we'd like to combine training corpus corpus1 and corpus2 into
19+
combinedcorpus; we first structure the files as follows:
20+
21+
combinedcorpus
22+
combinedcorpus/corpus1
23+
combinedcorpus/corpus2
24+
25+
Running this script with
26+
27+
python3 compiler_opt/tools/combine_training_corpus.py \
28+
--root_dir=$PATH_TO_combinedcorpus
29+
30+
generates combinedcorpus/module_path file. In this way corpus1 and
31+
corpus2 are combined into combinedcorpus.
32+
"""
33+
34+
import os
35+
36+
from absl import app
37+
from absl import flags
38+
from absl import logging
39+
40+
import tensorflow as tf
41+
42+
flags.DEFINE_string('root_dir', '', 'root dir of module paths to combine.')
43+
44+
FLAGS = flags.FLAGS
45+
46+
_FILE_NAME = 'module_paths'
47+
48+
49+
def main(argv):
50+
if len(argv) > 1:
51+
raise app.UsageError('Too many command-line arguments.')
52+
53+
module_names = []
54+
55+
for sub_dir in tf.io.gfile.listdir(FLAGS.root_dir):
56+
path = os.path.join(FLAGS.root_dir, sub_dir, _FILE_NAME)
57+
58+
logging.info('processing %s', path)
59+
60+
if not tf.io.gfile.exists(path):
61+
logging.error('%s does not exist.', path)
62+
continue
63+
64+
with tf.io.gfile.GFile(path, 'r') as f:
65+
module_names.extend(
66+
[os.path.join(sub_dir, name.rstrip('\n')) for name in f])
67+
68+
with tf.io.gfile.GFile(os.path.join(FLAGS.root_dir, _FILE_NAME), 'w') as f:
69+
for module in module_names:
70+
f.write(module + '\n')
71+
72+
73+
if __name__ == '__main__':
74+
app.run(main)

docs/demo/demo.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,9 @@ fx build
145145

146146
Fuchsia build conveniently generates a size report. Let's copy it for reference.
147147

148-
**Note** The `--args=clang_embed_bitcode=true` option above adds the compilation
148+
**Note**
149+
The `clang_prefix` is the absolute path of $LLVM_INSTALLDIR/bin(replace it by
150+
yours). The `--args=clang_embed_bitcode=true` option above adds the compilation
149151
flag `-Xclang=-fembed-bitcode=all`. This can be seen in the compilation database.
150152
The effect of this is that the object files have the llvm bytecode produced by
151153
clang, before the optimization passes, and the clang command line, captured in

0 commit comments

Comments
 (0)