Add convert to recordio format function #183

gongweibao · 2017-06-26T12:31:01Z

typhoonzero

Also need a Dockerfile and k8s yaml file and README.md.

typhoonzero · 2017-06-26T13:04:01Z

docker/convert/convert.py

+    path = output_path + "/" + name
+    mkdir_p(path)
+
+    mod.convert(path)


Shall we do convert and split at the same time? @Yancey1989

convert的时候已经split了。

I see the convert function here:https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/dataset/common.py#L167, it also needs some other required parameters such as reader and num_shards. So maybe the code can not be run correctly, or I missed sth. ?

oh，sorry，我应该把相关的PR都放上来的。其实，跟这个相关的PR是这个：
PaddlePaddle/Paddle#2608

typhoonzero · 2017-06-26T13:04:59Z

docker/convert/convert.py

+            raise
+
+def convert(output_path, name):
+    print "proc " + name


Use logging instead of printing, it will log the time and log level.

Yancey0623 · 2017-06-27T05:36:36Z

docker/convert/convert.py

+    print "proc " + name
+    mod = __import__("paddle.v2.dataset." + name, fromlist=[''])
+
+    path = output_path + "/" + name


I think we can use os.path.join(output_path, name) instead of hard code delimiter.

Good idea. Done.

Yancey0623 · 2017-06-27T05:40:56Z

docker/convert/convert.py

+    path = output_path + "/" + name
+    mkdir_p(path)
+
+    mod.convert(path)


I see the convert function here:https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/dataset/common.py#L167, it also needs some other required parameters such as reader and num_shards. So maybe the code can not be run correctly, or I missed sth. ?

typhoonzero · 2017-06-27T06:33:08Z

docker/convert/Dockerfile

@@ -0,0 +1,4 @@
+FROM paddlepaddle/paddle
+ADD ./convert.py /convert/


You can add multiple files using a single line to reduce the image layers

typhoonzero · 2017-06-27T06:34:11Z

docker/convert/convert.py

+import logging
+import logging.config
+
+logging.config.fileConfig('logging.conf')


Use [dict_config], so we don't need another config file.

typhoonzero · 2017-06-27T06:34:28Z

docker/convert/Dockerfile

+FROM paddlepaddle/paddle
+ADD ./convert.py /convert/
+ADD ./logging.conf /convert/
+ADD .cache/paddle/dataset /root/.cache/paddle/dataset


Add a default command for this dockerfile

没有default command，路径是需要指定的。

Didn't get your point. Default command could be CMD ["/program", "argument"], this can be overrided by k8s yaml definations

Thanks.Done.

…into convertdataset

helinwang

Sorry, could you let me know why users need to submit a k8s job to convert the public datasets?
I thought we will convert them locally (or by a k8s job), upload to the cluster public storage. And users don't need to convert them again?

gongweibao · 2017-06-28T01:18:35Z

@helinwang 这里的users其实是我们自己。当我们有了新的集群或者需要重新生成一下数据，只需要改一下convert_app.yaml里边的路径，然后用kubectrl create -f convert_app.yaml 就可以把数据写到相应的位置。我还是把README改成中文吧，英文还是表达不清楚啊。

helinwang

@gongweibao 懂了，谢谢！LGTM.

gongweibao and others added 3 commits June 26, 2017 19:49

first add

93f3c58

rm not need

f0da31d

fix bugs

8247b5d

gongweibao requested review from Yancey0623, helinwang and typhoonzero June 26, 2017 12:31

typhoonzero reviewed Jun 26, 2017

View reviewed changes

add logging

5135d6c

Yancey0623 reviewed Jun 27, 2017

View reviewed changes

Your Name added 2 commits June 27, 2017 14:16

add docker file

32b7ea7

fix by yancey's comment

279060b

typhoonzero reviewed Jun 27, 2017

View reviewed changes

Your Name and others added 8 commits June 27, 2017 17:29

fix yaml

fdbe8c0

fix bugs

1357ad8

fix bugs

67c1d9f

fix convert bug

599d873

fix by wuyi's comment

aa0f747

add readme

deab719

Merge branch 'convertdataset' of https://github.com/gongweibao/cloud …

1ab5b67

…into convertdataset

rm logging.confg

53f8417

helinwang requested changes Jun 27, 2017

View reviewed changes

gongweibao and others added 3 commits June 28, 2017 09:22

modify README.md

af4dbfd

modify README.md

a0017f0

add start command

87dd9d7

helinwang approved these changes Jun 28, 2017

View reviewed changes

gongweibao merged commit 19eff0c into PaddlePaddle:develop Jun 29, 2017

gongweibao deleted the convertdataset branch August 21, 2017 08:18

		@@ -0,0 +1,4 @@
		FROM paddlepaddle/paddle
		ADD ./convert.py /convert/

Add convert to recordio format function #183

Add convert to recordio format function #183

Uh oh!

Conversation

gongweibao commented Jun 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

typhoonzero left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Yancey0623 Jun 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Yancey0623 Jun 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

helinwang left a comment

Choose a reason for hiding this comment

Uh oh!

gongweibao commented Jun 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

helinwang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gongweibao commented Jun 26, 2017 •

edited

Loading

Yancey0623 Jun 27, 2017 •

edited

Loading

Yancey0623 Jun 27, 2017 •

edited

Loading

gongweibao commented Jun 28, 2017 •

edited

Loading