-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Add voc2012 dataset for image segment #2785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 8 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
338dd13
Add voc2012 dataset for image segment
wanghaoshuang a698824
Merge branch 'develop' into voc_dataset
wanghaoshuang a5239ac
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into…
wanghaoshuang c4f301d
Modify comments and fix code format.
wanghaoshuang 1ba879b
Use PIL to read image in palette mode
wanghaoshuang 0978d33
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into…
wanghaoshuang 2437631
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into…
wanghaoshuang 4a5c371
fix python dependency for voc2012 dataset
wanghaoshuang 302c4f1
rename voc_seg to voc2012
wanghaoshuang b142a6b
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into…
wanghaoshuang ceb9a73
fix import err
wanghaoshuang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| # Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| import paddle.v2.dataset.voc_seg | ||
| import unittest | ||
|
|
||
|
|
||
| class TestVOC(unittest.TestCase): | ||
| def check_reader(self, reader): | ||
| sum = 0 | ||
| label = 0 | ||
| for l in reader(): | ||
| self.assertEqual(l[0].size, 3 * l[1].size) | ||
| sum += 1 | ||
| return sum | ||
|
|
||
| def test_train(self): | ||
| count = self.check_reader(paddle.v2.dataset.voc_seg.train()) | ||
| self.assertEqual(count, 2913) | ||
|
|
||
| def test_test(self): | ||
| count = self.check_reader(paddle.v2.dataset.voc_seg.test()) | ||
| self.assertEqual(count, 1464) | ||
|
|
||
| def test_val(self): | ||
| count = self.check_reader(paddle.v2.dataset.voc_seg.val()) | ||
| self.assertEqual(count, 1449) | ||
|
|
||
|
|
||
| if __name__ == '__main__': | ||
| unittest.main() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| # Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| """ | ||
| Image dataset for segmentation. | ||
| The 2012 dataset contains images from 2008-2011 for which additional | ||
| segmentations have been prepared. As in previous years the assignment | ||
| to training/test sets has been maintained. The total number of images | ||
| with segmentation has been increased from 7,062 to 9,993. | ||
| """ | ||
|
|
||
| import tarfile | ||
| import io | ||
| import numpy as np | ||
| from paddle.v2.dataset.common import download | ||
| from paddle.v2.image import * | ||
| from PIL import Image | ||
|
|
||
| __all__ = ['train', 'test', 'val'] | ||
|
|
||
| VOC_URL = 'http://host.robots.ox.ac.uk/pascal/VOC/voc2012/\ | ||
| VOCtrainval_11-May-2012.tar' | ||
|
|
||
| VOC_MD5 = '6cd6e144f989b92b3379bac3b3de84fd' | ||
| SET_FILE = 'VOCdevkit/VOC2012/ImageSets/Segmentation/{}.txt' | ||
| DATA_FILE = 'VOCdevkit/VOC2012/JPEGImages/{}.jpg' | ||
| LABEL_FILE = 'VOCdevkit/VOC2012/SegmentationClass/{}.png' | ||
|
|
||
| CACHE_DIR = 'voc2012' | ||
|
|
||
|
|
||
| def reader_creator(filename, sub_name): | ||
|
|
||
| tarobject = tarfile.open(filename) | ||
| name2mem = {} | ||
| for ele in tarobject.getmembers(): | ||
| name2mem[ele.name] = ele | ||
|
|
||
| def reader(): | ||
| set_file = SET_FILE.format(sub_name) | ||
| sets = tarobject.extractfile(name2mem[set_file]) | ||
| for line in sets: | ||
| line = line.strip() | ||
| data_file = DATA_FILE.format(line) | ||
| label_file = LABEL_FILE.format(line) | ||
| data = tarobject.extractfile(name2mem[data_file]).read() | ||
| label = tarobject.extractfile(name2mem[label_file]).read() | ||
| data = Image.open(io.BytesIO(data)) | ||
| label = Image.open(io.BytesIO(label)) | ||
| data = np.array(data) | ||
| label = np.array(label) | ||
| yield data, label | ||
|
|
||
| return reader | ||
|
|
||
|
|
||
| def train(): | ||
| """ | ||
| Create a train dataset reader containing 2913 images in HWC order. | ||
| """ | ||
| return reader_creator(download(VOC_URL, CACHE_DIR, VOC_MD5), 'trainval') | ||
|
|
||
|
|
||
| def test(): | ||
| """ | ||
| Create a test dataset reader containing 1464 images in HWC order. | ||
| """ | ||
| return reader_creator(download(VOC_URL, CACHE_DIR, VOC_MD5), 'train') | ||
|
|
||
|
|
||
| def val(): | ||
| """ | ||
| Create a val dataset reader containing 1449 images in HWC order. | ||
| """ | ||
| return reader_creator(download(VOC_URL, CACHE_DIR, VOC_MD5), 'val') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PEP8 style https://www.python.org/dev/peps/pep-0008/#package-and-module-names says:
Here the underscore doesn't really improve the readability, how about naming it
voc2012instead?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. I have renamed it to voc2012.