Skip to content

Commit 07974bc

Browse files
j3su5pro-intelma-pinedagera-aldamaflezaalvaagalleg
authored andcommitted
Dataset_librarian code updates (#1187)
* Changed location of installation of GCP cli * Adding packaging tests * Test Unit Tests * Test Functional Tests * Fix for downloader * Adding an specific version to not have issues on instalation * Interconnection experiment * Test updated workflow * Fixed out_report.xml path * Interconnection for GCP * Adding updates for wheel and setuptools * Experiment requirements * Changes on interconnection proyect and setup * Changes to ignore copied files * Adding fix to handle encoding issues * Adding changes to readme * Taking current folder to get packages * Fix typo Signed-off-by: Felipe Leza Alvarez <[email protected]> * Inter connection sample * Hot fix for python 3.8 * Fix for setup code * ignoring sample file * Renaming correctly interconnection * Fix on upgrade for pip, wheel and setuptools * Interoperability POC sample fix * Fix for .sample copy * Setup for bash * Hotfix: change "test" strings for "test_unittest" so that unit tests won't use files used by samples. * Fixing requirements and setup * Removing wheel update from setup * Fixes the problem to upload dataset to GCP. * Fixed unwanted changes on main branch * Fixed aws functional test name * Upload a folder using the aws connector * Last changes on Interoperability are applied * Updated Licence * Removinb code of conduct reference, we have not * Skips row 0 from excel when creating dataframe. * Refactor * delete unused gitignore * change access keys format * Refactored names * first version license header * Remoivng readme files on this branch * Deleting files created from setup * headers * Adding headers into main packages * Adding headers on sample code * Updating files for publishing * Get GCP credentials * Changed setup.sh file * Removed init file on interoperability folder * Big refactoring, moving data_connector into datasets * Complete sample link * Test WF * Fixed path for unit tests * Changed trigger to PR * Create sample link * Merging readmes * Removing licence only for data connector * Missed recursive flag * delete unused names * Ignore outputs of jupyter notebook * Removing commented block * Removing deprecated folder * Include headers in init files * Adding headers in all tests * Removing coverage omit from tox configuration file (This only works on MZ env) * Removed gcp auth commands instead of commenting them * Adding headers * Removed sensitive information on gcp * Removing values to make it more easy for usersfill it * Solving names to public repo * Removing default values * Removing sample values * Removing values * Fixing path on script * Removing extra files * add security file * Updating files and structure for publishing * Updates for packaging * Updating readme * Updating metadata * Updating source code * Updating git ignore * Updating git ignore * updating metadata * Updating gitgitnore * removing extracted files * Updating repo * removing dataset egg info * Updating file permissions * Updating file permissions * removing key * Updating imports * Updating reame * Removing error * removing data_connector changes * Updating conda recipes * Fixed bugs in setup.sh Added azure src and dependencies. * Removing conda folders * Updating blank space at the end of files * Updating readme * Validation/scans (#56) * Fixed dataset_api requirements file * Merging from data_connector * Updating gitignore * Returning depencencies * Returning training code * Creating and re naming sample files * Adding format * New readme proposals * Fix on toml to avoid refactor * Readme agenda * Conda folder is unevitable * Exclud conda and egg folders * Adding badages in main readme... will see if we should use rst format for main readme only * Simple entry point for sample doc * Change header for sub_linked section * Modifications to current lass invocation * Adding relative link to documentation in AWS main readme file * Terms and conditions requirements update * Changes on Azure Readmi file * Removing previous terms and conditions * Updating path for datasets_urls * Updating path for datasets_urls * Removing data connector changes * Updating blank last line * Updated documentation with curren code functionality * Update documentation * Added code sample for upload, download and list blobs for oauth * first definition on dcp readme for bigquery * Sample connection with oauth * Adding readme sample for gcp service account connection with GCP * Connection documentation finished * Updating TPP file * updating with feedback --------- Signed-off-by: gera-aldama <[email protected]> Signed-off-by: Felipe Leza Alvarez <[email protected]> Co-authored-by: Miguel Pineda <[email protected]> Co-authored-by: Gerardo Dominguez <[email protected]> Co-authored-by: gera-aldama <[email protected]> Co-authored-by: Felipe Leza Alvarez <[email protected]> Co-authored-by: aagalleg <[email protected]> Co-authored-by: Leza Alvarez, Felipe <[email protected]> Co-authored-by: ma-pineda <[email protected]>
1 parent decd63e commit 07974bc

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+2735
-268
lines changed

datasets/data_connector/.gitignore

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,11 @@ __pycache__/
1212
.venv
1313

1414
# Folders
15-
local_channel/
1615
dist/
1716
inspect_package-conda/
1817
inspect_package-pip/
19-
conda_package/
2018
build/
19+
20+
# conda
21+
conda/local_channel/
22+
conda/extracted

datasets/data_connector/MANIFEST.in

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,6 @@ prune docker
33
prune samples
44
prune tests
55
prune docs
6-
exclude *.txt __init__.py .gitignore poetry.lock tox.ini MANIFEST.in
6+
prune conda
7+
prune data_connector.egg-info
8+
exclude *.txt __init__.py .gitignore poetry.lock tox.ini MANIFEST.in

datasets/data_connector/README.md

Lines changed: 24 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,27 @@
1+
# Cloud Data Connector
2+
3+
[![Intel: AI](https://img.shields.io/badge/intel-AI-0071C5)](https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/overview.html)
4+
[![Python](https://img.shields.io/badge/Python-3.8/3.9/3.10-green)](https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/overview.html)
5+
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
6+
[![security: bandit](https://img.shields.io/badge/security-bandit-yellow.svg)](https://github.com/PyCQA/bandit)
7+
[![security: SNYK](https://img.shields.io/badge/Security-SNYK-yellow)](https://github.com/PyCQA/bandit)
8+
[![security: BDBA](https://img.shields.io/badge/Security-BDBA-yellow)](https://github.com/PyCQA/bandit)
9+
[![security: Checkmarks](https://img.shields.io/badge/Security-Checkmarks-yellow)](https://github.com/PyCQA/bandit)
10+
11+
12+
_____
113
## Overview
214
---
315
data_connector is a tool to connect to AzureML, Azure blob, GCP storage, GCP Big Query and AWS storage S3.
416
The goal is provide all cloud managers in one place and provide documentation for an easy integration.
517

6-
For more details, visit the [Data Connector](repo link) GitHub repository.
18+
For more details, visit the [Data Connector](https://github.com/IntelAI/models/tree/master/datasets/data_connector) GitHub repository.
19+
<br/><br/>
720

821
## Hardware Requirements
922
---
1023
The hardware should comply with the same requirements that the cloud service.
24+
<br/><br/>
1125

1226
## How it Works
1327
---
@@ -20,7 +34,7 @@ The package contains the following modules:
2034
| data_connector.azure |
2135

2236
Each module is capable of connect, download and upload operation to it-s corresponding cloud service.
23-
37+
<br/><br/>
2438

2539
## Getting Started with data_connector
2640
---
@@ -33,23 +47,15 @@ conda activate venv
3347

3448
You can install the package with:
3549
```bash
36-
python -m pip install intel-cloud-data-connector
50+
python -m pip install cloud-data-connector
3751
```
3852

39-
Please follow module specific documentation for use case, hands-examples. This documentation can be found inside the package.
53+
Please follow module specific documentation for use case, hands-examples.
4054
1. data_connector/azure/README.md
4155
2. data_connector/azure/AzureML.md
4256
3. data_connector/aws/README.md
4357
4. data_connector/gcp/README.md
44-
45-
<!---
46-
## Learn More
47-
---
48-
For more information about data_connector, see these guides and software resources:
49-
- github/repo/link
50-
TODO: Update public repo
51-
-->
52-
58+
<br/><br/>
5359

5460
## Getting Started with data_connector.azure
5561
---
@@ -102,11 +108,7 @@ Data connector provides a tool to connect to Azure ML workspaces and upload conf
102108
```
103109
How to get a [Connection String](https://learn.microsoft.com/en-us/answers/questions/1071173/where-can-i-find-storage-account-connection-string)?
104110

105-
<!---
106-
TODO: Update link from public repo
107-
![Azure Connection String Sample](data_connector/../../../docs/img/connection_string.png)
108-
-----
109-
-->
111+
![Azure Connection String Sample](https://github.com/IntelAI/models/blob/master/datasets/data_connector/docs/img/connection_string.png)
110112

111113
Also you can get connection strings using Azure CLI
112114
```bash
@@ -137,12 +139,8 @@ Or just
137139
downloader = Downloader(connector=connector)
138140
downloader.download()
139141
```
140-
<!---
141-
TODO: Update link from public repo
142-
[See sample](../../samples/azure/blob_sample.py)
143-
-----
144-
-->
145-
142+
[See sample Here](https://github.com/IntelAI/models/blob/master/datasets/data_connector/samples/azure/blob_sample.py)
143+
<br/><br/>
146144

147145

148146
## Getting Started with data_connector.aws
@@ -327,7 +325,7 @@ uploader = Uploader(conection_object)
327325
# upload a file
328326
uploader.upload(bucket_name, 'path/to_local_file.csv', 'path/to_object_name.csv')
329327
```
330-
328+
<br/><br/>
331329

332330
## Getting Started with data_connector.gcp
333331
---
@@ -420,6 +418,7 @@ For service account:
420418
python -m samples.gcp.bigquery -p <project_name> -c <credentials_path>
421419
```
422420
User must provide the project name using flag (-p) and the local path to the JSON file with the credentials (-c).
421+
<br/><br/>
423422

424423

425424
## Support
Lines changed: 63 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Data Connector AWS S3
1+
# Cloud Data Connector: AWS S3
22

33
Data Connector for AWS S3 allows you to connect to S3 buckets and list contents, download and upload files.
44

@@ -8,7 +8,8 @@ To access S3 buckets, you will need to sign up for an AWS account and create acc
88

99
Access keys consist of an access key ID and secret access key, which are used to sign programmatic requests that you make to AWS.
1010

11-
## Hot to get your access key ID and secret access key
11+
How to get your access key ID and secret access key
12+
---
1213

1314
1. Open the IAM console at https://console.aws.amazon.com/iam/.
1415
2. On the navigation menu, choose Users.
@@ -34,31 +35,50 @@ You can add more configuration settings listed [here](https://boto3.amazonaws.co
3435
You need to import the DataConnector class.
3536

3637
```python
37-
from data_connector.aws.connector import Connector
38+
from data_connector.aws import Connector as aws_connector
3839
```
3940

4041
Connector class has the method connect(), it creates an AWS S3 object, by default the function will create a S3 connector using the credentials saved in your environment variables.
4142

4243
```python
43-
connector = Connector()
44+
aws_bucket_connector = aws_connector()
4445
```
4546

4647
Call the connect() method, this will return a connection object for S3.
4748

4849
```python
49-
conection_object = connector.connect()
50+
aws_conection_object = aws_bucket_connector.connect()
5051
```
5152

53+
Downloader
54+
---
5255
Import the Downloader class and use the connection object returned by connect() function.
5356

57+
5458
```python
55-
from data_connector.aws.downloader import Downloader
59+
from data_connector.aws import Downloader as aws_downloader
5660

57-
downloader = Downloader(conection_object)
61+
aws_file_downloader = aws_downloader(aws_conection_object)
5862
```
5963

60-
The Downloader class has two methods:
6164

65+
66+
To download a file use the `download(container_obj, data_file, destiny)` method and specify the next parameters.
67+
68+
- container_obj: The name of the bucket to download from.
69+
- data_file: The name of the file to download from.
70+
- destiny: The path to the file to download to.
71+
72+
```python
73+
from data_connector.aws import Downloader as aws_downloader
74+
75+
downloader = aws_downloader(aws_conection_object)
76+
file_name = "path/to_file.csv"
77+
downloader.download(bucket_name, file_name, 'path/to_destiny.csv')
78+
```
79+
List Blobs
80+
---
81+
[Downloader](#downloader) class has two methods:
6282
- list_blobs(container_obj): The function to get a list of the objects in a bucket.
6383
- download(container_obj, data_file, destiny): The function to download a file from a S3 bucket.
6484

@@ -67,71 +87,62 @@ A first step with buckets is to list their content using the `list_blobs(contain
6787
- container_obj: The bucket name to list.
6888

6989
```python
70-
from data_connector.aws.downloader import Downloader
90+
from data_connector.aws import Downloader as aws_downloader
7191

72-
downloader = Downloader(conection_object)
92+
aws_bucket_downloader = aws_downloader(aws_conection_object)
7393

74-
list_blobs = downloader.list_blobs('MY_BUCKET_NAME')
94+
list_blobs = aws_bucket_downloader.list_blob(
95+
'MY_BUCKET_NAME'
96+
)
7597
print(list_blobs)
7698
```
7799

78-
To download a file use the `download(container_obj, data_file, destiny)` method and specify the next parameters.
79-
80-
- container_obj: The name of the bucket to download from.
81-
- data_file: The name of the file to download from.
82-
- destiny: The path to the file to download to.
83-
84-
```python
85-
from data_connector.aws.downloader import Downloader
86-
87-
downloader = Downloader(conection_object)
88-
file_name = "path/to_file.csv"
89-
downloader.download(bucket_name, file_name, 'path/to_destiny.csv')
90-
```
100+
Upload
101+
---
91102

92103
You can import an Uploader class and use the upload method to send a file from you local machine to a bucket. You need to add the connector object to Uploader constructor.
93104

94105
```python
95-
from data_connector.aws.uploader import Uploader
96-
from data_connector.aws.connector import Connector
106+
from data_connector.aws import Uploader as aws_uploader
107+
from data_connector.aws import Connector as aws_connector
97108

98-
connector = Connector()
99-
conection_object = connector.connect()
100-
uploader = Uploader(conection_object)
109+
aws_bucket_connector = aws_connector()
110+
aws_conection_object = aws_bucket_connector.connect()
111+
aws_bucker_uploader = aws_uploader(aws_conection_object)
101112

102113
```
103114
Specify the next parameters in upload function.
104115

105116
- container_obj: The name of the bucket to upload to.
106117
- data_file: The path to the file to upload.
107-
- object_name: The name of the file to upload to.
118+
- object_name: The name of file in cloud bucket
108119

109120
```python
110-
from data_connector.aws.uploader import Uploader
111-
112-
uploader = Uploader(conection_object)
113-
uploader.upload(bucket_name, 'path/to_local_file.csv', 'path/to_object_name.csv')
121+
aws_bucker_uploader.upload('<bucket_name>', '<path/to_local_file>', '<path/to_object_name>')
114122
```
115123

124+
125+
Samples
126+
---
116127
### List objects in a bucket
117128

118129
```python
119130
# import the dataconnector package
120-
from data_connector.aws.connector import Connector
121-
from data_connector.aws.downloader import Downloader
131+
from data_connector.aws import Connector as aws_connector
132+
from data_connector.aws import Downloader as aws_downloader
122133

123134
# specify a S3 bucket name
124135
bucket_name = 'MY_BUCKET_NAME'
125136
# create a connector
126-
connector = Connector()
137+
aws_bucket_connector = aws_connector()
127138
# connect to aws using default AWS access keys
128139
# connect() method uses the configurations settings for AWS account
129-
conection_object = connector.connect()
140+
aws_conection_object = aws_bucket_connector.connect()
130141
# list files from bucket
131142
# create a downloader to list files
132-
downloader = Downloader(conection_object)
143+
aws_downloader = aws_downloader(aws_conection_object)
133144
# use the list_blobs function
134-
list_blobs = downloader.list_blobs(bucket_name)
145+
list_blobs = aws_downloader.list_blobs(bucket_name)
135146
# list_blobs functions returns all objects in bucket
136147
print(list_blobs)
137148

@@ -141,40 +152,40 @@ print(list_blobs)
141152

142153
```python
143154
# import the dataconnector package
144-
from data_connector.aws.connector import Connector
145-
from data_connector.aws.downloader import Downloader
155+
from data_connector.aws import Connector as aws_connector
156+
from data_connector.aws import Downloader as aws_connector
146157

147158
# specify a S3 bucket name
148159
bucket_name = 'MY_BUCKET_NAME'
149160
# create a connector
150-
connector = Connector()
161+
aws_bucket_connector = aws_connector()
151162
# connect to aws using default aws access keys
152-
conection_object = connector.connect()
163+
aws_conection_object = aws_bucket_connector.connect()
153164
# download a file from bucket
154165
# create a Downloader object using a connector object
155-
downloader = Downloader(conection_object)
166+
aws_downloader = aws_connector(aws_conection_object)
156167
# specify the object name to download
157168
file_name = "path/to_file.csv"
158169
# download the object
159-
downloader.download(bucket_name, file_name, 'path/to_destiny.csv')
170+
aws_downloader.download(bucket_name, file_name, 'path/to_destiny.csv')
160171
```
161172

162173
### Upload a file
163174

164175
```python
165176
# import dataconnector package
166-
from data_connector.aws.connector import Connector
167-
from data_connector.aws.uploader import Uploader
177+
from data_connector.aws import Connector as aws_connector
178+
from data_connector.aws import Uploader as aws_uploader
168179

169180
# specify a S3 bucket name
170181
bucket_name = 'MY_BUCKET_NAME'
171182
# create a connector
172-
connector = Connector()
183+
aws_bucket_connector = aws_connector()
173184
# connect to aws using default aws access keys
174-
conection_object = connector.connect()
185+
aws_conection_object = aws_bucket_connector.connect()
175186
# Upload a file
176187
# create a uploader object using a connection object
177-
uploader = Uploader(conection_object)
188+
aws_bucket_uploader = aws_uploader(aws_conection_object)
178189
# upload a file
179-
uploader.upload(bucket_name, 'path/to_local_file.csv', 'path/to_object_name.csv')
190+
aws_bucket_uploader.upload(bucket_name, 'path/to_local_file.csv', 'path/to_object_name.csv')
180191
```

0 commit comments

Comments
 (0)