Skip to content

Commit a663a27

Browse files
j3su5pro-intelflezaalvaagalleggera-aldamama-pineda
authored
Data Connector HF for public repo (#1331)
* Changes on interconnection proyect and setup * Changes to ignore copied files * Adding fix to handle encoding issues * Adding changes to readme * Taking current folder to get packages * Fix typo Signed-off-by: Felipe Leza Alvarez <[email protected]> * Fix typo Signed-off-by: Felipe Leza Alvarez <[email protected]> * fixes * Fix typo Signed-off-by: Felipe Leza Alvarez <[email protected]> * Inter connection sample * Hot fix for python 3.8 * Fix for setup code * ignoring sample file * Renaming correctly interconnection * renaming * Renaming correclty * Renaming again, last change delete setup * Fix on upgrade for pip, wheel and setuptools * Interoperability POC sample fix * Fix for .sample copy * Setup for bash * Hotfix: change "test" strings for "test_unittest" so that unit tests won't use files used by samples. * Fixing requirements and setup * typo fix * Removing wheel update from setup * Fixes the problem to upload dataset to GCP. * Fixed unwanted changes on main branch * Fixed unwanted changes on main branch * Fixed aws functional test name * Upload a folder using the aws connector * change on package name requires all finish edit interconnection to rename it to interopelabilityu * Last changes on Interoperability are applied * Updated Licence * Removinb code of conduct reference, we have not * Skips row 0 from excel when creating dataframe. * Refactor * Modification on refactoring * delete unused gitignore * change access keys format * Refactored names * Removing old name * first version license header * Remoivng readme files on this branch * Deleting files created from setup * headers * Remonig not implemented packages * Adding headers into main packages * Adding headers on sample code * Updating files for publishing * Get GCP credentials * Get GCP credentials * Get GCP credentials * Get GCP credentials * Changed setup.sh file * Removed init file on interoperability folder * Updating readme * Removing bad folder * Big refactoring, moving data_connector into datasets * Complete sample link * Test WF * Fixed path for unit tests * Changed trigger to PR * Create sample link * Merging readmes * Merging readmes * Merging readmes * Removing licence only for data connector * Missed recursive flag * delete unused names * Ignore outputs of jupyter notebook * Removing commented block * Removing deprecated folder * Include headers in init files * Adding headers in all tests * Removing coverage omit from tox configuration file (This only works on MZ env) * Removing empty lines * Removed gcp auth commands instead of commenting them * Adding headers * Removed sensitive information on gcp * Removing values to make it more easy for usersfill it * Solving names to public repo * Removing default values * Removing sample values * Removing values * Fixing path on script * Removing extra files * add security file * Updating files and structure for publishing * Updates for packaging * Updating readme * Updating metadata * Updating source code * Updating git ignore * Updating git ignore * updating metadata * Updating gitgitnore * removing extracted files * Updating repo * removing dataset egg info * Updating file permissions * Updating file permissions * removing key * Merging from parent * Merging from parent * Merging from parent * Merging from parent * Updating imports * Updating reame * Removing error * removing data_connector changes * removing data_connector changes * removing data_connector changes * removing data_connector changes * removing data_connector changes * removing data_connector changes * removing data_connector changes * removing data_connector changes * removing data_connector changes * Updating conda recipes * Updating conda recipes * Fixed bugs in setup.sh Added azure src and dependencies. * Removing conda folders * Updating blank space at the end of files * Updating readme * Validation/scans (#56) * Fixed dataset_api requirements file * Merging from data_connector * Updating gitignore * Fixing git ignore * Returning depencencies * Returning training code * Creating and re naming sample files * Adding format * New readme proposals * Fix on toml to avoid refactor * Readme agenda * Conda folder is unevitable * Exclud conda and egg folders * Adding badages in main readme... will see if we should use rst format for main readme only * Simple entry point for sample doc * Change header for sub_linked section * Modifications to current lass invocation * Adding relative link to documentation in AWS main readme file * Terms and conditions requirements update * Changes on Azure Readmi file * Removing previous terms and conditions * Updating path for datasets_urls * Updating path for datasets_urls * Removing data connector changes * Updating blank last line * Updated documentation with curren code functionality * Update documentation * Added code sample for upload, download and list blobs for oauth * first definition on dcp readme for bigquery * Sample connection with oauth * Adding readme sample for gcp service account connection with GCP * Connection documentation finished * Updating TPP file * updating with feedback * updating with feedback * Updating with feedback * Updating with feedback * Restoring lost changes for conda recipes * Updating conda recipes * Updating conda recipes * Updating conda recipe * Updating conda description * Updating changes from data_connector * Updating conda recipe * Hot fix for bad import * hotfix, binary storage stream downloaded package should write as binary on files * Updating git ignore * patching data connector to 1.0.1 * patching data connector to 1.0.1 * Updating recipes * Fix for toml where package build * Ignoring build folder * toml file is alwais included, have not sense exclude it * Fix on typo at conda install command * Fix on Apache version name * Fixing typo on conda descripion * Fixing typo on conda recipe meta.yaml * removing spaces for consistency --------- Signed-off-by: Felipe Leza Alvarez <[email protected]> Co-authored-by: Felipe Leza Alvarez <[email protected]> Co-authored-by: aagalleg <[email protected]> Co-authored-by: Gerardo Dominguez <[email protected]> Co-authored-by: Leza Alvarez, Felipe <[email protected]> Co-authored-by: Miguel Pineda <[email protected]> Co-authored-by: ma-pineda <[email protected]> Co-authored-by: gera-aldama <[email protected]>
1 parent 6270dd0 commit a663a27

File tree

8 files changed

+94
-101
lines changed

8 files changed

+94
-101
lines changed

datasets/data_connector/.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,5 +18,7 @@ inspect_package-pip/
1818
build/
1919

2020
# conda
21+
conda/local-channel/
2122
conda/local_channel/
22-
conda/extracted
23+
conda/extracted/
24+
conda/build/

datasets/data_connector/conda/conda_recipe/meta.yaml

Lines changed: 0 additions & 57 deletions
This file was deleted.
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
Data connector is a tool to connect to AzureML, Azure blob, GCP storage, GCP Big Query and AWS storage S3. The goal is provide all cloud managers in one place and provide documentation for an easy integration.
2+
3+
***
4+
5+
### Prerequisites
6+
Have either python `3.8`, `3.9` or `3.10` already installed.
7+
8+
***
9+
10+
### Installation Command
11+
```bash
12+
conda install cloud-data-connector -c microsoft -c intel -c conda-forge
13+
```
14+
15+
***
16+
17+
### PyPI Package
18+
[Here](https://pypi.org/project/cloud-data-connector/)
19+
20+
***
21+

datasets/data_connector/conda/pacakges.yaml

Lines changed: 0 additions & 30 deletions
This file was deleted.
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
{% set name = "cloud-data-connector" %}
2+
{% set version = "1.0.1" %}
3+
4+
package:
5+
name: {{ name|lower }}
6+
version: {{ version }}
7+
8+
source:
9+
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/cloud_data_connector-{{ version }}.tar.gz
10+
sha256: c0e23333b9d3b021a94516dc4a67b47abbdd18c5f011ceec834acd2b673988a5
11+
12+
build:
13+
noarch: python
14+
script: |
15+
{{ PYTHON }} -m pip install . -vv
16+
number: 0
17+
18+
requirements:
19+
host:
20+
- python>=3.8,<3.11
21+
- setuptools>=61.0
22+
- setuptools-scm
23+
- pip
24+
run:
25+
- python>=3.8,<3.11
26+
# - azureml>=0.2.7 # not available in conda
27+
- azure-ai-ml>=2023.06.01 # microsoft only
28+
- azure-core>=2023.06.01
29+
- azure-identity>=2023.06.01
30+
- azure-storage-blob>=1.4.1
31+
# - azureml-core>=1.49.0 # not available in conda
32+
- boto3>=1.26.154
33+
- google-api-core>=2.0.0
34+
- google-auth>=1.33.0
35+
- google-auth-oauthlib>=0.4.1
36+
- google-cloud-bigquery>=2.1.0
37+
- google-cloud-storage>=2.1.0
38+
- packaging>=21.3
39+
- python-dotenv>=1.0.0
40+
41+
test:
42+
imports:
43+
- data_connector
44+
45+
about:
46+
summary: 'Data connector is a tool to connect to AzureML, Azure blob, GCP storage, GCP Big Query and AWS storage S3. The goal is provide all cloud managers in one place and provide documentation for an easy integration.'
47+
license: 'Apache License, Version 2.0'
48+
about_license_url: https://www.apache.org/licenses/LICENSE-2.0.html
49+
50+
extra:
51+
recipe-maintainers:
52+
- Jose de Jesus Herrera Ledon <[email protected]>
53+
- Alberto Gallegos Muro <[email protected]>
54+
- Felipe Leza Alvarez <[email protected]>
55+
- Miguel Pineda Juarez <[email protected]>
56+
- Gerardo Dominguez Aldama <[email protected]>

datasets/data_connector/data_connector/azure/downloader.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ def download(
4747
storage_stream_downloader = blob_container_client.download_blob(
4848
data_file
4949
).readall()
50-
with open(destiny, mode="w") as downloaded_blob:
50+
with open(destiny, mode="wb") as downloaded_blob:
5151
downloaded_blob.write(storage_stream_downloader)
5252
self.container_client = blob_container_client
5353
return blob_container_client

datasets/data_connector/pyproject.toml

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "data_connector"
7-
version = "1.0.0"
7+
version = "1.0.1"
88
requires-python = ">=3.8,<3.11"
99
authors = [
1010
{ name="IntelAI", email="[email protected]"}
@@ -23,22 +23,22 @@ classifiers = [
2323
dependencies = [
2424
"azureml>=0.2.7",
2525
"azure-ai-ml>=1.4.0",
26-
"azure-storage-blob>=12.14.1",
26+
"azureml-core>=1.49.0",
2727
"azure-identity>=1.12.0",
28+
"azure-storage-blob>=1.4.1",
2829
"azure-core>=1.26.3",
29-
"azureml-core>=1.49.0",
30-
"boto3>=1.26.65",
31-
"google-api-core>=2.11.0",
32-
"google-auth>=2.16.2",
33-
"google-auth-oauthlib>=1.0.0",
34-
"google-cloud-bigquery>=3.7.0",
35-
"google-cloud-storage>=2.7.0",
36-
"packaging<22.0,>=20.0",
30+
"boto3>=1.26.154",
31+
"google-api-core>=2.0.0",
32+
"google-auth>=1.33.0",
33+
"google-auth-oauthlib>=0.4.1",
34+
"google-cloud-bigquery>=2.1.0",
35+
"google-cloud-storage>=2.1.0",
36+
"packaging>=21.3",
3737
"python-dotenv>=1.0.0"
3838
]
3939

4040
[tool.setuptools.packages.find]
41-
where = ["data_connector"] # list of folders that contain the packages (["."] by default)
41+
where = ["."] # list of folders that contain the packages (["."] by default)
4242
include = ["data_connector*"]
43-
exclude = ["data_connector.egg-info", "pyproject.toml"]
43+
exclude = ["data_connector.egg-info"]
4444
namespaces = false

datasets/data_connector/samples/gcp/bigquery.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
from data_connector.gcp.query import Query
2323
from dotenv import load_dotenv
2424
from google.cloud import bigquery
25+
from google.api_core.exceptions import BadRequest
2526

2627
load_dotenv()
2728

0 commit comments

Comments
 (0)