-
Notifications
You must be signed in to change notification settings - Fork 1
Setup GCP BigQuery Integration Tests for Analysis Modules #244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
f51ea78
feat: Setup GCP BigQuery Integration Tests for Analysis Modules
mayurkmmt 00783ab
fix: change the code to check ibis functionality rather than the output
mayurkmmt 54db275
Merge branch 'main' of github.com:data-simply/pyretailscience into fe…
mayurkmmt cc13d22
fix: changed the code as per the comments
mayurkmmt 17de115
Merge branch 'main' of github.com:data-simply/pyretailscience into fe…
mayurkmmt 0cd7587
feat: created big query test cases for utils file
mayurkmmt File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
GCP_PROJECT_ID = |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
name: BigQuery Integration Tests | ||
|
||
on: | ||
workflow_dispatch: | ||
inputs: | ||
test_suite: | ||
type: choice | ||
description: Test Suite to Run | ||
default: "all" | ||
options: | ||
- all | ||
- cohort_analysis | ||
- composite_rank | ||
- cross_shop | ||
- customer_decision_hierarchy | ||
- haversine | ||
- hml_segmentation | ||
- product_association | ||
- revenue_tree | ||
- rfm_segmentation | ||
- segstats_segmentation | ||
- threshold_segmentation | ||
|
||
permissions: | ||
contents: read | ||
|
||
concurrency: | ||
group: "bigquery-tests" | ||
cancel-in-progress: true | ||
|
||
jobs: | ||
integration-tests: | ||
name: Run BigQuery Integration Tests | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
|
||
- name: Setup Python | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: "3.11" | ||
|
||
- name: Install uv Package | ||
run: | | ||
pip install --upgrade pip | ||
pip install uv==0.5.30 | ||
|
||
- name: Install Dependencies | ||
run: | | ||
uv sync | ||
|
||
- name: Set up GCP Authentication | ||
uses: google-github-actions/auth@v2 | ||
with: | ||
credentials_json: ${{ secrets.GCP_SA_KEY }} | ||
|
||
- name: Run Integration Tests | ||
env: | ||
TEST_SUITE: ${{ inputs.test_suite }} | ||
run: | | ||
uv run pytest tests/integration/bigquery -v \ | ||
$(if [ "$TEST_SUITE" != "all" ]; then echo "-k $TEST_SUITE"; fi) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
"""BigQuery integration test fixtures.""" | ||
|
||
import os | ||
|
||
import ibis | ||
import pytest | ||
from dotenv import load_dotenv | ||
from google.cloud import bigquery | ||
from loguru import logger | ||
|
||
load_dotenv() | ||
client = bigquery.Client(project="pyretailscience-infra") | ||
|
||
|
||
@pytest.fixture(scope="session") | ||
def bigquery_connection(): | ||
"""Connect to BigQuery for integration tests.""" | ||
try: | ||
conn = ibis.bigquery.connect( | ||
project_id=os.environ.get("GCP_PROJECT_ID"), | ||
) | ||
logger.info("Connected to BigQuery") | ||
except Exception as e: | ||
logger.error(f"Failed to connect to BigQuery: {e}") | ||
raise | ||
else: | ||
return conn | ||
|
||
|
||
@pytest.fixture(scope="session") | ||
def transactions_table(bigquery_connection): | ||
"""Get the transactions table for testing.""" | ||
return bigquery_connection.table("test_data.transactions") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
"""Integration tests for Cohort Analysis with BigQuery.""" | ||
|
||
from pyretailscience.analysis.cohort import CohortAnalysis | ||
|
||
|
||
def test_cohort_analysis_with_bigquery(transactions_table): | ||
"""Integration test for CohortAnalysis using BigQuery backend and Ibis table. | ||
|
||
This test ensures that the CohortAnalysis class initializes and executes successfully | ||
using BigQuery data with various combinations of aggregation parameters. | ||
""" | ||
limited_table = transactions_table.limit(5000) | ||
|
||
CohortAnalysis( | ||
df=limited_table, | ||
aggregation_column="unit_spend", | ||
agg_func="sum", | ||
period="week", | ||
percentage=True, | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
"""Integration tests for Composite Rank Analysis with BigQuery.""" | ||
|
||
import pytest | ||
|
||
from pyretailscience.analysis.composite_rank import CompositeRank | ||
|
||
|
||
@pytest.mark.parametrize("ignore_ties", [False, True]) | ||
def test_tie_handling(transactions_table, ignore_ties): | ||
"""Test handling of ties during rank calculation.""" | ||
rank_cols = [ | ||
("unit_spend", "desc"), | ||
("customer_id", "desc"), | ||
] | ||
result = CompositeRank( | ||
df=transactions_table, | ||
rank_cols=rank_cols, | ||
agg_func="mean", | ||
ignore_ties=ignore_ties, | ||
) | ||
assert result is not None | ||
executed_result = result.df | ||
assert executed_result is not None |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
"""Integration tests for Cross Shop Analysis with BigQuery.""" | ||
|
||
import pytest | ||
|
||
from pyretailscience.analysis.cross_shop import CrossShop | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"group_3_col", | ||
[ | ||
"category_1_name", | ||
None, | ||
], | ||
) | ||
def test_cross_shop_with_bigquery(transactions_table, group_3_col): | ||
"""Test CrossShop with data fetched from BigQuery. | ||
|
||
This parameterized test verifies that CrossShop can be initialized | ||
and run with data from BigQuery using different combinations of group columns, | ||
value columns, and aggregation functions without throwing exceptions. | ||
""" | ||
transactions_df = transactions_table.limit(5000) | ||
group_1_col = "brand_name" | ||
group_2_col = "category_0_name" | ||
group_1_vals = transactions_df[group_1_col].execute().dropna().unique() | ||
group_2_vals = transactions_df[group_2_col].execute().dropna().unique() | ||
|
||
group_1_val = group_1_vals[0] | ||
group_2_val = group_2_vals[0] | ||
|
||
group_3_val = None | ||
if group_3_col is not None: | ||
mvanwyk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
group_3_vals = transactions_df[group_3_col].execute().dropna().unique() | ||
if len(group_3_vals) == 0: | ||
pytest.skip(f"Not enough unique values for {group_3_col}") | ||
group_3_val = group_3_vals[0] | ||
|
||
labels = ["Group 1", "Group 2"] if group_3_col is None else ["Group 1", "Group 2", "Group 3"] | ||
|
||
CrossShop( | ||
df=transactions_table, | ||
group_1_col=group_1_col, | ||
group_1_val=group_1_val, | ||
group_2_col=group_2_col, | ||
group_2_val=group_2_val, | ||
group_3_col=group_3_col, | ||
group_3_val=group_3_val, | ||
labels=labels, | ||
value_col="unit_quantity", | ||
agg_func="count", | ||
) |
35 changes: 35 additions & 0 deletions
35
tests/integration/bigquery/test_customer_decision_hierarchy.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
"""Integration tests for Customer Decision Hierarchy Analysis with BigQuery.""" | ||
|
||
import pytest | ||
|
||
from pyretailscience.analysis.customer_decision_hierarchy import CustomerDecisionHierarchy | ||
|
||
|
||
@pytest.mark.parametrize( | ||
("method", "exclude_same_transaction"), | ||
[ | ||
("truncated_svd", False), | ||
("truncated_svd", None), | ||
("yules_q", False), | ||
("yules_q", None), | ||
], | ||
) | ||
def test_customer_decision_hierarchy_with_bigquery( | ||
transactions_table, | ||
method, | ||
exclude_same_transaction, | ||
): | ||
"""Test CustomerDecisionHierarchy with data fetched from BigQuery. | ||
|
||
This parameterized test verifies that CustomerDecisionHierarchy can be initialized | ||
and run with data from BigQuery using different combinations of product columns | ||
and methods without throwing exceptions. | ||
""" | ||
transactions_df = transactions_table.limit(5000).execute() | ||
|
||
CustomerDecisionHierarchy( | ||
df=transactions_df, | ||
product_col="product_name", | ||
exclude_same_transaction_products=exclude_same_transaction, | ||
method=method, | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
"""Tests for the date utility functions with BigQuery integration.""" | ||
|
||
from datetime import UTC, datetime | ||
|
||
from pyretailscience.utils.date import filter_and_label_by_periods | ||
|
||
|
||
def test_filter_and_label_by_periods_with_bigquery(transactions_table): | ||
"""Test filter_and_label_by_periods with data using Ibis. | ||
|
||
This test verifies that filter_and_label_by_periods can process data | ||
through an Ibis without throwing exceptions. | ||
""" | ||
limited_table = transactions_table.limit(1000) | ||
period_ranges = { | ||
"Q1": (datetime(2023, 1, 1, tzinfo=UTC), datetime(2023, 3, 31, tzinfo=UTC)), | ||
"Q2": (datetime(2023, 4, 1, tzinfo=UTC), datetime(2023, 6, 30, tzinfo=UTC)), | ||
} | ||
result = filter_and_label_by_periods(limited_table, period_ranges) | ||
|
||
assert result is not None | ||
|
||
df = result.execute() | ||
assert df is not None |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.