refactor: move data simulation to another package #55

mvanwyk · 2024-07-03T18:39:36Z

PR Type

Enhancement, Documentation

Description

Refactored multiple Jupyter Notebook examples to load data from a parquet file instead of simulating data.
Updated transaction data examples in notebooks to reflect new data.
Improved exception handling and added type annotations in data_contracts.ipynb.
Removed data simulation instructions from README and added placeholder text.
Updated mkdocs configuration to reflect new examples structure.

Changes walkthrough 📝

Relevant files

Enhancement

data_contracts.ipynb `Refactor data contracts example to load data from parquet file` docs/examples/data_contracts.ipynb Replaced data simulation code with data loading from a parquet file. Updated transaction data examples to reflect new data. Improved exception handling in `top_customers` function. Added type annotations and docstrings for better clarity.	+101/-110
retention.ipynb `Refactor retention example to load data from parquet file` docs/examples/retention.ipynb Removed data simulation setup. Added data loading from a parquet file. Updated transaction data examples to reflect new data. Minor formatting improvements.	+173/-33
gain_loss.ipynb `Refactor gain/loss example to load data from parquet file` docs/examples/gain_loss.ipynb Replaced data simulation code with data loading from a parquet file. Updated transaction data examples to reflect new data. Minor formatting improvements.	+66/-41
cross_shop.ipynb `Refactor cross-shop example to load data from parquet file` docs/examples/cross_shop.ipynb Replaced data simulation code with data loading from a parquet file. Updated transaction data examples to reflect new data. Minor formatting improvements.	+67/-42

Documentation

README.md `Update README to remove data simulation instructions` README.md Removed section on generating simulated data. Added placeholder text for future updates.	+1/-27

Configuration changes

mkdocs.yml `Update mkdocs configuration to reflect new examples structure` mkdocs.yml Reorganized examples section. Removed reference to data simulation example.	+1/-3

Additional files (token-limit)

segmentation.ipynb `...` docs/examples/segmentation.ipynb ...	+294/-269

💡 PR-Agent usage:
Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

Summary by CodeRabbit

New Features
- Updated documentation to reflect the transition from simulating to loading pre-simulated data.
Documentation
- README.md now mentions that simulated transaction data functionality is "Coming Soon."
- Updated multiple example notebooks to load pre-simulated data instead of generating it.
- Revised navigation in mkdocs.yml for better clarity and structure.
Chores
- Updated .gitignore to exclude .csv files instead of .parquet files.
- Removed click dependency and a script entry in pyproject.toml.

coderabbitai · 2024-07-03T18:39:46Z

Walkthrough

The recent changes primarily focus on shifting the data workflow from generating simulated data to loading pre-simulated data. This impacts multiple notebooks and documentation files, altering the instructions and examples accordingly. Additionally, the .gitignore file was updated to exclude .csv instead of .parquet files, and the pyproject.toml was modified to remove certain dependencies and script entries. Navigation in mkdocs.yml was also restructured for better clarity and organization.

Changes

Files	Change Summary
`.gitignore`	Updated to exclude `.csv` instead of `*.parquet` files.
`README.md`	Removed the section on generating simulated transaction data; replaced with "Coming Soon."
`docs/examples/cross_shop.ipynb`	Changed from simulating data to loading pre-simulated data; updated displayed data.
`docs/examples/data_contracts.ipynb`	Updated text and functionality for loading data; added new class and type hints in function parameters.
`docs/examples/gain_loss.ipynb`	Switched from simulating to loading pre-simulated data; updated brand names and prices.
`docs/examples/retention.ipynb`	Significant changes to load data from a file; included new imports and updated output visualizations.
`…/examples/…` (multiple files)	Grouped similar changes across multiple notebook files for brevity.
`mkdocs.yml`	Rearranged the navigation structure; removed outdated sections and links.
`pyproject.toml`	Removed `click` dependency; reordered some package versions; removed a script entry.

Poem

In the realm where data flows,
Files transformed and notebooks glowed,
From simulating days to pre-simulated ways,
Cleaner paths now boldly showed.
CSVs we shall hide,
In structured lines, our progress pried.
🌟🚀 A celebratory leap, with code we keep! 🚀🌟

Tip

AI model upgrade

`gpt-4o` model for reviews and chat is now live

OpenAI claims that this model is better at understanding and generating code than the previous models. Please join our Discord Community to provide any feedback or to report any issues.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

qodo-merge-pro · 2024-07-03T18:40:15Z

PR Reviewer Guide 🔍

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Key issues to review Data Consistency: Ensure that the new data source (parquet files) maintains consistency with the previous simulated data, especially in terms of data structure and content. Exception Handling: Review the changes in exception handling in `data_contracts.ipynb` to ensure they are appropriate and provide clear error messages. Documentation Updates: Verify that all documentation and comments accurately reflect the changes made, especially in Jupyter notebooks and the README file.

qodo-merge-pro · 2024-07-03T18:40:56Z

PR Code Suggestions ✨

Category	Suggestion	Score
Possible bug	Correct the case sensitivity in the DataFrame type hint Replace the use of `pd.Dataframe` with `pd.DataFrame` to correct the case sensitivity issue in the type hint, which could lead to runtime errors or issues with static type checkers. docs/examples/data_contracts.ipynb [812] -def top_customers(df: pd.Dataframe, n: int=5) -> pd.DataFrame: +def top_customers(df: pd.DataFrame, n: int=5) -> pd.DataFrame: Apply this suggestion Suggestion importance[1-10]: 10 Why: The correction from `pd.Dataframe` to `pd.DataFrame` is crucial as it prevents potential runtime errors and issues with static type checkers, ensuring the code functions correctly.	10
Best practice	Add data validation after loading the dataframe to ensure it contains all expected columns It's recommended to validate the data loaded from external sources to ensure it meets expected formats and constraints. This can prevent issues arising from malformed or unexpected data. docs/examples/segmentation.ipynb [197-198] df = pd.read_parquet("../../data/transactions.parquet") +# Ensure the dataframe contains expected columns +expected_columns = {'transaction_id', 'transaction_datetime', 'customer_id', 'product_id', 'product_name', 'category_0_name', 'category_0_id', 'category_1_name', 'category_1_id', 'brand_name', 'brand_id', 'unit_price', 'quantity', 'total_price', 'store_id'} +assert expected_columns.issubset(df.columns), "Dataframe is missing one or more expected columns" df.head() Apply this suggestion Suggestion importance[1-10]: 9 Why: This suggestion adds a crucial validation step to ensure the data meets expected formats, which can prevent downstream errors due to malformed data.	9
	Set the random seed outside the function call for consistent outputs Ensure that the random seed is set outside the function call for reproducibility. This practice helps in maintaining consistent outputs for the random choices made in the notebook. docs/examples/cross_shop.ipynb [246-248] -df.loc[shoes_idx, "category_1_name"] = np.random.RandomState(42).choice( +rng = np.random.RandomState(42) +df.loc[shoes_idx, "category_1_name"] = rng.choice( ["Shoes", "Jeans"], size=shoes_idx.sum(), p=[0.5, 0.5], ) Apply this suggestion Suggestion importance[1-10]: 8 Why: This suggestion ensures reproducibility, which is important for consistent results, especially in a notebook setting.	8
	Use a custom exception for clearer error handling Instead of raising a generic `ValueError`, raise a more specific custom exception to provide clearer error handling specific to the domain or application. docs/examples/data_contracts.ipynb [817] -raise ValueError(msg) +class ContractValidationError(Exception): + pass +raise ContractValidationError(msg) Apply this suggestion Suggestion importance[1-10]: 7 Why: Using a custom exception improves error handling by providing clearer and more specific error messages, which is a best practice for maintainable code.	7
Robustness	Add error handling around the file reading operation to manage potential exceptions Consider adding error handling for file reading operations to manage exceptions that may occur if the file is missing or corrupt. docs/examples/segmentation.ipynb [197] -df = pd.read_parquet("../../data/transactions.parquet") +try: + df = pd.read_parquet("../../data/transactions.parquet") +except Exception as e: + print(f"Failed to read data: {e}") + # Handle the error appropriately, possibly re-raise or log Apply this suggestion Suggestion importance[1-10]: 9 Why: Adding error handling improves the robustness of the code by managing exceptions that may occur during file reading operations, preventing the program from crashing unexpectedly.	9
Enhancement	Add a data type expectation for the 'total_price' column Ensure that the `ExpectationConfiguration` for the 'total_price' column includes a check for the column's data type, enhancing data validation and consistency. docs/examples/data_contracts.ipynb [895-897] ExpectationConfiguration( expectation_type="expect_column_to_exist", kwargs={"column": "total_price"}, ), +ExpectationConfiguration( + expectation_type="expect_column_values_to_be_of_type", + kwargs={"column": "total_price", "type_": "float"}, +), Apply this suggestion Suggestion importance[1-10]: 9 Why: Including a data type expectation for the 'total_price' column enhances data validation and consistency, ensuring that the data meets expected standards.	9
	Add a check for an empty DataFrame to prevent errors Add a check to ensure that the DataFrame `df` is not empty before proceeding with sorting and returning the top customers. This prevents potential errors when operating on an empty DataFrame. docs/examples/data_contracts.ipynb [819] +if df.empty: + return df return df.sort_values("total_price", ascending=False).head(n).reset_index(drop=True) Apply this suggestion Suggestion importance[1-10]: 8 Why: Adding a check for an empty DataFrame enhances the robustness of the function by preventing potential errors when operating on an empty DataFrame.	8
	Use pandas `to_html` for dynamic HTML table generation Replace the hard-coded HTML table with a dynamic generation using pandas DataFrame `to_html` method, which can be customized with CSS classes and other HTML attributes. This approach enhances code readability and maintainability. docs/examples/retention.ipynb [36-148] -<table border="1" class="dataframe"> - <thead> - ... - </thead> - <tbody> - ... - </tbody> -</table> +df.to_html(classes='dataframe', border=1) Suggestion importance[1-10]: 8 Why: This suggestion enhances code readability and maintainability by leveraging pandas' built-in functionality, reducing the need for hard-coded HTML.	8
Possible issue	Add a check to ensure the DataFrame is not empty to prevent runtime errors To ensure that the DataFrame is not empty before performing operations, add a check to confirm that `df` is not empty after loading the data. This check prevents potential errors in subsequent operations if the data file is missing or empty. docs/examples/cross_shop.ipynb [195-196] df = pd.read_parquet("../../data/transactions.parquet") +if df.empty: + raise ValueError("Data file is empty or not found.") df.head() Apply this suggestion Suggestion importance[1-10]: 9 Why: This suggestion addresses a potential runtime error, which is crucial for ensuring the robustness of the code.	9
Maintainability	Replace hardcoded file paths with environment variables for better flexibility and maintainability To avoid hardcoding file paths, consider using a configuration file or environment variables to manage file paths, making the code more flexible and easier to maintain across different environments. docs/examples/segmentation.ipynb [197] -df = pd.read_parquet("../../data/transactions.parquet") +import os +data_path = os.getenv('DATA_PATH', '../../data/') +df = pd.read_parquet(data_path + "transactions.parquet") Apply this suggestion Suggestion importance[1-10]: 8 Why: Using environment variables for file paths enhances the flexibility and maintainability of the code, making it easier to adapt to different environments.	8
	Encapsulate data loading logic into a function for improved readability and reusability For better readability and maintenance, consider using a function to encapsulate the data loading logic, especially if similar data loading patterns are used multiple times in the notebook. docs/examples/segmentation.ipynb [197-198] -df = pd.read_parquet("../../data/transactions.parquet") +def load_data(file_path): + return pd.read_parquet(file_path) + +df = load_data("../../data/transactions.parquet") df.head() Apply this suggestion Suggestion importance[1-10]: 7 Why: Encapsulating the data loading logic into a function enhances code readability and reusability, especially if similar patterns are used multiple times in the notebook.	7
	Replace inline CSS with external CSS file for DataFrame styling Consider using CSS classes instead of inline styles for the DataFrame HTML representation to improve maintainability and separation of concerns. This change will make it easier to manage styles globally and reduce redundancy in the notebook. docs/examples/retention.ipynb [23-35] -<style scoped> - .dataframe tbody tr th:only-of-type { - vertical-align: middle; - } - ... -</style> +<link rel="stylesheet" type="text/css" href="dataframe_style.css"> Suggestion importance[1-10]: 7 Why: Using an external CSS file improves maintainability and separation of concerns, but it requires additional setup to ensure the CSS file is available and correctly linked.	7
	Use a variable for the file path to enhance flexibility and maintainability Replace the hard-coded file path with a variable that can be set at the top of the notebook. This change makes the notebook more flexible and easier to maintain, especially when the data source changes or when the notebook is used in different environments. docs/examples/cross_shop.ipynb [195] -df = pd.read_parquet("../../data/transactions.parquet") +data_file_path = "../../data/transactions.parquet" +df = pd.read_parquet(data_file_path) Apply this suggestion Suggestion importance[1-10]: 7 Why: Using a variable for the file path makes the code more flexible and easier to maintain, which is a good practice but not critical.	7
	Improve variable naming for better readability Consider using a more descriptive variable name instead of `shoes_idx` to enhance code readability. For example, `shoes_category_filter` would provide more context about the purpose of the variable. docs/examples/cross_shop.ipynb [245] -shoes_idx = df["category_1_name"] == "Shoes" +shoes_category_filter = df["category_1_name"] == "Shoes" Apply this suggestion Suggestion importance[1-10]: 6 Why: The suggestion improves code readability by using a more descriptive variable name, which is beneficial for maintainability but not critical.	6
Readability	Improve DataFrame text display formatting in the notebook Ensure the DataFrame display in 'text/plain' output is properly formatted for better readability. Consider using `pd.set_option` to adjust display settings like `max_columns`, `max_rows`, or `precision`. docs/examples/retention.ipynb [153-179] -" transaction_id transaction_datetime customer_id product_id \\\n", -"0 7108 2023-01-12 17:44:29 1 15 \n", -... +pd.set_option('display.max_columns', None) +pd.set_option('display.precision', 2) +df.head() Suggestion importance[1-10]: 6 Why: Adjusting display settings can improve readability, but the current formatting is already fairly readable. This is a minor enhancement.	6
Readability	Use Python dictionary syntax for arrow properties to enhance readability Replace the manual HTML arrow properties dictionary with a more readable format by using Python's dictionary syntax, which enhances code readability and maintainability. docs/examples/retention.ipynb [311] -"arrowprops={\"facecolor\": \"black\", \"arrowstyle\": \"-\|>\", \"connectionstyle\": \"arc3,rad=-0.25\", \"mutation_scale\": 25},\n", +"arrowprops=dict(facecolor='black', arrowstyle='-\|>', connectionstyle='arc3,rad=-0.25', mutation_scale=25),\n", Apply this suggestion Suggestion importance[1-10]: 5 Why: The existing code is already quite readable, and this change offers only a slight improvement in readability and maintainability.	5

coderabbitai

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 9a3a0b9 and 405b312.

Files ignored due to path filters (2)

data/transactions.parquet is excluded by !**/*.parquet
poetry.lock is excluded by !**/*.lock

Files selected for processing (8)

.gitignore (1 hunks)
README.md (1 hunks)
docs/examples/cross_shop.ipynb (7 hunks)
docs/examples/data_contracts.ipynb (6 hunks)
docs/examples/gain_loss.ipynb (7 hunks)
docs/examples/retention.ipynb (3 hunks)
mkdocs.yml (1 hunks)
pyproject.toml (2 hunks)

Files skipped from review due to trivial changes (4)

.gitignore
README.md
mkdocs.yml
pyproject.toml

Additional comments not posted (23)

docs/examples/data_contracts.ipynb (7)

13-13: LGTM!

The change from "creating and simulating some data" to "loading some simulated data" is consistent with the PR objective.

62-181: LGTM!

The transaction data table has been updated with new data values. These changes align with the PR objective of using pre-simulated data.

192-192: LGTM!

The code now loads data from a Parquet file, which is consistent with the PR objective.

812-817: LGTM!

The function signature now includes type annotations, which enhance code readability and maintainability.

882-891: LGTM!

The new class CustomCustomerLevelContract is well-documented and follows best practices for extending data contracts.

Line range hint 890-918: LGTM!

The __init__ method now includes type annotations and additional expectations for the total_price column, improving code readability and maintainability.

812-817: LGTM!

The code cell validates the custom contract and clips the total_price values to meet the contract expectations, ensuring data integrity.

docs/examples/gain_loss.ipynb (7)

17-17: LGTM!

The markdown cell correctly reflects the change in data loading.

66-69: LGTM!

The displayed table data is consistent and correctly formatted.

Line range hint 84-88: LGTM!

The displayed table data is consistent and correctly formatted.

Line range hint 102-106: LGTM!

The displayed table data is consistent and correctly formatted.

118-185: LGTM!

The displayed table data is consistent and correctly formatted.

198-199: LGTM!

The code correctly updates to load data from a Parquet file.

263-263: LGTM!

The code correctly reassigns rows and applies discounts based on the new data.

docs/examples/cross_shop.ipynb (7)

14-14: Update text to reflect loading of simulated data.

The text change correctly reflects the new approach of loading simulated data instead of generating it.

63-63: Verify data formatting and consistency.

Ensure that the displayed data values are correctly formatted and consistent with the rest of the dataset.

81-81: Verify data formatting and consistency.

Ensure that the displayed data values are correctly formatted and consistent with the rest of the dataset.

99-99: Verify data formatting and consistency.

Ensure that the displayed data values are correctly formatted and consistent with the rest of the dataset.

115-182: Verify data formatting and consistency.

Ensure that the displayed data values are correctly formatted and consistent with the rest of the dataset.

193-196: Import necessary libraries and load data from a Parquet file.

The imports and data loading code appear to be correct.

Ensure that the data file path "../../data/transactions.parquet" is valid and accessible.

247-247: Randomly assign category name for shoes.

The code appears to correctly randomly assign the category name "Shoes" or "Jeans" to the rows where the category name is currently "Shoes".

Ensure that the random assignment logic is correct and necessary.

docs/examples/retention.ipynb (2)

213-214: LGTM!

The output text provides useful statistics about the dataset.

311-311: LGTM!

The changes to the plot aesthetics are appropriate and enhance the visualization.

coderabbitai · 2024-07-03T18:42:00Z

docs/examples/retention.ipynb

+       "      <td>AMD</td>\n",
+       "      <td>102</td>\n",
+       "      <td>120.00</td>\n",
+       "      <td>3</td>\n",
+       "      <td>360.00</td>\n",
+       "      <td>4</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>4553</td>\n",
+       "      <td>2023-02-05 09:31:42</td>\n",
+       "      <td>1</td>\n",
+       "      <td>735</td>\n",
+       "      <td>Linden Wood Paneled Mirror</td>\n",
+       "      <td>Home</td>\n",
+       "      <td>5</td>\n",
+       "      <td>Home Decor</td>\n",
+       "      <td>30</td>\n",
+       "      <td>Pottery Barn</td>\n",
+       "      <td>147</td>\n",
+       "      <td>599.00</td>\n",
+       "      <td>1</td>\n",
+       "      <td>599.00</td>\n",
+       "      <td>4</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>4553</td>\n",
+       "      <td>2023-02-05 09:31:42</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1107</td>\n",
+       "      <td>Pro-V Daily Moisture Renewal Conditioner</td>\n",
+       "      <td>Beauty</td>\n",
+       "      <td>7</td>\n",
+       "      <td>Hair Care</td>\n",
+       "      <td>45</td>\n",
+       "      <td>Pantene</td>\n",
+       "      <td>222</td>\n",
+       "      <td>4.99</td>\n",
+       "      <td>1</td>\n",
+       "      <td>4.99</td>\n",
+       "      <td>4</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   transaction_id transaction_datetime  customer_id  product_id  \\\n",
+       "0            7108  2023-01-12 17:44:29            1          15   \n",
+       "1            7108  2023-01-12 17:44:29            1        1317   \n",
+       "2            4553  2023-02-05 09:31:42            1         509   \n",
+       "3            4553  2023-02-05 09:31:42            1         735   \n",
+       "4            4553  2023-02-05 09:31:42            1        1107   \n",
+       "\n",
+       "                               product_name category_0_name  category_0_id  \\\n",
+       "0                              Spawn Figure            Toys              1   \n",
+       "1                                 Gone Girl           Books              8   \n",
+       "2                             Ryzen 3 3300X     Electronics              3   \n",
+       "3                Linden Wood Paneled Mirror            Home              5   \n",
+       "4  Pro-V Daily Moisture Renewal Conditioner          Beauty              7   \n",
+       "\n",
+       "       category_1_name  category_1_id       brand_name  brand_id  unit_price  \\\n",
+       "0       Action Figures              1   McFarlane Toys         3       27.99   \n",
+       "1  Mystery & Thrillers             53  Alfred A. Knopf       264       10.49   \n",
+       "2  Computer Components             21              AMD       102      120.00   \n",
+       "3           Home Decor             30     Pottery Barn       147      599.00   \n",
+       "4            Hair Care             45          Pantene       222        4.99   \n",
+       "\n",
+       "   quantity  total_price  store_id  \n",
+       "0         2        55.98         6  \n",
+       "1         1        10.49         6  \n",
+       "2         3       360.00         4  \n",
+       "3         1       599.00         4  \n",
+       "4         1         4.99         4  "
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
    }
   ],
   "source": [
-    "import strictyaml as yaml\n",
-    "\n",
-    "from pyretailscience.data.simulation import config_schema\n",
+    "import matplotlib.pyplot as plt\n",
+    "import pandas as pd\n",
    "\n",
-    "config_file = \"../../data/default_data_config.yaml\"\n",
-    "with open(config_file, \"r\") as f:\n",
-    "    config = yaml.load(f.read(), config_schema).data\n",
+    "from pyretailscience import customer\n",
    "\n",
-    "config[\"transactions\"][\"start_date\"] = datetime(year=2023, month=1, day=1)\n",
-    "sim = Simulation(seed=42, config=config)\n",
-    "sim.run()\n",
-    "df = pd.DataFrame(sim.transactions)"
+    "df = pd.read_parquet(\"../../data/transactions.parquet\")\n",
+    "df.head()"


Tip
Codebase Verification

File not found: ../../data/transactions.parquet

The Parquet file ../../data/transactions.parquet does not exist in the specified path. Please ensure that the file is available or update the path accordingly to avoid errors when loading the data in the notebook.

Analysis chain

LGTM! Verify the existence of the Parquet file.

The changes to display the dataframe and the imports are appropriate.

However, ensure that the Parquet file ../../data/transactions.parquet exists and is accessible.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Verify the existence of the Parquet file. # Test: Check if the Parquet file exists. Expect: File exists. if [ -f "../../data/transactions.parquet" ]; then echo "File exists." else echo "File does not exist." fi

Length of output: 95

refactor: move data simulation to another package

405b312

qodo-merge-pro bot added documentation Improvements or additions to documentation enhancement New feature or request Review effort [1-5]: 3 labels Jul 3, 2024

coderabbitai bot reviewed Jul 3, 2024

View reviewed changes

mvanwyk merged commit 1a55803 into main Jul 3, 2024

mvanwyk deleted the split_out_data_sim branch July 3, 2024 18:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: move data simulation to another package #55

refactor: move data simulation to another package #55

Uh oh!

mvanwyk commented Jul 3, 2024 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 3, 2024 •

edited

Loading

`gpt-4o` model for reviews and chat is now live

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

qodo-merge-pro bot commented Jul 3, 2024

Uh oh!

qodo-merge-pro bot commented Jul 3, 2024

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jul 3, 2024

Uh oh!

Uh oh!

refactor: move data simulation to another package #55

refactor: move data simulation to another package #55

Uh oh!

Conversation

mvanwyk commented Jul 3, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Changes walkthrough 📝

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Poem

gpt-4o model for reviews and chat is now live

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

qodo-merge-pro bot commented Jul 3, 2024

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot commented Jul 3, 2024

PR Code Suggestions ✨

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 3, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mvanwyk commented Jul 3, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 3, 2024 •

edited

Loading

`gpt-4o` model for reviews and chat is now live

CodeRabbit Configration File (`.coderabbit.yaml`)