report_analysis

Analyze Team Reports for audit

Setting up Environment

Using command line, install requirements.txt from the root directory
```
pip install -r requirements.txt
```

Running the Script

from the root directory run main.py
```
python main.py
```

Implementation

The script will process reports, that is dated on previous working day.
Excel outputs will go to the output dir ./output separated in each report folder.

Setting up settings.py

You can change the following configuration in settings.py

Date of run
Set which reports to run
Set What team should be inluded in the report analysis
Set up directory paths
- Root directory - NEVER CHANGE ROOT DIRECTORY
- Report source directory
- Output directory

NOTE: If you will make changes on the directory names, make sure that it exists in your root directory

Update Date of Run

In settings.py, the default date of run is date today. The automation will process all data dated the previous weekday.

from datetime import date

DATE = date.today()

You may set the date of run to different date.

For example, you want to run the automation on October 12 2020, Monday.

DATE = date(2020, 10, 12)

The automation will process all data on previous working day. October 9 2020, Friday.

Setting up Team

In settings.py, TEAMS variable sets which team will be included in the report analyais. If you want to add or remove team, just set it up from the list.

TEAMS = ['BONDS', 'EQD STRUCT', 'FXMM', 'FUNDING']

NOTES:

Teams that is in the list, should have it's corresponding folder in the reports directory.
Lookup for folder name is CASE SENSITIVE. Team name in the list should match the exact folder name.

Setting up Reports to Run

You may decide what reports to run in settings.py.

RUN ALL REPORTS
```
RUN_ALL = True
```
If you want to run only specific reports, Change RUN_ALL to False and set your report as 1 or 0 in REPORTS dictionary.
1. 1 -> automation will run the report
2. 0 -> automation will not run the report Example: If you want to only run, ExceptionRecords and MissingInMX reports;
```
RUN_ALL = False

REPORTS = {
    'ExceptionRecords': 1,
    'Matched': 0,
    'MISMATCH': 0,
    'MissingInMX': 1,
    'MissingInSAA': 0,
    'UnclassifiedException': 0,
}
```

To run specific report in the REPORTS dict, it has a counterpart setup in main.py.

Go to main.py and add the pipeline class of the specific report in pipes array and make sure IMPORT THAT PIPELINE CLASS from pipelines module.

from pipelines.exception_records import ExceptionRecordsPipeline
from pipelines.matched import MatchedPipeline
from pipelines.mismatch import MismatchPipeline
from pipelines.mx import MissingInMXPipeline
from pipelines.saa import MissingInSaaPipeline
from pipelines.unclassified_exception import UnclassifiedExceptionPipeline

pipes = [
    ExceptionRecordsPipeline,
    MismatchPipeline,
    UnclassifiedExceptionPipeline,
    MatchedPipeline,
    MissingInSaaPipeline,
    MissingInMXPipeline
]

Pipeline Class

Every reports that is being run has its corresponding Pipeline Class in pipelines module.

Each reports is separated in different py file in pipelines directory, with filename representing it's report name.

Example of the report string, and how to import into main.py:

ExceptionRecords report is in ./pipelines/exception_records.py with a pipeline class ExceptionRecordsPipeline.
```
from pipelines.exception_records import ExceptionRecordsPipeline
```
MissingInMX report is in ./pipelines/mx.py with a pipeline class MissingInMXPipeline.
```
from pipelines.mx import MissingInMXPipeline
```
MissingInSAA report is in ./pipelines/saa.py with a pipeline class MissingInSaaPipeline.
```
from pipelines.saa import MissingInSaaPipeline
```

Architecture of Pipeline Class

main.py calls and runs the pipeline class in the following order: For this example we will use MissingInSaaPipeline
1. Initialize the pipeline class
  1. This will initialize the class and runs the __init__ method from the class.
```
pipeline = MissingInSaaPipeline()
```
  2. Initialization includes:
    1. Inheritance of ReportReader class which gives us the ability to read report for our specific pipeline class.
    2. Being able to generate a dateframe for the MASTER report which can be accessed through self.master_df
    3. Generate a consolidated report dataframe from each team through self.consolidated_df
    4. Set the Comments and Team columns of self.master_df dataframe to empty or nan values.
2. Run self.process_df() method:
```
pipeline.process_df()
```
  1. process_df method will now process the self.master_df which is the master report dataframe read the the ReportReader class.
    
    Process includes:
    1. self.__add_comment - applied to the self.master_df and loop through it's row using apply method of pandas Dataframe class.
      
      Comments will be matched with the consolidated report and perform matching if a row in self.master_df is in self.consolidated_df.
    2. self.__add_team - applied to the self.master_df and loop through it's row using apply method of pandas Dataframe class and look for the Team name in consolidated_df.
3. Run self.end_process() method:
```
pipeline.endprocess()
```
4. end_process method is going to export the processed data: consolidated_df, and no_comments_df from step iii. - self.process_df(self) and write it to excel file int OUT_DIR -> ./output/
  
  The output filename consist of report_name, date_of_process, and the df type: consolidated_df or no_comments_df.
```
Example:
1. filename:
     "ExceptionRecords_16Oct2020_consolidated.xlsx"
2. output directory:
     "./output/ExceptionRecords/"
3. filepath:
     "./output/ExceptionRecords/ExceptionRecords_16Oct2020_consolidated.xlsx"
```

Other attributes and methods in pipeline class

self.no_comments_df() - This is a property attribute that generates a dataframe of rows with no comments from the master_df.

Creating new Report or Pipeline class

In template.py from the pipelines module, it consist the basic template of the pipeline class.

In order to create new pipeline for your new report:

Copy template.py and rename the py file into filename that represents your report or part of your report.
Change TemplatePipeline into the name of your report.
Set the class attribute report to your report name.

Example - Your report name is MyNewReport:
```
class MyNewReport:
   report = "MyNewReport"
```
Set criteria in __add_comment and __add_team method of which columns you want to match your master_df with consolidated_df on looking up comments and team.
```
criteria = {
    'ColumnName1': df['ColumnName1'],
    'ColumnName2': df['ColumnName2']
} 
```
Set your new pipeline class in settings.py and main.py to run your new report. Refer to Setting up Reports to Run section of this documentation.

NOTE:

This will only do the most basic matching and look up of comments for reports.
If there are special cases for your new report, you might need to add code somewhere in the pipeline class to implement the special cases.

ReportReader Class

ReportReader is a class in utils module which reader your report for you.

Usage

To instantiate the ReportReader class, you need to pass in report as parameter which is your report name.
```
reader = ReportReader('MissingInMX')
```
NOTE: Always make your report name the same as your report folder name CASE SENSITIVE
self.master_df - This will return you a dataframe of the master of the report you passed in instantiating the class.
```
df = reader.master_df
```
self.consolidated_df - This will return you a dataframe of the consolidated report for all the teams that is set up in settings.py. Reference: Setting Up Team
self.read_report() - This will return you a dateframe of one report

Parameters:
1. team: str
  - Required parameter
  - This will tell the class to read the reports for the specific team that you passed in.
  - The report that will be read is whichever report you passed in to ReportPath class as instance parameter.
```
df = reader.read_report(team='BONDS')
```
2. any_file: Optional[bool]
  - Optional parameter
  - if you set any_file to true, it will make the ReportReader class read any filename in the team directory regardless of the report name
```
df = reader.read_report(team='BONDS', any_file=True)
```
3. filename: Optional[str]
  - Optional paramater
  - This will make the ReportReader class to read the filename that you passed as parameter. If filename is not found, the automation will fail and stop.

ReportPath Class

ReportPath is a class that gives you the paths, filenames, and globs information for a specific report and team.

ReportPath class is going to look for the specific excel file that you will need to read.

Usage

To instantiate the ReportPath Class, you need to pass the following parameters:
1. report: str - Report name
  - Required parameter
  - This will specify which report directory you are looking for.
2. team: str - Team name
  - Required parameter
  - This will specify which team directory you are looking for.
```
rpath = ReportPath(report='MissingInMX', team='BONDS')
```
1. any_file: Optional[bool]
  - Optional parameter
  - default is False
  - If you passed this parameter, it will disregard the report name as glob and just look for any excel files that is in the team directory
  - NOTE: Your team directory must only have one excel file. If the script found 0 or more than 1 excel file in the directory, you will get an error and the automation will fail.
  - You might only need to use this for special cases of your report
```
rpath = ReportPath(report='MissingInMX', team='BONDS', any_file=True)
```
2. filename: Optional[str]
  - Optional parameter
  - If you passed this parameter, it will disregard the report name as glob, and just look for the specific filename that you passed in the class instance.
```
rpath = ReportPath(report='MissingInMX', team='BONDS', filename='specific_filename.xlsx')
```

Attributes and Methods of ReportPath Class

self.process_date - returns the previous working date from the date of run in string format
```
adate = rpath.process_date # Example result: '16 Oct 2020'
```
self.glob_path - returns the glob path that you can use for glob.glob(path) method of finding filenames in directory. The glob is CASE INSENSITIVE when looking for excel files.
```
glob_path = rpath.glob_path 
# Output: '/Users/pluggle/Documents/Github/report_analysis/src/BONDS/201016/[Mm][Ii][Ss][Ss][Ii][Nn][Gg][Ii][Nn][Mm][Xx]*.xlsx'
```

self.filepath - returns the absolute path of our file

filepath = rpath.filepath
# Output: '/Users/pluggle/Documents/Github/report_analysis/src/BONDS/201016/MissingInMX.xlsx'

self.dir - returns the directory of report and team that you passed in.

directory = rpath.dir
# Output: '/Users/pluggle/Documents/Github/report_analysis/src/BONDS/201016'

self.glob_filename - returns the filename parth of your globpath. Again, this glob filename is CASE INSENSITIVE in looking for filenames of the excel files.
```
glob_fn = rpath.glob_filename
# Output: '[Mm][Ii][Ss][Ss][Ii][Nn][Gg][Ii][Nn][Mm][Xx]*.xlsx'
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

report_analysis

Table of Contents

Setting up Environment

Running the Script

Implementation

Setting up settings.py

Update Date of Run

Setting up Team

Setting up Reports to Run

Pipeline Class

Architecture of Pipeline Class

Other attributes and methods in pipeline class

Creating new Report or Pipeline class

ReportReader Class

Usage

ReportPath Class

Usage

Attributes and Methods of ReportPath Class

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
output		output
pipelines		pipelines
src		src
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
settings.py		settings.py

codewithpatch/report_analysis

Folders and files

Latest commit

History

Repository files navigation

report_analysis

Table of Contents

Setting up Environment

Running the Script

Implementation

Setting up settings.py

Update Date of Run

Setting up Team

Setting up Reports to Run

Pipeline Class

Architecture of Pipeline Class

Other attributes and methods in pipeline class

Creating new Report or Pipeline class

ReportReader Class

Usage

ReportPath Class

Usage

Attributes and Methods of ReportPath Class

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages