Analyze Team Reports for audit
- Setting Environment Variables
- Running the Script
- Implementation
- Using command line, install
requirements.txtfrom the root directorypip install -r requirements.txt
- from the root directory run
main.pypython main.py
- The script will process reports, that is dated on previous working day.
- Excel outputs will go to the output dir
./outputseparated in each report folder.
You can change the following configuration in settings.py
- Date of run
- Set which reports to run
- Set What team should be inluded in the report analysis
- Set up directory paths
- Root directory - NEVER CHANGE ROOT DIRECTORY
- Report source directory
- Output directory
NOTE: If you will make changes on the directory names, make sure that it exists in your root directory
In settings.py, the default date of run is date today. The automation will process all data dated the previous weekday.
from datetime import dateDATE = date.today()You may set the date of run to different date.
For example, you want to run the automation on October 12 2020, Monday.
DATE = date(2020, 10, 12)The automation will process all data on previous working day. October 9 2020, Friday.
In settings.py, TEAMS variable sets which team will be included in the report analyais. If you want to add or remove team, just set it up from the list.
TEAMS = ['BONDS', 'EQD STRUCT', 'FXMM', 'FUNDING']NOTES:
- Teams that is in the list, should have it's corresponding folder in the reports directory.
- Lookup for folder name is CASE SENSITIVE. Team name in the list should match the exact folder name.
You may decide what reports to run in settings.py.
- RUN ALL REPORTS
RUN_ALL = True
- If you want to run only specific reports, Change
RUN_ALLtoFalseand set your report as1or0inREPORTSdictionary.1-> automation will run the report0-> automation will not run the report Example: If you want to only run,ExceptionRecordsandMissingInMXreports;RUN_ALL = False REPORTS = { 'ExceptionRecords': 1, 'Matched': 0, 'MISMATCH': 0, 'MissingInMX': 1, 'MissingInSAA': 0, 'UnclassifiedException': 0, }
- To run specific report in the
REPORTSdict, it has a counterpart setup inmain.py.- Go to
main.pyand add the pipeline class of the specific report inpipesarray and make sure IMPORT THAT PIPELINE CLASS frompipelinesmodule.from pipelines.exception_records import ExceptionRecordsPipeline from pipelines.matched import MatchedPipeline from pipelines.mismatch import MismatchPipeline from pipelines.mx import MissingInMXPipeline from pipelines.saa import MissingInSaaPipeline from pipelines.unclassified_exception import UnclassifiedExceptionPipeline pipes = [ ExceptionRecordsPipeline, MismatchPipeline, UnclassifiedExceptionPipeline, MatchedPipeline, MissingInSaaPipeline, MissingInMXPipeline ]
- Go to
Every reports that is being run has its corresponding Pipeline Class in pipelines module.
Each reports is separated in different py file in pipelines directory, with filename representing it's report name.
Example of the report string, and how to import into main.py:
ExceptionRecordsreport is in./pipelines/exception_records.pywith a pipeline classExceptionRecordsPipeline.from pipelines.exception_records import ExceptionRecordsPipeline
MissingInMXreport is in./pipelines/mx.pywith a pipeline classMissingInMXPipeline.from pipelines.mx import MissingInMXPipeline
MissingInSAAreport is in./pipelines/saa.pywith a pipeline classMissingInSaaPipeline.from pipelines.saa import MissingInSaaPipeline
main.pycalls and runs the pipeline class in the following order: For this example we will useMissingInSaaPipeline-
Initialize the pipeline class
- This will initialize the class and runs the
__init__method from the class.pipeline = MissingInSaaPipeline()
- Initialization includes:
- Inheritance of
ReportReaderclass which gives us the ability to read report for our specific pipeline class. - Being able to generate a dateframe for the
MASTERreport which can be accessed throughself.master_df - Generate a consolidated report dataframe from each team through
self.consolidated_df - Set the
CommentsandTeamcolumns ofself.master_dfdataframe to empty ornanvalues.
- Inheritance of
- This will initialize the class and runs the
-
Run
self.process_df()method:pipeline.process_df()
-
process_dfmethod will now process theself.master_dfwhich is the master report dataframe read the theReportReaderclass.Process includes:
-
self.__add_comment- applied to theself.master_dfand loop through it's row usingapplymethod ofpandas Dataframeclass.Comments will be matched with the consolidated report and perform matching if a row in
self.master_dfis inself.consolidated_df. -
self.__add_team- applied to theself.master_dfand loop through it's row usingapplymethod ofpandas Dataframeclass and look for the Team name inconsolidated_df.
-
-
-
Run
self.end_process()method:pipeline.endprocess()
-
end_processmethod is going to export the processed data:consolidated_df, andno_comments_dffromstep iii.-self.process_df(self)and write it to excel file intOUT_DIR->./output/The output filename consist of
report_name,date_of_process, and the df type:consolidated_dforno_comments_df.Example: 1. filename: "ExceptionRecords_16Oct2020_consolidated.xlsx" 2. output directory: "./output/ExceptionRecords/" 3. filepath: "./output/ExceptionRecords/ExceptionRecords_16Oct2020_consolidated.xlsx"
-
self.no_comments_df()- This is a property attribute that generates a dataframe of rows with no comments from themaster_df.
In template.py from the pipelines module, it consist the basic template of the pipeline class.
In order to create new pipeline for your new report:
-
Copy
template.pyand rename the py file into filename that represents your report or part of your report. -
Change
TemplatePipelineinto the name of your report. -
Set the class attribute
reportto your report name.Example - Your report name is
MyNewReport:class MyNewReport: report = "MyNewReport"
-
Set criteria in
__add_commentand__add_teammethod of which columns you want to match yourmaster_dfwithconsolidated_dfon looking up comments and team.criteria = { 'ColumnName1': df['ColumnName1'], 'ColumnName2': df['ColumnName2'] }
-
Set your new pipeline class in
settings.pyandmain.pyto run your new report. Refer to Setting up Reports to Run section of this documentation.
NOTE:
- This will only do the most basic matching and look up of comments for reports.
- If there are special cases for your new report, you might need to add code somewhere in the pipeline class to implement the special cases.
ReportReader is a class in utils module which reader your report for you.
-
To instantiate the
ReportReaderclass, you need to pass inreportas parameter which is your report name.reader = ReportReader('MissingInMX')
NOTE: Always make your report name the same as your report folder name CASE SENSITIVE
-
self.master_df- This will return you a dataframe of the master of the report you passed in instantiating the class.df = reader.master_df -
self.consolidated_df- This will return you a dataframe of the consolidated report for all the teams that is set up insettings.py. Reference: Setting Up Team -
self.read_report()- This will return you a dateframe of one reportParameters:
team: str- Required parameter
- This will tell the class to read the reports for the specific
teamthat you passed in. - The report that will be read is whichever report you passed in to ReportPath class as instance parameter.
df = reader.read_report(team='BONDS')
any_file: Optional[bool]- Optional parameter
- if you set
any_fileto true, it will make the ReportReader class read any filename in theteamdirectory regardless of thereportname
df = reader.read_report(team='BONDS', any_file=True)
filename: Optional[str]- Optional paramater
- This will make the ReportReader class to read the
filenamethat you passed as parameter. Iffilenameis not found, the automation will fail and stop.
ReportPath is a class that gives you the paths, filenames, and globs information for a specific report and team.
ReportPath class is going to look for the specific excel file that you will need to read.
-
To instantiate the ReportPath Class, you need to pass the following parameters:
-
report: str- Report name- Required parameter
- This will specify which
reportdirectory you are looking for.
-
team: str- Team name- Required parameter
- This will specify which
teamdirectory you are looking for.
rpath = ReportPath(report='MissingInMX', team='BONDS')
-
any_file: Optional[bool]- Optional parameter
- default is
False - If you passed this parameter, it will disregard the report name as glob and just look for any excel files that is in the
teamdirectory - NOTE: Your team directory must only have one excel file. If the script found 0 or more than 1 excel file in the directory, you will get an error and the automation will fail.
- You might only need to use this for special cases of your report
rpath = ReportPath(report='MissingInMX', team='BONDS', any_file=True)
-
filename: Optional[str]- Optional parameter
- If you passed this parameter, it will disregard the report name as glob, and just look for the specific
filenamethat you passed in the class instance.
rpath = ReportPath(report='MissingInMX', team='BONDS', filename='specific_filename.xlsx')
-
-
self.process_date- returns the previous working date from the date of run in string formatadate = rpath.process_date # Example result: '16 Oct 2020'
-
self.glob_path- returns the glob path that you can use forglob.glob(path)method of finding filenames in directory. The glob is CASE INSENSITIVE when looking for excel files.glob_path = rpath.glob_path # Output: '/Users/pluggle/Documents/Github/report_analysis/src/BONDS/201016/[Mm][Ii][Ss][Ss][Ii][Nn][Gg][Ii][Nn][Mm][Xx]*.xlsx'
-
self.filepath- returns the absolute path of our filefilepath = rpath.filepath # Output: '/Users/pluggle/Documents/Github/report_analysis/src/BONDS/201016/MissingInMX.xlsx'
-
self.dir- returns the directory ofreportandteamthat you passed in.directory = rpath.dir # Output: '/Users/pluggle/Documents/Github/report_analysis/src/BONDS/201016'
-
self.glob_filename- returns the filename parth of yourglobpath. Again, this glob filename is CASE INSENSITIVE in looking for filenames of the excel files.glob_fn = rpath.glob_filename # Output: '[Mm][Ii][Ss][Ss][Ii][Nn][Gg][Ii][Nn][Mm][Xx]*.xlsx'