Analyze Team Reports for audit
- Setting Environment Variables
- Running the Script
- Implementation
- Using command line, install
requirements.txt
from the root directorypip install -r requirements.txt
- from the root directory run
main.py
python main.py
- The script will process reports, that is dated on previous working day.
- Excel outputs will go to the output dir
./output
separated in each report folder.
You can change the following configuration in settings.py
- Date of run
- Set which reports to run
- Set What team should be inluded in the report analysis
- Set up directory paths
- Root directory - NEVER CHANGE ROOT DIRECTORY
- Report source directory
- Output directory
NOTE: If you will make changes on the directory names, make sure that it exists in your root directory
In settings.py
, the default date of run is date today. The automation will process all data dated the previous weekday.
from datetime import date
DATE = date.today()
You may set the date of run to different date.
For example, you want to run the automation on October 12 2020, Monday.
DATE = date(2020, 10, 12)
The automation will process all data on previous working day. October 9 2020, Friday.
In settings.py
, TEAMS
variable sets which team will be included in the report analyais. If you want to add or remove team, just set it up from the list.
TEAMS = ['BONDS', 'EQD STRUCT', 'FXMM', 'FUNDING']
NOTES:
- Teams that is in the list, should have it's corresponding folder in the reports directory.
- Lookup for folder name is CASE SENSITIVE. Team name in the list should match the exact folder name.
You may decide what reports to run in settings.py
.
- RUN ALL REPORTS
RUN_ALL = True
- If you want to run only specific reports, Change
RUN_ALL
toFalse
and set your report as1
or0
inREPORTS
dictionary.1
-> automation will run the report0
-> automation will not run the report Example: If you want to only run,ExceptionRecords
andMissingInMX
reports;RUN_ALL = False REPORTS = { 'ExceptionRecords': 1, 'Matched': 0, 'MISMATCH': 0, 'MissingInMX': 1, 'MissingInSAA': 0, 'UnclassifiedException': 0, }
- To run specific report in the
REPORTS
dict, it has a counterpart setup inmain.py
.- Go to
main.py
and add the pipeline class of the specific report inpipes
array and make sure IMPORT THAT PIPELINE CLASS frompipelines
module.from pipelines.exception_records import ExceptionRecordsPipeline from pipelines.matched import MatchedPipeline from pipelines.mismatch import MismatchPipeline from pipelines.mx import MissingInMXPipeline from pipelines.saa import MissingInSaaPipeline from pipelines.unclassified_exception import UnclassifiedExceptionPipeline pipes = [ ExceptionRecordsPipeline, MismatchPipeline, UnclassifiedExceptionPipeline, MatchedPipeline, MissingInSaaPipeline, MissingInMXPipeline ]
- Go to
Every reports that is being run has its corresponding Pipeline Class in pipelines
module.
Each reports is separated in different py file in pipelines
directory, with filename representing it's report name.
Example of the report string, and how to import into main.py
:
ExceptionRecords
report is in./pipelines/exception_records.py
with a pipeline classExceptionRecordsPipeline
.from pipelines.exception_records import ExceptionRecordsPipeline
MissingInMX
report is in./pipelines/mx.py
with a pipeline classMissingInMXPipeline
.from pipelines.mx import MissingInMXPipeline
MissingInSAA
report is in./pipelines/saa.py
with a pipeline classMissingInSaaPipeline
.from pipelines.saa import MissingInSaaPipeline
main.py
calls and runs the pipeline class in the following order: For this example we will useMissingInSaaPipeline
-
Initialize the pipeline class
- This will initialize the class and runs the
__init__
method from the class.pipeline = MissingInSaaPipeline()
- Initialization includes:
- Inheritance of
ReportReader
class which gives us the ability to read report for our specific pipeline class. - Being able to generate a dateframe for the
MASTER
report which can be accessed throughself.master_df
- Generate a consolidated report dataframe from each team through
self.consolidated_df
- Set the
Comments
andTeam
columns ofself.master_df
dataframe to empty ornan
values.
- Inheritance of
- This will initialize the class and runs the
-
Run
self.process_df()
method:pipeline.process_df()
-
process_df
method will now process theself.master_df
which is the master report dataframe read the theReportReader
class.Process includes:
-
self.__add_comment
- applied to theself.master_df
and loop through it's row usingapply
method ofpandas Dataframe
class.Comments will be matched with the consolidated report and perform matching if a row in
self.master_df
is inself.consolidated_df
. -
self.__add_team
- applied to theself.master_df
and loop through it's row usingapply
method ofpandas Dataframe
class and look for the Team name inconsolidated_df
.
-
-
-
Run
self.end_process()
method:pipeline.endprocess()
-
end_process
method is going to export the processed data:consolidated_df
, andno_comments_df
fromstep iii.
-self.process_df(self)
and write it to excel file intOUT_DIR
->./output/
The output filename consist of
report_name
,date_of_process
, and the df type:consolidated_df
orno_comments_df
.Example: 1. filename: "ExceptionRecords_16Oct2020_consolidated.xlsx" 2. output directory: "./output/ExceptionRecords/" 3. filepath: "./output/ExceptionRecords/ExceptionRecords_16Oct2020_consolidated.xlsx"
-
self.no_comments_df()
- This is a property attribute that generates a dataframe of rows with no comments from themaster_df
.
In template.py
from the pipelines module, it consist the basic template of the pipeline class.
In order to create new pipeline for your new report:
-
Copy
template.py
and rename the py file into filename that represents your report or part of your report. -
Change
TemplatePipeline
into the name of your report. -
Set the class attribute
report
to your report name.Example - Your report name is
MyNewReport
:class MyNewReport: report = "MyNewReport"
-
Set criteria in
__add_comment
and__add_team
method of which columns you want to match yourmaster_df
withconsolidated_df
on looking up comments and team.criteria = { 'ColumnName1': df['ColumnName1'], 'ColumnName2': df['ColumnName2'] }
-
Set your new pipeline class in
settings.py
andmain.py
to run your new report. Refer to Setting up Reports to Run section of this documentation.
NOTE:
- This will only do the most basic matching and look up of comments for reports.
- If there are special cases for your new report, you might need to add code somewhere in the pipeline class to implement the special cases.
ReportReader
is a class in utils module which reader your report for you.
-
To instantiate the
ReportReader
class, you need to pass inreport
as parameter which is your report name.reader = ReportReader('MissingInMX')
NOTE: Always make your report name the same as your report folder name CASE SENSITIVE
-
self.master_df
- This will return you a dataframe of the master of the report you passed in instantiating the class.df = reader.master_df
-
self.consolidated_df
- This will return you a dataframe of the consolidated report for all the teams that is set up insettings.py
. Reference: Setting Up Team -
self.read_report()
- This will return you a dateframe of one reportParameters:
team: str
- Required parameter
- This will tell the class to read the reports for the specific
team
that you passed in. - The report that will be read is whichever report you passed in to ReportPath class as instance parameter.
df = reader.read_report(team='BONDS')
any_file: Optional[bool]
- Optional parameter
- if you set
any_file
to true, it will make the ReportReader class read any filename in theteam
directory regardless of thereport
name
df = reader.read_report(team='BONDS', any_file=True)
filename: Optional[str]
- Optional paramater
- This will make the ReportReader class to read the
filename
that you passed as parameter. Iffilename
is not found, the automation will fail and stop.
ReportPath
is a class that gives you the paths, filenames, and globs information for a specific report and team.
ReportPath class is going to look for the specific excel file that you will need to read.
-
To instantiate the ReportPath Class, you need to pass the following parameters:
-
report: str
- Report name- Required parameter
- This will specify which
report
directory you are looking for.
-
team: str
- Team name- Required parameter
- This will specify which
team
directory you are looking for.
rpath = ReportPath(report='MissingInMX', team='BONDS')
-
any_file: Optional[bool]
- Optional parameter
- default is
False
- If you passed this parameter, it will disregard the report name as glob and just look for any excel files that is in the
team
directory - NOTE: Your team directory must only have one excel file. If the script found 0 or more than 1 excel file in the directory, you will get an error and the automation will fail.
- You might only need to use this for special cases of your report
rpath = ReportPath(report='MissingInMX', team='BONDS', any_file=True)
-
filename: Optional[str]
- Optional parameter
- If you passed this parameter, it will disregard the report name as glob, and just look for the specific
filename
that you passed in the class instance.
rpath = ReportPath(report='MissingInMX', team='BONDS', filename='specific_filename.xlsx')
-
-
self.process_date
- returns the previous working date from the date of run in string formatadate = rpath.process_date # Example result: '16 Oct 2020'
-
self.glob_path
- returns the glob path that you can use forglob.glob(path)
method of finding filenames in directory. The glob is CASE INSENSITIVE when looking for excel files.glob_path = rpath.glob_path # Output: '/Users/pluggle/Documents/Github/report_analysis/src/BONDS/201016/[Mm][Ii][Ss][Ss][Ii][Nn][Gg][Ii][Nn][Mm][Xx]*.xlsx'
-
self.filepath
- returns the absolute path of our filefilepath = rpath.filepath # Output: '/Users/pluggle/Documents/Github/report_analysis/src/BONDS/201016/MissingInMX.xlsx'
-
self.dir
- returns the directory ofreport
andteam
that you passed in.directory = rpath.dir # Output: '/Users/pluggle/Documents/Github/report_analysis/src/BONDS/201016'
-
self.glob_filename
- returns the filename parth of yourglobpath
. Again, this glob filename is CASE INSENSITIVE in looking for filenames of the excel files.glob_fn = rpath.glob_filename # Output: '[Mm][Ii][Ss][Ss][Ii][Nn][Gg][Ii][Nn][Mm][Xx]*.xlsx'