Monthly Audit Issue Tracker by using Python-docx package
Table of Contents
- For every first date of the month, track company's audit action plans status in order to prevent overdue or aging.
- Generate monthly Audit Action Plan Tracking Report in Python.
- Perform data cleaning for a raw data from Database.
- Filter and generate data in certain format.
- Configure word document formats: such as font, style, color and margin.
- Program: Python
- Packages: pandas, Python-docx, datetime
- Files:
- Automation.ipynb --> Python Script
- tblReport Query.xlsx --> raw data file exported from Data Repository
! Every record in the data is made up and does not reflect the real-life information !
FIELDS:
- Project ID (Format YYYY-###): Identification for each project
- Project Name: Audit project name
- Issue Number: Issue numbering
- DepartmentResponsible: Name of department audited
- Action Status: Status of action plan as ("Open"|"Closed"|"Pending Verification")
- Target Date: Initial target date to complete action plan
- Revised Target Date: Target date when extedned one time or Latest Revised Target Date
- 1st Revised Target Date: Target date when extended two times
- 2nd Revised Target Date: Target Date when extended three times
- 3rd Revised Target Date: Target Date when extended four times
- Management Action: Action Detail in Issue Number
#Revised Date chronic order
for i in range(0, len(Data_Repository)):
if pd.isnull(Data_Repository.iloc[i]['3rd Revised Target Date']) == False:
list = [Data_Repository.iloc[i]['1st Revised Target Date'], Data_Repository.iloc[i]['2nd Revised Target Date'], Data_Repository.iloc[i]['3rd Revised Target Date']]
list.sort(key = lambda date: datetime.strptime(date, '%m.%d.%y'))
Data_Repository.loc[i, '1st Revised Target Date'] = Data_Repository.loc[i, '1st Revised Target Date'] = list[0]
Data_Repository.loc[i, '2nd Revised Target Date'] = Data_Repository.loc[i, '2nd Revised Target Date'] = list[1]
Data_Repository.loc[i, '3rd Revised Target Date'] = Data_Repository.loc[i, '3rd Revised Target Date'] = list[2]
elif pd.isnull(Data_Repository.iloc[i]['2nd Revised Target Date']) == False:
list = [Data_Repository.iloc[i]['1st Revised Target Date'], Data_Repository.iloc[i]['2nd Revised Target Date']]
list.sort(key = lambda date: datetime.strptime(date, '%m.%d.%y'))
Data_Repository.loc[i, '1st Revised Target Date'] = Data_Repository.loc[i, '1st Revised Target Date'] = list[0]
Data_Repository.loc[i, '2nd Revised Target Date'] = Data_Repository.loc[i, '2nd Revised Target Date'] = list[1]
- Notice that "Payment Processing" project has error in Target Date; 1st Revised Target Date comes after 2nd Revised Target Date.
- This is due to human error, entering the date in wrong order.
- Since "Target Date" and "Revised Target Date" are not influenced by this error, simply re-arrange 1st & 2nd & 3rd Revised Target Date in chronic order by using list.sort.
Every code for 4. Body is inside of function called body. Thus, be mindful of indentation.
def body(To, cc, dept, Date1, Date2, evidence):
doc = docx.Document()
section = doc.sections[0]
section.top_margin = Inches(0.62)
section.bottom_margin = Inches(0.31)
section.left_margin = Inches(0.75)
section.right_margin = Inches(0.81)
normal_style = doc.styles['Normal']
normal_style.font.name = 'Arial'
normal_style.font.size = Pt(10)
normal_style.font.color.rgb = RGBColor(31, 73, 125)
-
This is a preset of word document format.
-
doc = docx.Document() is Document constructor from Python-docx package.
- Every Document objects must follow after the initial constructor.
-
section. configure fortmat of [Layout --> Margins] in Document.
-
normal_style. configure format of [Home --> Styles --> Normal] in Document.
Note that body function has six different arguments
* To & cc & evidence --> defined in dictionary from {Departments} section
* dept --> defined in list from {Departments} section
* Date1 & Date2 --> already defined in {Config This Before Run} Section
main1 = '''
To: {}
cc: {}
'''.format(To, cc)
- main1 prints names of employees who will receive report email.
- "To:" for head of department
- "cc:" for relevant employees
line1 = '''
Subject: {} Open Audit Issues / Action Plans Summary as of {}
'''.format(dept, Date1)
- line1 prints subject line of email.
- contains the name of department and date of report.
line2 = '''
Please find the Outstanding Audit Issues/Action Plans Summary as of {} (attached).
When actions have been completed, please provide the supporting evidence to close the action. Email evidence to {}.
Should you need to revise the target completion date, please send an email to Internal Audit DH, or designee, noting the revised date, reason for the delay and interim action to mitigate risk, as applicable.
Please note: Target Date extensions will need to be approved by the respective Division Head, via email.
'''.format(Date2, evidence)
- line2 prints body paragraph of email.
- "evidence" for whom to send evidence of action plan. Mostly auditor who is in charge of (in this case, Ted Jung).
parag = doc.add_paragraph(main1, 'Normal')
parag.add_run(line1).font.color.rgb = RGBColor(0, 32, 96)
parag.add_run(line2)
- To add texts in Document, initial paragraph must be created by parag..
- multiple paragraphs can exist as different groups.
- After initial texts of paragraph, additional lines can be added by add_run.
- each added line is subordinated to paragraph and follows paragraph's format unless defined seperately like 'line1'.
excel = Data_Repository[Data_Repository['DepartmentResponsible'] == dept]
n1 = 2
- Filter only necessary departments in the report.
- First two rows (n1 = 2) in the table are title and column names.
if excel['3rd Revised Target Date'].notnull().any() == True:
excel = excel[['Issue # Ref', 'Action Status', 'Target Date', '1st Revised Target Date', '2nd Revised Target Date', '3rd Revised Target Date', 'Revised Target Date', 'Audit Name', 'Brief Description']]
elif excel['2nd Revised Target Date'].notnull().any() == True:
excel = excel[['Issue # Ref', 'Action Status', 'Target Date', '1st Revised Target Date', '2nd Revised Target Date', 'Revised Target Date', 'Audit Name', 'Brief Description']]
elif excel['1st Revised Target Date'].notnull().any() == True:
excel = excel[['Issue # Ref', 'Action Status', 'Target Date', '1st Revised Target Date', 'Revised Target Date', 'Audit Name', 'Brief Description']]
elif excel['Revised Target Date'].notnull().any() == True:
excel = excel[['Issue # Ref', 'Action Status', 'Target Date', 'Revised Target Date', 'Audit Name', 'Brief Description']]
else:
excel = excel[['Issue # Ref', 'Action Status', 'Target Date', 'Audit Name', 'Brief Description']]
- Default format --> 'else:'.
- If action plan gets extended, then:
- If extended one time --> Revised Target Date
- If extended two times --> Revised Target Date and 1st Revised Target Date
- If extended three times --> Revised Target Date and 1st Revised Target Date and 2nd Revised Target Date
- If extended four times --> Revised Target Date and 1st Revised Target Date and 2nd Revised Target Date and 3rd Revised Target Date
- Number of columns will vary on the maximum number of Target Date extended.
- Each department has different number of extension and therefore it is necessary to set columns differently by each circumstance.
For example:
Commercial Credit_01.02.23.docx does not have any action extended its Target Date and therefore has only 5 columns.
Servicing & Loyalty_01.02.23.docx has one action extended four times, and therefore has 9 columns
columns = excel.columns
n2 = len(columns)
excel = excel.fillna('')
- Replace nan to '' for clear visualization in the table, otherwise you would see word nan in blank space.
def make_rows_bold(*rows):
for row in rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
for run in paragraph.runs:
run.font.bold = True
- Function to make text of the cell in table Bold.
shading_elm_1 = parse_xml(r'<w:shd {} w:fill="#DEEAF6"/>'.format(nsdecls('w')))
- Function to fill first row cells with certain color.
def set_table_header_bg_color(cell):
"""
set background shading for Header Rows
"""
tblCell = cell._tc
tblCellProperties = tblCell.get_or_add_tcPr()
clShading = OxmlElement('w:shd')
clShading.set(qn('w:fill'), "#D9D9D9")
tblCellProperties.append(clShading)
- Function to fill second row cells with certain color.
- Another way of coding the color.
def set_table_header_bg_color2(cell):
"""
set background shading for Header Rows
"""
tblCell = cell._tc
tblCellProperties = tblCell.get_or_add_tcPr()
clShading = OxmlElement('w:shd')
clShading.set(qn('w:fill'), "#FFC7CE")
tblCellProperties.append(clShading)
- Function to fill Target Date that is overdue with pink.
- Each color requires each function to define.
#Table Basic Format
table = doc.add_table(rows = n1, cols = n2)
table.allow_autofit = True
table.alignment = WD_TABLE_ALIGNMENT.LEFT
table.cell(0, 0).text = 'High-level Status' #First row text
make_rows_bold(table.rows[0]) #First row Bold
table.cell(0, 0).merge(table.rows[0].cells[-1]) #First row merge
table.cell(0,0)._tc.get_or_add_tcPr().append(shading_elm_1) #First row color
for col in range(len(columns)): #Column Names
table.cell(1, col).text = columns[col]
table.cell(1,col).paragraphs[0].paragraph_format.alignment = WD_TABLE_ALIGNMENT.CENTER
for each_cell in table.rows[1].cells: #Second row color
set_table_header_bg_color(each_cell)
- Every subordiante codes must follow after 'doc.add_table'.
- To indicate particular cell, two ways:
- table.cell(0,0)
- table.rows[0].cells[0]
#Table Adding Contents
for i, row in enumerate(excel.iloc[0:].itertuples()):
table_row = table.add_row().cells
for col in range(0, n2):
table_row[col].text = str(row[col+1]) #+1 for ignore index
table_row[col].paragraphs[0].paragraph_format.alignment = WD_TABLE_ALIGNMENT.LEFT
- Loop creating rows by number of action plans in each department.
- Index column is in 0 at 'excel'. Therefore, add 1 to skip Index.
#Expired Action Highlight
if len(excel.columns) >= 6:
for row in range(3, len(table.rows) + 1): #3 here starts from 1
if table.cell(row, -3).text != '': #-3 = 'Revised Target Date #If this cell has been revised
if datetime.strptime(table.cell(row, -3).text, "%m.%d.%y") < today:
set_table_header_bg_color2(table.rows[row -1].cells[-3])
else: #If cell has not been revised
if datetime.strptime(table.cell(row -1, 2).text, "%m.%d.%y") < today: #2 = Target Date
set_table_header_bg_color2(table.rows[row - 1].cells[2])
elif len(excel.columns) == 5:
for row in range(3, len(table.rows) + 1):
if datetime.strptime(table.cell(row, -3).text, "%m.%d.%y") < today: #Since no Revised column, -3 = Target Date
set_table_header_bg_color2(table.rows[row -1].cells[2])
- If the department has at least one action plan revised before --> Number of Clolumn >= 6
- If the department has no action plan revsied ever --> Number of Column = 5
- Action plan should have been remediated by the latest Target Date (either revised or not). If not, then overdue action plan and needs attention.
#Change whole table text color
for row in table.rows:
for cell in row.cells:
paragraphs = cell.paragraphs
for paragraph in paragraphs:
for run in paragraph.runs:
run.font.color.rgb = RGBColor(0,0,0)
- Text color is previously influenced by {Word.docx} font color. Therefore, re-set the color only for Table
