JulianHayward/Azure-MG-Sub-Governance-Reporting

Push AzGovViz output to repository fails when PSRule.csv exceeds 100 MB

extmaper opened this issue · 17 comments

When -DoPSRule is used and the PSRule.csv file size exceeds 100 MB the workflow fails.

remote: error: File wiki/AzGovViz_***_PSRule.csv is 100.65 MB; this exceeds GitHub's file size limit of 100.00 MB
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com./

so in case the csv exceeds this limit we could remove the description column from the output - as this field is quite large.

Thoughts?

CSV columns:

  • resourceType
  • subscriptionId
  • mgPath
  • resourceId
  • pillar
  • category
  • severity
  • rule
  • description
  • recommendation
  • link
  • ruleId
  • result
  • errorMsg

I think it would be ok to remove the description column as proposed.

please try v6_major_20220717_1

@extmaper works for you?

closing / no response

@JulianHayward - Sorry for the late reply, I have been on vacation. The implemented change for 100 MB file works as expected. Thanks for fast update.

@JulianHayward we get this error again:
remote: error: File wiki/AzGovViz_***.csv is 103.71 MB; this exceeds GitHub's file size limit of 100.00 MB
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com./

What other optimization's can be done to make it work?
Running version 6.3.0

@cajohanikea which file is it this time?

@JulianHayward The log says this "File wiki/AzGovViz_***.csv", The file that is generated after the scan of the environment that will be uploaded to the app service.

The parameters we are using is: -LargeTenant -NoPIMEligibility -GitHubActionsOIDC

The file is called AzGovViz__managementgroupid_.csv

What do you say about using Git Large File? - https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-git-large-file-storage

git large file - please give it a try and share if/how it worked.
I guess there are more files close to the 100MB limit? which are the next largest files?

some ideas..

  • start sorting out those files that are hitting the limit right before the commit/push (loss of data)
  • compress those files (loss of readable git history)
  • eval git large file
  • more?

The files after that are:

AzGovViz_managementgroupid_PSRule.csv - 77 MB
AzGovViz_managementgroupid_RoleAssignments.csv -Approx: 40 MB
AzGovViz_managementgroupid_DefinitionInsights.html - Approx: 30 MB

I don't think the PSRule file is used anymore since we don't specify the parameter -DoPSRule on script execution?

For compression do you mean using tool like gzip and how do you mean with "loss of readable git history"?
How will the compression impact the available data in azgovviz?

@cajohanikea please check the branch issue122next.
Check pipeline updates
(the cleanup is uncommented..)

@JulianHayward The way I understand the solution is that csv files are used for change tracking within GH and the HTML files is sent to App service and used in the solution.

Since the CSV files does not affect the solutions functionality in the App service but the html files would. Should the clean-up focus on csv files only?

In addition, what if the change tracking files instead of being check-in is sent to for example a storage account?
Then there would not be any kind of file size limitations to consider, and the user would have two options for change tracking in larger environments - delete or keep on external system.

Maybe Im misunderstanding, please let me know.

@cajohanikea please ping me on linkedId / let´s discuss scenarios/dependencies

@JulianHayward The feature in branch issue122next solves the problem by removing the AzGovViz_.csv file.
For us other files are not close to the limit - AzGovViz_
*_RoleAssignments.csv at 48 MB (We are not using -PSrule param and use param -Largetenant)

Run Write-Host "Checking files in $($env:OutputPath) for GitHub 100MB file size limit" Checking files in wiki for GitHub 100MB file size limit Ref: https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#file-size-limits Found total of 23170 files Found 1 files hitting the GitHub file size limit File 'AzGovViz_***.csv' size 104.378873825073MB exceeds the GitHub 100MB file size limit - removing file /home/runner/work/******/******/wiki/AzGovViz_***.csv

@cajohanikea merged - thanks!