Add report generation to workflow (move off of worker VM)
giancarloaf opened this issue · 2 comments
giancarloaf commented
The report generation script currently lives in a crontab on the "worker" VM. We would like to move this into the data-pipeline workflow at the end of the dataflow pipeline.
tunetheweb commented
It's a little bit trickier as we have three reports:
- The bulk of the reports can be run once the old tables are ready
- Some reports depend on the
httparchive.blink.*tables, which aren't updated until the 1st of the month as they depend on two BigQuery schedule tasks (materialize_blink_featuresand thenMaterialize Blink Feature Percentages- which is dependent on the first job). Could they be run as part of the pipeline so we don't have to wait until the first? - The CrUX data is not available until the 2nd Tuesday of the month, so we currently run that on the 15th of the month.
Would be lovely to clean all this up!
giancarloaf commented
I believe I found the current report generation script from the worker VM under igrigorik's user crontab
giancarlo_faranda@worker:~$ sudo su igrigorik
igrigorik@worker:/home/giancarlo_faranda$ crontab -l
#0 15 * * * /bin/bash -l -c 'cd /home/igrigorik/code && ./sync_csv.sh `date +\%b_1_\%Y`' >> /var/log/HAimport.log 2>&1
#0 8 * * * /bin/bash -l -c 'cd /home/igrigorik/code && ./sync_csv.sh mobile_`date +\%b_1_\%Y`' >> /var/log/HAimport.log 2>&1
#0 10 * * * /bin/bash -l -c 'cd /home/igrigorik/code && ./sync_har.sh chrome' >> /var/log/HA-import-har-chrome.log 2>&1
#0 11 * * * /bin/bash -l -c 'cd /home/igrigorik/code && ./sync_har.sh android' >> /var/log/HA-import-har-android.log 2>&1
# Attempt to run the reports everyday
0 8 * * * /bin/bash -l -c 'cd /home/igrigorik/code && sql/generate_reports.sh -th `date "+\%Y_\%m_01"` -l ALL' >> /var/log/generate_reports.log 2>&1
# Run the reports on the 2nd to pick up blink table updates
0 7 2 * * /bin/bash -l -c 'cd /home/igrigorik/code && sql/generate_reports.sh -th `date -d "-1 month" "+\%Y_\%m_01"` -l ALL' >> /var/log/generate_last_months_reports.log 2>&1
# Run the CrUX reports on 15th
0 7 15 * * /bin/bash -l -c 'cd /home/igrigorik/code && sql/generate_reports.sh -tfh `date -d "-1 month" "+\%Y_\%m_01"` -r "*crux*" -l ALL' >> /var/log/crux_reruns.log 2>&1