VM Setup Guide: https://docs.google.com/document/d/1lqg1ISPt8mezwkp_yapuwRHbmotTTP4sRr9wIR6qZBo/
This repo contains the material for generating data for the Advanced SQL class
To generate one week of data it takes approx:
- ~60 minutes
- 120 MB disk
- ? MB of memory
- 100% CPU
To build one year:
- 22 hours
- 6 GB Disk
make build
# To push to docker hub...
make push
Then use the VM configuration guide and the config.sh script on the VM to ready the machine for class.
Links:
- https://docs.google.com/document/d/1lqg1ISPt8mezwkp_yapuwRHbmotTTP4sRr9wIR6qZBo/
- https://github.com/LogstonEducation/PDL-Advanced-SQL-Material-Prep/blob/master/questions.md
- https://towardsdatascience.com/learning-sql-201-optimizing-queries-regardless-of-platform-918a3af9c8b1
- https://cs.stanford.edu/people/nick/how-hard-drive-works/
- https://blogs.umass.edu/Techbytes/2017/04/04/hard-drives-how-do-they-work/
Books https://www.amazon.com/Database-Internals-Deep-Distributed-Systems/dp/1492040347 https://jakevdp.github.io/PythonDataScienceHandbook/ https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321 https://pages.cs.wisc.edu/~remzi/OSTEP/ (Part 3)