A repository to showcase how Python's pandas library can be used to interact with a SQL database via SQLAlchemy. The primary use case for this type of integration is when a "data science" workflow needs to integrate with a software development workflow. This repository in particular is organized around a scheduled job that involves fetching, preprocessing, and writing data to a SQL database.
For illustrative purposes, a small csv with fake data is included to simulate what raw data on employees of an organization might look like.
All code that will run as part of the scheduled job goes in here.
There is a main
function which, in this example, is run as a scheduled job using Python's apscheduler
library.
main
should call all functions that are required to perform the workflow. Beyond this, there is no restriction on how modules should be structured inside of the main folder.