Welcome to my MSc project!
This readme is intended to give an orientation around the repository, but for full context it is advisable to read the accompanying report.
Please note: Large files, including data, have not been committed to the repository. Therefore many scripts won't simply run 'out of the box' as it is prerequisite that certain data and files have been prepared.
You will be able to run some scripts (see 'getting started' section), but unfortunately you will not be able to make full use of the repo if you simply clone it, as some of the data (e.g. station data) will be unavailable to you.
This is a multipurpose repository containing various sub-packages. Broadly there are four sections, some of which have their own dedicated readme file with more details. The following list is roughly in order of 'dependence' with earlier items being needed by latter items.
- tfl_api_logger: largely devoted to scripts which make API calls and log data to CSV. The most important one is
probably
bikeStationStatus.py
, for gathering station data, which was scheduled to periodically execute on a Raspberry Pi device. - cycle_journey_prep: scripts for downloading, combining and cleaning TfL's open journey data.
- database_creation: Used by
tfl_project/create_sqlite_database.py
and containing various scripts which, if prerequisite CSVs are ready, create a SQLite database and performs many steps to 'extract, transform and load' data into a proper database schema. The key output is:tfl_project/data/bike_db.db
. Unfortunately, you'll need to ask me for a copy of the station data for this to work. - simulation: All modules and scripts related to running simulations of the BSS.
There is also a Notebooks directory for exploration and analysis.
Project code is intended to be run from the repository root.
Make sure your IDE, interpreter or terminal are set to use the respository root as the working directory (including when running scripts) , or you may encounter issues with relative references and imports.
There are a handful of scripts which, if run directly from the terminal or cmd, may result in 'module not found' errors.
The reason for this is because running something like python foo/bar/some_script.py
in the terminal does not
automatically mean that the terminal working directory is added to the Python interpreter's path, so imports beginning
with foo.
can fail.
The solution to this is: either use an IDE which automatically appends the CWD to the python sys.path
, or run the script
as a module. The examples below assume that you are running them in the CMD, so you shouldn't have any problems.
Use Conda to create a virtual environment with the libraries that were used in development. If you are using windows you can exactly recreate my environment using:
conda env create -f windows_environment.yml
Alternatively a cross-platform file can be used: conda env create -f environment.yml
This is very slow (~5GB), but you can try running:
python tfl_project/cycle_journey_prep/combineCycleData.py
if you want to try pulling and aggregating the (post-2014) journey data from TfL!
After that you'd run:
python tfl_project/cycle_journey_prep/clean_combined_cycle_data.py
to clean it, but again this is a very slow step.
See tfl_project/cycle_journey_prep/README_cyclejourneys.md for more info.
This was fetched especially for the project and is too large to commit. Unfortunately the full functionality of the repo will not be available without it, although you are welcome to ask me for a copy of the station data.
You can run, for example:
python -m tfl_project.simulation.scenario_scripts.sim1_warehouses_bigger_5am
to perform simulations and get resulting output files.
If you're using the terminal or cmd, it's necessary to run it as a module, as above.
This is possible because I committed the contents of tfl_project/simulation/files/pickled_cities/london_warehouses to version control... the equivalent of "here's one I made earlier"
Some scripts such as sim0_base_5am_no_rebal.py won't work out of the box unless you have prepared the database accordingly: the pre-prepared cities are large files so weren't all put in version control.
With a little work you will be able to run, if you wish: python -m tfl_project.tfl_api_logger.bikeStationStatus
to see the
result of a single request and log. Run it as a module, as above.
In reality this was actually run by a cron job as scheduled, see tfl_project/tfl_api_logger/README_api_log.md for more details.
Two prerequisite actions are:
- You get a tfl token and save it as tfl_api_logger/apiCredentials.txt (ID on first line, token on second line)
- You change
out_csv
to an actual location available on your machine.
See tfl_project/tfl_api_logger/README_api_log.md for more info.