Hey! We are stoked that you are interested in joining the team at Blue Onion Labs.
We have crafted the following test to see how you approach pulling and manipulating of data. We want to get a general idea of how you approach some common types of problems that we encounter here at Blue Onion (we are really proficient at integrations!)
spacexdata.com provides an API to query attributes about SpaceX launches (https://github.com/r-spacex/SpaceX-API/blob/master/docs/v4/README.md). For this exercise we are going to be working with one resource in particular:
- The Starlink Schema
For this exercise, no need to pull directly from the API as we have a pull of historical data here in this repo in the starlink_historical_data.json
We want to be achieve a few goals:
- To import the SpaceX Satellite data as a time series into a database
- To be able to query the data to determine the last known latitude/longitude of the satellite for a given time
Stand up your favorite kind of database (and ideally it would be in a form that would be runnable by us, via something like docker-compose).
Write a script (in whatever language that you prefer, though Ruby, Python, or Javascript would be ideal for us) to import the relevant fields in starlink_historical_data.json as a time series. The relevant fields are: - spaceTrack.creation_date (represents the time that the lat/lon records were recorded) - longitude - latitude - id (this is the starlink satellite id) Again, the goal is that we want to be able to query the database for the last known position for a given starlink satellite. Don't hesitate to use any tools/tricks you know to load data quickly and easily!
Write a query to fetch the the last known position of a satellite (by id), given a time T. Include this query in your README or somewhere in the project submission
Connect to database, run the following style of query to get the record with the answer from the timeseries: SELECT * FROM getlocation('5eed7716096e5900069857f0', now());
Write some logic (via a combination of query + application logic, most likely) to fetch from the database the closest satellite at a given time T, and a given a position on a globe as a (latitude, longitude) coordinate.
No need to derive any fancy match for distances for a point on the globe to a position above the earth. You can just use the Haversine formula. Example libraries to help here:
For Python: https://github.com/mapado/haversine
For Ruby: https://github.com/kristianmandrup/haversine
- Run through it one last time to make sure it works!
- Push the code up to your repo one last time (or save your working directory to a 'zip')
- Reach out to us with your solution
If you have any questions at all during the challenge do not hesitate to reach out! Whether it be a question about the requirements, submitting, anything, just send us a note!
I definitely ran out of time. I expect you should be able to run as follows:
- docker-compose up (to start the postgres db)
- python -m venv
- activate and then to install/run, pip install -r requirements.txt
- python rewrite_json.py
- Connect to database using postgres/postgres on 5432 using your db tools of choice.
- Run a query similar to the below to see what the data is for the given id of interest/timestamp: SELECT * FROM getlocation('5eed7716096e5900069857f0', now());
I did not do what I could consider to be thorough validating/testing the data. This would be my very next step. I chose to omit the data that was questionable. This would have to be handled in a production setting, likely through some thoughtful work....