/DDCC

Primary LanguagePythonMIT LicenseMIT

DivvyDose Code Challenge

Flask application to merge bitbucket and github API information and return as single JSON object

Installation instructions

Using python3.6, create a virtual environment

python3 -m venv /path/to/env
source /path/to/env/bin/activate

Install requirements with pip:

pip install -r requirements.txt

NOTE: The rate limiting from the github API is pretty restrictive unless you authenticate as a github user when making requests. I am currently importing github username and password via the environment variables GITHUB_USER and GITHUB_PASS. Set these how you would like, my example sets them at the command line at runtime. Also note that basic auth does NOT work if your user has 2-factor authentication enabled, so either temporarily disable it or create a dummy account without 2-factor auth.

Run Flask:

GITHUB_USER=username GITHUB_PASS=password FLASK_APP=app.py python -m flask run

Follow the link in console to your running app, usually http://127.0.0.1:5000

Usage

The route to merge the profiles is exposed at /merge. This route takes two query parameters, bb_name for the bitbucket user and gh_name for the github user. Here's an example url: http://127.0.0.1:5000/merge?bb_name=pygame&gh_name=miguelgrinberg

Data Format

Here's an annotated sample JSON output

{
	account_size: 40307881, # size of the merged accounts
	commits: 8875, # total commits of the merged accounts across all branches
	languages: { 
		count: 11, # total number of unique languages use
		list: [ # a deduped list of languages used
			"python",
			"shell",
			"batchfile",
			"html",
			"javascript",
			"css",
			"coffeescript",
			"ruby",
			"c",
			"c++",
			"hcl"
		]
	},
	open_issues: 355, # total open issues
	repo_count: {
		forked: 57, # number of forked repositories
		original: 72 # number of non-forked repositories
	},
	repo_topics: {
		count: 48, # count of all topics across all github repos
		list: [
			"webapp",
			"unittest",
			"serverless-deployments",
			...etc
		]
	},
	repo_watchers: 15430, # total number of watchers/followers across repos
	stars: {
		given: 218, # total github repos users have starred
		received: 15165 # total numbers of stars on users own github repos
	},
	user_watchers: 5866 # total number of users following both merged accounts
}

Notes and Considerations

  • Chose GET as the REST verb, since you are retrieving read-only data and nothing is being altered
  • Since using GET, chose to expose query params to set the user accounts to merge
  • Currently all responses are checked for 200 status, if the APIs return anything else, the response is just ignored and the data from that particular call is not incremented. If given more time I would try to handle this with retries and depending on the data being retrieved, return an error response instead of the data if unsuccessful.
  • Code is not terribly efficient, there are many REST calls being made (4x at least per repo). I also didn't refactor for efficiency, only readability. If given more time I would try to reduce the number of API calls by studying the API documentation more thoroughly and refactor the code to run faster probably by reducing iterations where possible.
  • Used Flake8 standard for linting
  • Did not have enough time to write good coverage with unit tests. As is, I would try to unit test by mocking the requests library response with truncated real JSON responses from the github and bitbucket API. I wrote two very basic tests to show a mocking strategy, you can run them with command python -m unittest tests/test_app.py
  • I tried to make my code run as a series of function calls so I could test each logical segment individually, however even the three functions I made to create the data object, merge in the github data, and merge in the bitbucket data could be further broken down (function for getting follwer data, commit data, etc). This would help the readability, maintainability, and testability of the code.
  • Bitbucket doesn't have topics or stars, so the data for those is from github only