Gitpip is a small microservice that communicates with two APIs, Pipedrive and Github.
In short, it tracks a certain number of github users’ gists; Gists are routinely scanned, and in case a gist wasn’t seen before, it’ll be saved and published as a pipedrive activity.
Using the Github API you should be able to query a user’s publicly available github gists and create a deal/activity in Pipedrive for each gist. Implement an application that periodically checks for a user’s publicly available gists, this application should also have a web endpoint where it should show the gists for that user that were added since the last visit.
All in all, the app has the following requirements:
- it must query some
user
Github’s **Gist** API - it must post a
gist
from someuser
as an activity or deal to **Pipedrive’s** API - it must have some kind of
periodic
check, that will add unseengist
- it must have an endpoint where recently added
gist
are shown, for some givenuser
- it must have an endpoint that shows all
user
that are being tracked - Proper logging
- I understand posting a
gist
as posting the contents of it as an activity, with the deal bearing the user’s username. - The periodic checks obviously imply the need for some kind of state
- it should show the gists for that user that were added since the *last visit*. I’m assuming that visit is some kind of a
session
, hence getting last users will first check for the lastsession
in order to use it as the starting timestamp from which all added gists with at least that timestamp, for that specificuser
, will be returned. After this a newsession
will be added.
I wrote it in Golang, and I tried to make it as raw as possible, that is, with the least amount of external libraries. I am a huge advocate for YAGNI and KISS; only two gorilla/mux
and logrus
were used.
From the functonality requirements, and logging needs, we can have the following relational structure(I’m omitting the types), which also explains the rationale:
user_id
(does not have to be github’s user id),username
,created_at
gist_unique_id
(this is the gist’s file id, every gist can have multiple files)gist_id
(the full gist hash)raw_url_link
(the link to the gist file’s text, to be posted as an activity)username
(the user’s username, breaking the normal form for the sake of simplicity)gist_file_title
created_at
routine_id
(this is the id of each every-three-hours gist fetch)created_at
- (the point of this table is to keep track of all gists of all users that were added on each routine)
routine_id
gist_id
user_id
session_id
(this will hold each GetLatestGists session. Every time that endpoint is queried it will look for the previous session and respond accordingly)user_id
created_at
- Here I keep the generated docker-compose kubernetes deployments and services generated by kompose
- Where all code lies in.
domain_types.go
holds all types used forSQL
andJSON
handler.go
- Where the endpoint handler is inrepository.go
- Where all the database logic liesutils.go
- where the github and pipedrive API access logic is at.
- The SQL init script.
- The script to wait for another service to load. The licensing is in it, and it gives the needed credits.
Everything was developed with Docker
, attempting to use lightweight images since the very first commit.
The goal of using docker-compose
is, other than the obvious, to easily convert it to kubernetes
later using kompose
.
All sensitive information is required as an environment variable that ought to be passed by modifying the given docker-compose.yml
or kubernetes files. I know environment variables can be a security risk, due to the fact that if a malicious actor exec
’s into the container they can easily get the sensitive credentials.
- /users/<some_user>/<some_unique_id> - Adds a new user
- /users - Gets all tracked users.
- /health - Returns “Alive” with header 200 if the service is functional and non blocked.
- /latestgists/<some_user> Gets all newly added gists for a specified, already in, user, AND records the session, from which the next call will filter gists from. it will only show gists if a routine has happened, otherwise it won’t show any. with the kubernetes endpoint it is not possible to know when the last routine happened
I have compiled two binaries, for osx and generic linux. it should suffice to get it up and running given that you have the following environmental variables set up:
- PIPEDRIVE_TOKEN
- PIPEDRIVE_ORG
- POSTGRES_CONNECTION_STRING
You can also use docker-compose
as follows:
First, let’s build the microservice image.
docker build -f Dockerfile -t gistdrive:1.0
Then, let us spin up the compose(make sure to fill in your credentials)
docker-compose -f docker-compose.yml up -d
And viola!
curl "localhost:8080/users"
Should return nothing
curl -X POST -H "Context-Type: application/json" "http://localhost:8080/users/<some_username>/<some_unique_id_not_necessarily_githubs_id>"
Should return a new User.
And at last
curl -X POST -H "Context-Type: application/json" "http://localhost:8080/latestgists/<some_added_username>"
Should return the newly added gists, with respect to the last time you’ve made a POST to that endpoint, and given that a routine
has happened(EVERY THREE HOURS). So if it didn’t show anything now, come back in 3 hours :) (sorry)
All access is secured by RBAC, with firewall rules restricting all and any access other than from the specified microservice endpoint.
everything is in europe-north1a
, Finland.
The project was deployed on a Google Kubernetes Engine instance with 3
nodes, a replication factor of 2
, and 12 GB
total memory. It uses hardened
nodes in order to provent malicious nodes from trying to take over the cluster.
The health
check is done using a LivelinessProbe
, querying the /health
endpoint and the readiness check uses a ReadinessProbe
with the wait-for-it.sh
script in order to wait for the postgres pod. I know that in order for postgres to have persistency it needs a volume. For the sake of simplicity I decided to make it ephemeral.
I did not use any provisioning tool due to lack of time.
This should suffice for resilience and scalability, within this very specific context.
The images were stored on Google’s Artifact registry under private, source controlled registries.
All logging, database, service and general kube, is routed to Stackdriver
.
I know that logrus is writing everything to stderr
, I didn’t have enough time to fix it.
A load balanced external ip is: 35.228.33.forty-six*(the actual number is 46, I’m just writing it down in order to not get busted by crawlers) on port *8080.
All endpoints are accessible from there.
I wrote some 300 lines of test, but they were quite shameful, please don’t look.
However, if you really want to check them out you can just look at the past commit.