-
For each http api request on
/analysis?duration=x&dimension=y
, -
Fetch social media post events from Upfluence SSE api during x duration
-
Parse the received datas to extract timestamp and y dimension values
-
Get the first and last timestamp values of the received datas
-
Sort the datas by y dimension values
-
Compute 3 percentiles values from the sorted datas : p_50 (median), p_90 and p_99
-
Return a JSON object with total number of items in set, first and last timestamps and the computed percentiles.
* Integrality of the goals /subject of this exercise is kept private
-
Read the goals (the subject)
-
Upstream API discovery via cli tools (of course it's curl)
-
Write down data objects models
-
Implement a draft (quick & dirty) project version with a minimal separation of concerns
-
Re-Read the goals to avoid missing features
-
Write unit tests with mocked upstream api for critical components (parameter validation and statistics computation)
-
Refactoring to stay DRY and best-practices compliant
Current implementation consist of :
-
A Golang api server with
echo
framework -
A package for api structs definitions in api
-
A listen the stream and send the events to a chan during x duration method (Subscribe),
-
A sort set and compute centiles methods
-
A GET api handler
-
A main function to start the api server
-
Memory usage is a bit too high in SubscribeFull as we deserialize and save the whole StreamEvent in ram without the need for the extra fields it contains. This is a bit mitigated in SubscribeLight method as only the timestamps and dimension values are sent to the chan.
-
Each client connection to the api create a new upstream api listener, the connection is not shared, I could have used a shared stream of event with mutexes for read for every consumer request.
-
Current draft implementation parse and format all the upstream api datas, I guess we should avoid to save all the keys we dont need.(cf Memory Usage). -
The error handling in Subscribe methods is a bit too strict and doesnt allow for a retry if connection failed or is interrupted.
-
All the running env (eg the PORT this server is listening on or the upstream api URL) is hardcoded, this is bad practice.
-
This project is using the golang "log" package to print info, warnings and errors to stdout , the message format and the logger package should reflect the one used by the organisation.
-
The Percentiles method shouldn't have any reference to the api models and should only take the unsorted slice of "dimension" values for a better separation of concerns(Fixed in PercentilesV2).
-
Wondering if "NonAccurate" events should be removed from the statistical set (for example, if likes is the dimension, Article type of stream event does not have this property).
-
Wondering if some event kinds property without the same name but representing the same dimension should be accounted as so (eg : Is repins the same as retweets).
-
Stat method and api models could have been part of the public api (
pkg
directory) for this project but as its only a test, all this parts have been put ininternal
directory. -
Regarding the upfluence package, the organisation should have a dedicated client with proper structs and options, this implementation is volontary simple.
-
Slice sorting have been implemented using go slices Sort package, as no external lib is allowed it is planned to re-implement it with either a partition sort for small list and quicksort for largest ones( EDIT : A naive Quicksort and Partition sort algo is implemented on the sort package) -
In Subscribe method, a strings.Replace method is used to isolate the json string from the
data:
prefix, seems a bit overkilled.
GET
/analysis
: Getting statistical analysis for Upfluence SSE stream datas
name type data type description duration required string (query parameter) A duration eg: 10s, 24h, 1d ... dimension required string (query parameter) An analyis dimension : likes, comments, favorites, retweets
http code content-type response 200
application/json
{"total_posts": number,"minimum_timestamp":UNIX timestamp number,"maximum_timestamp":UNIX timestamp number,"likes_p50":number,"likes_p90":number,"likes_p99":number}
400
application/json
{"error":"a usefull error message describing the bad parameter"}
204
NA No content if no SSE data match the required dimension parameter
curl -X GET -H "Content-Type: application/json" http://localhost:8080/analysis?duration=10s&dimension=likes
-
Install a recent Go / Golang version in your favorite distribution (HELP).
-
Optionally install GNU
make
tool and a container building tool compactible with thedockerfile
recipe (eg: buildah, podman, docker cli ..). -
Once done, you are ready to run the api command with
go run cmd/server/main.go
executed on your favorite shell interface. -
If you have the
make
tool setup, runmake help
or justmake
from the repo root to see the available commands.