Simple application to generate, read, and show, tokens.
We can see this problem as a set of processes. You must first generate tokens, you must then read them, and save them into database. I decided to tackle this challenge by breaking it into two applications: producer and consumer.
This is the simplest one, a token generator that creates a file with random tokens, one per line, each consisting of seven lowercase letters a-z, and save them to storage/tokens (default).
The process of generating tokens is entirely based on random values. There is a charset, consisting of all the lowercase letters ([a-z]), and, for each position in a string of size 7 one letter is randomly picked from the charset.
The default values for the following parameters were set to what was asked in the challenge description, you can change them by overwriting its default values (--amount=1000000, to set amount to 1M, for example)
- amount: the number of tokens to be generated (default: 1000000)
- length: the length of the generated token (default: 7)
- path: the file location to save the generated tokens (default: storage/tokens)
This is where the tokens are read, and inserted into the database. Since finding duplicates were also part of the problem, I decided to use an auxiliary data structure. All the entries are read from the file and saved into a hash map, maping token to total, which is the amount of times this token appears.
It is here, also, that the frequency of non-unique tokens is printed out concurrently while saving the data into the database.
For the consumer, the available parameters were added to make it easier to test different configurations for batch size and number of workers. The default values here worked for my hardware, but might not be the best set for other machines.
- batch: the number of tokens to be present in a single insert (default: 100)
- workers: the number of goroutines used to access database (default: 30)
- path: the file location to save the generated tokens (default: storage/tokens)
Run the following commands in order to see the project working.
- Prepare the database:
make db
- Generate the tokens (and save to file):
make produce
- Consume the tokens (from file, save to db):
make consume
There's also a shortcut to run the tests: make test
.
I tried my best to implement everything, as asked, and I hope I did. The database access is done using workers, the inserts are in batch, the report generation is done in another goroutine, a hash map is used to find the duplicates (in memory), the tokens are generated and written to a file in chunks, and I also tried to change a few configurations parameters to speed up the communication with Postgres.