IMT Digest is an utility to download arbitrary-sized files from an internet source, apply an IMT hash to its contents and save the hash to a file.
It includes a throttle feature to limit the download speed, so it doesn't consume all the available bandwidth.
$ go get github.com/FcoManueel/imtdigest
The simplest way to use it is by providing a file location and a url in the following way:
$ imtdigest -file "/tmp/imt-hash.txt" -url "https://www.google.com/"
If you want the download to be throttled then you have to provide the -rate
flag followed by an amount of bytes per second.
Here you can find the full explanation of the available flags:
$ imtdigest --help
Usage of imtdigest:
-file string
The path of the file where the output will be saved. (required)
-rate int
Limits the max download rate. Units are in bytes/second. (optional)
-url string
The URL from which the data will be fetched. (required)
For the throttling feature I decided to go with a token bucket algorithm.
While there's nothing in the standard library for that, the Go team has indeed developed an implementation of it.
While my aim was use as few external packages as possible, I considered fair to include that one given its nature and origin.
The throttling I applied works only at an application level. There are other factors that could affect the effective transference rate (e.g. data could still be transferred if there's space on the OS receive buffer for the given TCP connection, even if the application thread consuming the data is sleeping).
The throttling might be good enough (for the purpose of the exercise and the allotted time anyways) since we are likely dealing with sufficiently large files, but there's definitely room for improvement.
The current implementation is specially bad for small file sizes and if I were to start again I would go with a custom/simpler implementation of the token bucket.
I added this project to my public GitHub for ease of installation and evaluation, but let me know if you prefer me to take it down or make it private.
This is fairly irrelevant for the intent of the exercise, but I found it interesting enough to share. The logic provided for the hashing function specifies that when i=0
then h[i-1]
should be considered to be 0.
This however might be an issue for a hashing
function, since it causes that every 0 byte flushes the hash (see hash_test.go for examples of this).