It's an online shop where you can buy bundles of digital goods, like video games, eBooks, audiobooks, and comics, for a price that you set yourself! What you pay is split into three portions, one going towards charity, one to pay developers or authors, and one as a tip for Humble Bundle itself. You can even set rations between those portions, how much of your money each party receives! More details: https://www.humblebundle.com/
Some bundles are large! I mean, laaarge! If a bundle contains 15 games in versions for Windows, Linux and Mac, it's already 45 files. And very often there are additional patches, soundtracks, extra packs and bonuses. It's not unusual for a comics bundle to contain over 100 files you can download.
Now, you probably don't need all those files, you can just cherry-pick the platform and the format you are most interested in and download just them. If that's the case then you probably don't need a 'downloader' :-)
For others who like to have files they bought on their disk, be it for easier access or as a backup, a 'downloader' is a piece of software that automates the downloading of those files.
Yes, it's a Humble Bundle downloader. It's an application written in Erlang that given a Humble Bundle key downloads files contained in that bundle. But not only. It remembers MD5/SHA1 checksums of all downloaded files so that it doesn't download files that have been downloaded previously. Oh yes, and also allows to define regex patterns to skip particular files from the download if, for example, you are not interested in 32-bit Linux packages or games with German user interface.
Hmmm. You mean, why I wrote it or why you should use it? :-) Not that there is a fierce competition, I just couldn't find anything that would remotely have the features I needed. Then I thought that I will spend less time writing it than downloading all the files in all the bundles that I have bought so far and will buy in the future.
But let's see what Humbundee is capable of:
Humbundee will try to download a file using qbittorrent-nox
first and will fallback to wget
if there was an error when processing the torrent file. This is mainly to avoid exploiting Humble Bundle servers if the file can be downloaded from somewhere else. And I chose qbittorrent because that was the only bittorrent client I could find that worked from command line and was properly supporting web seeds.
All downloaded files are checked if they are of the expected size and if their MD5/SHA1 checksums match those advertised on the bundle page (actually, SHA1 checksums are not advertised but they are available in the metadata describing the bundle).
Humbundee maintains a comprehensive log per each downloaded bundle. Not only it logs all files and torrents downloaded in the bundle, but also records all encountered problems, and at the end adds a single line of the download summary. See for yourself:
2016-02-06 20:29:45.731145> Finished all downloads, Expected: 45, Downloaded: 30, Ignored: 0, Excluded: 15, Errors: 0, Status: ok, closing log.
You may not want to download installation packages for Mac OS X or Linux 32-bit if you don't own those platforms. Or you may not speak German and don't want games featuring that language. Humbundee allows you to define a list of regex patterns to exclude downloads if the pattern is present in a name, URL, or platform associated with a particular download.
Sometimes the same game is included in more than one bundle. You are smart enough to not download again games you already downloaded from other bundles (if you remember them). A not-so-smart downloader may try to download everything you throw at it. But not Humbundee. It remembers all downloaded files, not by their names but sizes and MD5/SHA1 checksums, so it knows when a file has been downloaded previously even if it has a different name.
If a bundle contains a few dozens of files you don't want to end up with all of them in a single folder. Humbundee automatically organizes all downloaded files into a hierarchy of folders under the main folder named after the bundle key. See the configuration section below for more details.
By this I mean that at this moment Humbundee probably won't work on Windows, and certainly it won't compile on Windows. It may work in a Unix-like environment on Windows (cygwin, MSys2) but that hasn't been tested.
Installed in version that supports maps
. I developed Humbundee with Erlang 18 but 17 should be fine (albeit I didn't check that). Check if it's installed and in which version with:
erl -eval 'erlang:display(erlang:system_info(otp_release)), halt().' -noshell
It's a standard on Linux and available to be installed on other platforms. On *BSD check with:
which gmake
It's needed to clone the repository to your disk and also to fetch the build scripts (done automatically). At least version 1.6.x
would be needed.
Available to be installed from a package on most systems. Check if it's installed and available in the path with:
qbittorrent-nox --help
Also widely available to be installed from a package. Check if it's installed and available in the path with:
wget --help
You need to be logged in to Humble Bundle in order to download files from your bundle. But currently Humbundee doesn't allow you to log in. Instead, it expects you to log in using a normal browser and export a cookie which it can then use use to access the part of the website that only you are authorized to use. The cookie must be in a format compatible with wget
. I use this extension to export the cookie from Chrome.
When you ask it to download a bundle, Humbundee downloads a JSON
file that describes the bundle and starts a new Erlang gen_server
called hbd_id
to process the file. Humbundee can download multiple bundles simultaneously. Each hbd_id
spawns one Erlang process per bundle position, which is usually a game, eBook or soundtrack. Then each of those processes spawns one new Erlang process per file that can be downloaded. There are usually multiple files per bundle position, e.g. Linux, Windows and Mac versions of a game, or EPUB, PDF and CBZ versions of an eBook.
So, at this moment we have one Erlang process per each file that can be downloaded. For simplicity lets focus on what a single process does.
Firstly, it checks if the file can be downloaded at all. Some files, like streams or embedded JavaScript applications can't be downloaded. And so it dies if the file can't be downloaded. Secondly, it checks if the file has already been downloaded, and it dies if it has. Thirdly, it checks if the file should be excluded, i.e. if any of the regex patterns match, and it dies if yes.
If all is good the process tries to download the torrent file with wget
. If the torrent file doesn't exist, or can't be downloaded, or is incorrect, or can't be parsed, then the process discards information about the torrent file and the file will be downloaded as if the torrent didn't exist.
Please note that up to this moment all processing was done in parallel, including downloading torrent files with wget
. Whereas downloading dozens of short torrent files simultaneously isn't a big deal, downloading so many actual files very likely would be a big deal. Therefore the actual downloading is managed by a separate gen_server
called hbd_pool
. It cooperates with qbittorrent-nox
to download files with torrents and uses wget
to download files without torrents.
Each process that hasn't died so far adds its download to hbd_pool
and waits until the download is completed. The download isn't started immediatelly but hbd_pool
adds them to a queue and starts when the amount of active downloads is less than the specified threshold. Active downloads are those either added to qbittorrent-nox
or started with wget
.
This part is a bit hacky because qbittorrent-nox
doesn't provide any API to control the downloads. Torrent downloads are added to qbittorrent-nox
, which is configured to keep all pending downloads in the temp
folder and all finished downloads in the downloads
folder. The temp
folder is also used by hbd_pool
to store the temporary versions of files that are being downloaded by wget
. Every two seconds hbd_pool
performs three important checks:
- List completed downloads in the
downloads
folder. Every completed download is moved to the destination folder constructed from the download properties, i.e.:<key>/<publisher>/<title> - <platform>
. - Start new downloads from the queue up to the maximum allowed amount of active downloads.
- List pending downloads in the
temp
folder if the map with active downloads isn't yet emptied. Thetemp
folder will be empty if allwget
downloads have finished and ifqbittorrent-nox
isn't actively downloading any files. This may happen if all seeds mentioned in the torrent file are inactive. In that casehbd_pool
considers all active downloads older than 10 seconds as stale and starts downloading them withwget
.
Lets reiterate the last point. When the temp
folder is empty but the map containing all active downloads isn't yet empty it means that some files have been added to qbittorrent-nox
but are not being actively downloaded. If those files have been added to qbittorrent-nox
more than 10 seconds ago we can safely assume that they won't be started in the foreseeable future and treat them as stale - i.e. try to download them with wget
instead. This probably needs an additional explanation. The thing is that each download always contains an URL to download the file with HTTP/S but the torrent file is optional. Additionally, the web seed in the torrent file is not the same as the URL to download the file with HTTP/S. The former doesn't always work but the later should always work. That's why downloading the stale file with wget
from the HTTP/S URL is usually a safe fallback.
This brings us to yet another quite important point. Sometimes the torrent downloads are tad slow or they stop and don't seem to be progressing anymore. In that case just delete the download from qbittorrent-nox
along with the temporary file from the temp
folder (when deleting the download from the list qbittorrent-nox
will show you a checkbox asking if the file should be also deleted). Once temp
becomes empty all unfinished downloads will be detected as stale and hbd_pool
will start downloading them with wget
.
Now lets come back to the processes started by hbd_id
which are still waiting for the downloads to be completed. For each file that hbd_pool
moves from the downloads
folder to the destination folder it also sends back a response to the waiting process which initially added the file to hbd_pool
. Each hbd_id
remembers how many processes have been started to download files in the given bundle. The bundle download is finished when all those processes die after receiving either an OK that the file has been downloaded properly or an error. When no child process is left hbd_id
adds a summary to the log file, then the log file is closed and the hbd_id
dies as well.
Each file contained in a bundle and processed by Humbundee is added to an Index. Index is simply a mnesia table containing idx
records. The table is indexed by the bundle key and MD5 and SHA1 checksums of the file. Additionally it stores the file size and available JSON metadata of the downloaded file. You can query entries in the table with humbundee:read/1
and delete with humbundee:delete/1
.
All downloaded files are automatically added to the index. However, it's also possible to index files contained in a bundle without downloading them. This will be useful if you have already downloaded a bundle manually but still want to add it to the index so that Humbundee doesn't download its files again in case they are also present in another bundle. For that you can use humbundee:index/1
(see the Available commands section below).
This can be also used for testing purposes, i.e. to verify that exclude patterns work as expected. For that you could employ the following steps:
- Configure Humbundee (add desired exclude patterns, configure folders, etc).
- Start Humbundee
- Index the bundle with
humbundee:index/1
- Check download logs for any errors and also to verify if the desired files have been excluded
- Delete the download folder from the
destination_dir
folder - Delete the bundle from the index with
humbundee:delete/1
and repeat
Please note that hbd_idx
only cares about duplicate files and is not trying to index all downloaded files. Therefore, if the same file with the same MD5/SHA1 checksum is contained in multiple bundles with possibly different names, titles, publishers and bundle Id's, only one version of that file will be stored in the index with one specific set of metadata information.
The main, default configuration is in the humbundee/etc/sys.config.src file.
When Humbundee starts it additionally tries to read file .humbundee.conf
from the user's home directory, i.e.: ~/.humbundee.conf
. Then for each option found in the user configuration file it replaces the default value with the user defined one. In other words, user options take precedence but if they are not defined then the defaults are used.
%% -*- erlang -*-
{cookie, <<"/home/user/Downloads/cookies.txt">>}.
{download_dir, <<"/home/user/archive/downloads">>}.
{temp_dir, <<"/home/user/archive/temp">>}.
{destination_dir, <<"/home/user/archive/humblebundle">>}.
{exclude_regex_list,
[
{name, <<"^MOBI$">>, [caseless]},
{name, <<"^32-bit">>},
{name, <<"32-bit$">>},
{name, <<".rpm$">>},
{name, <<"^720p$">>},
{name, <<"^German$">>, [caseless]},
{platform, <<"ebook">>}
]}.
{workers, 20}.
The meaning of these options is as follows:
Location of the cookie file
Folder where qbittorrent-nox
stores completed downloads. Humbundee monitors this folder and moves completed files to an appropriate subfolder in the destination_dir
.
Folder where qbittorrent-nox
and wget
store incomplete downloads.
Amount of downloads that can be active at the same time (the sum of downloads that hbd_pool
adds to qbittorrent-nox
or starts with wget
can't be bigger that this threshold).
A parent folder for completed bundles. All files in completed bundles are stored in subfolders named according to the following pattern:
Key
.etr, .json, .log
*_torrents
*_data
*- Publisher
*- Title - platform
Where:
Key
is the bundle key, an alphanumeric set of characters associated with each bundle..etr
,.json
,.log
are files produced when downloading the bundle._torrents
contains all downloaded torrent files._data
is the parent folder for all downloaded files.Publisher
is usually the company behind the digital goods, games, comics, etc.Title
is the title of the digital item, a game title, a book title, etc.Platform
is one of android, linux, windows, comics, comedy, ebook, audio, etc.
List of tuples {Match, Pattern}
or {Match, Pattern, Options}
where:
Match
- one ofname
,link
,platform
andany
.Pattern
- theRegexp
variable accepted by re:compile/2Options
- list of options accepted by either re:compile/2 or re:run/4
Humbundee tries to match Pattern
against the content of specific fields in the JSON file according to Match
and excludes the file from the download if the match is successful. The following fields in the Humble Bundle JSON file are matched when the Match
is:
name
- fields: folder (payee, usually the publisher), title (human_name, usually the position name on the bundle page), machname (a shortened concatenation of the title and platform), name (text on the download button)link
- fields: web and bittorrent urlsplatform
- field: platformany
- all above, file is excluded if any of those fields match
The Options
list may contain options accepted by both, re:compile/2 and re:run/4. Humbundee uses those accepted by re:compile/2 when compiling the regex and filters them out so that when re:run/4 is later called to check the match only options accepted by re:run/4 are passed to it.
qbittorrent-nox -d
On the Downloads tab edit Save files to location:. It must have the same value as download_dir
configured for Humbundee in the previous section.
On the same Downloads tab edit Keep incomplete torrents in:. It must have the same value as temp_dir
configured for Humbundee in the previous section.
It may be a good idea to tick the Append .!qB extension to incomplete files check box to distinguish files downloaded with qbittorrent-nox
from files downloaded by hbd_pool
with wget
- they will have the extension .hbd!
.
Other options, especially those on Connection and Speed tabs can be configured to match your network link speed. They don't affect Humbundee in any way.
Install Erlang, GNU make, git, qbittorrent-nox, wget.
Log in to Humble Bundle and download the cookie in a format compatible with wget
. This extension can be used to export the cookie from Chrome.
Instead of updating the default configuration humbundee/etc/sys.config.src it's easier to create a new file .humbundee.conf
in the user's home directory, i.e.: vi ~/.humbundee.conf
.
See the section All about configuration above for details.
See the section All about configuration above for details.
Note1: use gmake
(GNU make) instead of make
on *BSD systems.
Note2: make dev
must be issued after make get-deps
has completed, make get-deps dev
won't work.
git clone https://github.com/amiramix/humbundee.git
cd humbundee/
make get-deps
make dev
./bin/init.esh
./bin/start.esh
to_erl ../hbd/shell/
Humbundee is managed from Erlang shell by calling functions exposed in the humbundee.erl module. See the next section for details.
q().
This is to be issued in the Erlang shell of course. Alternatively, in another console do in the Humbundee root folder:
./bin/stop.esh
The following commands can be issued in the Erlang shell to start and control downloads. All commands accept either strings or binaries.
humbundee:download(Key) -> ok | {error, Error}
Types:
Key = binary() | string()
Error = any()
Starts downloading the bundle with the given key. The function spawns a new process to handle the download and returns immediately. You can start downloading another bundle at any time, all downloads are handled in parallel. Please use humbundee:status/1
to get information about the status of downloading a particular bundle and humbundee:status/0
to get information about the status of the download queue hbd_pool
.
humbundee:index(Key) -> ok | {error, Error}
Types:
Key = binary() | string()
Error = any()
Works almost exactly like humbundee:download/1
, however each download process dies just before the file would be added to the download queue. The primary purpose of this function is to add a bundle to the index without actually downloading files contained in the bundle. This is useful if you have already downloaded a bundle but still want the files contained in the bundle to be indexed so that Humbundee doesn't try to download those files if they are distributed again in another bundle. Files indexed in a particular bundle can be queried using humbundee:read/1
and deleted from the index using humbundee:delete/1
.
humbundee:status() -> [{downloading, Keys}, {bad, Bad}, {waiting, Waiting}, {wget, Pids}]
Returns status of the download queue in hbd_pool
.
Where:
Keys
- List of active downloads - files currently being downloaded.Bad
- List of files downloaded with an error. Those files are not moved to the destination folder but left in thedownloads
folder for further manual inspection.Waiting
- Amount of downloads waiting in the queue - files not yet started downloading.Pids
- List of processes currently downloading files withwget
.
humbundee:status(Key) -> [{id, Id}, {cookie, Cookie}, {out, Out}, {count, Count},
{stats, {OK, Ignored, Excluded, Error}}, {pids, Pids}]
Types:
Key = binary() | string()
Returns the status of downloading a bundle with the specified key.
Where:
Id
- The Key of the bundleCookie
- Location of the cookie fileOut
- Root of the download directory for files in this bundleCount
- Amount of expected downloads for this bundleOK
- Amount of files downloaded successfully so farIgnored
- Amount of ignored files so farExcluded
- Amount of excluded files so farError
- Amount of downloads completed with an error so farPids
- Remaining processes that are either still processing downloads or which added their downloads tohbd_pool
and are waiting for the downloads to be completed
humbundee:read(Id) -> [#idx{}]
Types:
Id = binary() | string()
Returns list of idx records for which the provided Id matched one of the following: bundle key, file MD5 checksum, file SHA1 checksum.
The function automatically recognizes if the provided Id is a checksum or a bundle key and acts appropriately. For MD5/SHA1 it will usually return a list with one item - the file for which the MD5/SHA1 matches. Unless there is no such file, in which case it will return an empty list. For bundle keys it will return a list with all items belonging to that bundle.
humbundee:delete(Id) -> [Result]
Types:
Id = binary() | string()
Works exactly like humbundee:read/1
but instead of returning the read records deletes them and returns a list containing results of deleting those records - one result per deleted record.
humbundee:index(<<"U6tNd3hXZUT8uCg2">>).
humbundee:delete("U6tNd3hXZUT8uCg2").
humbundee:download("U6tNd3hXZUT8uCg2").
humbundee:status(<<"U6tNd3hXZUT8uCg2">>).
humbundee:status().
humbundee:read("U6tNd3hXZUT8uCg2").
humbundee:read("8b6f1882f93f3c5009900e09759fbcb3e8f04deb").
humbundee:delete(<<"5029763c40d31842637a5e8f4165a245">>).
Where:
U6tNd3hXZUT8uCg2 - bundle key
8b6f1882f93f3c5009900e09759fbcb3e8f04deb - SHA1 checksum
5029763c40d31842637a5e8f4165a245 - MD5 checksum
Command ./bin/init.esh
installs Humbundee into the folder ../hbd/
. So, the folder containing Humbundee sources and hbd
become siblings. The hbd
folder is called node root. It contains files produced by the node when it starts and downloads bundles, and also will contain the downloaded files if you haven't changed the default configuration. One of the folders that will be useful for troubleshooting is hbd/log
. Especially yolf.log
will contain the starting configuration and any errors encountered when starting the node. Crashes will also be logged to lager
logs. All messages printed by the Erlang VM during the booting process will be logged to erlang.log.*
logs.