📃 A way to easily find files that exist within a Duplicati backup
When Duplicati creates a backup, it creates an index file called filelist.json
. This file can often be gigabytes in size, making it very hard to search for files within the backup.
This project creates an index file that is on average 40x smaller than filelist.json
and is easily searchable with the program.
The program uses ijson
to stream the massive JSON index and creates a "MARISA trie" of the paths found. Since the items are stored in the "trie" (as seen below) they can be searched by a prefix very quickly.
- Install Python 3 and Poetry
- Run
poetry install
in the project root directory - Retrieve the
filelist.json
file from the backup's*.dlist.zip
archive - To create an index:
- Run
poetry run python -m jsonpy create [input] [output]
- If you run this command without arguments it will, by default, ingest a file called
filelist.json
and spit outindex.marisa.gz
- Run
- To search an existing index
- Run
poetry run python -m jsonpy search [input] [search term]
- Run
This Source Code Form is subject to the terms of the Mozilla Public
License, v. 2.0. If a copy of the MPL was not distributed with this
file, You can obtain one at http://mozilla.org/MPL/2.0/.