/magnetico2bitmagnet

magnetico2bitmagnet processes a (magnetico) SQLite database to extract and print data in a bitmagnet supported JSON format

Primary LanguagePython

magnetico2bitmagnet

This repository contains a collection of 3 experimental scripts used to import data into bitmagnet.

This script extracts data from a .sqlite3 database generated by magnetico and exports the data in a format supported by bitmagnet, suitable for use with the /import endpoint.
Data can be exported to one or multiple files and then imported into bitmagnet by using cat on the file(s) and piping it into curl to the bitmagnet /import endpoint.
Alternatively, data can be piped directly into curl to the bitmagnet endpoint without first exporting to files.
Note: File information cannot be imported.
Usage examples are available here.

This script searches (optionally recursively) for .torrent files in a specified directory and exports the data in a format supported by bitmagnet for use with the /import endpoint.
Data can be exported to one or multiple files and then imported into bitmagnet by using cat on the file(s) and piping it into curl to the bitmagnet /import endpoint.
Alternatively, data can be piped directly into curl to the bitmagnet endpoint without exporting to files first.
Note: File information cannot be imported.
Usage examples are available here.

This script searches (optionally recursively) for .torrent files in a specified directory and exports the required data for direct insertion into the PostgreSQL database used by bitmagnet.
Unlike the other scripts, file information can be imported.
Usage examples are available here.

Known Issues

This section lists a few known issues that I haven't been able to fix.

Encoding

Encoding characters is really challenging. This means that there is a chance that languages used in Asia or Slavic languages may be decoded incorrectly.
I've ran tests with charset_normalizer, but this didn't yield better results than a manual list, maybe I am using it wrong.

It's impossible to give a percentage of incorrectly decoded names. But I expect it to be <1%.
Example:
Byte array from either a magnetico database or a .torrent file: b'C\xd0\xb8\xd0\xb4\xd0\xaf\xd0\xb4\xd0\xbe\xd0\xbc\xd0\xb0
Desired result: CидЯдома (hex: 421 438 434 42F 434 43E 43C 430)
Possible result: C邽迡觓迡郋邾訄 (hex: 43 90BD 8FE1 89D3 8FE1 90CB 90BE 8A04)
Possible result: Cミクミエミッミエミセミシミー (hex: 43 FF90 FF78 FF90 FF74 FF90 FF6F FF90 FF74 FF90 FF7E FF90 FF7C FF90 FF70)

Byte array from either a magnetico database or a .torrent file: b'\xb9D\xa8\xe3\xbdc\xa1@'
Desired result: 道具箱 (hex: 9053 5177 7BB1 3000)
Possible result: ケDィ羶c。@ (hex: FF79 44 FF68 7FB6 63 FF61 40)

Alternative decode_with_fallback function using charset_normalizer:

from charset_normalizer import from_bytes

def decode_with_fallback(byte_sequence, preferred_encoding=None):
    matches = from_bytes(
        byte_sequence,
        cp_isolation=['utf-8', 'shift_jis', 'euc_jp', 'big5', 'gbk', 'gb18030', 'cp1251', 'latin1'],
        threshold=0.2,
        language_threshold=0.1,
        enable_fallback=True
    )

    return str(matches.best())

Disclaimer

A lot of the code is generated by OpenAI's ChatGPT 4. I can code, read and understand Python myself, but why spend hours figuring out how to make something or troubleshoot if you can just ask an LLM.
I never copy code blind, I understand and test it before incorporating it in the scripts and check different sources to see if the solution proposed by ChatGPT seems valid.

Acknowledgments

  • Thanks to the developers of magnetico and bitmagnet for their excellent work on making self-hosted DHT crawlers accessible.