Retcon means retroactive continuity, I needed a name that decribed function and would not conflict with other app names so I choe retcon.
[Hand waving] Whatever I want. But mostly it is designed to track meta information files and remote representations of creative works and personalities. Think of it as the IMDB for niche hobbies.
Install requirements, and initialize the database tables
#Install python modules if you need them
python3 -mpip install -r requirements.txt
python3 ./manage.py migrate #Initialize database
python3 ./manage.py importlang #Create language models from system locales
python3 ./manage.py createsuperuser #Create the administrative user
If you updated you need to migrate to bring the database up to date.
python3 ./manage.py migrate
To run the server
python3 ./manage.py runserver
There is limited human friendly interactions. To access human friendly but not great interaction go to
http://localhost:8000/admin
For machine friendly interaction see
http://localhost:8000/api
To export a list of users for hydrus
http://localhost:8000/api/site/<id>/users.txt
e.g.
http://localhost:8000/api/site/1/users.txt
Models and API are subject to change without notice. I recommend you abstract out any access to them if you write something that interacts with the system.
I recommend quite an elaborate url pattern for capturing websites. For example the pattern I use for twitter is.
^(?:https?:\/\/)?(?:www\.)?(?:twitter\.com\/){1,2}([^\/]+)\/?$
^(?:https?:\/\/)?(?:www\.)?
: This pattern captures url with or without scheme and www subdomaintwitter\.com\/{1,2}
: The actual domain and trailing slash between one and two times, the second time is for bad url substitiutions. (e.g.twitter.com/{username}
whereusername=twitter.com/foobar
)([^\/]+)
: Actual capture of username string.\/?
: Optional trailing slash
Unfortunatly this can't be just done generically because some sites use a subdomain as their user pattern.
e.g. username.tumblr.com
By comparison the substitution pattern is much easer. Simply use a (python format string)[https://www.python.org/dev/peps/pep-3101/#format-strings] where the item that will be substituted into {}
is the username.
In may instances company and website records match up almost 1 to 1 seemingly duplicating information, this is just a side effect of many companies having a we presence. When the archived work is from a company whic became defunt prior to the web it is sensible for them to be seperate.
Retcon uses image sequences to describe al kinds of visual files. There is functionall no difference between a vide0, animation of image after they have been decoded. A video and an animated gif are both sequences of image frames, and an image itself is simply an image sequence of length 1. Thus any algorithm which can applied to one such sequence can also be applied to all of them. By abstracting away the kind of image source into an image sequence retcon can be mostly unconcerend about the storage format for metadata purposes.
==File managment is a work in progress DO NOT USE IN PRODUCTION== File managaement tools that deal with named files take a prefix, this allows you to relleocate paths because this portion oft he path will be omitted for the purposes of storage
e.g.
If the prefix is /Volumes/
And you have a file with path /Volumes/a/b/c
Then the path a/b/c
will be stored
Current management command which should work
scanpath <prefix> <root>
catalogues files under root recursivelyhashpaths <prefix>
caclulates file hashes for catalogued filesidentifyfileMIME <prefix>
index the filetypes for catalogued files
Contributions and feedback are always welcome. If you want to complete/suggest a feature, fix a bug, or write some tests feel free to do so and open a pull request.
If you didn't come here from the discord, you should also check out hydrus https://hydrusnetwork.github.io/hydrus/