🚨🚨🚨
🇬🇧 This app is no longer maintained. It may depends on outdated dependences which cas cause installation problems or contains security vulnerabilities. Please use or fork it with caution.
🇫🇷 Cette application n'est plus maintenue, elle est susceptible de s'appuyer sur des dépendances obsolètes pouvant empêcher son bon fonctionnement, voire comporter des failles de sécurité. Merci de de l'utiliser ou de la forker avec précaution.
🚨🚨🚨
Sisyphe is a generic NodeJS recursive folder analyser terminal application & a (lerna) git monorepo.
Tested with NodeJS@8.X, Redis@3.2.6
Works on Linux/OSX/Windows
Example to run a quick local redis (thanks to docker):
docker run --name sisyphe-redis -p 6379:6379 redis:3.2.6
- Download the latest Sisyphe version
- Just do :
npm install
(this will execute a npm postinstall) - ... that's it.
npm run test
will test sisyphe & its workers
./app.js --help
Will output help
-V, --version output the version number
-n, --corpusname <name> Corpus name
-s, --select <name> Select all module to deal with
-c, --config-dir <path> Configuration folder path
-t, --thread <number> The number of process which sisyphe will take
-b, --bundle <number> Regroup jobs in bundle of jobs
-r, --remove-module <name> Remove module name from the workflow
-q, --quiet Silence output
-l, --list List all available workers
-h, --help output usage information
Just start Sisyphe on a folder with any files in it.
node app -n corpusname ~/Documents/customfolder/corpus
node app -n corpusname -c ~/Documents/customfolder/corpusResources ~/Documents/customfolder/corpus
Sisyphe is now working in background with all your computer thread. Just take a coffee and wait , it will prevent you when it's done :)
The results of sisyphe are present @ sisyphe/out/{timestamp}-corpusname/
(errors,info,duration..)
For a control panel & full binded app, go to Sisyphe-monitor
sisyphe has a server that allows to control it and to obtain more information on its execution.
Simply run the server with npm run server
to access these features
There is a list of default modules (focused on xml & pdf).
Those URL NEED to be updated when merge branch will be ok.
- FILETYPE Will detect mimetype,extension, corrupted files..
- PDF Will get info from PDF (version, author, meta...)
- XML Will check if it's wellformed, valid-dtd's, get elements from balises ...
- XPATH Will generate a complete list of xpaths from submitted folder
- OUT Will export data to json file & ElasticSearch database
- NB Try to assing some categories to an XML document by using its abstract
- MULTICAT Try to assing some categories to an XML document by using its identifiers
- TEEFT Try to extract keywords of a fulltext
- SKEEFT Try to extract keywords of a structured fulltext by using teeft algorithm and text structuration
When you work on worker, just:
- Commit your changes as easy
- Do a
npm run updated
(to check what worker has changed) - Do a
npm run publish
(it will ask you to change version of module worker & publish it to github)
Some bugs could occured with certains files with 'skeeft' on windows module please just disactivate it until we fix.