Generates similarity document scores from cli. Useful when combined with static site generators.
$ similar-documents -h
usage: $ similar_documents -o result.json -k 5 -t japanese *.md
$ similar_documents -h
positional arguments:
documents list[str]
optional arguments:
-h, --help show this help message and exit
-o None, --output-file None
Optional: write output file (default STDOUT)
-k 5, --top-k 5 Number of similar documents
-d, --debug Show debug messages
-t None, --tokenizer None
tokenizer [japanese] (default None(space splitting))
-e utf-8, --encoding utf-8
file encoding(default utf-8)
$ similar-documents -t japanese -k 2 /data/markdowns/recently/*.md | jq . | head -20
{
"/data/markdowns/recently/2021-04-01.md": [
[
"/data/markdowns/recently/2021-04-26.md",
0.3123780045484529
],
[
"/data/markdowns/recently/2021-04-03.md",
0.17384380113610887
]
],
"/data/markdowns/recently/2021-04-02.md": [
[
"/data/markdowns/recently/2021-04-26.md",
0.10715535963136594
],
[
"/data/markdowns/recently/2021-04-23.md",
0.09411751563901728
]
pip install -U similar-documents
$ docker build -t similar-doc .
# examples: run on windows
$ docker run -it -v C:\Users\yu1\Dropbox\secon-sites\data:/data/ --rm similar-doc bash -c 'similar-documents -t japanese -k 2 -o /data/result.json `ls /data/markdowns/*/*.md`'
MIT
- Yuichi Tateno