itkach/slob

Sorting in storage instead of RAM

Closed this issue · 3 comments

Hello @itkach

Could you allow sortinging to be somewhere in storage, but NOT in RAM (slob.py), if it's possible?

I use Pyglossary tool (which uses slob.py originally) to convert many file types to slob, so I could use Aard2 app as my default multidictionary viewer.

But during converting of large files as wikipedias or wiktionaries "which have huge number of words" to slob, but sorting fails due to low memory (inspect of I have 6 gb RAM in my device).

Iam in love with Aard2 app and slob files which I can deal with freely, so if sorting into storage is possible it will be an amazing breakthrough.

Thanks in advance.

Hello @itkach
It seems like you are busy these days, or you didn't like my way of expressing of my issue, I appreciate what you are doing, but really this problem may cost me hundreds of dollars "which I don't have right now" to change my device with one of a bigger memory, because Iam in need of this.

So a simple answer would be enough for me, even if I don't get any solution soon.

Hello @itkach

Could you allow indexing to be somewhere in storage, but NOT in RAM (slob.py), if it's possible?

I use Pyglossary tool (which uses slob.py originally) to convert many file types to slob, so I could use Aard2 app as my default multidictionary viewer.

But during converting of large files as wikipedias or wiktionaries which have very large index to slob,
then indexing will fail due to low memory (inspect of I have 6 gb RAM in my device).

The only thing slob.py does in memory is sort a list of integers, which is relatively compact (e.g. list of 10 million integers which appears to take up ~85Mb in RAM according to sys.getsizeof() and needs about the same for sorting). I'm not sure what are you asking for when you say "indexing to be somewhere in storage" - dictionary index is already part of .slob file.

A number of wikipedias and wiktionaries is available already in .slob... are you trying to convert them yourself? using Pyglossary? from what source? I'm not sure I understand your workflow.

To be clear, I am not actively working on this project and have no plans to redesign anything.

I convert Kiwix files (.zim) which are not available as slob files yet. Like wikipedia english mini (mini means that it contains only introduction of each article, not the full text) to use it as a dictionary, also medical wikipedia is found separately as zim file, but your slob files are only the usual wikipedia or wiktionary that we all know,

During conversion by pyglossary in Termux app for android, at the end of conversion (during sorting) Termux forcibly closed due to low memory crash.This only occurs with zim files which have large number of articles.

So my request is to try to make sorting process to use lesser memory.

Kiwix app is great app which play zim files (many wikipedias and wiktionaries with subspecialities) and it also supports fuzzy search.

This is the source for zim files, please have a look at them:

https://wiki.kiwix.org/wiki/Content_in_all_languages