[Mostly Android] Is it possible to use a model without loading it into RAM?
BurrHM opened this issue · 1 comments
Lister here can load massive (10s of GBs) text files instantly, and you can perform search or scroll through them/jump to the end. My understanding is that it reads the data directly off the disk and loads a tiny part (being used) in memory.
In modes 1, 2, 3, 6, 7 Lister allows to view files of any size as it keeps only a small part of the file in memory (approximately 64 Kb), the rest is automatically loaded when scrolling through the text.
Is something like this possible for Maid for Android?
The reason is that mobile memory is very limited compared to what you can get on the desktop. I can accept the hit in performance because the alternative is complete unavailability of the pertinent model.
Not sure if its possible but if it is its something the llama.cpp devs would have to implement