stream load. Is it possible?
Opened this issue · 5 comments
vinnitu commented
I want to load network resourse to index but it failed
import requests
import io
import pickle
import hnswlib
def get_stream(url):
response = requests.get(url)
stream_data = response.content
return io.BytesIO(stream_data)
model = pickle.load(get_stream('http://example.com/model')) # it works
index = hnswlib.Index(space='cosine', dim=128)
index.load_index(get_stream('http://example.com/index.hnsw')) # doesn't work
got error
TypeError: load_index(): incompatible function arguments. The following argument types are supported:
1. (self: hnswlib.Index, path_to_index: str, max_elements: int = 0, allow_replace_deleted: bool = False) -> None
Invoked with: <hnswlib.Index(space='cosine', dim=128)>, <_io.BytesIO object at 0x7fd364e557c0>
Is it normal idea?
vinnitu commented
I am not sure, but can we pass io.BytesIO as std::ifstream?
Line 152 in 3f34296
void loadIndex(const std::ifstream &input, SpaceInterface<dist_t> *s) {
std::streampos position;
readBinaryPOD(input, maxelements_);
readBinaryPOD(input, size_per_element_);
readBinaryPOD(input, cur_element_count);
data_size_ = s->get_data_size();
fstdistfunc_ = s->get_dist_func();
dist_func_param_ = s->get_dist_func_param();
size_per_element_ = data_size_ + sizeof(labeltype);
data_ = (char *) malloc(maxelements_ * size_per_element_);
if (data_ == nullptr)
throw std::runtime_error("Not enough memory: loadIndex failed to allocate data");
input.read(data_, maxelements_ * size_per_element_);
input.close();
}
vinnitu commented
split function at first phase
void loadStream(const std::ifstream &input, SpaceInterface<dist_t> *s) {
readBinaryPOD(input, maxelements_);
readBinaryPOD(input, size_per_element_);
readBinaryPOD(input, cur_element_count);
data_size_ = s->get_data_size();
fstdistfunc_ = s->get_dist_func();
dist_func_param_ = s->get_dist_func_param();
size_per_element_ = data_size_ + sizeof(labeltype);
data_ = (char *) malloc(maxelements_ * size_per_element_);
if (data_ == nullptr)
throw std::runtime_error("Not enough memory: loadIndex failed to allocate data");
input.read(data_, maxelements_ * size_per_element_);
}
void loadIndex(const std::string &location, SpaceInterface<dist_t> *s) {
std::ifstream input(location, std::ios::binary);
std::streampos position;
loadStream(input, s);
input.close();
}
vinnitu commented
Unfortunately, we can't just do this because functions are used.
.seekg() and .tellg() (we can simplify loading code and remove it)
and maybe std::ifstream is not compatible with io.ByteIO and we need std::istringstream
What do you think about?