Support for saving embeddings bin file in laser_encoder
vmenan opened this issue · 4 comments
Hi every one,
first and for most thank you so much for building laser_encoder support. This is very very useful. In previous embed.sh the pipeline was able to save the embedding.bin file, is this support available in laser_encoder?
Thank you so much for the support!
Hi @vmenan!
The old embed.sh
is still working (at least, supposed to). It now uses laser_encoders
under the hood, but it should not affect the results.
So if you prefer, you can still use the embed.sh
pipeline that saves the embedding.bin
file.
The new laser_encoders
package is intended for the users who want to implement their own pre- or post-processing of the data, including the way the embeddings are saved (or maybe used without saving).
I apologize the delayed reply @avidale . Yes, it does. laser_encoders
give more control to the user, which is brilliant. I was able solve my issue by using a simple function, it can be easily implemented by anyone, but im sharing the code here, incase someone wants a quick solution to this.
import numpy as np
def append_to_bin_file(file_name, numpy_array):
# Convert NumPy array to bytes
binary_data = numpy_array.tobytes()
try:
# Open the file in binary append mode ('ab')
with open(file_name, 'ab') as file:
# Append binary data to the file
file.write(binary_data)
print(f"Array appended to {file_name} successfully.")
except Exception as e:
print(f"An error occurred: {e}")
This function will allow one to append the embedding to a ".bin" file. I chose to append to a file, if someone is loading in data in chunks due to RAM limitations.