browsermt/bergamot-translator

Support model-loading from .npz files

jerinphilip opened this issue · 3 comments

XapaJIaMnu/translateLocally#89

Marian supports this path. It may be possible for us to support this - but this shouldn't be at the cost of making TranslationModelcomplicated. Current path is:

// (1) For WebAssembly: string config, legal MemoryBundle, missing entires corresponding to bundle. Thin.
TranslationModel(const std::string& config, MemoryBundle&& memory, size_t replicas = 1)
  : TranslationModel(parseOptionsFromString(config, /*validate=*/false), std::move(memory), replicas){};
  
// (2) Constructor converting options based load into a fast-load by loading MemoryBundle from Options
TranslationModel(const Config& options, size_t replicas = 1)
  : TranslationModel(options, getMemoryBundleFromConfig(options), replicas) {}

// (3) The constructor which does the heavylifting. Chooses mmap based fastpath if memory is non-empty. 
// Falls back to options based FS loads one otherwise.
TranslationModel(const Config& options, MemoryBundle&& memory, size_t replicas = 1);

We have already expressed intention to clean a bit of MemoryBundle (affecting getMemoryBundleFromConfig) in #338. If we can find a way to (a) not load into MemoryBundle from .npz, or alternatively, (b) load into MemoryBundle from .npz we may be able to achieve what we want.

We'd prefer (b) for when #257 is solved?

Marian checks based on extension https://github.com/marian-nmt/marian-dev/blob/b29cc07a95f49df7825f3a92e860bd642db0e812/src/common/io.cpp#L13-L21), which we can use.

@XapaJIaMnu sounds like the sketch of a plan?

Yes sounds good!

@XapaJIaMnu: I had a brief conversation with @graemenail regarding the mmap path. The following is what I understood.

  1. The mmap path places loaded content in contiguous storage and provides a pointer to the graph.
  2. The NumPy file loads into vector<io::Item>, which are not consumable by the fast loading path.

Because 2, (b) mentioned in #341 (comment) does not appear straightforward. #342 has a working prototype with (a), where we return an empty AlignedMemory and loading falls back to the non-fast path. Is it possible to convert .npz to .bin on the fly and get closer towards (b)? Are there other approaches I am missing? I have limited knowledge of the source doing these things and could use inputs from expertise in the area.

Also, please bear in mind .npz loads are suggested for quick debugging use-cases, so we may defer spending more time here to quickly get the final functionality if (b) takes longer and get (a). I intend to check in a test in BRT to ensure this stays stable.

I appreciate any help you can provide.

ugh, this is a nightmare.
Both the fast and the slow coding path eventually go through this function: https://github.com/browsermt/marian-dev/blob/08b1544636fe13eaf1fbacb17c6fb050abfb8d42/src/common/binary.cpp#L30
As there is reordering memory on the fly based on the architecture, we never have a direct a direct way of feeding bytes into the graph.. There shouldn't be anything preventing (b) from working as we do eventually go through the function that I mentioned, therefore we do go through the intermediate vectorio::Item representation.