jatinchowdhury18/RTNeural

JSON exception when trying to load PyTorch model

nrydanov opened this issue · 4 comments

Hello, thanks a lot for your work.

I get a problem when I'm trying to load my PyTorch module saved using instructions you provided in README, but my plug-in crashes. I tried to debug it using LLDB and smth like

what(): [json.exception.out_of_range.403] key 'in_shape' not found.

As I understand, NeuralRT trying to find in_shape parameter that is not stored in exported JSON file. As I see, JSON file contains all weights, but state_dict doesn't really have such key.

I'm currently investigating an issue in source code, but maybe you could help me? Am I doing smth wrong?

Hello!

Would it be possible to share some code snippets showing how you're exporting the model from PyTorch, as well as how you're attempting to load the model in RTNeural?

Thanks!

Thanks for such a quick response!

Of course I can share some code.
I save model using

torch.save(
                {
                    "model": model.state_dict(),
                    "best_model": checkpoint["best_model"],
                    "optimizer": optimizer.state_dict(),
                    "best_loss": best_loss,
                    "last_epoch": epoch,
                    "scheduler": scheduler.state_dict(),
                },
                save_path,
            )

in my train loop

and then convert it from .pt to json in such way:

class EncodeTensor(JSONEncoder,Dataset):
    def default(self, obj):
        if isinstance(obj, torch.Tensor):
            return obj.cpu().detach().numpy().tolist()
        return super(json.NpEncoder, self).default(obj)

def main(args):
    model = get_model(args.model)
    model_config = model.Settings(_env_file=f"{args.config}/model.cfg")
    model = model(model_config)

    model.load_state_dict(torch.load(args.checkpoint)['model'])

    with open(args.output, 'w') as json_file:
        json.dump(model.state_dict(), json_file, cls=EncodeTensor)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--checkpoint", type=str, required=True)
    parser.add_argument("--model", type=str, required=True)
    parser.add_argument("--config", type=str, required=True)
    parser.add_argument("--output", type=str, required=True)
    args = parser.parse_args()
    main(args)

then I just use result of this script as input to JUCE plug-in with RTNeural as backend.

this->chooser->launchAsync (folderChooserFlags, [this] (const FileChooser& c) {
            auto path = c.getResult().getFullPathName();

            std::ifstream jsonStream(path.toStdString(), std::ifstream::binary);
            auto model = RTNeural::json_parser::parseJson<double>(jsonStream);
            model->reset();
            processorRef.model = std::move(model);
    });

I might think that something wrong with passing path to JSON with FileChooser, but I can't see any in_shape entries in result of cat output.json

I see only lstm.weight_ih_l0 and lstm.bias_hh_l0 and other LSTM specific weights in output JSON file

Thanks for the extra information, this is very helpful! Unfortunately, RTNeural does not currently support inferring the network architecture from the model's JSON representation for PyTorch models (although there is a PR working towards adding that functionality, see #79).

In the meantime, it is possible to manually load your network's layer weights from the JSOn file exported from PyTorch, as documented in the REAME.

The general approach is to do something like:

std::ifstream jsonStream(path.toStdString(), std::ifstream::binary);
nlohmann::json modelJson;
jsonStream >> modelJson;

// create empty model
auto model = std::make_unique<RTNeural::Model<double>>();

// create LSTM
auto lstm = std::make_unique<RTNeural::LSTMLayer<double>>(in_size, out_size);
RTNeural::torch_helpers::loadLSTM<double> (modelJson, "lstm.", lstm); // load layer weights
model->addLayer(lstm.release()); // add LSTM to the model

I should also mention a couple of things related to RTNeural's performance:

First, using float will generally provide better performance than double, especially with the XSIMD and Eigen backends, since a CPU with SSE or NEON instructions can use vector operations to process 4 floats at a time, but only 2 doubles.

Second, RTNeural::ModelT will be much faster than RTNeural::Model, with the restriction that the RTNeural::ModelT requires the model architecture to be known at compile-time. If you do need the model architecture to be somewhat "dynamic", it is possible to achieve that sort of behaviour for a subset of allowed model architectures, using std::variant (for more information see #88).

Of course, if your still in the prototyping phase, performance might not be a big concern, so it might make sense to just do whatever is simplest for now.

Thanks a lot about all information given (especially about float vs double case and compile-time feature).

I'll try this approach for now (passing bachelor's degree soon), but I can't wait to see this feature finished. I would even think to help with it, when I have a little bit more time.

Thanks again for a quick response and such a cool project!