A test repo on optimising huggingface models

We test out how to speed up inference in 0_base_model

Then we test out a vllm deployment in 1_scale_up_serving To query the vllm endpoint we have notebook 2_testing_remote_server