support for larger models

currently lager models dont load. (tested with https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K)

clip_model_load: loading model from '../models/laion_clip-vit-h-14-laion2b-s32b-b79k/ggml-model-f16.bin' - please wait...clip_model_load: n_vocab = 49408
clip_model_load: num_positions   = 77
clip_model_load: t_hidden_size  = 1024
clip_model_load: t_n_intermediate  = 4096
clip_model_load: t_n_head  = 16
clip_model_load: t_n_layer = 24
clip_model_load: image_size = 224
clip_model_load: patch_size   = 14
clip_model_load: v_hidden_size  = 1280
clip_model_load: v_n_intermediate  = 5120
clip_model_load: v_n_head  = 16
clip_model_load: v_n_layer = 32
clip_model_load: ftype     = 1
clip_model_load: ggml ctx size = 1887.22 MB
.................................................................................................................clip_model_load: model size =  1882.50 MB / num tensors = 909
clip_model_load: model loadded
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 100900784, available 100663296)
zsl: /home/green/workspace/clip.cpp/ggml/src/ggml.c:4131: ggml_new_tensor_impl: Assertion `false' failed.
Aborted (core dumped)

after modifying

clip.cpp/clip.cpp

Line 713 in a12792d

new_clip->buf_compute.resize(96 * 1024 * 1024);

with a * 100ul, it gets futher, but now fails with:

clip_model_load: ggml ctx size = 1887.22 MB
.................................................................................................................clip_model_load: model size =  1882.50 MB / num tensors = 909
clip_model_load: model loadded
zsl: /home/green/workspace/clip.cpp/ggml/src/ggml.c:11044: ggml_compute_forward_soft_max_f32: Assertion `!isnan(sp[i])' failed.
zsl: /home/green/workspace/clip.cpp/ggml/src/ggml.c:11044: ggml_compute_forward_soft_max_f32: Assertion `!isnan(sp[i])' failed.
Aborted (core dumped)

Memory issue will be handled in #8, and I'll have a look at the other crash.

Reproduced the issue. Will patch it after testing with other checkpoints of models.

I had an oh! moment just wen pouring coffee into my cup :D Instead of hardcoding a single memory size or running a warmup just to learn the memory requirement, we can decide based on the number of tensors, which essentially indicates the model size. So it sounds like an overengineering, but it's not --it's the only way of automatically setting the correct memory size without any initial delay or lib user's intervention, or the only way I could come up with.

I think this relates to ggerganov/ggml#260

Yes, so let's think of it as a workaround until the proper mecanism is implemented in GGML.

I pushed it in #11

I think we can already merge it.

the memory is allocated based on the model variant.
the NaN issue is resolved for larger models with the patch size of 14.

nice. yea, it's been an issue ever since i started with llama.cpp 😆

to 2. , you mean one ggerganov/ggml#274 is merged?

Yeah, but I checked out the fixing branch from my fork until then. We can checkout the upstream master again once it's merged.

#11