Attempting to pass model params to ExLlama on startup causes an AttributeError

Question

Attempting to pass model params to ExLlama on startup causes an AttributeError

InconsolableCellist opened this issue a year ago · 2 comments

InconsolableCellist commented a year ago

Summary

It appears that self.model_config is None in ExLlama's class.py (https://github.com/0cc4m/KoboldAI/blob/exllama/modeling/inference_models/exllama/class.py#L423), and is assumed to exist when you get to that code via passing in --model_parameters.

Additionally, play.sh has an issue with parsing JSON use space as an IFS, which is how the help model param tells you to format your JSON.

Steps to reproduce:

Attempt to start with something like the following, which attempts to set model_parameters: /play.sh --host --model airoboros-l2-70b-gpt4-2.0 --model_backend ExLlama --model_parameters "{'0_Layers':35,'1_Layers':45,'model_ctx':4096}"

Actual Behavior:

Output:

Colab Check: False, TPU: False
INIT       | OK         | KAI Horde Models

 ## Warning: this project requires Python 3.9 or higher.

INFO       | __main__:<module>:680 - We loaded the following model backends: 
Huggingface GPTQ
KoboldAI Old Colab Method
KoboldAI API
Huggingface
Horde
Read Only
OpenAI
ExLlama
Basic Huggingface
GooseAI
INFO       | __main__:general_startup:1395 - Running on Repo: https://github.com/0cc4m/KoboldAI.git Branch: exllama
MESSAGE    | Welcome to KoboldAI!
MESSAGE    | You have selected the following Model: airoboros-l2-70b-gpt4-2.0
ERROR      | __main__:<module>:10948 - An error has been caught in function '<module>', process 'MainProcess' (12311), thread 'MainThread' (140613286643520):
Traceback (most recent call last):

> File "aiserver.py", line 10948, in <module>
    run()
    └ <function run at 0x7fe259d69ca0>

  File "aiserver.py", line 10849, in run
    command_line_backend = general_startup()
                           └ <function general_startup at 0x7fe25a321dc0>

  File "aiserver.py", line 1634, in general_startup
    model_backends[args.model_backend].set_input_parameters(arg_parameters)
    │              │    │                                   └ {'0_Layers': 35, '1_Layers': 45, 'model_ctx': 4096, 'max_ctx': 2048, 'compress_emb': 1, 'ntk_alpha': 1, 'id': 'airoboros-l2-7...
    │              │    └ 'ExLlama'
    │              └ Namespace(apikey=None, aria2_port=None, cacheonly=False, colab=False, configname=None, cpu=False, customsettings=None, f=None...
    └ {'Huggingface GPTQ': <modeling.inference_models.gptq_hf_torch.class.model_backend object at 0x7fe22525b2b0>, 'KoboldAI Old Co...

  File "/home/.../KoboldAI-Client-llama/modeling/inference_models/exllama/class.py", line 423, in set_input_parameters
    self.model_config.device_map.layers = []
    │    └ None
    └ <modeling.inference_models.exllama.class.model_backend object at 0x7fe22b4f31c0>

AttributeError: 'NoneType' object has no attribute 'device_map'

Expected Behavior:

The model parameters can be set at startup

Environment:

$ git remote -v
origin  https://github.com/0cc4m/KoboldAI.git (fetch)
origin  https://github.com/0cc4m/KoboldAI.git (push)

$ git status
On branch exllama
Your branch is up to date with 'origin/exllama'


  commit 973aea12ea079e9c5de1e418b848a0407da7eab7 (HEAD -> exllama, origin/exllama)
  Author: 0cc4m <picard12@live.de>
  Date:   Sun Jul 23 22:07:34 2023 +0200
  
      Only import big python modules for GPTQ once they get used

Additionally, the following change should be made in play.sh:

$ git diff play.sh
  diff --git a/play.sh b/play.sh
  index 8ce7b781..3e88ae28 100755
  --- a/play.sh
  +++ b/play.sh
  @@ -3,4 +3,4 @@ export PYTHONNOUSERSITE=1
   if [ ! -f "runtime/envs/koboldai/bin/python" ]; then
   ./install_requirements.sh cuda
   fi
  -bin/micromamba run -r runtime -n koboldai python aiserver.py $*
  +bin/micromamba run -r runtime -n koboldai python aiserver.py "$@"

So that you can pass in JSON as the model params with spaces between the KV pairs, as the help parameter instructs you:

$ ./play.sh --host --model airoboros-l2-70b-gpt4-2.0 --model_backend ExLlama --model_parameters help
...

INFO | main:general_startup:1395 - Running on Repo: https://github.com/0cc4m/KoboldAI.git Branch: exllama
MESSAGE | Welcome to KoboldAI!
MESSAGE | You have selected the following Model: airoboros-l2-70b-gpt4-2.0
ERROR | main:general_startup:1627 - Please pass through the parameters as a json like "{'[ID]': '[Value]', '[ID2]': '[Value]'}" using --model_parameters (required parameters shown below)
ERROR | main:general_startup:1628 - Parameters (ID: Default Value (Help Text)): 0_Layers: [None] (The number of layers to put on NVIDIA GeForce RTX 3090.)
1_Layers: [0] (The number of layers to put on NVIDIA GeForce RTX 3090.)
max_ctx: 2048 (The maximum context size the model supports)
compress_emb: 1 (If the model requires compressed embeddings, set them here)
ntk_alpha: 1 (NTK alpha value)

Answer 1 · 2023-09-07T08:13:52.000Z

I've faced the same situation, but I was able to get it working without using the model_backend and model path parameters. Instead, I did it manually through the web interface menu to open the model. and of course I selected the ExLlama backend on its menu.

Answer 2 · 2023-09-07T14:24:47.000Z

Nvm.. I figured out a way. The issue with self.model_config.device_map.layers , model_config being None is because it was never initialized from the beginning. The only place where it gets initialized is in the is_valid() function within the KoboldAI\modeling\inference_models\exllama\class.py file. This is_valid() function is called when a user opens the model through the web interface menu.

To fix this, I made a small change to the get_requested_parameters() function in the class.py file. I added the following line at the very beginning:

if not self.model_config:
    self.model_config = ExLlamaConfig(os.path.join(model_path, "config.json"))

However, it turns out there was one more thing that needed to be changed within the same function. I removed the square brackets from

"default": [layer_count if i == 0 else 0]

and changed it to

"default": layer_count if i == 0 else 0

This is needed because in the set_input_parameters() function still in class.py file, on the line:

for i, l in enumerate(layers):
    if l > 0:

that would treat it as a list instead of an integer.