Trouble loading GPTBigCode models

Question

Trouble loading GPTBigCode models

Closed this issue 5 months ago · 3 comments

I'm happy to try to debug this. But, in case the error is obvious to an nnsight hacker, here is an error I'm getting.

This is the model:

https://huggingface.co/bigcode/gpt_bigcode-santacoder

This is my code that raises the error below. I am able to load Pythia as shown in the tutorial.

from nnsight import LanguageModel

model = LanguageModel("bigcode/gpt_bigcode-santacoder", device_map='cuda:0')

Error:

/work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/transformers/utils/hub.py:123: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/transformers/utils/import_utils.py:1382, in _LazyModule._get_module(self, module_name)
   1381 try:
-> 1382     return importlib.import_module("." + module_name, self.__name__)
   1383 except Exception as e:

File ~/miniconda3/lib/python3.11/importlib/__init__.py:126, in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1204, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1176, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1147, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:690, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:940, in exec_module(self, module)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py:64
     60 # Fused kernels
     61 # Use separate functions for each case because conditionals prevent kernel fusion.
     62 # TODO: Could have better fused kernels depending on scaling, dropout and head mask.
     63 #  Is it doable without writing 32 functions?
---> 64 @torch.jit.script
     65 def upcast_masked_softmax(
     66     x: torch.Tensor, mask: torch.Tensor, mask_value: torch.Tensor, scale: float, softmax_dtype: torch.dtype
     67 ):
     68     input_dtype = x.dtype

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/torch/jit/_script.py:1381, in script(obj, optimize, _frames_up, _rcb, example_inputs)
   1380     _rcb = _jit_internal.createResolutionCallbackFromClosure(obj)
-> 1381 fn = torch._C._jit_script_compile(
   1382     qualified_name, ast, _rcb, get_default_args(obj)
   1383 )
   1384 # Forward docstrings

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/torch/jit/_recursive.py:1010, in try_compile_fn(fn, loc)
   1009 rcb = _jit_internal.createResolutionCallbackFromClosure(fn)
-> 1010 return torch.jit.script(fn, _rcb=rcb)

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/torch/jit/_script.py:1378, in script(obj, optimize, _frames_up, _rcb, example_inputs)
   1377     return maybe_already_compiled_fn
-> 1378 ast = get_jit_def(obj, obj.__name__)
   1379 if _rcb is None:

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/torch/jit/frontend.py:331, in get_jit_def(fn, def_name, self_name, is_classmethod)
    317 """
    318 Build a JIT AST (TreeView) from the given function.
    319 
   (...)
    329     self_name: If this function is a method, what the type name of `self` is.
    330 """
--> 331 parsed_def = parse_def(fn) if not isinstance(fn, _ParsedDef) else fn
    332 type_line = torch.jit.annotations.get_type_line(parsed_def.source)

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/torch/_sources.py:120, in parse_def(fn)
    119 def parse_def(fn):
--> 120     sourcelines, file_lineno, filename = get_source_lines_and_file(
    121         fn, ErrorReport.call_stack()
    122     )
    123     sourcelines = normalize_source_lines(sourcelines)

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/torch/_sources.py:23, in get_source_lines_and_file(obj, error_msg)
     22     filename = inspect.getsourcefile(obj)
---> 23     sourcelines, file_lineno = inspect.getsourcelines(obj)
     24 except OSError as e:

File ~/miniconda3/lib/python3.11/inspect.py:1244, in getsourcelines(object)
   1243 object = unwrap(object)
-> 1244 lines, lnum = findsource(object)
   1246 if istraceback(object):

File ~/miniconda3/lib/python3.11/inspect.py:1063, in findsource(object)
   1056 """Return the entire source file and starting line number for an object.
   1057 
   1058 The argument may be a module, class, method, function, traceback, frame,
   1059 or code object.  The source code is returned as a list of all the lines
   1060 in the file and the line number indexes a line in that list.  An OSError
   1061 is raised if the source code cannot be retrieved."""
-> 1063 file = getsourcefile(object)
   1064 if file:
   1065     # Invalidate cache if needed.

File ~/miniconda3/lib/python3.11/inspect.py:940, in getsourcefile(object)
    937 """Return the filename that can be used to locate an object's source.
    938 Return None if no way can be identified to get the source.
    939 """
--> 940 filename = getfile(object)
    941 all_bytecode_suffixes = importlib.machinery.DEBUG_BYTECODE_SUFFIXES[:]

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/torch/package/package_importer.py:696, in _patched_getfile(object)
    695         return _package_imported_modules[object.__module__].__file__
--> 696 return _orig_getfile(object)

File ~/miniconda3/lib/python3.11/inspect.py:920, in getfile(object)
    919     return object.co_filename
--> 920 raise TypeError('module, class, method, function, traceback, frame, or '
    921                 'code object was expected, got {}'.format(
    922                 type(object).__name__))

TypeError: module, class, method, function, traceback, frame, or code object was expected, got builtin_function_or_method

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[1], line 3
      1 from nnsight import LanguageModel
----> 3 model = LanguageModel("bigcode/gpt_bigcode-santacoder", device_map='cuda:0')

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/nnsight/models/LanguageModel.py:46, in LanguageModel.__init__(self, tokenizer, automodel, *args, **kwargs)
     43 self.local_model: PreTrainedModel = None
     44 self.automodel = automodel if not isinstance(automodel, str) else getattr(modeling_auto, automodel)
---> 46 super().__init__(*args, **kwargs)

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/nnsight/models/AbstractModel.py:104, in AbstractModel.__init__(self, repoid_path_model, dispatch, alter, *args, **kwargs)
     99                 self.meta_model: Module = Module.wrap(
    100                     copy.deepcopy(self.local_model).to("meta")
    101                 )
    102         else:
    103             self.meta_model: Module = Module.wrap(
--> 104                 self._load_meta(self.repoid_path_clsname, *args, **kwargs).to(
    105                     "meta"
    106                 )
    107             )
    109 # Wrap all modules in our Module class.
    110 for name, module in self.meta_model.named_children():

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/nnsight/models/LanguageModel.py:59, in LanguageModel._load_meta(self, repoid_or_path, *args, **kwargs)
     54 self.tokenizer = AutoTokenizer.from_pretrained(
     55     repoid_or_path, config=self.config, padding_side="left"
     56 )
     57 self.tokenizer.pad_token = self.tokenizer.eos_token
---> 59 return self.automodel.from_config(self.config, trust_remote_code=True)

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:440, in _BaseAutoModelClass.from_config(cls, config, **kwargs)
    438     return model_class._from_config(config, **kwargs)
    439 elif type(config) in cls._model_mapping.keys():
--> 440     model_class = _get_model_class(config, cls._model_mapping)
    441     return model_class._from_config(config, **kwargs)
    443 raise ValueError(
    444     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    445     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    446 )

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:387, in _get_model_class(config, model_mapping)
    386 def _get_model_class(config, model_mapping):
--> 387     supported_models = model_mapping[type(config)]
    388     if not isinstance(supported_models, (list, tuple)):
    389         return supported_models

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:740, in _LazyAutoMapping.__getitem__(self, key)
    738 if model_type in self._model_mapping:
    739     model_name = self._model_mapping[model_type]
--> 740     return self._load_attr_from_module(model_type, model_name)
    742 # Maybe there was several model types associated with this config.
    743 model_types = [k for k, v in self._config_mapping.items() if v == key.__name__]

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:754, in _LazyAutoMapping._load_attr_from_module(self, model_type, attr)
    752 if module_name not in self._modules:
    753     self._modules[module_name] = importlib.import_module(f".{module_name}", "transformers.models")
--> 754 return getattribute_from_module(self._modules[module_name], attr)

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:698, in getattribute_from_module(module, attr)
    696 if isinstance(attr, tuple):
    697     return tuple(getattribute_from_module(module, a) for a in attr)
--> 698 if hasattr(module, attr):
    699     return getattr(module, attr)
    700 # Some of the mappings have entries model_type -> object of another model type. In that case we try to grab the
    701 # object at the top level.

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/transformers/utils/import_utils.py:1372, in _LazyModule.__getattr__(self, name)
   1370     value = self._get_module(name)
   1371 elif name in self._class_to_module.keys():
-> 1372     module = self._get_module(self._class_to_module[name])
   1373     value = getattr(module, name)
   1374 else:

File /work/arjunguha-research-group/arjun/venvs/jan2024/lib/python3.11/site-packages/transformers/utils/import_utils.py:1384, in _LazyModule._get_module(self, module_name)
   1382     return importlib.import_module("." + module_name, self.__name__)
   1383 except Exception as e:
-> 1384     raise RuntimeError(
   1385         f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its"
   1386         f" traceback):\n{e}"
   1387     ) from e

RuntimeError: Failed to import transformers.models.gpt_bigcode.modeling_gpt_bigcode because of the following error (look up to see its traceback):
module, class, method, function, traceback, frame, or code object was expected, got builtin_function_or_method

Answer 1 · 2024-01-12T16:29:10.000Z

@arjunguha What version is your transformers? I see this in the model card:

Model Summary
This is the same model as SantaCoder but it can be loaded with transformers >=4.28.1 to use the GPTBigCode architecture. We refer the reader to the SantaCoder model page for full documentation about this model

main: Uses the gpt_bigcode model. Requires the bigcode fork of transformers.
main_custom: Packaged with its modeling code. Requires transformers>=4.27. Alternatively, it can run on older versions by setting the configuration parameter activation_function = "gelu_pytorch_tanh".

Answer 2 · 2024-01-12T17:15:20.000Z

I'm using transformers 4.36.2. So, the main branch and not the version that has its own modeling code.

Answer 3 · 2024-04-07T19:48:56.000Z

SantaCoder seems to work in the current release of nnsight.

from nnsight import LanguageModel

model = LanguageModel("bigcode/gpt_bigcode-santacoder", device_map='cuda:0')

with model.trace("# the following python function computes the sqrt"):
    test = model.transformer.h[0].attn.output.save()

Closing this issue.