mandiant/Ghidrathon

Cannot use transformers package

Forsworns opened this issue · 8 comments

Thanks for your great work on this plugin.

My env:

MacOS 12.5 (x86), Python 3.11, Ghidrathon master branch, Ghidra 10.4

Background

Recently I'm trying to write a script like Gepetto to analyze bytecodes with llm. The langchain pacakge seems work well with Ghidrathon. But I want to use a local llm (after all, it's free, lol). So I played with Hugging Face and try to use transformers in Ghidrathon.

Problem

But I cannot import transformers, just try this in Ghidrathon:

from transformers import AutoTokenizer

and I got a CPython error

File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/data/__init__.py", line 26, in <module>
    from .metrics import glue_compute_metrics, xnli_compute_metrics
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/data/metrics/__init__.py", line 19, in <module>
    from scipy.stats import pearsonr, spearmanr
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/scipy/stats/__init__.py", line 608, in <module>
    from ._stats_py import *
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/scipy/stats/_stats_py.py", line 37, in <module>
    from numpy.testing import suppress_warnings
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/jep/shared_modules_hook.py", line 134, in exec_module
    self.sharedImporter.sharedImport(fullname)
TypeError: CalledProcessError.__init__() missing 1 required positional argument: 'cmd'

I guess it was due to the lazy module loading in transformers pakage. And I don't know if I need to place some package under PYTHON_SHARED_MODULES or PYTHON_INCLUDE_PATHS. Or if this related to a known limitation of JEP.

Hi @Forsworns thank you for reporting. I'm trying to recreate on my end and received the following error:

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.

To better help us troubleshoot this issue can you please provide your exact setup?

Yep, you have to make sure you can use transformer with the native python interpreter at first. For me, I use Pytorch.

But I cann't provide an exact requirement.txt, since I don't use a virtual env, and there may be some packages not related to the transformers as following:

https://gist.github.com/Forsworns/0a80a60cb8c362bb36a9ea4de4398370#file-requirements-txt

The pytorch is installed by pip3 install torch torchvision torchaudio on my Mac. And you can find that there is not cuda support installed, on a platform with cuda, don't know if the problem for JEP compatibility will get worse.

Hi @Forsworns - based on feedback from the Jep developers we just pushed a PR (#61) to switch Ghidrathon from using sub-interpreters to shared interpreters. This should enable the use of (C)Python modules that do not support sub-interpreters. Could you test out the referenced PR to see if your issue is resolved?

The latest release fixes this issue in my local tests. Please reopen if you continue to experience issues with the latest release.

@mike-hunhoff Thanks! Sorry for the late response, the last comments was buried in the mailbox list.

  • Do we still need the xml configuration file to tell the Ghidrathon which package should be treated as shared?

  • I import another package

from transformers import pipeline

and get the following error, it seems my platform problem:

Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1086, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/pipelines/__init__.py", line 60, in <module>
    from .document_question_answering import DocumentQuestionAnsweringPipeline
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/pipelines/document_question_answering.py", line 29, in <module>
    from .question_answering import select_starts_ends
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/pipelines/question_answering.py", line 8, in <module>
    from ..data import SquadExample, SquadFeatures, squad_convert_examples_to_features
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/data/__init__.py", line 26, in <module>
    from .metrics import glue_compute_metrics, xnli_compute_metrics
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/data/metrics/__init__.py", line 19, in <module>
    from scipy.stats import pearsonr, spearmanr
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/scipy/stats/__init__.py", line 608, in <module>
    from ._stats_py import *
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/scipy/stats/_stats_py.py", line 37, in <module>
    from numpy.testing import suppress_warnings
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/numpy/testing/__init__.py", line 11, in <module>
    from ._private.utils import *
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 64, in <module>
    _tags = list(sys_tags())
            ^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/packaging/tags.py", line 536, in sys_tags
    yield from cpython_tags(warn=warn)
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/packaging/tags.py", line 211, in cpython_tags
    platforms = list(platforms or platform_tags())
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/packaging/tags.py", line 399, in mac_platforms
    version_str = subprocess.run(
                  ^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/java', '-sS', '-c', 'import platform; print(platform.mac_ver()[0])']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/kyrie/.ghidra/.ghidra_10.4_DEV/Extensions/Ghidrathon-2.2.0/data/python/jepeval.py", line 66, in jepeval
    more_input_needed = _jepeval(line)
                        ^^^^^^^^^^^^^^
  File "/Users/kyrie/.ghidra/.ghidra_10.4_DEV/Extensions/Ghidrathon-2.2.0/data/python/jepeval.py", line 49, in _jepeval
    exec(compile(line, "<string>", "single"), globals(), globals())
  File "<string>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 1229, in _handle_fromlist
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1076, in __getattr__
    module = self._get_module(self._class_to_module[name])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1088, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.pipelines because of the following error (look up to see its traceback):
Command '['/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/java', '-sS', '-c', 'import platform; print(platform.mac_ver()[0])']' returned non-zero exit status 1.

@mike-hunhoff seems it should call python -sS -c 'import platform; print(platform.mac_ver()[0])' instead of java -sS -c 'import platform; print(platform.mac_ver()[0])' in the error above, rather confusing ...

@Forsworns oh this is an interesting one. Based on the output that you provided I tracked the offending lines of code to packaging.types: https://github.com/pypa/packaging/blob/3030822b0b06a43fbbb2710da7b0846d1bebd2ba/src/packaging/tags.py#L399-L410

The code calls subprocess.run where the target executable is pulled from sys.executable. sys.executable is set correctly in my Linux test environment:

>>> import sys
>>> sys.executable
'/home/spring/dragon/.env/bin/python3`

Can you run the code listed above from your Ghidrathon interpreter? I'm guessing sys.executable is set to the Java executable used to run Ghidra in your Mac environment...

It appears the Jep developers know of this issue: https://github.com/ninia/jep/blob/056ce9907f5ecbf2364df1ec55755404b2e8a947/commands/test.py#L73-L76 which they avoid by setting the environment variable PYTHONEXECUTABLE. Ghidrathon probably needs to set this as well...or manually configure sys.executable during the init phase if running in a Mac environment. Let's open a new issue to track this work