AmenRa/retriv

[BUG] Segmentation fault (core dumped)

celsofranssa opened this issue · 1 comments

First of all, thank you for this excellent library.

Describe the bug

Building TDF matrix: 100%|███████████████████████████████████████████████| 13905/13905 [00:34<00:00, 408.07it/s]
Building inverted index: 100%|███████████████████████████████████████| 148864/148864 [00:10<00:00, 14750.18it/s]
Batch search:   0%|                                                                   | 0/13905 [00:00<?, ?it/s]
Segmentation fault      (core dumped)

I am getting Segmentation fault (core dumped) when using bsearch in Sparse Retriever.

Current environment
  • CUDA:
    - GPU:
    - NVIDIA GeForce RTX 3090
    - available: True
    - version: 12.1

  • Packages:
    - absl-py: 2.0.0
    - accelerate: 0.24.1
    - aiohttp: 3.8.6
    - aiosignal: 1.3.1
    - alembic: 1.12.1
    - antlr4-python3-runtime: 4.9.3
    - appdirs: 1.4.4
    - async-timeout: 4.0.3
    - attrs: 23.1.0
    - autofaiss: 2.15.8
    - beautifulsoup4: 4.12.2
    - bleach: 6.1.0
    - cachetools: 5.3.2
    - cbor: 1.0.0
    - cbor2: 5.5.1
    - certifi: 2023.7.22
    - charset-normalizer: 3.3.2
    - click: 8.1.7
    - colorlog: 6.7.0
    - contourpy: 1.2.0
    - cramjam: 2.7.0
    - cycler: 0.12.1
    - dill: 0.3.7
    - docker-pycreds: 0.4.0
    - embedding-reader: 1.5.1
    - faiss-cpu: 1.7.4
    - fastparquet: 2023.10.1
    - filelock: 3.13.1
    - fire: 0.4.0
    - fonttools: 4.44.0
    - frozenlist: 1.4.0
    - fsspec: 2023.10.0
    - gitdb: 4.0.11
    - gitpython: 3.1.40
    - google-auth: 2.23.4
    - google-auth-oauthlib: 1.1.0
    - greenlet: 3.0.1
    - grpcio: 1.59.2
    - huggingface-hub: 0.17.3
    - hydra-core: 1.3.2
    - idna: 3.4
    - ijson: 3.2.3
    - indxr: 0.1.5
    - inscriptis: 2.3.2
    - ir-datasets: 0.5.5
    - jinja2: 3.1.2
    - joblib: 1.3.2
    - kaggle: 1.5.16
    - keybert: 0.8.3
    - kiwisolver: 1.4.5
    - krovetzstemmer: 0.8
    - lightning-utilities: 0.9.0
    - llvmlite: 0.41.1
    - lxml: 4.9.3
    - lz4: 4.3.2
    - mako: 1.3.0
    - markdown: 3.5.1
    - markdown-it-py: 3.0.0
    - markupsafe: 2.1.3
    - matplotlib: 3.8.1
    - mdurl: 0.1.2
    - mpmath: 1.3.0
    - multidict: 6.0.4
    - multipipe: 0.1.0
    - multiprocess: 0.70.15
    - networkx: 3.2.1
    - nltk: 3.8.1
    - nmslib: 2.1.1
    - numba: 0.58.1
    - numpy: 1.26.1
    - nvidia-cublas-cu12: 12.1.3.1
    - nvidia-cuda-cupti-cu12: 12.1.105
    - nvidia-cuda-nvrtc-cu12: 12.1.105
    - nvidia-cuda-runtime-cu12: 12.1.105
    - nvidia-cudnn-cu12: 8.9.2.26
    - nvidia-cufft-cu12: 11.0.2.54
    - nvidia-curand-cu12: 10.3.2.106
    - nvidia-cusolver-cu12: 11.4.5.107
    - nvidia-cusparse-cu12: 12.1.0.106
    - nvidia-nccl-cu12: 2.18.1
    - nvidia-nvjitlink-cu12: 12.3.52
    - nvidia-nvtx-cu12: 12.1.105
    - oauthlib: 3.2.2
    - omegaconf: 2.3.0
    - oneliner-utils: 0.1.2
    - optuna: 3.4.0
    - orjson: 3.9.10
    - packaging: 23.2
    - pandas: 1.5.3
    - pillow: 10.1.0
    - pip: 23.3.1
    - protobuf: 4.23.4
    - psutil: 5.9.6
    - pyarrow: 12.0.1
    - pyasn1: 0.5.0
    - pyasn1-modules: 0.3.0
    - pyautocorpus: 0.1.12
    - pybind11: 2.6.1
    - pygments: 2.16.1
    - pyparsing: 3.1.1
    - pystemmer: 2.0.1
    - python-dateutil: 2.8.2
    - python-slugify: 8.0.1
    - pytorch-lightning: 2.1.1
    - pytorch-metric-learning: 2.3.0
    - pytz: 2023.3.post1
    - pyyaml: 6.0.1
    - ranx: 0.3.18
    - regex: 2023.10.3
    - requests: 2.31.0
    - requests-oauthlib: 1.3.1
    - retriv: 0.2.3
    - rich: 13.6.0
    - rsa: 4.9
    - safetensors: 0.4.0
    - scikit-learn: 1.3.2
    - scipy: 1.11.3
    - seaborn: 0.13.0
    - sentence-transformers: 2.2.2
    - sentencepiece: 0.1.99
    - sentry-sdk: 1.39.1
    - setproctitle: 1.3.3
    - setuptools: 68.2.2
    - six: 1.16.0
    - smmap: 5.0.1
    - soupsieve: 2.5
    - sqlalchemy: 2.0.23
    - sympy: 1.12
    - tabulate: 0.9.0
    - tensorboard: 2.15.1
    - tensorboard-data-server: 0.7.2
    - termcolor: 2.3.0
    - text-unidecode: 1.3
    - threadpoolctl: 3.2.0
    - tokenizers: 0.14.1
    - torch: 2.1.0
    - torchaudio: 2.1.0
    - torchmetrics: 1.2.0
    - torchvision: 0.16.0
    - tqdm: 4.66.1
    - transformers: 4.35.0
    - trec-car-tools: 2.6
    - triton: 2.1.0
    - typing-extensions: 4.8.0
    - unidecode: 1.3.7
    - unlzw3: 0.2.2
    - urllib3: 2.0.7
    - wandb: 0.16.1
    - warc3-wet: 0.2.3
    - warc3-wet-clueweb09: 0.2.5
    - webencodings: 0.5.1
    - werkzeug: 3.0.1
    - wheel: 0.41.2
    - yarl: 1.9.2
    - zlib-state: 0.1.6

  • System:
    - OS: Linux
    - architecture:
    - 64bit
    - ELF
    - processor: x86_64
    - python: 3.10.13
    - release: 5.15.0-88-generic
    - version: #98~20.04.1-Ubuntu SMP Mon Oct 9 16:43:45 UTC 2023

I had this issue before, and the reason is the query was too long in my experiment