StatCan/aaw

llama-index: a data framework for your LLM application

Closed this issue · 16 comments

Hello, I am a PhD student working on a project on the DAS space.

Describe the solution you'd like

I would like to know if it would be possible to check the llama-index package for possible vulnerabilities. I was not able to find this package on jfrog page.

Additional context

If it can help the following link lead to the documentation https://docs.llamaindex.ai/en/stable/

Hi @P13tr092

Which namespace are you working from?

Can you provide us with more details on your use case for this package?

Hello Souheil,
I think the name space could be 4pointwayback. In case I am wrong where can I find the namespace?
Regarding LlamaIndex, this package serves as a powerful framework for working with Large Language Models. In my project, I will utilize this framework to explore and analyze documents through the Retrieval-Augmented Generation (RAG) process.
Let me know whether you need more details.

Hello, I'm currently looking into the issue. There was a vulnerability identified in one of the dependencies to llama-index. This is why pip times out when trying to install it. I will try to find a work-around and let you know.

Hello @jacek-dudek,
Thank you for your help!

Some initial discovery:

Tried to manually install llama-index package using pip. Some dependencies are not getting downloaded onto the notebook. I identified these two packages: requests-toolbelt, typing inspect.

pip output:
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPConnectionPool(host='jfrog-platform-artifactory.jfrog-system', port=8081): Read timed out. (read timeout=300.0)")': /artifactory/api/pypi/pypi-remote/packages/packages/65/f3/107a22063bf27bdccf2024833d3445f4eea42b2e598abfbd46f6a63b6cb0/typing_inspect-0.9.0-py3-none-any.whl

ERROR: Could not install packages due to an OSError: HTTPConnectionPool(host='jfrog-platform-artifactory.jfrog-system', port=8081): Max retries exceeded with url: /artifactory/api/pypi/pypi-remote/packages/packages/65/f3/107a22063bf27bdccf2024833d3445f4eea42b2e598abfbd46f6a63b6cb0/typing_inspect-0.9.0-py3-none-any.whl (Caused by ReadTimeoutError("HTTPConnectionPool(host='jfrog-platform-artifactory.jfrog-system', port=8081): Read timed out. (read timeout=300.0)"))

Looked into jfrog REST apis to see if I can query the jfrog server directly. I can. It appears that those two packages are there on the server. Here's jfrog output:

curl http://jfrog-platform-artifactory.jfrog-system:8081/artifactory/api/search/artifact?name=requests-toolbelt
{
"results" : [ {
"uri" : "https://jfrog.aaw.cloud.statcan.ca/artifactory/api/storage/conda-forge-remote-cache/noarch/requests-toolbelt-0.9.1-py_0.tar.bz2"
}, {
"uri" : "https://jfrog.aaw.cloud.statcan.ca/artifactory/api/storage/conda-forge-remote-cache/noarch/requests-toolbelt-0.10.1-pyhd8ed1ab_0.tar.bz2"
}, {
"uri" : "https://jfrog.aaw.cloud.statcan.ca/artifactory/api/storage/conda-forge-remote-cache/noarch/requests-toolbelt-1.0.0-pyhd8ed1ab_0.conda"
}, {
"uri" : "https://jfrog.aaw.cloud.statcan.ca/artifactory/api/storage/pypi-remote-cache/.pypi/requests-toolbelt.html"
} ]
}
curl http://jfrog-platform-artifactory.jfrog-system:8081/artifactory/api/search/artifact?name=typing-inspect
{
"results" : [ {
"uri" : "https://jfrog.aaw.cloud.statcan.ca/artifactory/api/storage/pypi-remote-cache/.pypi/typing-inspect.html"
} ]

Trying to install the latest versions of either of these two packages still errors out. The earliest thing that appears to go wrong is something related to cache entries.

pip -vv install typing-inspect==0.9.0
. . .
Looking up "http://jfrog-platform-artifactory.jfrog-system:8081/artifactory/api/pypi/pypi-remote/packages/packages/65/f3/107a22063bf27bdccf2024833d3445f4eea42b2e598abfbd46f6a63b6cb0/typing_inspect-0.9.0-py3-none-any.whl" in the cache
No cache entry available
No cache entry available
Incremented Retry for (url='/artifactory/api/pypi/pypi-remote/packages/packages/65/f3/107a22063bf27bdccf2024833d3445f4eea42b2e598abfbd46f6a63b6cb0/typing_inspect-0.9.0-py3-none-any.whl'): Retry(total=4, connect=None, read=None, redirect=None, status=None)
. . .
Will keep investigating.

@P13tr092 can I ask how critical the llama-indexer and spacy package are to your work?

If they are blockers/critical then I will look at whitelisting while we work on eval/mitigation.

Hi @Souheil-Yazji ,
These two packages are essential to my research project

Okay, sorry about the wait. We'll look into whitelisting this asap.

Great! thank you for your help @Souheil-Yazji

cc @jacek-dudek

Work Around for Non-pro-b Notebooks:

conda deactivate ##exit base env
conda create -n <your-env-name> python=3.11 numpy=1.24.4 ## I believe your requirements.txt file would work but you have to pin the python version down since we are on 3.12.7 which is not compatible with numpy<2.x.x
conda activate <your-env-name>
conda install llama-index
conda install spacy=3.7.5 && python -m spacy download en_core_web_sm

This will all work for a pro-b notebook except for python -m spacy download en_core_web_sm Let me know if that is needed, as we can locally host those files.

One thing I noticed is the usage of pip in the notebook you're using. Honestly conda just does a much better job resolving deps and giving accurate info that I default to it, even though it's not as lightweight.

Your requirements file seems to be good, except that python version doesn't seem pinned. Otherwise you can build that env and activate it, then change the notebook kernal to use that env instead :)

This should resolve all your problems.

@P13tr092 let me know if this worked for you :)

Confirmed that Souheil's conda workaround works on my notebook. (Only the python version needed to be pinned when creating the new environment actually.) If you don't insist on that specific version of spacy then you can also get a successful install of both llama-index and spacy with the default python version (3.12).

You'll need to pass an environment variable storing an api key from your openai account (OPENAI_API_KEY) when running your applications. I don't have a paid subscription so I'm not getting any output but the llama-index sample code seems to work otherwise.

Closing for now, if issue occurs again, reopen