[BUG] AutoRetriever occured error on windows os.
histmeisah opened this issue · 1 comments
Required prerequisites
- I have read the documentation https://camel-ai.github.io/camel/camel.html.
- I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- Consider asking first in a Discussion.
What version of camel are you using?
0.1.6.4
System information
windows 11
python 3.10
camel 0.1.6.4
Problem description
Bug Report: Invalid Collection Name Generation in AutoRetriever on Windows OS
Demo
Run this demo :https://colab.research.google.com/drive/1qs5zqQ3LrTTaPqa6ykShklmKps8fycmY?usp=sharing
On my own windows PC.
Description
The AutoRetriever
class in the CAMEL library is generating invalid collection names when processing URLs, leading to a WinError 123
(The filename, directory name, or volume label syntax is incorrect) when trying to create or access the vector storage.
Steps to Reproduce
- Initialize an
AutoRetriever
instance. - Call the
run_vector_retriever
method with a list of URLs as thecontents
parameter. - The method fails when trying to create a Qdrant collection with an invalid name.
Expected Behavior
The _collection_name_generator
method should create a valid collection name for any input, including URLs with special characters.
Actual Behavior
The method creates invalid collection names for some URLs, causing the QdrantStorage
initialization to fail with a WinError 123
.
Error Message
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'local_data\\collection\\![](https:'
Proposed Solution
Modify the _collection_name_generator
method in the AutoRetriever
class to ensure it always produces a valid collection name:
def _collection_name_generator(self, content: str) -> str:
parsed_url = urlparse(content)
is_url = all([parsed_url.scheme, parsed_url.netloc])
if is_url:
# Use a stricter character replacement for URLs
collection_name = re.sub(
r'[^a-zA-Z0-9]+',
'_',
parsed_url.netloc + parsed_url.path
)
elif os.path.exists(content):
collection_name = re.sub(r'[^a-zA-Z0-9]+', '_', Path(content).stem)
else:
collection_name = re.sub(r'[^a-zA-Z0-9]+', '_', content[:30])
# Ensure the name starts with a letter
collection_name = re.sub(r'^[^a-zA-Z]+', '', collection_name)
# Remove leading and trailing underscores
collection_name = collection_name.strip('_')
# Use a default name if empty
if not collection_name:
collection_name = 'default_collection'
# Limit length
return collection_name[:30]
Additionally, add error handling in the run_vector_retriever
method:
try:
collection_name = self._collection_name_generator(content)
print(f"Generated collection name: {collection_name}") # For debugging
vector_storage_instance = self._initialize_vector_storage(collection_name)
# ... rest of the method
except Exception as e:
print(f"Error processing content: {content}")
print(f"Error details: {str(e)}")
continue # Skip this content and continue with the next
Reproducible example code
The Python snippets:
Command lines:
Extra dependencies:
Steps to reproduce:
Traceback
No response
Expected behavior
No response
Additional context
No response
Hey @histmeisah , thanks for the issue! The bug has been fixed in #872 , You can try it out in version 0.1.6.7 +