Error in poetry poe run-digital-data-etl

Question

Error in poetry poe run-digital-data-etl

Opened this issue a month ago · 2 comments

The following error occurred when running the command poetry poe run-digital-data-etl in the command line.

(LLM-Engineers-Handbook) PS C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook> poetry poe run-digital-data-etl                                                                                           
Poe => poetry run python -m tools.run --run-etl --no-cache --etl-config-filename digital_data_etl_maxime_labonne.yaml
2024-11-13 21:05:06.001 | INFO     | llm_engineering.settings:load_settings:94 - Loading settings from the ZenML secret store.
Your ZenML client version (0.67.0) does not match the server version (0.68.1). This version mismatch might lead to errors or unexpected behavior. 
To disable this warning message, set the environment variable ZENML_DISABLE_CLIENT_SERVER_MISMATCH_WARNING=True
2024-11-13 21:05:08.831 | WARNING  | llm_engineering.settings:load_settings:99 - Failed to load settings from the ZenML secret store. Defaulting to loading the settings from the '.env' file.
2024-11-13 21:05:08.929 | INFO     | llm_engineering.infrastructure.db.mongo:__new__:20 - Connection to MongoDB with URI successful: mongodb://llm_engineering:llm_engineering@127.0.0.1:27017
PyTorch version 2.4.0 available.
2024-11-13 21:05:12.004 | INFO     | llm_engineering.infrastructure.db.qdrant:__new__:29 - Connection to Qdrant DB with URI successful: localhost:6333
Chromedriver is already installed.
USER_AGENT environment variable not set, consider setting it to identify your requests.
sagemaker.config INFO - Not applying SDK defaults from location: C:\ProgramData\sagemaker\sagemaker\config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: C:\Users\jefer\AppData\Local\sagemaker\sagemaker\config.yaml
Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2
C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884      
  warnings.warn(
Initiating a new run for the pipeline: digital_data_etl.
Not including stack component settings with key orchestrator.sagemaker.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in _run_module_as_main:198                                                                       │
│ in _run_code:88                                                                                  │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\tools\run.py:200 in <module>         │
│                                                                                                  │
│   197                                                                                            │
│   198                                                                                            │
│   199 if __name__ == "__main__":                                                                 │
│ ❱ 200 │   main()                                                                                 │
│   201                                                                                            │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\click\core.p │
│ y:1130 in __call__                                                                               │
│                                                                                                  │
│   1127 │                                                                                         │
│   1128 │   def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any:                           │
│   1129 │   │   """Alias for :meth:`main`."""                                                     │
│ ❱ 1130 │   │   return self.main(*args, **kwargs)                                                 │
│   1131                                                                                           │
│   1132                                                                                           │
│   1133 class Command(BaseCommand):                                                               │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\click\core.p │
│ y:1055 in main                                                                                   │
│                                                                                                  │
│   1052 │   │   try:                                                                              │
│   1053 │   │   │   try:                                                                          │
│   1054 │   │   │   │   with self.make_context(prog_name, args, **extra) as ctx:                  │
│ ❱ 1055 │   │   │   │   │   rv = self.invoke(ctx)                                                 │
│   1056 │   │   │   │   │   if not standalone_mode:                                               │
│   1057 │   │   │   │   │   │   return rv                                                         │
│   1058 │   │   │   │   │   # it's not safe to `ctx.exit(rv)` here!                               │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\click\core.p │
│ y:1404 in invoke                                                                                 │
│                                                                                                  │
│   1401 │   │   │   echo(style(message, fg="red"), err=True)                                      │
│   1402 │   │                                                                                     │
│   1403 │   │   if self.callback is not None:                                                     │
│ ❱ 1404 │   │   │   return ctx.invoke(self.callback, **ctx.params)                                │
│   1405 │                                                                                         │
│   1406 │   def shell_complete(self, ctx: Context, incomplete: str) -> t.List["CompletionItem"]:  │
│   1407 │   │   """Return a list of completions for the incomplete value. Looks                   │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\click\core.p │
│ y:760 in invoke                                                                                  │
│                                                                                                  │
│    757 │   │                                                                                     │
│    758 │   │   with augment_usage_errors(__self):                                                │
│    759 │   │   │   with ctx:                                                                     │
│ ❱  760 │   │   │   │   return __callback(*args, **kwargs)                                        │
│    761 │                                                                                         │
│    762 │   def forward(                                                                          │
│    763 │   │   __self, __cmd: "Command", *args: t.Any, **kwargs: t.Any  # noqa: B902             │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\tools\run.py:159 in main             │
│                                                                                                  │
│   156 │   │   pipeline_args["config_path"] = root_dir / "configs" / etl_config_filename          │
│   157 │   │   assert pipeline_args["config_path"].exists(), f"Config file not found: {pipeline   │
│   158 │   │   pipeline_args["run_name"] = f"digital_data_etl_run_{dt.now().strftime('%Y_%m_%d_   │
│ ❱ 159 │   │   digital_data_etl.with_options(**pipeline_args)(**run_args_etl)                     │
│   160 │                                                                                          │
│   161 │   if run_export_artifact_to_json:                                                        │
│   162 │   │   run_args_etl = {}                                                                  │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\zenml\new\pi │
│ pelines\pipeline.py:1386 in __call__                                                             │
│                                                                                                  │
│   1383 │   │   │   return self.entrypoint(*args, **kwargs)                                       │
│   1384 │   │                                                                                     │
│   1385 │   │   self.prepare(*args, **kwargs)                                                     │
│ ❱ 1386 │   │   return self._run(**self._run_args)                                                │
│   1387 │                                                                                         │
│   1388 │   def _call_entrypoint(self, *args: Any, **kwargs: Any) -> None:                        │
│   1389 │   │   """Calls the pipeline entrypoint function with the given arguments.               │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\zenml\new\pi │
│ pelines\pipeline.py:748 in _run                                                                  │
│                                                                                                  │
│    745 │   │   │   │   code_path=code_path,                                                      │
│    746 │   │   │   │   **deployment.model_dump(),                                                │
│    747 │   │   │   )                                                                             │
│ ❱  748 │   │   │   deployment_model = Client().zen_store.create_deployment(                      │
│    749 │   │   │   │   deployment=deployment_request                                             │
│    750 │   │   │   )                                                                             │
│    751                                                                                           │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\zenml\zen_st │
│ ores\rest_zen_store.py:1544 in create_deployment                                                 │
│                                                                                                  │
│   1541 │   │   Returns:                                                                          │
│   1542 │   │   │   The newly created deployment.                                                 │
│   1543 │   │   """                                                                               │
│ ❱ 1544 │   │   return self._create_workspace_scoped_resource(                                    │
│   1545 │   │   │   resource=deployment,                                                          │
│   1546 │   │   │   route=PIPELINE_DEPLOYMENTS,                                                   │
│   1547 │   │   │   response_model=PipelineDeploymentResponse,                                    │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\zenml\zen_st │
│ ores\rest_zen_store.py:4362 in _create_workspace_scoped_resource                                 │
│                                                                                                  │
│   4359 │   │   Returns:                                                                          │
│   4360 │   │   │   The created resource.                                                         │
│   4361 │   │   """                                                                               │
│ ❱ 4362 │   │   return self._create_resource(                                                     │
│   4363 │   │   │   resource=resource,                                                            │
│   4364 │   │   │   response_model=response_model,                                                │
│   4365 │   │   │   route=f"{WORKSPACES}/{str(resource.workspace)}{route}",                       │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\zenml\zen_st │
│ ores\rest_zen_store.py:4341 in _create_resource                                                  │
│                                                                                                  │
│   4338 │   │   """                                                                               │
│   4339 │   │   response_body = self.post(f"{route}", body=resource, params=params)               │
│   4340 │   │                                                                                     │
│ ❱ 4341 │   │   return response_model.model_validate(response_body)                               │
│   4342 │                                                                                         │
│   4343 │   def _create_workspace_scoped_resource(                                                │
│   4344 │   │   self,                                                                             │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\pydantic\mai │
│ n.py:568 in model_validate                                                                       │
│                                                                                                  │
│    565 │   │   """                                                                               │
│    566 │   │   # `__tracebackhide__` tells pytest and some other tools to omit this function fr  │
│    567 │   │   __tracebackhide__ = True                                                          │
│ ❱  568 │   │   return cls.__pydantic_validator__.validate_python(                                │
│    569 │   │   │   obj, strict=strict, from_attributes=from_attributes, context=context          │
│    570 │   │   )                                                                                 │
│    571                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValidationError: 2 validation errors for PipelineDeploymentResponse
metadata.step_configurations.get_or_create_user.config.outputs.user.artifact_config
  Extra inputs are not permitted [type=extra_forbidden, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/extra_forbidden
metadata.step_configurations.crawl_links.config.outputs.crawled_links.artifact_config
  Extra inputs are not permitted [type=extra_forbidden, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/extra_forbidden
Error: Sequence aborted after failed subtask 'run-digital-data-etl-maxime'

Python: 3.11.8
Sistema: Windows
Versão do SO: 10.0.22631
Nome de lançamento: 10
Arquitetura: AMD64
Versão completa: Windows-10-10.0.22631-SP0

Package            Version
------------------ ---------
alembic            1.8.1
annotated-types    0.7.0
asttokens          2.4.1
bcrypt             4.0.1
certifi            2024.8.30
charset-normalizer 3.4.0
click              8.1.3
cloudpickle        2.2.1
colorama           0.4.6
comm               0.2.1
debugpy            1.8.0
decorator          5.1.1
distro             1.9.0
docker             7.1.0
executing          2.0.1
gitdb              4.0.11
GitPython          3.1.43
greenlet           3.1.1
idna               3.10
ipykernel          6.29.0
ipython            8.20.0
ipywidgets         8.1.5
jedi               0.19.1
jupyter_client     8.6.0
jupyter_core       5.7.1
jupyterlab_widgets 3.0.13
Mako               1.3.6
markdown-it-py     3.0.0
MarkupSafe         3.0.2
matplotlib-inline  0.1.6
mdurl              0.1.2
mysqlclient        2.2.0
nest-asyncio       1.6.0
packaging          24.2
parso              0.8.3
passlib            1.7.4
pip                24.0
platformdirs       4.1.0
prompt-toolkit     3.0.43
psutil             5.9.8
pure-eval          0.2.2
pydantic           2.8.2
pydantic_core      2.20.1
pydantic-settings  2.6.1
Pygments           2.17.2
PyMySQL            1.1.1
python-dateutil    2.8.2
python-dotenv      1.0.1
pywin32            306
PyYAML             6.0.2
pyzmq              25.1.2
requests           2.32.3
rich               13.9.4
setuptools         65.5.0
six                1.16.0
smmap              5.0.1
SQLAlchemy         2.0.35
SQLAlchemy-Utils   0.41.2
sqlmodel           0.0.18
stack-data         0.6.3
tornado            6.4
traitlets          5.14.1
typing_extensions  4.12.2
urllib3            2.2.3
wcwidth            0.2.13
widgetsnbextension 4.0.13
zenml              0.68.1

Answer 1 · 2024-12-03T21:04:33.000Z

Did you run 'poetry poe local-infrastructure-up' before attempting to run 'poetry poe run-digital-data-etl' ?

Are you able to see a local instance of mongodb running on your system?

Answer 2 · 2024-12-15T19:23:47.000Z

I also got an error at this step, albeit a different one:

OperationFailure: Authentication failed., full error: {'ok': 0.0, 'errmsg': 'Authentication failed.', 'code': 18, 'codeName':
'AuthenticationFailed'}
Error: Sequence aborted after failed subtask 'run-digital-data-etl-maxime'

This happens when I run with the default mongodb connection string after running 'poetry poe local-infrastructure-up'. The mongodb container appears to be running:

docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d056ea7e97a1 qdrant/qdrant:latest "./entrypoint.sh" 3 days ago Up 52 minutes 0.0.0.0:6333-6334->6333-6334/tcp llm_engineering_qdrant
61ec2b940bfb mongo:latest "docker-entrypoint.s…" 3 days ago Up 52 minutes 0.0.0.0:27017->27017/tcp llm_engineering_mongo

If I change the mongodb connection string in .env to connect to a cloud instance the command works