databricks/lilac

Error Computing Embedding with Parquet File in Lilac v0.1.24 on Windows 11

Opened this issue · 1 comments

Greetings,

I am trying to speed up my project by transitioning over from my long list (~200,000) of JSON files to the .parquet file that is created from this project (data-0000-of-00001.parquet)

I loaded up a new project with this corresponding .parquet file (and a subset of the data, 10,000 rows) successfully.

However, when trying to compute an embedding on the text, I am running into this error:

syntax_error

I am wondering:
(a) might this be a Windows-specific syntax error?
I tried changing directories using the mixture of forward and double backslashes in powershell and it worked okay, so this part of the error message is throwing me off a little bit.
image

(b) Is this key I'm seeing in the screenshot one that is unique to the parquet file in that it is referring to key-value pairs for the other Lilac project (and perhaps I wasn't supposed to upload this parquet file into another project?)

OS: Windows 11
Lilac Version: 0.1.24

I have the same situation when computing the clusters...