Generating Postgres Schema fails

Question

Generating Postgres Schema fails

damianr13 opened this issue 2 years ago · 3 comments

When I use the generate_postgres_schema method as explained in the README I receive the following error from the API:

Traceback (most recent call last):
  File "/home/rishabhsriv/fsd-research/similar-vector-app/./app.py", line 204, in get_postgres_schema_gsheets
    temp_df = pd.read_csv(StringIO(csv))
  File "/home/rishabhsriv/.local/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/rishabhsriv/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/rishabhsriv/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 488, in _read
    return parser.read(nrows)
  File "/home/rishabhsriv/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1047, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "/home/rishabhsriv/.local/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 223, in read
    chunks = self._reader.read_low_memory(nrows)
  File "pandas/_libs/parsers.pyx", line 801, in pandas._libs.parsers.TextReader.read_low_memory
  File "pandas/_libs/parsers.pyx", line 857, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 1925, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 7, saw 5

I simply used the function, without doing anything else to the schema objects sent to the API.

Some particularities in my database:

I exported 16 tables all linked with foreign keys among them
I have some user defined columns
I am using a lot of different postgres types such as: JSON, JSONB, ARRAY, DATE, etc.

P.S.: It's the same schema I sent over email yesterday

Answer 1 · 2023-01-19T17:27:11.000Z

Thank you for the bug report @damianr13! We're working on fixing this. It is being caused because there are more words in the generated schema than our system was tested with. We have some ideas of how to fix this – will do that in the next 2 days!

Answer 2 · 2023-01-19T23:44:13.000Z

We have partially fixed this issue – full fix coming in another 2 hours or so.

Answer 3 · 2023-01-20T00:15:44.000Z

Hi there! I think the issue is fixed. We are able to generate queries from your schema right now – though the quality might not be very high because we have not made any changes to the auto-generated metadata. Please do give it a shot and let us know? :)

>>> r = requests.post(url, json={
...   "question": "what are 10 products that are available and have the highest average ratings?",
...   "api_key": api_key,
...   "db_type": "postgres"
... })
>>> print(r.json()['query_generated'])

SELECT bp.name, bp.url, AVG(rp.review_average) AS avg_rating
FROM brand_product bp
JOIN retailer_product rp ON bp.id = rp.brand_product_id
WHERE rp.availability = 'available'
GROUP BY bp.name, bp.url
ORDER BY avg_rating DESC
LIMIT 10;
>>>

Query quality will improve significantly with better metadata definition. We will make further improvements in the near future too – like being able to define what a table is for etc.

Thanks for the bug report again! Would love to get your feedback and will fix any issue as soon as we can.