Generating Postgres Schema fails
damianr13 opened this issue · 3 comments
When I use the generate_postgres_schema
method as explained in the README I receive the following error from the API:
Traceback (most recent call last):
File "/home/rishabhsriv/fsd-research/similar-vector-app/./app.py", line 204, in get_postgres_schema_gsheets
temp_df = pd.read_csv(StringIO(csv))
File "/home/rishabhsriv/.local/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/rishabhsriv/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/rishabhsriv/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 488, in _read
return parser.read(nrows)
File "/home/rishabhsriv/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1047, in read
index, columns, col_dict = self._engine.read(nrows)
File "/home/rishabhsriv/.local/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 223, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 801, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 857, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 1925, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 7, saw 5
I simply used the function, without doing anything else to the schema objects sent to the API.
Some particularities in my database:
- I exported 16 tables all linked with foreign keys among them
- I have some user defined columns
- I am using a lot of different postgres types such as: JSON, JSONB, ARRAY, DATE, etc.
P.S.: It's the same schema I sent over email yesterday
Thank you for the bug report @damianr13! We're working on fixing this. It is being caused because there are more words in the generated schema than our system was tested with. We have some ideas of how to fix this – will do that in the next 2 days!
We have partially fixed this issue – full fix coming in another 2 hours or so.
Hi there! I think the issue is fixed. We are able to generate queries from your schema right now – though the quality might not be very high because we have not made any changes to the auto-generated metadata. Please do give it a shot and let us know? :)
>>> r = requests.post(url, json={
... "question": "what are 10 products that are available and have the highest average ratings?",
... "api_key": api_key,
... "db_type": "postgres"
... })
>>> print(r.json()['query_generated'])
SELECT bp.name, bp.url, AVG(rp.review_average) AS avg_rating
FROM brand_product bp
JOIN retailer_product rp ON bp.id = rp.brand_product_id
WHERE rp.availability = 'available'
GROUP BY bp.name, bp.url
ORDER BY avg_rating DESC
LIMIT 10;
>>>
Query quality will improve significantly with better metadata definition. We will make further improvements in the near future too – like being able to define what a table is for etc.
Thanks for the bug report again! Would love to get your feedback and will fix any issue as soon as we can.