taoyds/sparc

Annotation Issues [Please report any annotation errors here, thanks!]

taoyds opened this issue · 5 comments

Hi,

Even though our group spent a lot of time and effort on creating the SParC dataset, there definitely exist some annotation errors. We would appreciate your input if you report your findings here. Our group will try our best to correct the errors in our next release.

Thanks for your interest!

Best,
Tao

Hi Tao, I encounter a error caused by the SQL evaluation script. When there is any column whose name contains brackets in the predicted SQL, the evaluation script raises exception. I guess you use '(' or ')' to split the SQL, and it may be the origin of the exception. (sorry for not looking at the script in detail).

For reproduction, the example SQL is as following:

SELECT performance.official_ratings_(millions) FROM performance

The column name is Official_ratings_(millions) (you could find it in tables.json), hope it is helpful for you :)

Hi,Tao.
In the follow interaction(train.json):
NLQ:"How about the problems on which closure is authorized by him?"
SQL:

SELECT  FROM staff AS T1 JOIN problems AS T2 ON T1.staff_id = T2.closure_authorised_by_staff_id WHERE T1.staff_first_name = \"Rylan\" AND T1.staff_last_name = \"Homenick\"

NLQ:"List records of customers living in city Lockmanfurt."
SQL:

SELECT  FROM Customers AS T1 JOIN Addresses AS T2 ON T1.customer_address_id = T2.address_id WHERE T2.city = \"Lockmanfurt\"

The select part is null.

Confused about order_by
The example In the dev.json:

NLQ: ['Which', 'breed', 'codes', 'are', 'the', 'most', 'popular', 'two', '?']
SQL:

 SELECT breed_code, count(*) FROM Dogs GROUP BY breed_code limit 2

Why not use the following form?

SELECT breed_code FROM Dogs GROUP BY breed_code ORDER BY count(*) DESC LIMIT 2

There are other examples like above, I will not list them all.

In the dev.json:

"utterance": "Order the dog ages in descending order.",
"query": "SELECT age FROM Dogs ORDER BY age",
"sql": {
"orderBy": [
                        "asc",
                        [
                            [
                                0,
                                [
                                    0,
                                    26,
                                    false
                                ],
                                null
                            ]
                        ]
                    ],
...

Based on my understanding of context,the utterance should be:
utterance": "Order the dog ages in ascending order.

In the train.json, there are four empty interactions,as following:

{
        "database_id": "college_3", 
        "interaction": [], 
        "final": {
            "query": "SELECT CName FROM COURSE WHERE Credits  =  1", 
            "utterance": "What are the names of courses with 1 credit?"
        }
    }

{
        "database_id": "game_1", 
        "interaction": [], 
        "final": {
            "query": "SELECT count(*) FROM Video_games WHERE gtype  =  \"Massively multiplayer online game\"", 
            "utterance": "Count the number of video games with Massively multiplayer online game type ."
        }
    }

 {
        "database_id": "game_1", 
        "interaction": [], 
        "final": {
            "query": "SELECT gname FROM Plays_games AS T1 JOIN Video_games AS T2 ON T1.gameid  =  T2.gameid GROUP BY T1.gameid HAVING sum(hours_played)  >=  1000", 
            "utterance": "What are the names of all the games that have been played for at least 1000 hours?"
        }
    }

{
        "database_id": "game_1", 
        "interaction": [], 
        "final": {
            "query": "SELECT Gname FROM Plays_games AS T1 JOIN Video_games AS T2 ON T1.gameid  =  T2.gameid JOIN Student AS T3 ON T3.Stuid  =  T1.Stuid WHERE T3.Lname  =  \"Smith\" AND T3.Fname  =  \"Linda\"", 
            "utterance": "What are the names of all games played by Linda Smith?"
        }
    }