anthonydb/practical-sql

Chapter 4: COPY us_counties_2010...

CactusKurt opened this issue · 22 comments

COPY us_counties_2010
FROM 'C:\YourDirectory\us_counties_2010.csv'
WITH (FORMAT CSV, HEADER);

will give the error
ERROR: invalid byte sequence for encoding "UTF8": 0xf1 0x61 0x20 0x41
CONTEXT: COPY us_counties_2010, line 1804
SQL state: 22021

This is because line 1804 contains a tilde in the NAME column "Dona Ana County"

Hi, @CactusKurt --

Would you let me know:

  • Which OS you're using
  • How you installed PostgreSQL
  • Whether you ran the query in pgAdmin or from another GUI or the command line.

Thanks!

Running Windows 10 w/ latest updates
Installed PostgreSQL (64 bit) from the EnterpriseDB (EDB) site (Version 3.0) download
I ran the query in pgAdmin

Using a fresh download of PostgreSQL 10.4 from EnterpriseDB on Windows 10, I'm not able to replicate this. I'm glad to investigate further, though. If you would please:

  • Launch pgAdmin and left click on the analysis database you are using for the book exercises (assuming you're following the book and named the DB that way).
  • Select "Properties"
  • Click the "Definition" tab
  • Drop a screen grab here of the contents of that tab.

Thanks.

Deleting the tilde over the "n" character in "Doña Ana County" solves the problem. Could the problem arise due to the encoding? Here is the screen grab of the "Definition" tab...

pgadmin_analysis

Oh, it's definitely an encoding issue. The only (frustrating) question is why it's working fine over here but causing trouble on your system. Let me investigate further, and thanks for the report.

Your settings in that screengrab match mine, btw.

@CactusKurt Just to rule things out:

  • In pgAdmin, would you please run the following commands one at a time and let me know the value they each return:
    SHOW client_encoding;
    SHOW server_encoding;
  • Also, did you by any chance open the file us_counties_2010.csv in a text editor or Excel or other application and save it before the import?

Thanks ... I'm interested in getting to the bottom of this one :-)

@CactusKurt If you'd email me at practicalsqlbook@gmail.com, we can continue diagnosing as I want to ask some questions that are more appropriate in email.

SHOW client_encoding;
returned UNICODE

SHOW server_encoding:
returned UTF8

When I used a fresh copy of us_counties_2010, I copy/pasted/saved it in a text editor (Notepad) before using COPY. I'm guessing that is causing the problem.

Hi,

The best way to work with the data files for the book is to either git clone this repository to your local machine, or download the contents of the repo using the download option at the repo main screen.

All the files in this repo are encoded UTF-8. (Confirmed that yesterday).) Opening the .sql files in a text editor to copy/paste the queries into pgAdmin is fine. However, if you open a CSV file in a text editor and re-save it, it's possible you will change the encoding and encounter import issues as the database (using the install in the book) is expecting UTF-8.

Encodings are a tricky business and take some time to wrap your head around. I still have much to learn, myself. My sense is that you've learned some good lessons here about the behavior of Excel and text files that will serve you well going forward.

Thanks for reading the book and please open more issues if you encounter anything unusual. Feel free to email me at practicalsqlbook@gmail.com as well.

Using PlusTable as my GUI I am receiving the same error.

Error:
ERROR: extra data after last expected column CONTEXT: COPY us_counties_2010, line 2: "Autauga County,AL,050,3,6,01,001,1539582278,25775735,54571,22135,+32.5363818,-086.6444901,54571,5370..."
Other Info:

  • MacOS Mojave 10.14.1
  • Installed Postgres via the instructions in the book, specifically the Postgres app

Also, I cloned the repo locally so it shouldn't be an issue with the file. Any thoughts or recommendations? Thanks in advance.

Hi, @tfaieta --

The error you posted is different than the original one in this issue, which turned out to be a character encoding problem. Usually, the error message you posted occurs when the CSV file has more columns than the table you're importing into.

Can you post your entire COPY statement here?

@tfaieta Hi, were you able to resolve this? Please let me know. Thanks.

Great, glad you got that sorted. Going to close this.

Using PlusTable as my GUI I am receiving the same error.

Error:
ERROR: extra data after last expected column CONTEXT: COPY us_counties_2010, line 2: "Autauga County,AL,050,3,6,01,001,1539582278,25775735,54571,22135,+32.5363818,-086.6444901,54571,5370..."
Other Info:

  • MacOS Mojave 10.14.1
  • Installed Postgres via the instructions in the book, specifically the Postgres app

Also, I cloned the repo locally so it shouldn't be an issue with the file. Any thoughts or recommendations? Thanks in advance.

Using PlusTable as my GUI I am receiving the same error.

Error:
ERROR: extra data after last expected column CONTEXT: COPY us_counties_2010, line 2: "Autauga County,AL,050,3,6,01,001,1539582278,25775735,54571,22135,+32.5363818,-086.6444901,54571,5370..."
Other Info:

  • MacOS Mojave 10.14.1
  • Installed Postgres via the instructions in the book, specifically the Postgres app

Also, I cloned the repo locally so it shouldn't be an issue with the file. Any thoughts or recommendations? Thanks in advance.

Hello! I am this issue as well. Any help is greatly appreciated. Thanks
ERROR: extra data after last expected column
CONTEXT: COPY us_counties_2010, line 2: "Autauga County,AL,50,3,6,1,1,1539582278,25775735,54571,22135,32.5363818,-86.6444901,54571,53702,4285..."

@Lunasanz By any chance, did you open the CSV file with another program, such as Excel?

yeah I have same problem but this my error msg
ERROR: ERREUR: la relation « us_counties_2010 » n'existe pas
LINE 1: SELECT * FROM us_counties_2010;

@brahim024 That message indicates you did not create the table using the code in Listing 4-2. See:

https://github.com/anthonydb/practical-sql/blob/master/Chapter_04/Chapter_04.sql#L20

Thank you @anthonydb I was lost

Good morning .
Please I just got this
ERROR: invalid input syntax for type bigint: "1.05076E+11"
CONTEXT: COPY us_counties_2010, line 72, column area_land: "1.05076E+11".

@naboatng Most likely, you opened the CSV file with Excel or another spreadsheet application, which reformatted the numbers to scientific notation. Download a fresh copy of the CSV file, do not open it with a spreadsheet, and try the import again.