day 04-06 encoding problem on file open
aseelye opened this issue · 6 comments
There are some extended ascii characters in the csv download. It seems jupyter doesn't mind, but if you're doing this on IDLE or some other thing with CPython, it balks on reading the file, starting at character 535. This can be avoided by adding """encoding='utf-8'""" to the open statement.
Thank you @aseelye! @bbelderbos can you have a look at fixing this?
Thanks @aseelye, I am not able to reproduce this with Python 3.6.1 nor 2.7 nor Idle, but let me add it anyways so we fix this bug for you.
Here is the code I worked with, it gives me a len of 2395, without encoding it works too for me
import csv
import random
Movie = namedtuple('Movie', 'title year score')
movies_csv = 'movies.csv'
def get_movies_by_director(data=movies_csv):
"""Extracts all movies from csv and stores them in a dictionary
where keys are directors, and values is a list of movies (named tuples)"""
directors = defaultdict(list)
with open(data, encoding='utf-8') as f:
for line in csv.DictReader(f):
try:
director = line['director_name']
movie = line['movie_title'].replace('\xa0', '')
year = int(line['title_year'])
score = float(line['imdb_score'])
except ValueError:
continue
m = Movie(title=movie, year=year, score=score)
directors[director].append(m)
return directors
directors = get_movies_by_director()
print(len(directors))```
Done: 3bedaa9
I ran the notebook locally to confirm the fix (Python 3)
Thanks. This is a weird one. I'm running 3.6.4 on a mac. I'm seeing now that if I invoke with IDLE, it fails. If I invoke in the terminal, it works. I've checked both with sys.getdefaultencoding() and they both return utf-8. I have no idea why the IDLE is failing, but either way, thank you.