Unicode issue
danyboy666 opened this issue · 12 comments
I have a little snippet of code here but the thing is I don't know how to fix the output my script is giving me.
#!/usr/bin/env python
# encoding:utf-8
import tvdb_api
t = tvdb_api.Tvdb(interactive = 'True', language = 'fr')
season24 = t['Les grands reportages'][24].search("")
print season24
Output:
user@homeseedbox:~/script$ python tvdb_episode_renamer.py TVDB Search Results: 1 -> Grands reportages [fr] # http://thetvdb.com/?tab=series&id=297490&lid=17 (default) 2 -> Les grands reportages [fr] # http://thetvdb.com/?tab=series&id=226591&lid=17 3 -> Les grands reportages - Exploration [fr] # http://thetvdb.com/?tab=series&id=291259&lid=17 4 -> Les grands reportages - Personnalités [fr] # http://thetvdb.com/?tab=series&id=290829&lid=17 5 -> Les grands reportages - Les films IMAX [fr] # http://thetvdb.com/?tab=series&id=295231&lid=17 6 -> 20 ans de grands reportages [fr] # http://thetvdb.com/?tab=series&id=295277&lid=17 Enter choice (first number, return for default, 'all', ? for help): 2 [<Episode 24x01 - u'Game Fever'>, <Episode 24x02 - u"La course vers l'intelligence artificielle">, <Episode 24x03 - u'La Cor\xe9e de mon p\xe8re'>, <Episode 24x04 - u'Unit\xe9 9 - documentaire 2 : Les IPL'>, <Episode 24x05 - u'Immortalit\xe9, derni\xe8re fronti\xe8re'>, <Episode 24x06 - u"L'inde aujourd'hui">, <Episode 24x07 - u'Bye'>, <Episode 24x08 - u"L'abus des jeux vid\xe9os nuit \xe0 la sant\xe9">, <Episode 24x09 - u'De Sotchi \xe0 Pyeonchang'>, <Episode 24x10 - u'Sonia Benezra : Le meilleur est \xe0 venir'>, <Episode 24x11 - u"L'imam et son discours">, <Episode 24x12 - u'Qu\xe9bec, un an apr\xe8s'>, <Episode 24x13 - u'Trump: la culture du racisme en politique am\xe9ricaine'>, <Episode 24x14 - u'\xc0 contre-mar\xe9e'>, <Episode 24x15 - u"Charles et Mariane, jusqu'au dernier tour de piste">]
Any result with the chars (é and è) displays it's unicode value instead. I tried some ways to encode and decode the output string but Python is not my strong suit.
print season24.encode('utf8')
/ print season24.decode('utf8')
?
@homeseedbox:~/script$ python tvdb_episode_renamer.py TVDB Search Results: 1 -> Grands reportages [fr] # http://thetvdb.com/?tab=series&id=297490&lid=17 (default) 2 -> Les grands reportages [fr] # http://thetvdb.com/?tab=series&id=226591&lid=17 3 -> Les grands reportages - Exploration [fr] # http://thetvdb.com/?tab=series&id=291259&lid=17 4 -> Les grands reportages - Personnalités [fr] # http://thetvdb.com/?tab=series&id=290829&lid=17 5 -> Les grands reportages - Les films IMAX [fr] # http://thetvdb.com/?tab=series&id=295231&lid=17 6 -> 20 ans de grands reportages [fr] # http://thetvdb.com/?tab=series&id=295277&lid=17 Enter choice (first number, return for default, 'all', ? for help): 2 Traceback (most recent call last): File "tvdb_episode_renamer.py", line 19, in <module> print season24.encode('utf8') AttributeError: 'list' object has no attribute 'encode'
same with decode
I tried this:
`import tvdb_api
t = tvdb_api.Tvdb(interactive = 'True', language = 'fr')
season24 = t["Les grands reportages"][24].search('')
s = unicode(season24).encode('utf-8')
print s
`
and I get the exact same result as initial post.
I think since the code is next to nothing I can deal with this outside with a bash script and clean up the result output.
You are printing an Episode instance, which just shows the "repr" debug-string version of the episode (denoted by the < >
)
What you want to do is loop over the result of the search call, then print something like ep['episodeName']
etc (maybe calling encode('utf-8') on that string)
`#!/usr/bin/env python
#-- coding: utf-8 --
import tvdb_api
episode = t["Les grands reportages"].search("")
for x in episode: print x['episodename']
`
output:
3 janvier 1995
6 janvier 1995
7 janvier 1995
8 janvier 1995
9 janvier 1995
...
Faut en parler (5) - Nos animaux de la honte
Johnny Hallyday : la France Rock'n Roll (1)
Johnny Hallyday : la France Rock'n Roll (2)
Vietnam (9) - L'affrontement
Le mythe de Napoléon au Canada Français
Faut en parler - Société de performance
Marathon de l'intégration
Louis-José Houde: petit précis du comique
Le Canada: une histoire populaire, la suite, 1991-2015 (1)
Le Canada: une histoire populaire, la suite, 1991-2015 (2)
Sting l'éléctron libre
Game Fever
La course vers l'intelligence artificielle
La Corée de mon père
Unité 9 - documentaire 2 : Les IPL
Immortalité, dernière frontière
L'inde aujourd'hui
Bye
L'abus des jeux vidéos nuit à la santé
De Sotchi à Pyeonchang
Sonia Benezra : Le meilleur est à venir
L'imam et son discours
Québec, un an après
Trump: la culture du racisme en politique américaine
À contre-marée
Charles et Mariane, jusqu'au dernier tour de piste
no unicode issue, User error :)
Is there a way to loop and print multiple values at once?
Kinda something like this
for x in episode: print x['episodename', 'seasonnumber', 'episodenumber']
For reference here's what I came up with:
#!/usr/bin/env python
#-- coding: utf-8 --
import codecs
import sys
import tvdb_api
t = tvdb_api.Tvdb(interactive = 'True', language = 'fr')
#t = tvdb_api.Tvdb(language = 'fr')
result = t['Les grands reportages'].search("")
#result = t[226591].search("")
x = 0
i = 0
UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)
for i in result:
en = result[x]['episodeName']
se = result[x]['airedSeason']
ep = result[x]['airedEpisodeNumber']
da = result[x]['firstAired']
# print "Title: %s" %en
# print "S%02dE%02d" %(se, ep)
# print "S%02dE%02d - %s" % (se, ep, en)
print "S%02dE%02d - %s - Aired the: %s" % (se, ep, en, da)
x = x + 1
This print out every episode from every season in that serie in the format SXXEXX - Title
It's crude but It does what I need with no unicode issue. I want to know if there's any way to also return the aired dates?
got it
da = result[x]['firstAired']
I guess this can be closed thanks dbr
Cool, glad you got it working!
for x in episode: print x['episodename', 'seasonnumber', 'episodenumber']
Since the episode object acts like a dictionary, and you can use these in the string formatting syntax like so:
mydict = {'a': 1, 'b': 'example'}
print "%(a)d: %(b)s" % mydict
..you can neaten the code up a little:
import tvdb_api
t = tvdb_api.Tvdb(interactive = 'True', language = 'fr')
result = t['Les grands reportages'].search("")
for r in result:
formatted = "S%(airedSeason)02dE%(airedEpisodeNumber)02d - %(episodeName)s - Aired the: %(firstAired)s" % r
print formatted.encode("utf-8")
Pretty similar - I think the biggest improvement is just using the for r in result:
loop to iterate over the results instead of the counter
You are right this is way neater. I'm only starting up with Python still have a long way to go. The logic is the same for every language but python is challenging on it's own.
Anywais I hope you don't mind me using your snippet? It's for local use only. I do not intend on distributing this, it's for parsing and comparing a ripped episode. I intend on automating the whole process. The goal is to rip and episode, mux it, search for the real ep # from tvdb retag with proper number and title generate a nfo and post the rls on bin. This was pretty much the only part I was missing. I'm glad I finally started Python too :).