dbr/tvdb_api

Unicode issue

danyboy666 opened this issue · 12 comments

I have a little snippet of code here but the thing is I don't know how to fix the output my script is giving me.

#!/usr/bin/env python
# encoding:utf-8

import tvdb_api

t = tvdb_api.Tvdb(interactive = 'True', language = 'fr')
season24 = t['Les grands reportages'][24].search("")

print season24

Output:

user@homeseedbox:~/script$ python tvdb_episode_renamer.py TVDB Search Results: 1 -> Grands reportages [fr] # http://thetvdb.com/?tab=series&id=297490&lid=17 (default) 2 -> Les grands reportages [fr] # http://thetvdb.com/?tab=series&id=226591&lid=17 3 -> Les grands reportages - Exploration [fr] # http://thetvdb.com/?tab=series&id=291259&lid=17 4 -> Les grands reportages - Personnalités [fr] # http://thetvdb.com/?tab=series&id=290829&lid=17 5 -> Les grands reportages - Les films IMAX [fr] # http://thetvdb.com/?tab=series&id=295231&lid=17 6 -> 20 ans de grands reportages [fr] # http://thetvdb.com/?tab=series&id=295277&lid=17 Enter choice (first number, return for default, 'all', ? for help): 2 [<Episode 24x01 - u'Game Fever'>, <Episode 24x02 - u"La course vers l'intelligence artificielle">, <Episode 24x03 - u'La Cor\xe9e de mon p\xe8re'>, <Episode 24x04 - u'Unit\xe9 9 - documentaire 2 : Les IPL'>, <Episode 24x05 - u'Immortalit\xe9, derni\xe8re fronti\xe8re'>, <Episode 24x06 - u"L'inde aujourd'hui">, <Episode 24x07 - u'Bye'>, <Episode 24x08 - u"L'abus des jeux vid\xe9os nuit \xe0 la sant\xe9">, <Episode 24x09 - u'De Sotchi \xe0 Pyeonchang'>, <Episode 24x10 - u'Sonia Benezra : Le meilleur est \xe0 venir'>, <Episode 24x11 - u"L'imam et son discours">, <Episode 24x12 - u'Qu\xe9bec, un an apr\xe8s'>, <Episode 24x13 - u'Trump: la culture du racisme en politique am\xe9ricaine'>, <Episode 24x14 - u'\xc0 contre-mar\xe9e'>, <Episode 24x15 - u"Charles et Mariane, jusqu'au dernier tour de piste">]

Any result with the chars (é and è) displays it's unicode value instead. I tried some ways to encode and decode the output string but Python is not my strong suit.

print season24.encode('utf8') / print season24.decode('utf8') ?

@homeseedbox:~/script$ python tvdb_episode_renamer.py TVDB Search Results: 1 -> Grands reportages [fr] # http://thetvdb.com/?tab=series&id=297490&lid=17 (default) 2 -> Les grands reportages [fr] # http://thetvdb.com/?tab=series&id=226591&lid=17 3 -> Les grands reportages - Exploration [fr] # http://thetvdb.com/?tab=series&id=291259&lid=17 4 -> Les grands reportages - Personnalités [fr] # http://thetvdb.com/?tab=series&id=290829&lid=17 5 -> Les grands reportages - Les films IMAX [fr] # http://thetvdb.com/?tab=series&id=295231&lid=17 6 -> 20 ans de grands reportages [fr] # http://thetvdb.com/?tab=series&id=295277&lid=17 Enter choice (first number, return for default, 'all', ? for help): 2 Traceback (most recent call last): File "tvdb_episode_renamer.py", line 19, in <module> print season24.encode('utf8') AttributeError: 'list' object has no attribute 'encode'

same with decode

I tried this:

`import tvdb_api

t = tvdb_api.Tvdb(interactive = 'True', language = 'fr')
season24 = t["Les grands reportages"][24].search('')
s = unicode(season24).encode('utf-8')
print s
`

and I get the exact same result as initial post.

I think since the code is next to nothing I can deal with this outside with a bash script and clean up the result output.

dbr commented

You are printing an Episode instance, which just shows the "repr" debug-string version of the episode (denoted by the < >)

What you want to do is loop over the result of the search call, then print something like ep['episodeName'] etc (maybe calling encode('utf-8') on that string)

`#!/usr/bin/env python
#-- coding: utf-8 --

import tvdb_api

episode = t["Les grands reportages"].search("")
for x in episode: print x['episodename']
`

output:

3 janvier 1995
6 janvier 1995
7 janvier 1995
8 janvier 1995
9 janvier 1995
...
Faut en parler (5) - Nos animaux de la honte
Johnny Hallyday : la France Rock'n Roll (1)
Johnny Hallyday : la France Rock'n Roll (2)
Vietnam (9) - L'affrontement
Le mythe de Napoléon au Canada Français
Faut en parler - Société de performance
Marathon de l'intégration
Louis-José Houde: petit précis du comique
Le Canada: une histoire populaire, la suite, 1991-2015 (1)
Le Canada: une histoire populaire, la suite, 1991-2015 (2)
Sting l'éléctron libre
Game Fever
La course vers l'intelligence artificielle
La Corée de mon père
Unité 9 - documentaire 2 : Les IPL
Immortalité, dernière frontière
L'inde aujourd'hui
Bye
L'abus des jeux vidéos nuit à la santé
De Sotchi à Pyeonchang
Sonia Benezra : Le meilleur est à venir
L'imam et son discours
Québec, un an après
Trump: la culture du racisme en politique américaine
À contre-marée
Charles et Mariane, jusqu'au dernier tour de piste

no unicode issue, User error :)

Is there a way to loop and print multiple values at once?

Kinda something like this

for x in episode: print x['episodename', 'seasonnumber', 'episodenumber']

For reference here's what I came up with:

#!/usr/bin/env python
#-- coding: utf-8 --

import codecs
import sys
import tvdb_api

t = tvdb_api.Tvdb(interactive = 'True', language = 'fr')
#t = tvdb_api.Tvdb(language = 'fr')
result = t['Les grands reportages'].search("")
#result = t[226591].search("")

x = 0
i = 0

UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)

for i in result:
        en = result[x]['episodeName']
        se = result[x]['airedSeason']
        ep = result[x]['airedEpisodeNumber']
        da = result[x]['firstAired']

#        print "Title: %s" %en
#        print "S%02dE%02d" %(se, ep)
#        print "S%02dE%02d - %s" % (se, ep, en)
        print "S%02dE%02d - %s - Aired the: %s" % (se, ep, en, da)
        x = x + 1

This print out every episode from every season in that serie in the format SXXEXX - Title

It's crude but It does what I need with no unicode issue. I want to know if there's any way to also return the aired dates?

got it

da = result[x]['firstAired']

I guess this can be closed thanks dbr

dbr commented

Cool, glad you got it working!

for x in episode: print x['episodename', 'seasonnumber', 'episodenumber']

Since the episode object acts like a dictionary, and you can use these in the string formatting syntax like so:

mydict = {'a': 1, 'b': 'example'}
print "%(a)d: %(b)s" % mydict

..you can neaten the code up a little:

import tvdb_api

t = tvdb_api.Tvdb(interactive = 'True', language = 'fr')
result = t['Les grands reportages'].search("")

for r in result:
    formatted = "S%(airedSeason)02dE%(airedEpisodeNumber)02d - %(episodeName)s - Aired the: %(firstAired)s" % r
    print formatted.encode("utf-8")

Pretty similar - I think the biggest improvement is just using the for r in result: loop to iterate over the results instead of the counter

You are right this is way neater. I'm only starting up with Python still have a long way to go. The logic is the same for every language but python is challenging on it's own.

Anywais I hope you don't mind me using your snippet? It's for local use only. I do not intend on distributing this, it's for parsing and comparing a ripped episode. I intend on automating the whole process. The goal is to rip and episode, mux it, search for the real ep # from tvdb retag with proper number and title generate a nfo and post the rls on bin. This was pretty much the only part I was missing. I'm glad I finally started Python too :).