teeli/urltitle

Broken URLs

teeli opened this issue · 27 comments

teeli commented

Please report any URLs that aren't working properly (either causing errors on your bot's partyline or just not showing titles correctly) here.

Make sure you include the URL in question, any errors you might see and your configuration (any relevant software versions, e.g. eggdrop version, tcl version, tcl extension versions)

eggdrop 1.80
Tcl library: /home/eggie/tcl85/lib/tcl8.5
Tcl version: 8.5.19 (header version 8.5.19)
TLS support is enabled.
TLS library: OpenSSL 1.0.2g 1 Mar 2016

https://twitter.com/Breaking911/status/842624423358291968

[05:09:32] Tcl error [UrlTitle::handler]: can't read "meta(Content-Type)": no such element in array

teeli commented

@OmkAR2013 that should be fixed in the latest version

I got all previously unworking url's working. It's great! Everything except Twitter https links.

https://twitter.com/Reuters
https://twitter.com/i/moments/842395226299760641

There's no error being displayed in the bot log, so I'm not sure what's happening.
Using newest urltitle.tcl

Any suggestions? What setup do you have teeli for your working bot?

CONFIG ->

I am mOOpeY, running eggdrop v1.8.1+RC2: 1 user (mem: 100k).
Configured with: '--with-tcllib=/home/moopey/local/lib/libtcl8.6.so' '--with tclinc=/home/moopey/local/include/tcl.h' '--enable-tls'
OS: Linux 4.4.0-66-generic
Process ID: 37832 (parent 1)
Tcl library: /home/moopey/local/lib/tcl8.6
Tcl version: 8.6.6 (header version 8.6.6)
Tcl is threaded.
TLS support is enabled.
TLS library: OpenSSL 1.0.2g 1 Mar 2016
IPv6 support is enabled.

tDOM - a XML/DOM/XPath/XSLT implementation for Tcl
(Version 0.8.4)

tcltls-1.7.11.tar.gz
tcllib_1_18.tar.gz
tcl8.6.6-src.tar.gz
eggdrop-1.8.1rc2.tar.gz

We're getting similar with random urls now.

07:57:19 <@knofte> https://casinojakten.se
07:57:21 <@servant> Title: Freespin och Bäst Bonus från de Bästa Casinon!! | casinojakten.se
07:57:27 <@knofte> https://www.sunet.se
07:57:29 <@servant> Title: SUNET | Datakommunikation & infrastruktur för forskning och utbildning
07:57:41 <@knofte> http://www.google.com
07:57:46 <@knofte> https://www.google.com
07:57:50 <@knofte> https://google.com
07:57:56 <@knofte> http://google.com
...

ii libtcl8.6:amd64 8.6.5+dfsg-2 amd64 Tcl (the Tool Command Language) v8.6 - run-time library files
ii tcl 8.6.0+9 amd64 Tool Command Language (default version) - shell
ii tcl-dev:amd64 8.6.0+9 amd64 Tool Command Language (default version) - development files
ii tcl-tls 1.6.7+dfsg-1 amd64 TLS OpenSSL extension to Tcl
ii tcl8.6 8.6.5+dfsg-2 amd64 Tcl (the Tool Command Language) v8.6 - shell
ii tcl8.6-dev:amd64 8.6.5+dfsg-2 amd64 Tcl (the Tool Command Language) v8.6 - development files
ii tcl8.6-tdbc 1.0.3-1 amd64 Tcl Database Connectivity
ii tcl8.6-tdbc-sqlite3 1.0.3-1 all Tcl Database Connectivity
ii tcllib 1.17-dfsg-1 all Standard Tcl Library

is it fixed yet ?

teeli commented

Should be better support for HTTP(S) redirects and case insensitive HTTP headers now. Google, Twitter etc. should work.

Hi @teeli,

I have a new issue: with this url:
http://blog.dilbert.com/post/164297628606/how-to-know-youre-in-a-mass-hysteria-bubble

On the partyline, I see this:

Tcl error [UrlTitle::handler]: invalid command name ""

So I added putlog statements everywhere, and it seems to be this line being the culprit:

set title [[$root selectNodes {//head/title/text()}] data]

Any idea?

I get the same on that url.
Tcl error [UrlTitle::handler]: invalid command name ""

teeli commented

Apparently XPath fails to parse title on that page. I'm not sure why, I suspect it could be because of invalid html structure (stray doctype).

I should probably add some error checking and maybe a regex fallback (if that helps, need to test)

teeli commented

Updated a new version that should fix that issue

Fixed indeed. Well done. Your TCL-fu is admirable.

after updating imdb there is a problem with urltile

21:37:09 <~lollko> https://www.imdb.com/title/tt1025100/
21:37:11 <&rss> Title: TryIMDbProFree

is it possible to fix ?

teeli commented

Looks like there's an inline SVG element on the page that has a <title> tag. Need to look if it's possible to exclude those.

For reference

...
<svg width="175px" height="30px" viewBox="0 0 172 29" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<title>TryIMDbProFree</title>
<g id="tryIMDbProFree" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<rect id="tryIMDbProFreeButton" stroke="#A88734" fill="#F1C241" x="1" y="1" width="170" height="28" rx="3"></rect>
<text id="tryIMDbProFreeText">
<tspan x="33" y="19">Try IMDbPro Free</tspan>
...
teeli commented

I've updated a new version now, that fixes the issue with title tags outside <head> when using regex parsing instead of tdom.

Should fix the issue with the IMDB link above.

I've updated a new version now, that fixes the issue with title tags outside <head> when using regex parsing instead of tdom.

Should fix the issue with the IMDB link above.

working fine :) thx for you work

Great work, thanks. Most links work fine but BBC News articles don't work for me. :(
(and yet BBC Sport links work fine)

[10:11:59] Connection to https://www.bbc.co.uk/news/uk-england-south-yorkshire-47623303/ failed
[10:11:59] Error: Missing host part: /news/uk-england-south-yorkshire-47623303
[10:12:07] Connection to http://www.bbc.co.uk/news/uk-england-south-yorkshire-47623303/ failed
[10:12:07] Error: Missing host part: /news/uk-england-south-yorkshire-47623303

Yo, YouTube changed earlier this year (afaik) which created a problem with urltitle, same happened to youtube-dl:
Lamieur/youtube-dl@5eabe9c

For example:
Error: HTTP/1.1 429 Too Many Requests (https://www.youtube.com/watch?v=JImcvtJzIK8)

Some say forcing ipv4 for lookup could be used, but was not succesful with curl -I -4 unfortunately.

It'd be great to get YT-titles fixed again :)

teeli commented

I'll take a look and try to figure that out, but it'll probably be a bit more complex fix and might take a bit more time than usual. Looks like it's blocked by the youtube servers on a request level instead being just a parsing error in the script.

Yeah, it seems like the title is loaded firstly after a redirect has been made. Quite annoying feature. :)

There is a youtube-api.tcl available for using the youtube API, perhaps that could give some hints.
(could not find a reliable link for it though)

Not sure if its me or not but any page from reuters.com comes back with a blank title.

https://www.reuters.com/article/us-china-aviation-comac-insight/chinas-bid-to-challenge-boeing-and-airbus-falters-idUSKBN1Z905N
Title:

Not sure if its me or not but any page from reuters.com comes back with a blank title.

https://www.reuters.com/article/us-china-aviation-comac-insight/chinas-bid-to-challenge-boeing-and-airbus-falters-idUSKBN1Z905N
Title:

Same thing here, version 0.11.

https://www.bbc.com/news/world-us-canada-51483541 - Nothing happens, no errors in console either.

Twitter broke some weeks ago, nothing happens on those links.
https://twitter.com/ttnyhetsbyran/status/1279837369605160960?s=20 For example.
Tcl library: /usr/share/tcltk/tcl8.6
Tcl version: 8.6.9 (header version 8.6.9)
Tcl is threaded.
TLS support is enabled.
TLS library: OpenSSL 1.1.1d 10 Sep 2019

EDIT: Other sources tell me Twitter needs API to work. Perhaps not as easy fix then. Rather use a twitter exclusive script.

hi fellas

i tried some YT links but url title show me

22:23:40 <~lollko> https://www.youtube.com/watch?v=-tDiXMeEWzw
22:23:42 <&rss> Title: YouTube

maybe yt redesign yt site ?

here is my "conf" from egg

22:37:15 <rss> Tcl library: /usr/share/tcl8.5
22:37:15 <rss> Tcl version: 8.5.13 (header version 8.5.13)
22:37:15 <rss> Tcl is threaded.
22:37:15 <rss> TLS support is enabled.
22:37:15 <rss> TLS library: OpenSSL 1.0.2k-fips  26 Jan 2017

Hi @teeli

Twitter links aren't working for quite a while (no output at all). Can you have a look?

Thanks!

x.com aka twitter link still not working @teeli
Is anyone else fixing this?

example: https://x.com/Space_Station/status/1807824547309093239
Bot response: Title: x.com