wummel/linkchecker

Error when checking specific link

Closed this issue · 2 comments

I'm using the 9.3 version of linkchecker on Ubuntu 17.10, and I get this error reliably with this URL (output using -Dall):

linkchecker -Dall http://mythologica.fr/grec/heracles0.htm 
DEBUG 2018-04-18 10:39:32,091 MainThread Python 2.7.14 (default, Sep 23 2017, 22:06:14) 
[GCC 7.2.0] on linux2
DEBUG 2018-04-18 10:39:32,091 MainThread reading configuration from ['/home/mholmes/.linkchecker/linkcheckerrc']
INFO 2018-04-18 10:39:32,094 MainThread Checking intern URLs only; use --check-extern to check extern URLs.
DEBUG 2018-04-18 10:39:32,100 MainThread configuration: [('aborttimeout', 300),
 ('allowedschemes', []),
 ('authentication', []),
 ('blacklist', {}),
 ('checkextern', False),
 ('cookiefile', None),
 ('csv', {}),
 ('debugmemory', False),
 ('dot', {}),
 ('enabledplugins', []),
 ('externlinks', []),
 ('fileoutput', []),
 ('gml', {}),
 ('gxml', {}),
 ('html', {}),
 ('ignorewarnings', []),
 ('internlinks', []),
 ('localwebroot', None),
 ('logger', 'TextLogger'),
 ('loginextrafields', {}),
 ('loginpasswordfield', 'password'),
 ('loginurl', None),
 ('loginuserfield', 'login'),
 ('maxfilesizedownload', 5242880),
 ('maxfilesizeparse', 1048576),
 ('maxhttpredirects', 10),
 ('maxnumurls', None),
 ('maxrequestspersecond', 10),
 ('maxrunseconds', None),
 ('nntpserver', None),
 ('none', {}),
 ('output', 'text'),
 ('pluginfolders', []),
 ('proxy', {}),
 ('quiet', False),
 ('recursionlevel', -1),
 ('sitemap', {}),
 ('sql', {}),
 ('sslverify', True),
 ('status', True),
 ('status_wait_seconds', 5),
 ('text', {}),
 ('threads', 10),
 ('timeout', 60),
 ('trace', False),
 ('useragent',
  u'Mozilla/5.0 (compatible; LinkChecker/9.3; +http://wummel.github.io/linkchecker/)'),
 ('verbose', False),
 ('warnings', True),
 ('xml', {})]
DEBUG 2018-04-18 10:39:32,100 MainThread HttpUrl handles url http://mythologica.fr/grec/heracles0.htm
DEBUG 2018-04-18 10:39:32,100 MainThread checking syntax
DEBUG 2018-04-18 10:39:32,101 MainThread Add intern pattern u'^https?://(www\\.|)mythologica\\.fr\\/grec'
DEBUG 2018-04-18 10:39:32,101 MainThread Link pattern u'^https?://(www\\.|)mythologica\\.fr\\/grec' strict=False
DEBUG 2018-04-18 10:39:32,101 MainThread queueing http://mythologica.fr/grec/heracles0.htm
LinkChecker 9.3              Copyright (C) 2000-2014 Bastian Kleineidam
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' within this
distribution.
Get the newest version at http://wummel.github.io/linkchecker/
Write comments and bugs to https://github.com/wummel/linkchecker/issues
Support this project at http://wummel.github.io/linkchecker/donations.html

Start checking at 2018-04-18 10:39:32-007
DEBUG 2018-04-18 10:39:32,104 CheckThread-http://mythologica.fr/grec/heracles0.htm Checking http link
base_url=u'http://mythologica.fr/grec/heracles0.htm'
parent_url=None
base_ref=None
recursion_level=0
url_connection=None
line=0
column=0
page=0
name=u''
anchor=u''
cache_url=http://mythologica.fr/grec/heracles0.htm
DEBUG 2018-04-18 10:39:32,104 CheckThread-http://mythologica.fr/grec/heracles0.htm checking connection
 1 thread active,     0 links queued,    0 links in   0 URLs checked, runtime 1 seconds
DEBUG 2018-04-18 10:39:33,111 CheckThread-http://mythologica.fr/grec/heracles0.htm u'http://mythologica.fr/robots.txt' parse lines
DEBUG 2018-04-18 10:39:33,111 CheckThread-http://mythologica.fr/grec/heracles0.htm Parsed rules:
User-agent: *
Allow: /
DEBUG 2018-04-18 10:39:33,112 CheckThread-http://mythologica.fr/grec/heracles0.htm u'http://mythologica.fr/robots.txt' check allowance for:
  user agent: u'Mozilla/5.0 (compatible; LinkChecker/9.3; +http://wummel.github.io/linkchecker/)'
  url: u'http://mythologica.fr/grec/heracles0.htm' ...
DEBUG 2018-04-18 10:39:33,112 CheckThread-http://mythologica.fr/grec/heracles0.htm /grec/heracles0.htm Allow: / True
DEBUG 2018-04-18 10:39:33,112 CheckThread-http://mythologica.fr/grec/heracles0.htm  ... rule line Allow: /
DEBUG 2018-04-18 10:39:33,113 CheckThread-http://mythologica.fr/grec/heracles0.htm Prepare request with {'headers': {}, 'url': u'http://mythologica.fr/grec/heracles0.htm', 'method': 'GET'}
DEBUG 2018-04-18 10:39:33,114 CheckThread-http://mythologica.fr/grec/heracles0.htm Send request with {'verify': False, 'timeout': 60, 'stream': True, 'allow_redirects': False}
DEBUG 2018-04-18 10:39:33,284 CheckThread-http://mythologica.fr/grec/heracles0.htm follow all redirections
DEBUG 2018-04-18 10:39:33,456 CheckThread-http://mythologica.fr/grec/heracles0.htm Redirected to u'https://mythologica.fr/grec/heracles0.htm'
DEBUG 2018-04-18 10:39:33,456 CheckThread-http://mythologica.fr/grec/heracles0.htm Intern URL u'https://mythologica.fr/grec/heracles0.htm'
DEBUG 2018-04-18 10:39:33,456 CheckThread-http://mythologica.fr/grec/heracles0.htm task_done https://mythologica.fr/grec/heracles0.htm


********** Oops, I did it again. *************

You have found an internal error in LinkChecker. Please write a bug report
at https://github.com/wummel/linkchecker/issues
and include the following information:
- the URL or file you are testing
- the system information below

When using the commandline client:
- your commandline arguments and any custom configuration files.
- the output of a debug run with option "-Dall"

Not disclosing some of the information above due to privacy reasons is ok.
I will try to help you nonetheless, but you have to give me something
I can work with ;) .

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 104, in check_url
    line: self.check_url_data(url_data)
    locals:
      self = <local> <Checker(CheckThread-http://mythologica.fr/grec/heracles0.htm, started 140603819489024)>
      self.check_url_data = <local> <bound method Checker.check_url_data of <Checker(CheckThread-http://mythologica.fr/grec/heracles0.htm, started 140603819489024)>>
      url_data = <local> <https link, base_url=u'http://mythologica.fr/grec/heracles0.htm', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://mythologica.fr/grec/heracles0.htm>
  File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 120, in check_url_data
    line: check_url(url_data, self.logger)
    locals:
      check_url = <global> <function check_url at 0x7fe0e55c9ed8>
      url_data = <local> <https link, base_url=u'http://mythologica.fr/grec/heracles0.htm', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://mythologica.fr/grec/heracles0.htm>
      self = <local> <Checker(CheckThread-http://mythologica.fr/grec/heracles0.htm, started 140603819489024)>
      self.logger = <local> <linkcheck.director.logger.Logger object at 0x7fe0e4ea7550>
  File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 52, in check_url
    line: url_data.check()
    locals:
      url_data = <local> <https link, base_url=u'http://mythologica.fr/grec/heracles0.htm', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://mythologica.fr/grec/heracles0.htm>
      url_data.check = <local> <bound method HttpUrl.check of <https link, base_url=u'http://mythologica.fr/grec/heracles0.htm', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://mythologica.fr/grec/heracles0.htm>>
  File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 424, in check
    line: self.local_check()
    locals:
      self = <local> <https link, base_url=u'http://mythologica.fr/grec/heracles0.htm', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://mythologica.fr/grec/heracles0.htm>
      self.local_check = <local> <bound method HttpUrl.local_check of <https link, base_url=u'http://mythologica.fr/grec/heracles0.htm', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://mythologica.fr/grec/heracles0.htm>>
  File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 442, in local_check
    line: self.check_connection()
    locals:
      self = <local> <https link, base_url=u'http://mythologica.fr/grec/heracles0.htm', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://mythologica.fr/grec/heracles0.htm>
      self.check_connection = <local> <bound method HttpUrl.check_connection of <https link, base_url=u'http://mythologica.fr/grec/heracles0.htm', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://mythologica.fr/grec/heracles0.htm>>
  File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 137, in check_connection
    line: self.follow_redirections(request)
    locals:
      self = <local> <https link, base_url=u'http://mythologica.fr/grec/heracles0.htm', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://mythologica.fr/grec/heracles0.htm>
      self.follow_redirections = <local> <bound method HttpUrl.follow_redirections of <https link, base_url=u'http://mythologica.fr/grec/heracles0.htm', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://mythologica.fr/grec/heracles0.htm>>
      request = <local> <PreparedRequest [GET]>
  File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 263, in follow_redirections
    line: self._add_ssl_info()
    locals:
      self = <local> <https link, base_url=u'http://mythologica.fr/grec/heracles0.htm', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://mythologica.fr/grec/heracles0.htm>
      self._add_ssl_info = <local> <bound method HttpUrl._add_ssl_info of <https link, base_url=u'http://mythologica.fr/grec/heracles0.htm', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://mythologica.fr/grec/heracles0.htm>>
  File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 193, in _add_ssl_info
    line: sock = self._get_ssl_sock()
    locals:
      sock = <not found>
      self = <local> <https link, base_url=u'http://mythologica.fr/grec/heracles0.htm', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://mythologica.fr/grec/heracles0.htm>
      self._get_ssl_sock = <local> <bound method HttpUrl._get_ssl_sock of <https link, base_url=u'http://mythologica.fr/grec/heracles0.htm', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://mythologica.fr/grec/heracles0.htm>>
  File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 184, in _get_ssl_sock
    line: if raw_connection.sock is None:
    locals:
      raw_connection = <local> None
      raw_connection.sock = <local> !AttributeError: 'NoneType' object has no attribute 'sock'
      None = <builtin> None
AttributeError: 'NoneType' object has no attribute 'sock'
System info:
LinkChecker 9.3
Released on: 16.7.2014
Python 2.7.14 (default, Sep 23 2017, 22:06:14) 
[GCC 7.2.0] on linux2
Requests: 2.18.1
Qt: 4.8.7 / PyQt: 4.11.4
Modules: Sqlite, Gconf
Local time: 2018-04-18 10:39:33-007
sys.argv: ['/usr/bin/linkchecker', '-Dall', 'http://mythologica.fr/grec/heracles0.htm']

LANGUAGEStatistics:
 =Downloaded: 0B.
 No statistics available since no URLs were checked.
'en_CA:en'

That's it. 0 linksLANG  in 0 URLs= checked.  'en_CA.UTF-8'0 warnings found
Default locale:.  ('en', 'UTF-8')0 errors found

.
 Stopped checking at 2018-04-18 10:39:33-007 (1 seconds)
******** LinkChecker internal error, over and out ********

Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker
for more details please see: #708
Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues

Done!