torproject/stem

Retrieve extra-info descriptors via control port

juga0 opened this issue · 3 comments

juga0 commented

Hi @atagar,

related to #91, i realized that currently is possible to obtain extra-info descriptor via http using the stem downloader.

For example:

from stem import DirPort
from stem.directory import DIRECTORY_AUTHORITIES
from stem.descriptor.remote import DescriptorDownloader

# Exclude maatuska, because it seems to reject these queries.
endpoints = [
                DirPort(authority.address, authority.dir_port)
                for authority in DIRECTORY_AUTHORITIES.values() if authority.address != '171.25.193.9'
            ]
downloader = DescriptorDownloader(use_mirrors=True)
extrainfo_descriptors = downloader.get_extrainfo_descriptors(
            fingerprints=["C51701910DEA998F4F51287A71B7E94449A3E9BC"], endpoints=endpoints
        ).run()

Apart of maatuska rejecting these queries, it seems like currently all dirauths rejects any query during the voting period, so i had to add to the previous code to sleep until vote period as finished.

With that code, i also see logs like this:

 log.py:174 - log - Unable to download descriptors from 'http://193.23.244.244:80/tor/extra/fp/0409605E8343562A790FFE846CB3BD7C04429F70' (2 retries remaining): Failed to download from http://193.23.244.244:80/tor/extra/fp/0409605E8343562A790FFE846CB3BD7C04429F70 (HTTPError): Directory busy, try again later

Does it actually retry 3 times?, using the same dirauth/mirror or different one?

According to https://gitweb.torproject.org/torspec.git/tree/control-spec.txt#n645, it's also possible to obtain these descriptors using the control port (when the client has configured DownloadExtraInfo, as sbws will do).

I checked that i can actually do that:

controller.get_info("extra-info/digest/9E7AD7DFD0F573F5EDD7E5FB9196B083B0733D83")
>>>  'extra-info ThereIsNoCloud C51701910DEA998F4F51287A71B7E94449A3E9BC\nidentity-ed25519\n[...]

I think it'd be useful to implement something like controller.get_extrainfo_descriptor in a similar way as .get_server_descriptor is implemented. Even though there doesn't seem to be any event that could tell us there's a new extra-info descriptor available.

Is this something you would implement or accept a patch for?

Thanks.

Does it actually retry 3 times?, using the same dirauth/mirror or different one?

Yes, your code will make three attempts with a random authority each time.

it's also possible to obtain these descriptors using the control port (when the client has configured DownloadExtraInfo, as sbws will do)

I advise against running a tor process just to download descriptors. If you have your heart set on this please see the other torrc arguments you might want.

Have you tried downloading using a DescriptorDownloader with ORPorts? DirAuths throttle and sometimes break their DirPort as a DoS mitigation measure but they can't block their ORPort as cavalierly without also breaking tor.

Is this something you would implement or accept a patch for?

I'm not interested in adding a get_extrainfo_descriptor() method to the Controller. Actually, I'd like to deprecate the descriptor methods it has. However, if you provide a patch with tests I'll merge it.

Please note that the control port does not let you query extrainfo descriptors directly by fingerprint or nickname. Instead they require the extrainfo digest which is part of server descriptors.

If you descide to implement this I'd suggest accepting both a 'relay' and 'extrainfo_digest' argument...

def get_extrainfo_descriptor(self, relay: Optional[str] = None, extrainfo_digest: Optional[str] = None, default: Any = UNDEFINED) -> stem.descriptor.extrainfo_descriptor.RelayExtraInfoDescriptor:
  if not relay and not extrainfo_digest:
    raise ValueError('Extrainfo descriptors must be queried by a fingerprint, nickname, or extrainfo digest')
  elif relay and extrainfo_digest:
    raise ValueError('You cannot query by both a fingerprint and extrainfo digest')

  if relay:
    server_desc = self.get_server_descriptor(relay)
    extrainfo_digest = server_desc.extra_info_digest

  # etc...

Concerning your script, two quick thoughts...

downloader = DescriptorDownloader(use_mirrors=True)

I'm pretty sure 'use_mirrors' isn't doing anything aside from make your script slower. This makes you download the mirrors but then ignores them because you provide explicit endpoints.

authority.address != '171.25.193.9'

The Directory class has a nickname so this could be "authority.nickname != 'maatuska'".

juga0 commented

[snip]

it's also possible to obtain these descriptors using the control port (when the client has configured DownloadExtraInfo, as sbws will do)

I advise against running a tor process just to download descriptors. If you have your heart set on this please see the other torrc arguments you might want.

It's not just a tor process, is the sbws tor client. Yes, sbws has already all those options set. See also mike's comment at https://gitlab.torproject.org/tpo/network-health/helper-scripts/-/issues/8#note_2728816

Have you tried downloading using a DescriptorDownloader with ORPorts? DirAuths throttle and sometimes break their DirPort as a DoS mitigation measure but they can't block their ORPort as cavalierly without also breaking tor.

Yes, i tried right after i wrote this. Yes, it fails less.

Is this something you would implement or accept a patch for?

I'm not interested in adding a get_extrainfo_descriptor() method to the Controller. Actually, I'd like to deprecate the descriptor methods it has. However, if you provide a patch with tests I'll merge it.

You mean the controller get_server_descriptor?, if i can ask, why?

Please note that the control port does not let you query extrainfo descriptors directly by fingerprint or nickname. Instead they require the extrainfo digest which is part of server descriptors.

Yes, we're aware of that.

If you descide to implement this I'd suggest accepting both a 'relay' and 'extrainfo_digest' argument...
[snip]
Thanks for the tips to implement it!

Concerning your script, two quick thoughts...

downloader = DescriptorDownloader(use_mirrors=True)

I'm pretty sure 'use_mirrors' isn't doing anything aside from make your script slower. This makes you download the mirrors but then ignores them because you provide explicit endpoints.

I see, ok, in any case, now the downloader is only initialized at the beginning.

authority.address != '171.25.193.9'

The Directory class has a nickname so this could be "authority.nickname != 'maatuska'".

Yup, though it doesn't really matter for this goal...

Thanks a lot for the detailed answer!

You mean the controller get_server_descriptor?, if i can ask, why?

Because the DesciptorDownloader is simpler to use and sidesteps confusion from tor's caching behavior.

Descriptor inquiries are usually something simple like 'what relays are in the latest consensus?'. The DescriptorDownloader answers this, whereas the tor process answers the question 'what relays have been cached by the tor process and still valid?'.

The Controller's methods only advantage is that it doesn't burden authorities or mirrors at all. That's a good enough reason that I don't plan to deprecate the methods right now (especially since Nyx needs them), but as a general matter I advise against their usage.

Feel free to reopen if you have any other questions.