christgau/wsdd

Windows does not show wsdd host after reboot

votdev opened this issue · 22 comments

Windows does not show the wsdd host after reboot. If you restart wsdd, then it is shown in Windows Explorer again.

See https://forum.openmediavault.org/index.php/Thread/26324-Windows-10-Build-1809-und-OMV-Shares-sind-in-der-Netzwerkumgebung-verf%C3%BCgbar/?postID=199653#post199653

Thanks for reporting that issue and including wsdd in OMV. Nice to see that other people out there benefit from that tiny little script.

I'm a little confused from the forum posts. What is the actual problem? The wsdd host does not show up, when

  1. the wsdd host is rebooted and Window clients remains online. Or
  2. the Windows client is rebooted and it does not list the wsdd host in the Network view anymore.

sorry ich mach das mal schnell in deutsch hoffe dann bringe ich das besser rüber:

also der wsdd.py daemon lauft auf dem omv-server, der windows 10-pc wird neu gestartet dann findet windows 10 omv nicht, omv wird erst wieder nachdem im netzwerk gefunden nachdem wsdd.py neu gestartet wurde, und das nur solange du den windows-10 pc nicht neu startest denn dann geht es wieder nicht.

also kurz um läut der service und ein neuer pc kommt online bekommt der den namen von omv nicht übertragen.

habe dazu ein log gemacht

PC 192.168.0.8 mit windows 10 war online konnte omv aber nicht finden, also service um 22:44:50 neu gestartet und wird sofort versorgt (22:44:51,864:wsdd INFO) - danach wurde pc um 22:54 neu gestartet und wird nicht mehr versorgt und omv somit auch nicht im netzwerk angezeigt - erst wenn man die wsdd.py neu startet wird omv wieder im netzwerk angezeigt...

wsdd.log

hoffe es ist nun verständlicher :)

(I'd like to continue the discussion in English, so that non German speakers who may have the same issue can follow our thoughts.)

From your description, its problem 2 from the above list that you are facing. In addition, you had manually (re)started wsdd before the Windows client was rebooted.

This is pretty interesting because I have some Linux boxes using wsdd in my network which run 24/7 and Windows clients that are only booted when needed. When the Windows are powered on, I saw the WSDD machines in Explorer's Network view every time I looked at it, even immediately after login to Windows.

From the posted log, there is also no indication that wsdd has stopped working (in the sense of being stuck). It is handling several Hello, Probe and Resolve messages. So there is still network traffic that is processed. However, the essential HTTP request is missing for whatever reason. The response to that request will make a host appear in the Network view.

Could you please reproduce the scenario (start wsdd, reboot Windows client) but use the -vv for wsdd option so we can see the message contents? Note that this contains both IPs addresses and maybe hostnames (not sure, ATM).

i reproduced it with -vv logfile attached.
wsdd2.log

fyi before omv has integrated your py version of wsdd i have used wsdd2 (1.8) for months without this problem - don't know if that information helps but there can never be too much infos...

further informations: i'm using 3 pc's with windows 10 - 2 with latest fastring edition (18353 and since 26.03. 18362) and 1 with latest stable edition, all of them have the same behavior.

I was able to reproduce the issue by using IPv4 only, like in the provided log. In addition, the issue manifests also when forcing an update in Explorer's Network view using F5 shortcut or the refresh button.

It appears that UDP datagrams (probes) do not arrive at or are not handled by wsdd after initial announcement (done by Hello message). I'll dig into that... Stay tuned.

Great to hear that you've found the issue.

nice to hear - if you need a tester just feel free to contact me :)

nice to hear - if you need a tester just feel free to contact me :)

Thanks for your offer, I'll get back to that ;-) The problem appears to be that the receiving multicast socket is not properly "assigned" to the desired interface. And for multi-homed devices, like yours, this implies that you dont get the required traffic. I was not able to solve issue with a MWE last night, but could reproduce it. I'll look at this in the next days. A slightly reverse problem, i.e. sending on the wrong interface of a multihomed device, was already a problem in the past (see #3)

D'oh! Stupid me! I was able to reproduce the issue just because my firewall rules have been wrong. Please check yours. You must allow multicast, i.e. UDP, traffic to the address 239.255.255.250 on port 3702 in order to receive the queries from the Windows machines. The code for binding/joining the multicast group on the interface for receiving data appears to be correct.

omv doesn't provide an active firewall right out of the box...

and i have checked it, there are no rules in the firewall tab - also if there is a firewall why is it working once when pc is on and you restarting wsdd.py?

and as i said it is working with wsdd2 without any problems - and wsdd2 uses the same ports afaik?!?

omv doesn't provide an active firewall right out of the box...

Whow. Ok, but that's a different story.

and i have checked it, there are no rules in the firewall tab - also if there is a firewall why is it working once when pc is on and you restarting wsdd.py?

Conforming to what the spec says, wsdd sends an "Hello" message on startup using multicast (outgoing traffic). This message includes the host's IP address which is a mechanism to speed up the discovery process. Using that IP, a client like Windows can directly connect to the specified HTTP/TCP server and perform the final step of the discovery. There is then no need for Windows to send multicast messages.

Those steps worked in my setup as they do in your's. In my environment the firewall blocked incoming multicast traffic. Therefore, Windows's "Probe" messages, which are sent when you refresh the Network view, did not arrive. In such a case, Windows then thinks, the previously seen device has been gone and removes it from the view, which is IMO what you observe as well on your system.

My suggeestion is that you should give socat a try to check if multicast traffic can be received by an application. To do so, kill wsdd or similar applications (like wsdd2) if running and issue the following command:

socat UDP4-RECVFROM:3702,ip-add-membership=239.255.255.250:IP:IF_IDX STDIO

Please replace IP and IF_IDX with the IP address of the host and IF_IDX with the number that is printed before the interface name in the output of ip addr. You can also use the interface name (like eth0) instead of its index. When socat is running, open the Network view on your Windows client and force a refresh. If multicast packets can be received then socat prints out an XML document. If nothing happens, then there is another networking issue that is likely to be unrelated to wsdd or socat.

and as i said it is working with wsdd2 without any problems - and wsdd2 uses the same ports afaik?!?

Yes they do. wssd2, however, does not use interfaces created by docker or tunnel interfaces. Those are present on your box and wsdd tries to use them. Maybe there is a technical implication I am not aware of that prevents receiving the multicast traffic. You can try to avoid those interface by starting wsdd with -i eth0

so i have done a little further research...

first of all yes you're right if we add -i anydevice it will work - but i don't know if this is a practical workaround...

i have done that socat portsniffing and yes i will receive XML document - which is no wonder for me bc wsdd.py loggs that response so why should the port not work?!?

and then furthermore i have done a tcpdump on that port and i have seen that there is also traffic on this port from my pc even after reboot.

so if i have to guess i will bet your wsdd.py isn't answering on the right network?

is it possible that wsdd don't answer the receive on the same lan port which it was received (maybe you can force this)?

wsdd_port.log

first of all yes you're right if we add -i anydevice it will work - but i don't know if this is a practical workaround...

Technically, it is the same what wsdd2 is doing by ignoring docker and tunnel interfaces...

i have done that socat portsniffing and yes i will receive XML document - which is no wonder for me bc wsdd.py loggs that response so why should the port not work?!?

It's not a response that should be logged. It has to be a request from a Windows machine you just rebooted or forced a Network view update on. More precisely the message must be a WSD Probe. It has to look like this (Message ID will be different on your systems):

<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery" xmlns:wsdp="http://schemas.xmlsoap.org/ws/2006/02/devprof">
<soap:Header>
  <wsa:To>urn:schemas-xmlsoap-org:ws:2005:04:discovery</wsa:To>
  <wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/Probe</wsa:Action> 
  <wsa:MessageID>urn:uuid:2f71b6d3-f287-4937-bf24-20032b57250f</wsa:MessageID></soap:Header>
<soap:Body>
  <wsd:Probe><wsd:Types>wsdp:Device</wsd:Types></wsd:Probe>
</soap:Body>
</soap:Envelope>

so if i have to guess i will bet your wsdd.py isn't answering on the right network?
is it possible that wsdd don't answer the receive on the same lan port which it was received (maybe you can force this)?

Then you might see a response on another interface. Indeed there was a issue related to that problem (#3) but that one is resolved.

Could you please apply the following patch on wsdd.py and reproduce the issue using wsdd without a specific interface but with the -vv option. The log output is what I need again. Moreover, A full network traffic capture, i.e. pcap file, on port 3702 (udp) on every interface would help a lot. In addition, could you provide the output of netstat -gn and ip route list all after wsdd has been started?

diff --git a/src/wsdd.py b/src/wsdd.py
index 2984421..be436a0 100755
--- a/src/wsdd.py
+++ b/src/wsdd.py
@@ -394,6 +394,8 @@ def wsd_handle_message(data, interface):
     action = header.find('./wsa:Action', namespaces).text
     body = tree.find('./soap:Body', namespaces)
 
+    if interface:
+        logger.debug('got message on {}'.format(interface.address))
     logger.info('handling WSD {0} type message ({1})'.format(action, msg_id))
     logger.debug('incoming message content is {0}'.format(data))
     if action == WSD_PROBE:
@@ -652,6 +654,9 @@ def send_outstanding_messages(block=False):
         addr = send_queue[-1][2]
         msg = send_queue[-1][3]
         try:
+            logger.debug('sending via {}: {}'.format(
+                interface.interface, msg
+            ))
             interface.send_socket.sendto(msg, addr)
         except Exception as e:
             logger.error('error while sending packet on {}: {}'.format(

the files you wished, hopefully there is everything in what you need!

wsddpy_debug.zip

Thanks for those files. They gave me some more insights and I was able to reproduce the issue with almost identical setup.

TL;DR

There is no technical issue in the implementation. Select the interface for the real local network and everything works as expected.

Details

docker0 is a network bridge. As it appears to me (I'm not a docker expert), it is configured by default to do what a bridge does: Take traffic from one side and put it on the other one.

On the other hand, wsdd does what it has been told: bind to all interfaces, including docker0, and use the IP address configured for those interface as the address for the service provider it should "announce" on that particular interface.

Now, if Windows has been rebooted or a refresh of the Network view is forced, it sends a multicast "Resolve" message (after "Probe" but that does not matter here). That single message is received on the real interface of the wsdd host and by its Docker bridge as well. Since wsdd has bound sockets on these two interfaces (and the tunnel) the message is received twice - in whatever order. Based on the interface the message arrived, wsdd constructs an "ResolveMatch" response that includes the interface IP in the response's body. Consequently, two "ResolveMatch" messages are created with different IPs in them. They also leave from different interfaces - from the software perspecitve. However, they end up being sent on the same physical NIC and, thus, two messages arrive at the Windows host. One of the messages includes the correct IP (192....; the one handled by the socket that is responsible for eth0) and a wrong one with an address of the Docker network (172...).

From that point, I dont know what Windows exactly does, but I assume the following is going on: The "ResolveMatch" response with the internal Docker IP arrives at the windows host first. This was the case in my experiments with Wireshark running on Windows. That message was sent by wsdd using unicast and thus Windows can compare the IP-layer sender address with the IP address provided in the "ResolveMatch" message body. A smart implementation may now detect a mismatch between those two and suspect that there's something wrong going on. It may then ignore further messages from that endpoint which is identified by an UUID. Subsequently arriving "ResolveMatches" with a correct IP are therefore ignored and consequently the host does not appear anymore in Windows.

This is kind of a NAT problem. The application layer (the WSD messages) contains IP addresses of the private Docker network and exposes them to the public, although they are not reachable from the outside. Reminds me of SIP.

As mentioned in a previous comment, wsdd2 does not have this problem because it always ignores docker interfaces. While this works, I'm not going to hardcode such a workaround into wsdd. It may work in some use-cases but not in others.

fyi it is NOT!!!!!!! the docker device! (have reproduced it with an identical setup with docker and without OpenVPN - and there wsdd.py is workin as it is supposed to)

it is the tun0 device as you can see in the logs! and tun0 is OpenVPN....

there was also a similar problem in wsdd2 and they have FIXED it not workaround it with an option!

your -i option is somewhat cosmetical but doesn't fix the real problem and can you tell me what ppl with an multihome should do? i don't think -i is supposed to work with multiple nics or?

nothing would be simpler to exclude tun and docker devices like wsdd2 has done it over 2 years ago...

kochinc/wsdd2@43f2e65#diff-86ce8c879f76ba1a816e9fa64709deab

fyi it is NOT!!!!!!! the docker device! (have reproduced it with an identical setup with docker and without OpenVPN - and there wsdd.py is workin as it is supposed to)

Well, in my test case it appears like it is the docker interface. As I wrote, the IP that was sent in the ResolveMatch response was the one of the docker interface. In your log (from the ZIP file), the ResolveMatch messages contain the IP of the tunnel interface (see 00:39:14.402 and 404) not of the Docker one. That is in fact different, but the symptom appears to be identical: The constructed message is sent over the physical NIC but contains addresses (10.8.0.x) that are not reachable by the (Windows) host that receives those messages. And in consequence, the wsdd machine does not show up in Explorer.

it is the tun0 device as you can see in the logs! and tun0 is OpenVPN....

In my setup I created an IPIP tunnel.

there was also a similar problem in wsdd2 and they have FIXED it not workaround it with an option!

Well, they chose to ignore interfaces with certain prefixes in their name. Now, what if a user wants exactly the opposite. What if they want to expose their host on an interface that (unfortunately) matches to such an "ignore-prefix". There is no way for the user to achieve this except for changing the code and - in case of wsdd2 - recompile it. In addition, the user may create tunnel devices with other names, but which may break the functionality as well. How to exclude those?

The fix used in wsdd2 may match the purposes of the NetGear guys and I'm totally fine if it matches their specific targets/products. Personally, I don't want to include a solution that excludes things that may solve problems in some situations but where I am not sure if such a hardcoded exclusion doesn't break the usage in other situations.

If it is acceptable to include such a solution (ignore certain interfaces), you may ask downstream, which I think is @votdev here, to go in that direction.

BTW: The log message in wsdd2 states that

A user had OpenVPN installed, and the virtual network interfaces
created by OpenVPN caused wsdd2 to crash due to some unexpected
attributes.

They fixed a crash that seems to be caused by technical circumstances, not by behavior like devices not showing up in Explorer.

your -i option is somewhat cosmetical but doesn't fix the real problem

What is the "real" problem from your perspective? From my point of view wsdd behaves exactly as it should as I already explained in my previous comment. Wsdd receives a "Resolve" message/request on a certain interface it has bound to because the user wanted it to bind to that interface by omitting options. The "ResolveMatch" message contains the IP of the interface the request was received on, which absolutely makes sense. Subsequent requests are ignored because they are duplicates, which is what the standard requires you to do.

I fully understand that wsdd, compared to wsdd2, does not work out of the box in your environment. ATM, my opinion is that a tool should not try to guess in an unreliable way (and string matching is IMO unreliable) what may not be good for its execution.

can you tell me what ppl with an multihome should do?

On devices without tunnels or docker device, there are no problems. You can run wsdd with no options and it works fine. You may also specify interface(s) to bind to and wsdd announces the host only on those (see #3 (comment), e.g.). However, the user has to know what the best solution in her network setup is. The user may want to bind to certain interfaces and wsdd allows them to do so.

I think, that an addition to the "Known Issues" section of the README may be benefical for users of tunnel/bridge devices to not fall into the discussed trap. Or would an additional option to ignore certain interfaces be an acceptable solution for you? So you could start wsdd like wsdd -I docker0 -I tun0 or wsdd -i !docker0 -i !tun0 to ignore these interfaces?

i don't think -i is supposed to work with multiple nics or?

It does. You can specify multiple interfaces with wsdd -i eth0 -i eth1, e.g. The README is a little vague on that point. I'll clarify that in an update.

nothing would be simpler to exclude tun and docker devices like wsdd2 has done it over 2 years ago...

kochinc/wsdd2@43f2e65#diff-86ce8c879f76ba1a816e9fa64709deab

It's out of question that applying the fix from wsdd2 is simple, but I think I made clear that including a solution for a problem that not really exists and may introduce complications for other users is not in my intention.

What about simply adding an iptables rule to block outgoing packets on the TUN interface? That should fix your problem.

If it is acceptable to include such a solution (ignore certain interfaces), you may ask downstream, which I think is @votdev here, to go in that direction.

No, i will not modify wsdd downstream, i hope to get my improvements for Debian upstream to get rid of maintaining 3rd party packages in OMV.

What about simply adding an iptables rule to block outgoing packets on the TUN interface? That should fix your problem.

No, not really. The problem is that the Resolve requests are received by the tunnel (or docker) interface and the according packets are handed over to wsdd from that interface first. Blocking the response wont improve the situation. What should help instead is blocking the requests coming from the Windows hosts (udp/multicast destination/port 3702) on the tunnel/docker interface. So it's the incoming traffic from outside of the wsdd machine that should be blocked, IMHO. So, in principle you are right that the firewall would help to circumvent the issue. Nevertheless, as already noted, using -i ... solves the problem as well.

If it is acceptable to include such a solution (ignore certain interfaces), you may ask downstream, which I think is @votdev here, to go in that direction.

No, i will not modify wsdd downstream, i hope to get my improvements for Debian upstream to get rid of maintaining 3rd party packages in OMV.

That's a clear statement. Thanks for sharing your opinion.

Sorry for butting in on a closed issue :)

I agree with christgau. The problem is in the network setup and the assumption that software services should handle the same data on multiple interfaces.

I'd think that the routing setup is wrong if the source IP doesn't match the interface. Either block erroneous source IP addresses or NAT them to create valid traffic.

D'oh! Stupid me! I was able to reproduce the issue just because my firewall rules have been wrong. Please check yours. You must allow multicast, i.e. UDP, traffic to the address 239.255.255.250 on port 3702 in order to receive the queries from the Windows machines. The code for binding/joining the multicast group on the interface for receiving data appears to be correct.

Do you have an iptables rule and/or ufw rule for that please? Would be awesome!
Thanks in advance

I did a bit of research myself and this configuration works to remove all UFW BLOCK messages in dmesg using Ubuntu Focal:
ufw allow 5357/tcp
ufw allow 3702/tcp
ufw allow 3702/udp
ufw allow in proto udp to 224.0.0.0/4
ufw allow in proto udp to 239.0.0.0/8

add this to ufw before rules to allow multicast IGMP, /etc/ufw/before.rules before the COMMIT:
-A ufw-before-input -p igmp -d 224.0.0.0/4 -j ACCEPT
-A ufw-before-input -p igmp -d 239.0.0.0/8 -j ACCEPT

Hopefully it's ok to add this to a closed issue but this might be usefull to other people.