zmap/zmap

Vague Message on Fatal Error

Closed this issue · 4 comments

I'd like to preface this by pointing out that there is no option to create a new discussion for this repository. Therefore I have created a issue instead. Sorry about this.
image

Describe the bug

 0:00 0%; send: 1457 0 p/s (75.2 Kp/s avg); recv: 0 0 p/s (0 p/s avg); drops: 0 p/s (0 p/s avg); hitrate: 0.00%
 0:01 5%; send: 1328260 1.33 Mp/s (1.30 Mp/s avg); recv: 0 0 p/s (0 p/s avg); drops: 0 p/s (0 p/s avg); hitrate: 0.00%
 0:02 10%; send: 2753073 1.42 Mp/s (1.36 Mp/s avg); recv: 0 0 p/s (0 p/s avg); drops: 0 p/s (0 p/s avg); hitrate: 0.00%
 0:03 15%; send: 4197669 1.44 Mp/s (1.39 Mp/s avg); recv: 0 0 p/s (0 p/s avg); drops: 0 p/s (0 p/s avg); hitrate: 0.00%
 0:04 20%; send: 5635421 1.44 Mp/s (1.40 Mp/s avg); recv: 0 0 p/s (0 p/s avg); drops: 0 p/s (0 p/s avg); hitrate: 0.00%
 0:05 25% (15s left); send: 7060121 1.42 Mp/s (1.41 Mp/s avg); recv: 0 0 p/s (0 p/s avg); drops: 0 p/s (0 p/s avg); hitrate: 0.00%
 0:06 30% (14s left); send: 8500475 1.44 Mp/s (1.41 Mp/s avg); recv: 0 0 p/s (0 p/s avg); drops: 0 p/s (0 p/s avg); hitrate: 0.00%
 0:07 35% (13s left); send: 9943598 1.44 Mp/s (1.42 Mp/s avg); recv: 0 0 p/s (0 p/s avg); drops: 0 p/s (0 p/s avg); hitrate: 0.00%
 0:08 40% (12s left); send: 11385592 1.44 Mp/s (1.42 Mp/s avg); recv: 0 0 p/s (0 p/s avg); drops: 0 p/s (0 p/s avg); hitrate: 0.00%
 0:09 46% (11s left); send: 12812476 1.43 Mp/s (1.42 Mp/s avg); recv: 0 0 p/s (0 p/s avg); drops: 0 p/s (0 p/s avg); hitrate: 0.00%
 0:10 51% (10s left); send: 14263446 1.45 Mp/s (1.42 Mp/s avg); recv: 0 0 p/s (0 p/s avg); drops: 0 p/s (0 p/s avg); hitrate: 0.00%
 0:11 56% (9s left); send: 15708112 1.44 Mp/s (1.43 Mp/s avg); recv: 0 0 p/s (0 p/s avg); drops: 0 p/s (0 p/s avg); hitrate: 0.00%
Mar 11 22:16:29.417 [FATAL] monitor: maxiumum number of sendto failures (1) exceeded

Zmap is sending out many packets, reporting no drops, yet receiving zero responses, and eventually crashing due to a fatal error.

CLI Arguments

sudo zmap -p 80 --sender-threads=30 --bandwidth=1G 185.0.0.0/8 > /dev/null

Example Target IP

Any large subnet, ex /8.

Expected behavior
I am expecting more verbose error message to better understand what is causing the issue. Trying to google "sendto failures" is not very useful for my debugging. I understand that I can simply throttle the bandwidth further to mitigate this issue, but would like to also see if there are alternatives.

Environment:

  • OS: Ubuntu 22.04
  • Version: from package manager

Additional context
Are there any resources to better understand what tunables I can configure, inside or outside of the zmap arguments, that lead to higher performance? I'm willing to upgrade my system as necessary but am having a hard time finding the bottlenecks.

Hey @dbomma! Really appreciate you reporting this, there's only so many environments/situations we can test so it's actually really helpful to get these reports.

First off, can you be sure you're signed into Github before looking at Discussions? Based on this, I'm assuming you should have access if you're signed in and can access the ZMap repo, but let me know if not and I can see if there's some setting we have configured.

Let me look into reproducing. In the mean time, can you provide some info.
Host machine specs - CPU cores, RAM
Network Bandwidth - What's your speedtest upload speed?

Off hand, I'd try lowering your send bandwidth significantly(just fyi, this is on our roadmap to have slow start added to ZMap here), and then re-increase it as long as you don't see a drop in hit-rate. So try -B 1M and then increase it to -B 100M, etc on a 10 second scan to try to find your ideal send rate without packet drop.

Of course, this assumes the issue is your network infrastructure but it should help eliminate that as a cause.

Hi @phillip-stephens , thanks for the quick response! I've attached some of the specs you requested, I'm actually running this on a c6a.8xlarge AWS EC2 instance, if you would like to run on the exact same specs.

CPU info:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  32
  On-line CPU(s) list:   0-31
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7R13 Processor
    CPU family:          25
    Model:               1
    Thread(s) per core:  2
    Core(s) per socket:  16

Ram:

64 Gb

Network Bandwidth (average of 3 runs using speedtest-cli):

Download: 2600 Mbit/s 
Upload: 1800 Mbit/s

This does seem like enough to handle a 1G bandwidth, unless there are other parts of the network infrastructure in the way, that I am unaware of. Is there any sort of checklist for narrowing down exactly what the bottleneck is? I think this could make a great addition to the Wiki if not.

Edit: If there are any tests I can run on my end as well to help you all figure out this issue, please let me know.

As for the discussions, I think it may be a setting that needs to be configured. There is no "New Discussion" button available for me.

Can you try running with -T 8 or -T 16? I think we're sending packets too fast for the library to not run into errors and then we quit when we've hit too many errors. The msg could definitely be improved, I just want to see if this fixes your issue!

With the new changes coming in r. 4.1, this shouldn't manifest itself like this again. I tried to reproduce and the user gets a bit better of an error msg about No buffer space available.

I'll document in the wiki about tuning thread count and bandwidth constraints to maximize performance on a user's given hardware without reducing hit-rate or encountering errors.