There are two situations that IMHO need to be corrected.
The http protocol parser copies the data from the "Host:" header without checking.
Traffic analysis showed that some clients add a space character after the server name.
The second situation is more interesting.
In the tls protocol, the "SNI" field contains text with line feed characters.

I think not everyone is ready to handle a hostname with control characters.
What's the best way to deal with control characters in a hostname? Remove or replace?
It is possible to fix both problems by making changes to the ndpi_hostname_sni_set() function.

I agree that we should fix that; we might do something similar of what we are already doing in dns: first "normalize" the name and then save it as metadata.

@vel21ripn, out of curiosity, could you share the flow triggering this output:

        1       TCP <-> [proto: 91/TLS][IP: 0/Unknown][Encrypted][Confidence: DPI][DPI packets: 5][cat: Web/5][9 pkts/2712 bytes <-> 11 pkts/14917 bytes][Goodput ratio: 81/96][0.57 sec][Hostname/SNI:  like gecko) chrome/ safari/537.36
accept: /

It seems suspicious that we have that kind of string in the SNI (because it seems an HTTP user agent...)

The question is how to normalize the name: remove invalid characters or replace them. If replaced, then with what?
Spaces at the end of a line can definitely be removed.

This traffic is not unique. DigitalOcean has several servers. They all use TCP port 440. All clients have HTTP headers instead of SNI, but not always the same.
It's very similar to a VPN.

The question is how to normalize the name: remove invalid characters or replace them. If replaced, then with what? Spaces at the end of a line can definitely be removed.

In dns we replace invalid characters :

if(character is not valid) {
  if (ndpi_isprint(character) == 0) {
    return '?';
  } else {
    return  '_';

I think that we should keep the same logic

This traffic is not unique. DigitalOcean has several servers. They all use TCP port 440. All clients have HTTP headers instead of SNI, but not always the same. It's very similar to a VPN.

Could you share a pcap?

Excellent logic for replacing invalid characters!

Traffic examples

Done in 4543385 and f352e4f