web-shield: A Shell repository from DPsystems

|\     /|(  ____ \(  ___ \      (  ____ \|\     /|\__   __/(  ____ \( \      (  __  \
| )   ( || (    \/| (   ) )     | (    \/| )   ( |   ) (   | (    \/| (      | (  \  )
| | _ | || (__    | (__/ /_____ | (_____ | (___) |   | |   | (__    | |      | |   ) |
| |( )| ||  __)   |  __ ((_____)(_____  )|  ___  |   | |   |  __)   | |      | |   | |
| || || || (      | (  \ \            ) || (   ) |   | |   | (      | |      | |   ) |
| () () || (____/\| )___) )     /\____) || )   ( |___) (___| (____/\| (____/\| (__/  )
(_______)(_______/|/ \___/      \_______)|/     \|\_______/(_______/(_______/(______/

Your first line of WEB defense against Internet bots, hacks and probes.

This is an IP-based blacklist of known sources for unsavory activity.  If you run a
legit web site that has no business serving bad actors, this will come in very handy.

If you run a web site where you don't want automated scripts that aren't serving YOUR
purposes, this will come in very handy.

If you don't need people using VPNs on your site (the lion's share of VPN activity are
attempts to hack/probe web sites), this will come in very handy.

Yea, yea, people say blacklists don't work.  They're wrong.  They do work.  Better than
most other systems. You can't hack what you can't reach, and if you're not doing any
web business with Chinese or Russian cloud providers, or some central American hosting
company that's checking for Wordpress vulnerabilities, or dozens of web spiders that have
no respect for robots.txt and are secretly copying your web site, or a cheap eastern 
European VPS company that accepts anonymous clients and crypto payments...

...why TF should they have access?

THAT'S why you should use Web-Shield.

A great stand alone filter, or compliment to the wonderful active 
firewall: Fail2Ban, that will make F2B even more efficient.

Also see our sister product: Login-Shield (https://github.com/DPsystems/Login-Shield

by Dark Phiber, 2019-2022 - dolson803@@gmail.com

# WHAT?
# =====

Web-Shield is a set of scripts that implements a traffic filter 
of your web server ports.

You may ask yourself, why and how should I bother filtering traffic
to my web sites?  I want as much traffic as possible, right?

Well, no, not really.  There are a lot of traffic that just wastes
bandwidth and exploits your system and content, and actively tries
to find vulnerabilities and ways into your system.  You do NOT 
want that.

If you want true, organic, real users visiting your web server, 
and not web scrapers, system probe scripts, and other automated
hacking scripts that do everything from register fake users to 
exploit forms for spamming, this is the system for you.

We also have identified certain hostile web spiders that you can
block. These are either private corporate data collection systems
or search engine bots that disregard robots.txt designations
and collect data that will likely never be of benefit to you.

Our blacklist is intended to be a specifically targetted IPv4-based blacklist
of major groups of Internet locations that are notorious for housing
the lion's share of nefarious HOSTING OPERATIONS that 99.9% of the time
have NO BUSINESS WHATSOEVER crawling your web site.

In addition to this, there are some optional blacklists that are a 
little more aggressive in filtering common spiders and userspace that
is filled with fraud and activity that is unlikely to be of benefit.

This system can by used by itself or (ideally) in association with
more precise anti-hacking systems like Login-Shield and Fail2Ban.  
With this large net in place, it reduces the resources Fail2Ban needs 
to only dealing with mostly local attacks from IP space you might not 
want to ban wholesale.

Types of IP space that is in the Web-Shield blacklist:

1.	Mainly overseas hosting/cloud space - NOT user IP space
2.	VPNs and commonly used TOR exit nodes that are cloud hosted
3.	A small group of foreign user IP space that typically isn't
        legitimate web traffic.  

The general rule for Web-Shield is:  

 If the traffic is coming from another server, and is not a real
 person, we don't want it.  The exceptions are spiders for common
 search engines, and important sites like Let's Encrypt which 
 needs to handshake with your server to verify and deploy SSL
 certificates.   

 We also avoid any of the large cloud providers like AWS and
 Cloudflare.  It's too hit-and-miss messing with them, and we
 don't want some legitiamte automated systems being blocked.

 We DO block a number of low-end, high-volume US hosting IP 
 space like: Colocrossing, Digital Ocean, etc.  This space is
 notorious for having compromised computers that try to hack and
 probe other systems.

It is recommended that if you have some of the mass-market cloud
provider servers needing to hit your server, you whitelist those
specific IPs.


However, the Web-Shield blacklist specifically includes VPN PROVIDERS!  

If you want to stop people who are using VPNs from accessing your
web site, Web-Shield can do that.  If you don't have a specific 
need to entertain VPN-based traffic we recommend you block it.

In our analysis, the vast majority of hack/attack vectors are
coming from VPN space.  


# WHY?
# ====

Every time a site is compromised, there's a chance lists of usernames
and passwords are leaked.  Hackers will take these lists and try to
find other systems that use these same credentials. If they can gain
access they can completely ruin your day (or year).  They will often
try to login to e-mail clients, ftp accounts, ssh services, etc.  
These system probes are now becoming even more sophisticated, and able
to recognize Fail2Ban trigger conditions and work around them.  Our
system stops approximately 90% of the attacks on most servers.

# HOW?
# ====

Web-shield is a very small set of IPTABLES rules that is designed
to block access primarily to web ports.

## See the file INSTALL for installation instructions

## See the file VERSION for version and developer notes

## See the files CHANGELOG or VERSION for information on changes and program versions and developer notes

# ADDITIONAL NOTES
# ================

Specific systems web-shield is blocking - add your thoughts as to whether or not this is appropriate?

* Petalbot - a very aggressive Asian web crawler

* Awario.com - a commercial service that lets companies know if their products are being talked about
               on social media - presumably so they can do damage control

* semrush.com - another commercial service and private "business intelligence service" that probably
                only benefits other companies and not yours

* censys.com - another private, corporate audit of web space, presumably to "manage attack space",
               supposedly a "white hat database of vulnerable sites" which obviously makes a great
               target for black hat ops, so we don't want to be in their system.
DPsystems/web-shield