Table of Contents
SYNAPSE is a
- low-interaction
- server
- dynamic
honeypot acting as a Linux OS terminal. It is entirely written in Python. Instead of relying on a real terminal, SYNAPSE works with generative AI (currently GPT-4o model) to answer with realistic terminal outputs, as if the user was connecting to a real Linux OS using SSH. It currently implements the simulation of two services:
- SSH Server
- MySQL Server.
Generative AI, in this context, will be used to generate responses to issued commands both for the simulated Linux terminal and for the fake MySQL service, leveraging prompt engineering techniques. shelLM project was used as a starting point to implement SYNAPSE code.
SYNAPSE-to-MITRE extension automatically maps logs collected by SYNAPSE into attacks of the MITRE ATT&CK database, leveraging machine learning technologies. More in detail, a MLP classifier has been trained to achieve the desired behaviour. The dataset used to train the model is the one proposed by cti-to-mitre-with-nlp, re-created using the (currently) last version of the MITRE ATT&CK database (enterprise-attack-15.1). Generative AI, in this context, will be used both for deciding if an attack happened or not, and to generate a brief sentence summing up the eventual attack.
GeoLite2 database is used to obtain geolocation information about the connected IP address. VirusTotal APIs are used to get client IP address reputation among the other data extracted by the honeypot. Also, for each log of commands, reputations of IPs and domains entered by the user/attacker (e.g. ping to a certain IP address or wget from a certain domain) are fetched. All these information will be provided to the AI as additional factors to decide whether an attack happened or not.
Among its features, SYNAPSE supports multiple sessions for the same user. Each IP address will have its own simulated file system for each subsequent session. Different users will never see modifications done by others. File system file and directories together with MySQL databases and tables will be populated creatively (dinamically) by generative AI.
With the aim of a comparative evaluation, a static equivalent of SYNAPSE has been implemented: DENDRITE.
-
Clone this repository
git clone https://github.com/eneagizzarelli/SYNAPSE.git
-
Enter the project folder and install requirements
pip install -r requirements.txt
-
Create a .env file (in my configuration under
/home/enea/.env
) and add your OpenAI and VirusTotal API keysOPENAI_API_KEY='YOUR KEY' VIRUSTOTAL_API_KEY='YOUR KEY'
Note 1: in my configuration, SYNAPSE project folder has been cloned under the specific path /home/enea/SYNAPSE
. Every script/source file in this project refers to other scripts/source file using the above absolute path as a base path. If you plan to use an alternative configuration, like different location or user, remember to change the paths and to replace enea everywhere.
-
Copy
configSYNAPSE.sh
script fromscripts/
folder outside theSYNAPSE
directory and, after assigning the necessary permissions, run itchmod +x configSYNAPSE.sh ./configSYNAPSE.sh
This will complete the configuration of SYNAPSE, creating the necessary folders, downloading GeoLite2 database and assigning ownership and permissions to user enea (or the one you specifically decided).
-
Modify your
/etc/ssh/sshd_config
file in order to runstartSYNAPSE.sh
script (after assigning the necessary permissions) and to disable many SSH parameters (not handled by the code) whenever user enea (or the one you specifically decided) connects to your machine using SSH:Match User enea ForceCommand /home/enea/SYNAPSE/scripts/startSYNAPSE.sh X11Forwarding no AllowTcpForwarding no AllowAgentForwarding no PermitTunnel no PermitOpen none
Note 2: if you are hosting the code on a VM like AWS EC2 and you want to allow password authentication, remember to change your /etc/ssh/sshd_config.d/50-cloud-init.conf
file setting PasswordAuthentication yes
(60-cloudimg-settings.conf
for Oracle Cloud Infrastructure).
- Restart your SSH service
systemctl restart sshd
Adopting the aforementioned configuration will run SYNAPSE "fake" terminal instead of the real one whenever user enea (or the one you specifically decided) connects to your SSH server.
While SYNAPSE is running, many classification files will be created in the logs
directory. Those files will have a name format like IPaddr_classification_history_NUM.txt
, and will contain the history of commands the user with IP address IPaddr issued on its session number NUM. Over those files SYNAPSE-to-MITRE extension will operate. After assigning the necessary permissions, executing the script ./startSYNAPSE-to-MITRE.sh
will automatically convert classification files into attack files containing the corresponding MITRE ATT&CK object content, if AI thinks the attack happened.
If you plan to rebuild the dataset from scratch, the startDatasetBuild.sh
script can be run. You'll need to replace capec or enterprise-attack databases in the SYNAPSE-to-MITRE/data
folder with the versions you prefer (you can download them from the repositories linked in the below acknowledgments section). Make sure to leave file and folder names unchanged. In the end, the model can be trained with the newly generated dataset using the startModelTraining.sh
script.
Note 3: if you experience an error like Resource SOMETHING not found
and, further on, >>> nltk.download('SOMETHING')
when using SYNAPSE-to-MITRE, please try the following command: python3 -m nltk.downloader SOMETHING
. It should happen only for resources punkt and wordnet.
startSystemLogsAnalysis.sh
script can be executed to perform a basic analysis of Linux OS logs leveraging AI. auth.log
, kern.log
and syslog
will be given as a prompt to generative AI, obtaining as a result a report describing what happened in the system.
Some experiments, or use cases, have been carried out over SYNAPSE to stress its functionalities:
-
AI vs SYNAPSE - basic interaction: SYNAPSE is dinamically generating the content of file system and MySQL service, an additional AI interacts with it and navigates through the various file, folders, databases, tables and so on. Everything is automated, with the new AI replacing user-interaction.
-
AI vs SYNAPSE - attacker interaction: SYNAPSE is dinamically generating the content of file system and MySQL service, an additional AI tries to attack and corrupt it with N different attack strategies, where N can be customized. Everything is automated, with the new AI replacing user-interaction. After the execution of this script, that will stop autonomously when the N-th attack ends, SYNAPSE-to-MITRE extension can be run to map the attacks performed by the AI to the MITRE ATT&CK database.
The code implemented to perform the above tests can be found under use_cases/
folder and can be run by simply typing (e.g. if we want to execute the attacker interaction):
python3 AI_vs_SYNAPSE_attacker_interaction.py
Distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. See LICENSE
for more information.
Enea Gizzarelli - eneagizzarelli2000@gmail.com
LinkedIn - https://linkedin.com/in/eneagizzarelli