SYNAPSE: A Python repository from eneagizzarelli

SYNAPSE: SYNthetic AI Pot for Security Enhancement

Table of Contents

About the project
Installation
Usage
Use cases
License
Contacts
Other projects
Acknowledgments

About the project

SYNAPSE is a

low-interaction
server
dynamic

honeypot acting as a Linux OS terminal. It is entirely written in Python. Instead of relying on a real terminal, SYNAPSE works with generative AI (currently GPT-4o model) to answer with realistic terminal outputs, as if the user was connecting to a real Linux OS using SSH. It currently implements the simulation of two services:

SSH Server
MySQL Server.

Generative AI, in this context, will be used to generate responses to issued commands both for the simulated Linux terminal and for the fake MySQL service, leveraging prompt engineering techniques. shelLM project was used as a starting point to implement SYNAPSE code.

SYNAPSE-to-MITRE extension automatically maps logs collected by SYNAPSE into attacks of the MITRE ATT&CK database, leveraging machine learning technologies. More in detail, a MLP classifier has been trained to achieve the desired behaviour. The dataset used to train the model is the one proposed by cti-to-mitre-with-nlp, re-created using the (currently) last version of the MITRE ATT&CK database (enterprise-attack-15.1). Generative AI, in this context, will be used both for deciding if an attack happened or not, and to generate a brief sentence summing up the eventual attack.

GeoLite2 database is used to obtain geolocation information about the connected IP address. VirusTotal APIs are used to get client IP address reputation among the other data extracted by the honeypot. Also, for each log of commands, reputations of IPs and domains entered by the user/attacker (e.g. ping to a certain IP address or wget from a certain domain) are fetched. All these information will be provided to the AI as additional factors to decide whether an attack happened or not.

Among its features, SYNAPSE supports multiple sessions for the same user. Each IP address will have its own simulated file system for each subsequent session. Different users will never see modifications done by others. File system file and directories together with MySQL databases and tables will be populated creatively (dinamically) by generative AI.

With the aim of a comparative evaluation, a static equivalent of SYNAPSE has been implemented: DENDRITE.

(back to top)

Installation

Clone this repository

git clone https://github.com/eneagizzarelli/SYNAPSE.git

Enter the project folder and install requirements
```
pip install -r requirements.txt
```
Create a .env file (in my configuration under /home/enea/.env) and add your OpenAI and VirusTotal API keys
```
OPENAI_API_KEY='YOUR KEY'
VIRUSTOTAL_API_KEY='YOUR KEY'
```

Note 1: in my configuration, SYNAPSE project folder has been cloned under the specific path /home/enea/SYNAPSE. Every script/source file in this project refers to other scripts/source file using the above absolute path as a base path. If you plan to use an alternative configuration, like different location or user, remember to change the paths and to replace enea everywhere.

Copy configSYNAPSE.sh script from scripts/ folder outside the SYNAPSE directory and, after assigning the necessary permissions, run it
```
chmod +x configSYNAPSE.sh
./configSYNAPSE.sh
```
This will complete the configuration of SYNAPSE, creating the necessary folders, downloading GeoLite2 database and assigning ownership and permissions to user enea (or the one you specifically decided).
Modify your /etc/ssh/sshd_config file in order to run startSYNAPSE.sh script (after assigning the necessary permissions) and to disable many SSH parameters (not handled by the code) whenever user enea (or the one you specifically decided) connects to your machine using SSH:
```
Match User enea
   ForceCommand /home/enea/SYNAPSE/scripts/startSYNAPSE.sh
   X11Forwarding no
   AllowTcpForwarding no
   AllowAgentForwarding no
   PermitTunnel no
   PermitOpen none
```

Note 2: if you are hosting the code on a VM like AWS EC2 and you want to allow password authentication, remember to change your /etc/ssh/sshd_config.d/50-cloud-init.conf file setting PasswordAuthentication yes (60-cloudimg-settings.conf for Oracle Cloud Infrastructure).

Restart your SSH service
```
systemctl restart sshd
```

(back to top)

Usage

Adopting the aforementioned configuration will run SYNAPSE "fake" terminal instead of the real one whenever user enea (or the one you specifically decided) connects to your SSH server.

While SYNAPSE is running, many classification files will be created in the logs directory. Those files will have a name format like IPaddr_classification_history_NUM.txt, and will contain the history of commands the user with IP address IPaddr issued on its session number NUM. Over those files SYNAPSE-to-MITRE extension will operate. After assigning the necessary permissions, executing the script ./startSYNAPSE-to-MITRE.sh will automatically convert classification files into attack files containing the corresponding MITRE ATT&CK object content, if AI thinks the attack happened.

If you plan to rebuild the dataset from scratch, the startDatasetBuild.sh script can be run. You'll need to replace capec or enterprise-attack databases in the SYNAPSE-to-MITRE/data folder with the versions you prefer (you can download them from the repositories linked in the below acknowledgments section). Make sure to leave file and folder names unchanged. In the end, the model can be trained with the newly generated dataset using the startModelTraining.sh script.

Note 3: if you experience an error like Resource SOMETHING not found and, further on, >>> nltk.download('SOMETHING') when using SYNAPSE-to-MITRE, please try the following command: python3 -m nltk.downloader SOMETHING. It should happen only for resources punkt and wordnet.

startSystemLogsAnalysis.sh script can be executed to perform a basic analysis of Linux OS logs leveraging AI. auth.log, kern.log and syslog will be given as a prompt to generative AI, obtaining as a result a report describing what happened in the system.

(back to top)

Use cases

Some experiments, or use cases, have been carried out over SYNAPSE to stress its functionalities:

AI vs SYNAPSE - basic interaction: SYNAPSE is dinamically generating the content of file system and MySQL service, an additional AI interacts with it and navigates through the various file, folders, databases, tables and so on. Everything is automated, with the new AI replacing user-interaction.
AI vs SYNAPSE - attacker interaction: SYNAPSE is dinamically generating the content of file system and MySQL service, an additional AI tries to attack and corrupt it with N different attack strategies, where N can be customized. Everything is automated, with the new AI replacing user-interaction. After the execution of this script, that will stop autonomously when the N-th attack ends, SYNAPSE-to-MITRE extension can be run to map the attacks performed by the AI to the MITRE ATT&CK database.

The code implemented to perform the above tests can be found under use_cases/ folder and can be run by simply typing (e.g. if we want to execute the attacker interaction):

python3 AI_vs_SYNAPSE_attacker_interaction.py

(back to top)

License

Distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. See LICENSE for more information.

(back to top)

Contacts

Enea Gizzarelli - eneagizzarelli2000@gmail.com

LinkedIn - https://linkedin.com/in/eneagizzarelli

(back to top)

Other projects

DENDRITE: https://github.com/eneagizzarelli/DENDRITE

(back to top)

Acknowledgments

(back to top)