/TCP_Connection_Analysis

Analyzes half-closed TCP connections using generalized linear models, regression trees, and random forest classification models.

Primary LanguageR

TCP_Connection_Analysis

Analyzes TCP connections to find contributing factors to half-closed connections, which occur when a TCP reset is sent after a TCP FIN/ACK. This topic is important because TCP resets can be forged and used in network attacks.

I worked on this project with two classmates, and my individual contribution was primarily the TCP_data_cleaning.R script and the poisson_models.R script. These two scripts assemble the relevant features from data collected using Wireshark software into tabular format and fits several GLMs to analyze relevant predictors. Logistic regression models are used to predict the presence (zero or at least one) of TCP resets after a FIN/ACK, and zero-inflated negative binomial models are used to model the actual count of the TCP resets post FIN/ACK. Assumptions from the poisson models were violated due to zero inflation and overdispersion in the response variable.

Findings from this analysis are compared to feature importance plots generated from the regression trees and random forest classification models, which my partners wrote. Regression trees modelled the count and the random forest model predicted the presence of TCP resets resets post FIN/ACK. These models were much better in handling multi-collinearity in some of the features as well as modelling the non-linearities in some of the features, specifically their interactions. However the negative binomial models gave meaningful interpretations for the predictors where linearity was satisfied.

This project was done with two of my peers for our Computer Networks class at Skidmore.