Hassan Saad

Security Threats Consultant at SecureNetworks, Egypt

Artificial Intelligence & Machine Learning In Cyber Security

Session Title | Threat Hunting using Machine Learning

The session is about how we can use machine learning algorithms in threat hunting to predict malicious network traffic from the normal one.

The idea is divided into three phases:-

1- Data Processing: where we take the network traffic whether it's malicious or not from PCAP file or real-time from network sensors, and extract from them all the HTTP headers, and convert these headers to datasets, and divided them into 90% to the training data and 10% to the testing data.

2- Training Phase: where we take the normal and malicious training data and perform the Naïve-Bayes theory on them (we implemented the Naïve-Bayes theory from scratch to absolutely fit our training model and to increase the success rate).
Then we generate a text file that contains all the calculations of the training model (probability of normal HTTP headers, probability of malicious HTTP headers, number of total normal words, number of total malicious words, unique words, probability of normal class, and probability of malicious class), and we will use this file and all these calculations in the testing phase.

3- Testing Phase: where we take the testing data to test it against the training model, and load the training model from the previously mentioned text-file and perform the testing and calculations between these two, where the algorithm predicts to which class this test case belongs, as you can see, we perform the calculations and predictions on the remaining test data that we divided in the first and we don't know the types of these tests and we don't include them in our training model, so this test data is unknown to us.

The training model can successfully predict all the types of test data whether it is malicious or not without knowing the type of this test data based on the Naïve-Bayes machine learning algorithm and our training model.

And as you go you can continuously include new training data inside our training model to increase its efficacy and increase its success rate.

Tool implemented to do the mentioned theory: https://github.com/hassan0x/Chimera

I am a Security Threats Consultant at SecureNetworks & Founder at NineHackers, with more than 5 years of experience in cybersecurity especially offensive security specialized in penetration testing, red teaming attack simulation, and developing security tools..