A few weeks ago Wallarm has launched a hackathon to create a machine learning / AI model to detect attacks among normal web requests. The competition was run on Kaggle as InClass.
In this competition, Kagglers were asked to develop models that identify injections among neutral input vectors using neural networks or other ML techniques. Wallarm has open-sourced one of the TensorFlow-based models solving this problem and made it available to the competitors as a reference architecture along with the dataset.
“I have been in information security for about seven years, but I am also interested in the field of machine learning. I have built a few models that have used known learning data sets, such as KDD, cicids 2017, UNSW-NB 15. What I wanted to do is to try my hand at real use cases — and what Wallarm has offered was just that,” said Anton Sychev, one of the competition leaders.
Fifteen teams have competed in the hackathon and produced excellent results. In fact, we were faced with an unusual situation. Two competitors each produced the same top result — so we have two first place winners and we will share the first & second place prize money equally among them.
So here are our winners:
Slava Bakunin — First place
Evgeny Kovalev — First place
Artem Sychev — Third place
Alex Golovko, CTO of Wallarm, comments: “Machine Learning is a key factor in Wallarm architecture and we believe its role in cybersecurity overall will continue to increase. We have planned this hackathon with the idea to promote the use of machine learning in web security. I am happy to see how the community has responded. Some of the NLP and recurring network models offered by the participants are really interesting.”
Here is what our winners had to say about their models and implementations:
I was intrigued by the problem presented by the competition. I decided that in this kind of machine learning project, the sequence of symbols is less important than the characters themselves and their combinations as well as general information about the request, such as the length of the request or the number of digits in the payload. My model analyzes the requests not as a sequence but as a matrix of tf-ifd of frequently used words, bi-grams and characters. That said, I think that the most interesting characteristic of my model is that it has yielded excellent results.
Although the task is NLP, convolutional layers seemed to improve the model performance significantly, so my final model included them along with recurrent layers.
Feature engineering helped me too. Including some simple features into the concatenation of Max/Avg poolings and Attention. before the dense layers, gave a nice boost in quality, and different features gave different results.
Also, I had an interesting experience with stacking. I’ve included ~10 different models but didn’t think it worked since I didn’t see any movement in the public leaderboard. I was happy to see, though, that my final result of 0.99990 AUC-ROC has taken me to the first place.
I tried implementing algorithms for analysis and text recognition (NLP), specifically, the analysis of the text sentiment and its application in determining the probability that a web request might be malicious.
In addition to the 3rd prize, our judges have reviewed the models for the elegance and applicability to production environments. This special prize of One Ethereum coin was awarded to Artem Sychev.
Based on the success of this first experience, we are considering supporting other competitions and/or hackathons. Current possibilities include:
- Best project incorporating ML-based malicious intent detection
- Best Struts exploit
- Best meta-model
- Best exploit written in FAST DSL
Would appreciate feedback from the community. Please, leave us comments and additional competition ideas.