Graphs are excellent for visual learners. However, sometimes you just need to “show the numbers.” Exploratory Data Analysis (EDA) using statistics offers a compact summary of your data, providing a clear narrative without requiring extensive coding.
While it may lack the allure of machine learning models that predict, classify, or “talk,” data exploration is a crucial step that rewards the time you invest in it. By analyzing your data using graphical and mathematical techniques, you can significantly enhance feature engineering, model selection, and model tuning.
We have reviewed several classes of data, but one of the most important—and challenging—types remains: network traffic data. Its significance stems from the fact that many attacks originate from network-based vectors.
Data collection, preparation, and cleaning take up to 80%–90% of time for an ML project. Therefore, it is not surprising that we need to cover it on multiple blogs. In the first part, we talked mainly about theory: how we collect data from our systems and how we store it.
You have probably heard this saying many times before: “Garbage in, garbage out”. This is a big deal in machine learning projects and even more in ML cybersecurity projects because the model can only be as good as the data you input to it.
At the beginning of our journey to ML and cybersecurity, we need to lay the foundations for a good development environment that fits our needs. This is the tale of three different environments: Google Colaboratory (Colab), Jupyter Server launched from your terminal, and VS Code.
Machine Learning (ML) and Cybersecurity, is this a match made in heaven or yet another tech hype? Indeed, this is an interesting combination worth exploring for the security professional who wants to solve their problems in a non-traditional manner or for the data scientist who wants to be involved in a significant impact area.
This is the third part of the telemetry stack introduction that introduces basic concepts of an alerting engine and how to implement these with Prometheus AlertManager. You can read this post in Introduction to a Telemetry Stack - Part 3
The third part of 'Intro to Pandas' discusses about Exploratory Data Analysis (EDA) of black box pcap data using the Pandas library. You can read this post in Intro to Pandas (Part 3) - Forecasting the network
The seconf part of 'Intro to Pandas' discusses about Exploratory Data Analysis (EDA) of black box pcap data using the Pandas library. You can read this post in Intro to Pandas (Part 2) - Exploratory data analysis for network traffic