Feature engineering art and science

Abstract

Feature engineering is the process of extracting and creating characteristics for your data. It is described as one of the most important parts of the Machine Learning (ML) process. The effectiveness of your model depends on it. Specifically, a classification model relies on robust feature engineering to find those border lines that separate the clusters of data into different groups. A forecasting model needs good features to predict the future. A language model is highly dependent on a good way of translating words, sentences, and contexts into features. In this talk, I analyze the art and science of creating good characteristics to tell the story of your data and derive robust models. I split the features into numerical and categorical and show several examples on how to encode, convert, and summarize them to derive new features. I explain embeddings and how they contribute to the “magic” of language models such as ChatGPT and LLama. This talk revolves around practical examples from public datasets in the area of cybersecurity. I present and share a notebook in order to actively participate in the process of creating features. Finally, I discuss the art of creating features based on expertise in a specific field. The goal of this presentation is to demonstrate what makes ML models “tick” and perform tasks such as predicting the future, discovering bad actors, or generating interesting answers to your questions. Moreover, we delve into “featuristic” trends in the area of feature engineering to inspire creativity and innovation within the community of data enthusiasts.

Date
Mar 14, 2024 5:30 PM — Mar 24, 2023 7:30 PM
Location
Charleston, SC
Xenia Mountrouidou
Xenia Mountrouidou
Senior Security Researcher

Xenia Mountrouidou is a Senior Security Researcher at Cyber adAPT with versatile experience in academia and industry. She has over 10 years of research experience in network security, machine learning, and data analytics for computer networks. She enjoys writing Python scripts to automate boring things, finding interesting patterns with machine learning algorithms, and researching novel intrusion detection techniques. Her research interests revolve around network security, Internet of Things, intrusion detection, and machine learning.