Feature engineering art and science

Name: Feature engineering art and science
Start: 2024-03-14T17:30:00Z
End: 2023-03-24T19:30:00Z
Location: Charleston, SC

Abstract

Feature engineering is the process of extracting and creating characteristics for your data. It is described as one of the most important parts of the Machine Learning (ML) process. The effectiveness of your model depends on it. Specifically, a classification model relies on robust feature engineering to find those border lines that separate the clusters of data into different groups. A forecasting model needs good features to predict the future. A language model is highly dependent on a good way of translating words, sentences, and contexts into features. In this talk, I analyze the art and science of creating good characteristics to tell the story of your data and derive robust models. I split the features into numerical and categorical and show several examples on how to encode, convert, and summarize them to derive new features. I explain embeddings and how they contribute to the “magic” of language models such as ChatGPT and LLama. This talk revolves around practical examples from public datasets in the area of cybersecurity. I present and share a notebook in order to actively participate in the process of creating features. Finally, I discuss the art of creating features based on expertise in a specific field. The goal of this presentation is to demonstrate what makes ML models “tick” and perform tasks such as predicting the future, discovering bad actors, or generating interesting answers to your questions. Moreover, we delve into “featuristic” trends in the area of feature engineering to inspire creativity and innovation within the community of data enthusiasts.

Date

Mar 14, 2024 5:30 PM — Mar 24, 2023 7:30 PM

Event

Data Science Meetup, March 2024

Location

Charleston, SC

Xenia Mountrouidou

Senior Security Researcher

Xenia Mountrouidou is a Senior Security Researcher at Cyber adAPT with versatile experience in academia and industry. She has over 10 years of research experience in network security, machine learning, and data analytics for computer networks. She enjoys researching novel intrusion detection techniques, finding interesting patterns with machine learning algorithms, and writing Python scripts to automate boring tasks. Her research interests revolve around network security, Internet of Things, intrusion detection, and machine learning.