Abstract
Feature engineering is the process of extracting and creating characteristics for your data. It is described as one of the most important parts of the Machine Learning (ML) process. The effectiveness of your model depends on it. Specifically, a classification model relies on robust feature engineering to find those border lines that separate the clusters of data into different groups. A forecasting model needs good features to predict the future. A language model is highly dependent on a good way of translating words, sentences, and contexts into features. In this talk, I analyze the art and science of creating good characteristics to tell the story of your data and derive robust models. I split the features into numerical and categorical and show several examples on how to encode, convert, and summarize them to derive new features. I explain embeddings and how they contribute to the “magic” of language models such as ChatGPT and LLama. This talk revolves around practical examples from public datasets in the area of cybersecurity. I present and share a notebook in order to actively participate in the process of creating features. Finally, I discuss the art of creating features based on expertise in a specific field. The goal of this presentation is to demonstrate what makes ML models “tick” and perform tasks such as predicting the future, discovering bad actors, or generating interesting answers to your questions. Moreover, we delve into “featuristic” trends in the area of feature engineering to inspire creativity and innovation within the community of data enthusiasts.
Date
Mar 14, 2024 5:30 PM — Mar 24, 2023 7:30 PM
Event
Location