Y-aware Feature Engineering with High Cardinality Features (Part 2 of 4)

Summary: This is the second in a 4-part series where Anders Larson and Shea Parkes discuss predictive analytics with high cardinality features. In this episode they focus on y-aware feature engineering. Y-aware feature engineering is all about carefully bleeding information from your training response back into your engineered features without grossly misrepresenting your ability to generalize to new data.

Feb 01, 2023

20 minutes to listen

Topics:: Modeling techniques; Regression analysis; Data mining; Credibility theory

Most real-world data available for use in predictive modeling is not purely numeric data. There are often columns/features of categorical data (e.g. product or customer identifiers, zip codes). Sometimes this categorical data has many unique values. When that happens, it is called a high cardinality feature. There can be a lot of strong signal in high cardinality features, but it can also be very tricky to work with them.

This is the second in a 4-part series where Anders Larson and Shea Parkes discuss predictive analytics with high cardinality features. In this episode they focus on y-aware feature engineering. Y-aware feature engineering is all about carefully bleeding information from your training response back into your engineered features without grossly misrepresenting your ability to generalize to new data.

Emerging Topics

Y-aware Feature Engineering with High Cardinality Features (Part 2 of 4)