Feature engineering take place in the machine learning pipeline and consists of transforming and enriching the data to create more effective predictive models. It might include creating new features or adjusting existent ones in a way to prepare them to work in machine learning algorithms. Below is an abridged exploration: *
Domain Knowledge: While sometimes new, external features may be engineered, expert knowledge can be used to aid the process. For example, economic knowledge could help figure out factors influencing client behavior in financial services.
Interaction Features: Sometimes, the interaction between input variables is more predictive than the variables themselves. For example, the use of latitude and longitude to create specific location identifiers for further geographic analysis enhances predictive power.
Polynomial Features: The features are then created from the powers and interactions between existents, and it helps when linear relationships are not enough.
Bins and Binning: When continuous data is transformed into categorical, the modeled trends often become clearer. *
Normalization and Standardization: Scaling features using models where they have means of 0 and variance of 1 ensures that every feature has an equal impact on the prediction.
Missing Values: These might either spoil the prediction or act as a measure of randomness. Techniques range from filling in the median value to predictive imputation .
Categorical Encoding: Algorithms typically cannot handle categorical data, and their conversion to numeric format is facilitated by encoding. These processes are scientific and sometimes partly intuitive but often require extensive trial iterations.
The main objective is to make the data better serve the task of establishing relationships and patterns.
Leave a comment