Best Practices for Feature Engin

2018-02-24  本文已影响47人  刘月玮

Feature engineering, the process creating new input features for machine learning, is one of the most effective ways to improve predictive models.

Coming up with features is difficult, time-consuming, requires expert knowledge. “Applied machine learning” is basically feature engineering. ~ Andrew Ng

Through feature engineering, you can isolate key information, highlight patterns, and bring in domain expertise.

Unsurprisingly, it can be easy to get stuck because feature engineering is so open-ended.

In this guide, we’ll discuss 20 best practices and heuristics that will help you navigate feature engineering.

What is Feature Engineering?

Feature engineering is an informal topic, and there are many possible definitions. The machine learning workflow is fluid and iterative, so there’s no one “right answer.”

We explain our approach in more detail in our free 7-day email crash course.

In a nutshell, we define feature engineering as creating new features from your existing ones to improve model performance.

A typical data science process might look like this:

  1. Project Scoping / Data Collection
  2. Exploratory Analysis
  3. Data Cleaning
  4. Feature Engineering
  5. Model Training (including cross-validation to tune hyper-parameters)
  6. Project Delivery / Insights

What is Not Feature Engineering?

That means there are certain steps we do not consider to be feature engineering:

Again, this is simply our categorization. Reasonable data scientists may disagree, and that’s perfectly fine.

With those disclaimers out of the way, let’s dive into the best practices and heuristics!

Indicator Variables

The first type of feature engineering involves using indicator variables to isolate key information.

Now, some of you may be wondering, “shouldn’t a good algorithm learn the key information on its own?”

Well, not always. It depends on the amount of data you have and the strength of competing signals. You can help your algorithm “focus” on what’s important by highlighting it beforehand.

Interaction Features

The next type of feature engineering involves highlighting interactions between two or more features.

Have you ever heard the phrase, “the sum is greater than the parts?” Well, some features can be combined to provide more information than they would as individuals.

Specifically, look for opportunities to take the sum, difference, product, or quotient of multiple features.

*Note: We don’t recommend using an automated loop to create interactions for all your features. This leads to “feature explosion.”

Feature Representation

This next type of feature engineering is simple yet impactful. It’s called feature representation.

Your data won’t always come in the ideal format. You should consider if you’d gain information by representing the same feature in a different way.

External Data

An underused type of feature engineering is bringing in external data. This can lead to some of the biggest breakthroughs in performance.

For example, one way quantitative hedge funds perform research is by layering together different streams of financial data.

Many machine learning problems can benefit from bringing in external data. Here are some examples:

Error Analysis (Post-Modeling)

The final type of feature engineering we’ll cover falls under a process called error analysis. This is performed after training your first model.

Error analysis is a broad term that refers to analyzing the misclassified or high error observations from your model and deciding on your next steps for improvement.

Possible next steps include collecting more data, splitting the problem apart, or engineering new features that address the errors. To use error analysis for feature engineering, you’ll need to understand why your model missed its mark.

Here’s how:

Conclusion

As you see, there are many possibilities for feature engineering. We’ve covered 20 best practices and heuristics, but they are by no means exhaustive!

Remember these general guidelines as you start to experiment on your own:

Good features to engineer…

Finally, don’t worry if this feels overwhelming right now! You’ll naturally get better at feature engineering through practice and experience.

In fact, if this is your first exposure to some of these tactics, we highly recommend picking up a dataset and solidifying what you’ve learned. Here are some more resources that can help you in your journey:

Have any questions about feature engineering? Did we miss one of your favorite heuristics? Let us know in the comments!

来源:https://elitedatascience.com/feature-engineering-best-practices

上一篇下一篇

猜你喜欢

热点阅读