[PYTHON] Feature engineering for machine learning starting with the 4th Google Colaboratory --Interaction features

Introduction

This article explains the interaction features. This article is mainly based on "Features Engineering for Machine Learning". Please check it out if you become.

What is an interaction feature?

It is a method of creating a new feature by multiplying multiple features. Of these, the combination of the two features is called ** pairwise interaction features **. If the feature quantity is binary, it is a logical product. For example, if there are regions and age groups as features, by multiplying the regions and age groups, information that can better express the objective variable "20s living in Tokyo" from the information of "20s" and "living in Tokyo" can be obtained. You can create it.

However, the disadvantages are that the learning cost increases and unnecessary features are created. This increase in learning cost and the problem of unnecessary features can be solved by selecting features.

For example, suppose you have the following feature data.

image.png

When the interaction features were created for this data, the following data set was created.

image.png

Below is a sample code that actually implements the interaction features.

import numpy as np
import pandas as pd
import sklearn.preprocessing as preproc

##Random number fixed
np.random.seed(100)

data_array1 = []
for i in range(1, 100):
  s = np.random.randint(0, i * 10, 10)
  data_array1.extend(s)


##Random number fixed
np.random.seed(20)

data_array2 = []
for i in range(1, 100):
  s = np.random.randint(0, i * 10, 10)
  data_array2.extend(s)

data = pd.DataFrame({'A': data_array1, 'B': data_array2})

##Interaction features
data2 = pd.DataFrame(preproc.PolynomialFeatures(include_bias=False).fit_transform(data))
## interaction_only=You can remove the square of your own value by setting it to True.
# data2 = preproc.PolynomialFeatures(include_bias=False, interaction_only=True).fit_transform(data)

Finally

I'm thinking of posting a video about IT on YouTube. Please like, subscribe to the channel, and give us a high rating, as it will motivate youtube and Qiita updates. YouTube: https://www.youtube.com/channel/UCywlrxt0nEdJGYtDBPW-peg Twitter: https://twitter.com/tatelabo

Recommended Posts

Feature engineering for machine learning starting with the 4th Google Colaboratory --Interaction features
Feature Engineering for Machine Learning Beginning with Part 3 Google Colaboratory-Scaling
5th Feature Engineering for Machine Learning-Feature Selection
OpenCV feature detection with Google Colaboratory
[Shakyo] Encounter with Python for machine learning
Machine learning with Pytorch on Google Colab
Customize the progress display during learning with tf.keras (Google Colaboratory cell overflow countermeasures)
Align the number of samples between classes of data for machine learning with Python
Machine learning starting with Python Personal memorandum Part2
Machine learning starting from 0 for theoretical physics students # 1
Machine learning starting with Python Personal memorandum Part1
Upgrade the Azure Machine Learning SDK for Python
[Python] Collect images with Icrawler for machine learning [1000 images]
Machine learning starting from scratch (machine learning learned with Kaggle)
Machine learning starting from 0 for theoretical physics students # 2
The first step of machine learning ~ For those who want to implement with python ~
[Python] Save PDF from Google Colaboratory to Google Drive! -Let's collect data for machine learning-
Learning notes for the migrations feature in the Django framework (2)
Predict the gender of Twitter users with machine learning
Learning notes for the migrations feature in the Django framework (3)
Financial Forecasting Feature Engineering: What are the features in financial forecasting?
Learning notes for the migrations feature in the Django framework (1)
Summary of the basic flow of machine learning with Python
Record of the first machine learning challenge with Keras
Easy learning of 100 language processing knock 2020 with "Google Colaboratory"
For those who want to start machine learning with TensorFlow2
For those of you who glance at the log while learning with machine learning ~ Muscle training with LightGBM ~