Introduction to AI creation with Python! Part 2 I tried to predict the house price in Boston with a neural network

About this article

Using "regression", which is the basis of deep learning, I would like to make a program to predict the house price. I will write from a beginner's point of view as much as possible. The previous article is here .

What is a return?

Regression is the task of predicting __numerical values based on characteristic data. This time we will create a program to predict house prices, but it is also possible to predict price movements of stocks and FX (foreign exchange trading).

import

Import is ↓. This time let's import pandas to check the data.

By the way, pandas is very easy to handle, but very slow. It is common to use numpy for learning and pandas for visual confirmation and data preprocessing.

from tensorflow.keras.datasets import boston_housing
from tensorflow.keras.layers import Activation, Dense, Dropout
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.optimizers import Adam
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Home Price Download

This time, we will use a library called boston_housing to predict house prices. boston_housing contains characteristic information and the correct label for deciding on a home in Boston, USA. Characteristic information (hereinafter referred to as explanatory variables) includes the crime rate and accessibility of the area.

Obviously, if this explanatory variable contains sloppy information, the prediction accuracy will be poor. For example, even if you add the number of pachinko parlors in the vicinity to the explanatory variable Only the pachinker feels value, so it will interfere with the prediction. We have to add something that everyone feels worth.

The most difficult part of regression prediction is the definition of this explanatory variable. This time it is easy because it is included in the downloaded data.

The download is the code below. The downloaded explanatory variables divide the correct label into learning and verification. (train_data, train_labels) is for training and (test_data, test_labels) is for verification. This area is the same as the classification of the previous article.

(train_data, train_labels), (test_data, test_labels) = boston_housing.load_data()

Check the number of shapes. There are 404 learning data and 102 verification data. Compared to the classification of the previous article, it's a lot less. It is the number of cases that makes me uncertain whether I can really predict it.

Data preprocessing

Next is the simplest and most important task, data preprocessing. This home price forecast is not time series data, so there is no continuity. When predicting such data, it is safer to shuffle the data.

Use random.random () to create a random number, and np.argsort () to create an index and sort.

order = np.argsort(np.random.random(train_labels.shape))
train_data = train_data[order]
train_labels = train_labels[order]

Next, normalize the explanatory variables that determine the house price. This time, normalization is used to set the explanatory variable to a value with a variance of 1 with an average of 0.

In regression prediction, the prediction may be pulled by a large number of explanatory variables. It is said that it is good to normalize in this way.

Normalization can be calculated by subtracting the mean from the data you want to normalize and dividing by the standard deviation. The code is ↓.

mean = train_data.mean(axis=0)
std = train_data.std(axis=0)
train_data = (train_data - mean) / std
test_data = (test_data - mean) / std

Use pandas to make sure the explanatory variables are normalized.

#Checking the data after preprocessing the dataset
column_names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE',  'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']
df = pd.DataFrame(train_data, columns=column_names)
df.head()

Creating a model

Create a model of the neural network. This time, we will prepare three layers of total total connection.

model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(13,)))
model.add(Dense(64, activation='relu'))
model.add(Dense(1))

model.compile(loss='mse', optimizer=Adam(lr=0.001), metrics=['mae']) #compile

I will explain each one. First, create a sequential model with model = Sequential ().

model.add (Dense (64, activation ='relu', input_shape = (13,))) is the input layer. Dense: Total binding, Number of units: 64, Activation function: ReLU, Explanatory variables: 13

model.add (Dense (64, activation ='relu')) is a hidden layer. Dense: Total binding, Number of units: 64, Activation function: ReLU

model.add (Dense (1)) is the output layer. This time it is a numerical prediction, so there is only one unit (number of outputs).

Compiling the model

Compile with model.compile (loss ='mse', optimizer = Adam (lr = 0.001), metrics = ['mae']). With loss ='mse', set the loss function __ to find the __ error between the predicted value and the actual value. Let's set mse which is said to be suitable for regression.

With optimizer = Adam (lr = 0.001), set Adam to the optimization function __ to reduce the error, and set the learning rate to 0.01.

With metrics = ['mae'], set the mae of the evaluation function __ to evaluate the performance of the __ model.

Learning

This time I would like to learn using Early Stopping. With EarlyStopping, if you don't see any improvement in learning with the specified number of epochs, it will stop automatically. This time, if there is no improvement in 20 epochs, I would like to stop.

For learning, set the maximum number of epochs to 500, set the verification data to 20%, and enable Early Stopping with callbacks = [early_stop].

#Prepare for Early Stopping
early_stop = EarlyStopping(monitor='val_loss', patience=30)

#Execution of learning
history = model.fit(train_data, train_labels, epochs=500, 
    validation_split=0.2, callbacks=[early_stop])

This is an explanation of the learning situation. __loss is the training data error __. The closer it is to 0, the better the result. __mae is the average absolute error of the training data __. The closer it is to 0, the better the result. __val_loss is the error in the validation data __. The closer it is to 0, the better the result. __val_mae is the mean absolute error __ of the validation data. The closer it is to 0, the better the result.

I have set the number of epochs to 500, but I think that it has been discontinued because no improvement can be seen on the way.

Display learning results in a graph

Draw history.history where the learning result is saved with matplotlib.

plt.plot(history.history['mae'], label='train mae')
plt.plot(history.history['val_mae'], label='val mae')
plt.xlabel('epoch')
plt.ylabel('mae [1000$]')
plt.legend(loc='best')
plt.ylim([0,5])
plt.show()

Evaluation of learning

Evaluate the training data with model.evalute.

test_loss, test_mae = model.evaluate(test_data, test_labels)
print('loss:{:.3f}\nmae: {:.3f}'.format(test_loss, test_mae))

The result is worse than the training data, but the result is almost the same. It's amazing that even with a small number of cases, about 400, we can get a number close to this. Perhaps the definition of the explanatory variable is excellent.

inference

Finally, let's output the forecast data and check it. Display the correct label first, then infer. Since the output result of inference is two-dimensional, let's convert it to one-dimensional with flatten ().

#Display correct label
print(np.round(test_labels[0:10]))

#Display of inferred price
test_predictions = model.predict(test_data[0:10]).flatten()
print(np.round(test_predictions))
It seems that you can get a number close to the correct label.

Properties below this forecast may be selling cheaper than the market price. However, it may be cheaper for reasons that cannot be expressed by explanatory variables (such as ghosts appearing). It is dangerous to buy or sell based on this forecast result alone, but I think it will be helpful.

Recommended Posts

Introduction to AI creation with Python! Part 2 I tried to predict the house price in Boston with a neural network
Introduction to AI creation with Python! Part 3 I tried to classify and predict images with a convolutional neural network (CNN)
Introduction to AI creation with Python! Part 1 I tried to classify and predict what the numbers are from the handwritten number images.
A super introduction to Django by Python beginners! Part 6 I tried to implement the login function
I also tried to imitate the function monad and State monad with a generator in Python
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
A story that didn't work when I tried to log in with the Python requests module
I tried to predict next year with AI
I tried to predict the price of ETF
A super introduction to Django by Python beginners! Part 3 I tried using the template file inheritance function
A super introduction to Django by Python beginners! Part 2 I tried using the convenient functions of the template
I tried to refactor the template code posted in "Getting images from Flickr API with Python" (Part 2)
I tried to predict the horses that will be in the top 3 with LightGBM
I tried to graph the packages installed in Python
I tried to touch the CSV file with Python
I tried to draw a route map with Python
I tried to solve the soma cube with python
I tried to implement a pseudo pachislot in Python
[Python & SQLite] I tried to analyze the expected value of a race with horses in the 1x win range ①
I want to work with a robot in python.
I tried to automatically generate a password with Python3
I tried to solve the problem with Python Vol.1
I tried to make creative art with AI! I programmed a novelty! (Paper: Creative Adversarial Network)
I tried to find the entropy of the image with python
I tried to simulate how the infection spreads with Python
I tried to implement a one-dimensional cellular automaton in Python
I tried "How to get a method decorated in Python"
I tried to implement the mail sending function in Python
I tried to predict Boston real estate prices with PyCaret
I tried to make a stopwatch using tkinter in python
I tried to divide the file into folders with Python
I made a class to get the analysis result by MeCab in ndarray with python
I tried to create a Python script to get the value of a cell in Microsoft Excel
I tried to predict the genre of music from the song title on the Recurrent Neural Network
I tried to make a function to judge whether the major stock exchanges in the world are daylight saving time with python
[5th] I tried to make a certain authenticator-like tool with python
I tried to solve the ant book beginner's edition with python
[2nd] I tried to make a certain authenticator-like tool with python
[Introduction to Python] How to split a character string with the split function
[3rd] I tried to make a certain authenticator-like tool with python
I tried to create a list of prime numbers with python
I tried to process the image in "sketch style" with OpenCV
[Introduction to Python] How to use the in operator in a for statement?
I tried to implement a misunderstood prisoner's dilemma game in Python
I tried to make a periodical process with Selenium and Python
I tried to make a 2channel post notification application with Python
I wanted to solve the ABC164 A ~ D problem with Python
I tried to make a todo application using bottle with python
[4th] I tried to make a certain authenticator-like tool with python
[1st] I tried to make a certain authenticator-like tool with python
I tried to improve the efficiency of daily work with Python
I tried to predict the number of domestically infected people of the new corona with a mathematical model
I tried to find out the difference between A + = B and A = A + B in Python, so make a note
I tried "Implementing a genetic algorithm (GA) in python to solve the traveling salesman problem (TSP)"
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
Python: I tried to make a flat / flat_map just right with a generator
[Python] I tried to summarize the set type (set) in an easy-to-understand manner.
I tried to refer to the fun rock-paper-scissors poi for beginners with Python
I tried to implement permutation in Python
I tried to communicate with a remote server by Socket communication with Python.
[Introduction to Udemy Python3 + Application] 47. Process the dictionary with a for statement