[PYTHON] Format DataFrame data with Pytorch into a form that can be trained with NN

Overview

When training or verifying data with a neural network, it is necessary to shape the data into the form (data group, label). MNIST and CIFAR10 have this shape in advance, but when using the data you have, you need to make this shape yourself. However, it took some time to understand, so I will save it here.

In addition, there were people who answered in English about the basic method, so if you can speak English, please see the following series of exchanges. Convert Pandas dataframe to PyTorch tensor?

This Qiita supplements the context and explains in a little more detail, so please read only where necessary.

Separate your data into data and labels

Pull the label from the DataFrame (label column name: target)

train_label = torch.tensor(train['target'].values)

Quote: Convert Pandas dataframe to PyTorch tensor?

Pull data groups from labeled DataFrame

train_data = torch.tensor(X.drop('target', axis = 1).values)

Quote: Convert Pandas dataframe to PyTorch tensor?

However, this can result in the following error:

Error


TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, int64, int32, int16, int8, uint8, and bool.

As mentioned above, if the type of np.ndarray is object, it cannot be converted to torch.tensor. An object is a data type that is a mixture of multiple data types. Therefore, the data must be converted to one of the tensor's corresponding data types above. For example, let's unify it to the float32 type. It works fine if you rewrite it as follows.

train_data = torch.tensor(X.drop('target', axis = 1).values.astype(np.float32)) 

Quote: Convert Pandas dataframe to PyTorch tensor?

Use the data type suitable for your data. For data types, see NumPy data type dtype list and conversion by astype (cast).

Convert data frame of data group only to tensor

Or if there is an unlabeled DataFrame (X_train) from the beginning, convert it to np.array and then to tensor.

train_data = torch.tensor(np.array(X_train.astype('f')))

Format data into (data group, label)

from torch.utils import data
train_tensor = data.TensorDataset(train_data, train_label) 

The torch.utils.data.TensorDataset class takes tensors as an argument and puts the data together by indexing the tensors according to the first dimension. So, of course, you'll get an error if you don't have the number of data and labels to train. Reference: pytorch official tutorial

that's all.

Recommended Posts

Format DataFrame data with Pytorch into a form that can be trained with NN
Convert images from FlyCapture SDK to a form that can be used with openCV
Format summary of formats that can be serialized with gensim
Let's make a diagram that can be clicked with IPython
Convert mesh data exported from SpriteUV2 to a format that can be imported by Spine
Can I be a data scientist?
[Python] Make a graph that can be moved around with Plotly
Make a Spinbox that can be displayed in Binary with Tkinter
I made a shuffle that can be reset (reverted) with Python
Make a currency chart that can be moved around with Plotly (2)
Make a Spinbox that can be displayed in HEX with Tkinter
Make a currency chart that can be moved around with Plotly (1)
[Python] Draw elevation data on a sphere with Plotly and draw a globe that can be rotated round and round
About the matter that torch summary can be really used when building a model with Pytorch
Create a web app that can be easily visualized with Plotly Dash
Draw a graph that can be moved around with HoloViews and Bokeh
A memo for making a figure that can be posted to a journal with matplotlib
I made a familiar function that can be used in statistics with Python
File types that can be used with Go
List packages that can be updated with pip
Make holiday data into a data frame with pandas
Train MNIST data with a neural network in PyTorch
Features that can be extracted from time series data
Color list that can be set with tkinter (memorial)
A server that echoes data POSTed with flask / python
Limits that can be analyzed at once with MeCab
A Python program that converts ical data into text
A memo that reads data from dashDB with Python & Spark
It seems that Skeleton Tracking can be done with RealSense
Convert RGB and HSV in a differentiable form with PyTorch
Build a Docker environment that can use PyTorch and JupyterLab
Create a web API that can deliver images with Django
Implement a thread that can be paused by exploiting yield
I investigated the pretreatment that can be done with PyCaret
I made a plug-in that can "Daruma-san fell" with Minecraft
Convert GRIB2 format weather data that cannot be opened with pygrib to netCDF and visualize it
Understand the probabilities and statistics that can be used for progress management with a python program
[Python] A program that finds the maximum number of toys that can be purchased with your money
A story and its implementation that arbitrary a1 * a2 data can be represented by a 3-layer ReLU neural network with a1 and a2 intermediate neurons with an error of 0.