[PYTHON] Data set generation

In the learning so far, I was learning from the state where the dataset was given, so I decided to create the dataset myself. I've collected images of men and women, so I created a dataset from them!

Execution environment

MacOS、Python3.6(anaconda)、VSCode

How to turn an image into a dataset

This is an article that I referred to when creating the dataset this time.

[How to create a dataset from an original image](https://intellectual-curiosity.tokyo/2019/07/02/%E3%82%AA%E3%83%AA%E3%82%B8%E3% 83% 8A% E3% 83% AB% E3% 81% AE% E7% 94% BB% E5% 83% 8F% E3% 81% 8B% E3% 82% 89% E3% 83% 87% E3% 83% BC% E3% 82% BF% E3% 82% BB% E3% 83% 83% E3% 83% 88% E3% 82% 92% E4% BD% 9C% E6% 88% 90% E3% 81% 99% E3% 82% 8B% E6% 96% B9 /)

This time I used 4 image folders. Two men and two women.

To describe the flow of creating this dataset in words ① Take out the files in the image folder one by one and grayscale them. (2) Obtain the index number from which folder you extracted when extracting. ③ Change the grayscale image to your favorite size ④ ** Store in the list in the order of [image, index number of the folder containing the image] **

Clogged point

When I wanted to retrieve the values listed in (4), I was surprised because I didn't know that if I prepared two variables in the for statement, I could retrieve the data by skipping one.

for feature, label in training_data:

In addition, I will leave it as a memo from here.

os.listdir(A)

You can display the list of files in A.

os.path.join(A, B)

You can generate a Path that combines A and B.

cv2.imread('image data')

You can convert image data into an array.

enumerate(Variable name)

When fetching with the for statement, it can be fetched with ** "index number element" **

try:
Conditional expression
except:
   pass 

** Write conditional expression in try + write pass in except = Through even if conditional expression of try comes **

Finally, I couldn't write in Japanese with matplotlib. Japanese display of matplotlib on Mac With reference to this article, I was able to write in Japanese.

import matplotlib as mpl
matplotlib.rcParams[‘font.family] = ‘AppleGothic’

If you write this code, you can write it in Japanese!

Future issues

This time, I really wanted to cut out only the images of the faces of men and women, but I didn't understand, so for the time being, I created a dataset with the same data.

So, I would like to investigate again to make a data set by including the process of cutting out only the face.

Summary

For the time being, I checked if the datasets were separated and it worked. When I made it myself in this way, I was impressed that I was creating a data set like this, and there were many places to study.

Recommended Posts

Data set generation
Data set "Moving MNIST"
Artificial data generation with numpy
Artificial data set (sine function)
Python dummy data generation (address edition)
Set comprehension
Wind-like dummy data generation in Markov process
Data handling
Data analysis before kaggle's titanic feature generation
Exam Mathematics Part 1 (Question setting and data generation)