This is the 4th in the series.
So far, NN (2nd) that outputs the mean value and standard deviation from the given random numerical data, and outputs the three parameters used to create the normal distribution waveform data from the given normal distribution waveform data. I have created NN (3rd).
This time, from the two-dimensional image data that drew a circle, the convolutional NN (convolution NN) that outputs the four parameters (x-coordinate, y-coordinate, radius, and pen line thickness of the center of the circle) used to draw the circle. CNN) is created.
The image data was created in Objective-C.
First is the creation of image data. Create an image of a 50-pixel x 50-pixel circle, get the pixel data from it, and write it to a file along with the four parameters used to draw the circle. One data consists of 2504 data separated by commas. The first 2500 are numbers from 0 to 1, and the remaining 4 are the x-coordinate, y-coordinate, radius, and line thickness of the circle center. Separate the data with a line break (\ n).
I used the NSImage class in Objective-C to create the data.
4-001.c
//Image size is 50 pixels x 50 pixels
//Random numbers determine the center coordinates, radius, and line thickness.
//Radius is 5 to 25
//The center coordinates allow the circle to be included in the image.
//The line thickness is 0.2 to 5
#import <Cocoa/Cocoa.h>
-(uint8_t *)pixelDataFromImage(NSImage *image);//main()Prototype declaration of the function used in
int main(int argc, const char * argv[]) {
srand((unsigned int)time(NULL));//Random number initialization
//Open the destination file
char *fileName = "~/imageLearningData.txt";
FILE *fp = fopen(fileName, "w");
//Create 50,000 sheets
for(int mmm = 0; mmm < 50000;mmm++){
double radius = (double)rand()/RAND_MAX*20 + 5;
double x = (double)rand()/RAND_MAX*(50 - radius * 2) + radius;
double y = (double)rand()/RAND_MAX*(50 - radius * 2) + radius;
double lineWidth = (double)rand()/RAND_MAX*4.8 + 0.2;
[NSBezierPath setDefaultLineWidth:lineWidth];
NSImage *image = [[NSImage alloc] initWithSize:NSMakeSize(50, 50)];
NSBezierPath *bezierPath = [NSBezierPath bezierPath];
//Image drawing
[image lockFocus];
[bezierPath appendBezierPathWithOvalInRect:NSMakeRect(x - radius, y - radius, radius * 2, radius * 2)];
[bezierPath stroke];
[image unlockFocus];
uint8_t *pixels = pixelDataFromImage(image);
//Output pixel data
NSSize size = [image size];
uint32_t width = (uint32_t) size.width;
uint32_t height = (uint32_t) size.height;
int components = 4;
for(int iii = 0; iii < height ;iii ++){
for(int kkk = 0; kkk < width ; kkk++){
double value = 0;
value += pixels[( width * iii + kkk )*4 ]/255.0;
value += pixels[( width * iii + kkk )*4 + 1 ]/255.0;
value += pixels[( width * iii + kkk )*4 + 2 ]/255.0;
//The following is a device to reduce the output file size. "1.Output "1" instead of "0000000".
value /= 3;
if(value == 1){
fprintf(fp,"%d,",1);
}else{
fprintf(fp,"%f,",value);
}
}
}
fprintf(fp,"%f,%f,%f,%f\n",x,y,radius,lineWidth);
free(pixels);
}
fclose(fp);
}
Regarding the method pixelDataFromImage in the above code, I modified the code of Mr. Shimapyon @ shimacpyon and created it as follows. Thank you, Mr. Shimapyon.
4-002.c
-(uint8_t *)pixelDataFromImage(NSImage *image){
/*Create an instance of NSBitmapImageRep*/
NSBitmapImageRep*bitmapRep = [NSBitmapImageRep imageRepWithData:[image TIFFRepresentation]];
/*If saving as JPEG, remove the alpha channel*/
[bitmapRep setAlpha:NO];
/*Get quality for storage*/
float quality = 1.0;
/*Create a property*/
NSDictionary* properties = [NSDictionary dictionaryWithObject:[NSNumber numberWithFloat:quality] forKey:NSImageCompressionFactor];
/*Create JPEG data*/
NSData *data = [bitmapRep representationUsingType:NSJPEGFileType properties:properties];
//Create NSImage again from NSData
NSImage *newImage = [[NSImage alloc] initWithData:data];
if (newImage != nil) {
NSSize size = [newImage size];
uint32_t width = (uint32_t) size.width, height = (uint32_t) size.height, components = 4;
uint8_t *pixels = (uint8_t *) malloc(size.width * size.height * components);//0 to 255
if (pixels) {
CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
CGContextRef bitmapContext = CGBitmapContextCreate(pixels, width, height, 8, components * width, colorSpace, kCGImageAlphaPremultipliedLast);
NSRect rect = NSMakeRect(0, 0, width, height);
NSGraphicsContext *graphicsContext = (NSGraphicsContext *) [[NSGraphicsContext currentContext] graphicsPort];
CGImageRef cgImage = [newImage CGImageForProposedRect:&rect context:graphicsContext hints:nil];
CGContextDrawImage(bitmapContext, NSRectToCGRect(rect), cgImage);
CGContextRelease(bitmapContext);
CGColorSpaceRelease(colorSpace);
return pixels;
}
}
return nil;
}
It was not possible to extract pixel data directly from NSImage, so I obtained JPEG format data (NSData) from NSImage, created NSImage again based on this, and extracted pixel data from it. There may be a smarter way, but I would like to move on.
Four created training data (2504 data x 50000 rows), that is,
Divide into.
4-003.py
import numpy as np
d = np.loadtxt('./imageLearningData.txt', delimiter=',')
#-4:Is from the 4th to the end from the back.
d_training_x = d[:40000,:-4]
d_training_y = d[:40000,-4:]
d_test_x = d[40000:,:-4]
d_test_y = d[40000:,-4:]
#Change the shape of the data
d_training_x = d_training_x.reshape(40000,50,50,1)
d_test_x = d_test_x.reshape(10000,50,50,1)
Design a CNN. We use a convolutional neural network to train a 2D image. The design of CNN was designed appropriately based on my intuition. The following points were taken into consideration.
4-004.py
import keras
from keras.models import Sequential
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPool2D
from keras.optimizers import Adam
from keras.layers.core import Dense, Activation, Dropout, Flatten
#Model definition
model = Sequential()
model.add(Conv2D(32,5,input_shape=(50,50,1)))
model.add(Activation('tanh'))
model.add(Conv2D(32,3))
model.add(Activation('relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Conv2D(64,3))
model.add(Activation('relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dense(4, activation='linear'))
adam = Adam(lr=1e-4)
model.compile(optimizer=adam, loss='mean_squared_error', metrics=["accuracy"])
model.summary()
The number of parameters is 6,722,916. It looks like it will take some time ..... Start learning.
4-005.py
batch_size = 128 #128 data is thrown together
epochs = 20
history = model.fit(d_training_x, d_training_y,
batch_size=batch_size,
epochs=20,
verbose=1,
validation_data=(d_test_x, d_test_y))
It took 107 seconds per epoch. Graph the progress of learning. loss is the loss value calculated from the training data, and val_loss is the loss value calculated from the evaluation data.
4-006.py
#Drawing a graph
import matplotlib.pyplot as plt
plt.plot(history.history['loss'],label="loss")
plt.plot(history.history['val_loss'],label="val_loss")
plt.legend() #Show legend
plt.title("Can CNN learn to predict 4 parameters used to draw a circle?")
plt.xlabel("epoch")
plt.ylabel("Loss")
plt.show()
It looks like you've learned well.
How accurate can you predict? Let's throw the first 200 data of the evaluation data into the CNN after training.
4-007.py
inp = d_test_x[:200,:]
out = d_test_y[:200,:]
pred = model.predict(inp, batch_size=1)
#Make a graph.
plt.title("Can NN deduce circle parameters?")
plt.scatter(out[:,0], pred[:,0],label = "x",marker='.', s=20,alpha=0.7)
plt.scatter(out[:,1], pred[:,1],label = "y",marker='.', s=20,color="green",alpha=0.7)
plt.scatter(out[:,2], pred[:,2],label = "r",marker='.', s=20,color="red",alpha=0.7)
plt.scatter(out[:,3], pred[:,3],label = "line width",marker='.', s=20,color="black",alpha=0.7)
plt.legend(fontsize=14) #Show legend
plt.xlabel("expected value")
plt.ylabel("prediction")
#It's hard to see, so x=The y line is omitted
#x = np.arange(-1, 41, 0.01)
#y = x
#plt.plot(x, y,color="black")
plt.show()
The horizontal axis is the value of the parameter used when creating the circle data, and the vertical axis is the value output by CNN based on the image data.
If you take the $ x = y $ line drawn from the lower left to the upper right, you have successfully output.
It's not perfect, but it seems that I've learned a lot. Will it be a little better if I change the network configuration etc.?
It is now possible to output where and how large a circle is in an image as a parameter for drawing a circle.
This is the end of the 4th series!
Series 1st Preparation Series 2nd Mean and Standard Deviation Series 3rd Normal Distribution Series 4th Yen
Recommended Posts