Introduction

This is the 4th in the series.

So far, NN (2nd) that outputs the mean value and standard deviation from the given random numerical data, and outputs the three parameters used to create the normal distribution waveform data from the given normal distribution waveform data. I have created NN (3rd).

This time, from the two-dimensional image data that drew a circle, the convolutional NN (convolution NN) that outputs the four parameters (x-coordinate, y-coordinate, radius, and pen line thickness of the center of the circle) used to draw the circle. CNN) is created.

The image data was created in Objective-C.

Creation of image data

First is the creation of image data. Create an image of a 50-pixel x 50-pixel circle, get the pixel data from it, and write it to a file along with the four parameters used to draw the circle. One data consists of 2504 data separated by commas. The first 2500 are numbers from 0 to 1, and the remaining 4 are the x-coordinate, y-coordinate, radius, and line thickness of the circle center. Separate the data with a line break (\ n).

I used the NSImage class in Objective-C to create the data.

`4-001.c`


//Image size is 50 pixels x 50 pixels
//Random numbers determine the center coordinates, radius, and line thickness.
//Radius is 5 to 25
//The center coordinates allow the circle to be included in the image.
//The line thickness is 0.2 to 5
#import <Cocoa/Cocoa.h>
-(uint8_t *)pixelDataFromImage(NSImage *image);//main()Prototype declaration of the function used in

int main(int argc, const char * argv[]) {
    srand((unsigned int)time(NULL));//Random number initialization
    //Open the destination file
    char *fileName = "~/imageLearningData.txt";
    FILE *fp = fopen(fileName, "w");

    //Create 50,000 sheets
    for(int mmm = 0; mmm < 50000;mmm++){
        double radius = (double)rand()/RAND_MAX*20 + 5;
        double x = (double)rand()/RAND_MAX*(50 - radius * 2) + radius;
        double y = (double)rand()/RAND_MAX*(50 - radius * 2) + radius;
        double lineWidth = (double)rand()/RAND_MAX*4.8 + 0.2;
        [NSBezierPath setDefaultLineWidth:lineWidth];
    
        NSImage *image = [[NSImage alloc] initWithSize:NSMakeSize(50, 50)];    
        NSBezierPath *bezierPath = [NSBezierPath bezierPath];
        //Image drawing
        [image lockFocus];
        [bezierPath appendBezierPathWithOvalInRect:NSMakeRect(x - radius, y - radius, radius * 2, radius * 2)];
        [bezierPath stroke];
        [image unlockFocus];
    
        uint8_t *pixels = pixelDataFromImage(image);
    
        //Output pixel data
        NSSize size = [image size];
        uint32_t width = (uint32_t) size.width;
        uint32_t height = (uint32_t) size.height;
        int components = 4;
        for(int iii = 0; iii < height ;iii ++){
            for(int kkk = 0; kkk < width ; kkk++){
                double value = 0;
                value += pixels[( width * iii + kkk )*4     ]/255.0;
                value += pixels[( width * iii + kkk )*4 + 1 ]/255.0;
                value += pixels[( width * iii + kkk )*4 + 2 ]/255.0;
                //The following is a device to reduce the output file size. "1.Output "1" instead of "0000000".
                value /= 3;
                if(value == 1){
                    fprintf(fp,"%d,",1);
                }else{
                    fprintf(fp,"%f,",value);
                }
            }
        }
    
        fprintf(fp,"%f,%f,%f,%f\n",x,y,radius,lineWidth);
        free(pixels);
    }
    fclose(fp);
}

Regarding the method pixelDataFromImage in the above code, I modified the code of Mr. Shimapyon @ shimacpyon and created it as follows. Thank you, Mr. Shimapyon.

`4-002.c`


-(uint8_t *)pixelDataFromImage(NSImage *image){
    /*Create an instance of NSBitmapImageRep*/
    NSBitmapImageRep*bitmapRep = [NSBitmapImageRep imageRepWithData:[image TIFFRepresentation]];
    /*If saving as JPEG, remove the alpha channel*/
    [bitmapRep setAlpha:NO];
    /*Get quality for storage*/
    float quality = 1.0;
    /*Create a property*/
    NSDictionary* properties = [NSDictionary dictionaryWithObject:[NSNumber numberWithFloat:quality] forKey:NSImageCompressionFactor];
    /*Create JPEG data*/
    NSData *data = [bitmapRep representationUsingType:NSJPEGFileType properties:properties];

    //Create NSImage again from NSData
    NSImage *newImage = [[NSImage alloc] initWithData:data];

    if (newImage != nil) {
        NSSize size = [newImage size];
        uint32_t width = (uint32_t) size.width, height = (uint32_t) size.height, components = 4;
        uint8_t *pixels = (uint8_t *) malloc(size.width * size.height * components);//0 to 255
        if (pixels) {
            CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
            CGContextRef bitmapContext = CGBitmapContextCreate(pixels, width, height, 8, components * width, colorSpace, kCGImageAlphaPremultipliedLast);
            NSRect rect = NSMakeRect(0, 0, width, height);
            NSGraphicsContext *graphicsContext = (NSGraphicsContext *) [[NSGraphicsContext currentContext] graphicsPort];
            CGImageRef cgImage = [newImage CGImageForProposedRect:&rect context:graphicsContext hints:nil];
            CGContextDrawImage(bitmapContext, NSRectToCGRect(rect), cgImage);
            CGContextRelease(bitmapContext);
            CGColorSpaceRelease(colorSpace);
            return pixels;
        }
    }
    return nil;
}

It was not possible to extract pixel data directly from NSImage, so I obtained JPEG format data (NSData) from NSImage, created NSImage again based on this, and extracted pixel data from it. There may be a smarter way, but I would like to move on.

Division of training data

Four created training data (2504 data x 50000 rows), that is,

Input data for training (2500 data x 40000 lines),
Correct answer data for training (4 data x 40000 lines),
Input data for evaluation (2500 data x 10000 lines),
Correct answer data for evaluation (4 data x 10000 lines),

Divide into.

`4-003.py`


import numpy as np
d = np.loadtxt('./imageLearningData.txt', delimiter=',')
#-4:Is from the 4th to the end from the back.
d_training_x = d[:40000,:-4]
d_training_y = d[:40000,-4:]
d_test_x = d[40000:,:-4]
d_test_y = d[40000:,-4:]

#Change the shape of the data
d_training_x = d_training_x.reshape(40000,50,50,1)
d_test_x = d_test_x.reshape(10000,50,50,1)

CNN design

Design a CNN. We use a convolutional neural network to train a 2D image. The design of CNN was designed appropriately based on my intuition. The following points were taken into consideration.

Place some convolution layers.
Reduce the number of data in the Max pooling layer.
Combine several fully connected layers and gradually reduce the number of data to 4.
The loss function uses the sum of squares error.
The input shape of the first layer is (50,50,1).
The number of outputs in the last layer is 4.

`4-004.py`


import keras
from keras.models import Sequential
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPool2D
from keras.optimizers import Adam
from keras.layers.core import Dense, Activation, Dropout, Flatten

#Model definition
model = Sequential()
model.add(Conv2D(32,5,input_shape=(50,50,1)))
model.add(Activation('tanh'))
model.add(Conv2D(32,3))
model.add(Activation('relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Conv2D(64,3))
model.add(Activation('relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dense(4, activation='linear'))

adam = Adam(lr=1e-4)

model.compile(optimizer=adam, loss='mean_squared_error', metrics=["accuracy"])
model.summary()

Start learning

The number of parameters is 6,722,916. It looks like it will take some time ..... Start learning.

`4-005.py`


batch_size = 128  #128 data is thrown together
epochs = 20

history = model.fit(d_training_x, d_training_y,
batch_size=batch_size,
epochs=20,
verbose=1,
validation_data=(d_test_x, d_test_y))

It took 107 seconds per epoch. Graph the progress of learning. loss is the loss value calculated from the training data, and val_loss is the loss value calculated from the evaluation data.

`4-006.py`


#Drawing a graph
import matplotlib.pyplot as plt
plt.plot(history.history['loss'],label="loss")
plt.plot(history.history['val_loss'],label="val_loss")
plt.legend() #Show legend
plt.title("Can CNN learn to predict 4 parameters used to draw a circle?")
plt.xlabel("epoch")
plt.ylabel("Loss")
plt.show()

It looks like you've learned well.

CNN rating

How accurate can you predict? Let's throw the first 200 data of the evaluation data into the CNN after training.

`4-007.py`


inp = d_test_x[:200,:]
out = d_test_y[:200,:]
pred = model.predict(inp, batch_size=1)

#Make a graph.
plt.title("Can NN deduce circle parameters?")

plt.scatter(out[:,0], pred[:,0],label = "x",marker='.', s=20,alpha=0.7)
plt.scatter(out[:,1], pred[:,1],label = "y",marker='.', s=20,color="green",alpha=0.7)
plt.scatter(out[:,2], pred[:,2],label = "r",marker='.', s=20,color="red",alpha=0.7)
plt.scatter(out[:,3], pred[:,3],label = "line width",marker='.', s=20,color="black",alpha=0.7)

plt.legend(fontsize=14) #Show legend
plt.xlabel("expected value")
plt.ylabel("prediction")
#It's hard to see, so x=The y line is omitted
#x = np.arange(-1, 41, 0.01)  
#y = x
#plt.plot(x, y,color="black")
plt.show()

The horizontal axis is the value of the parameter used when creating the circle data, and the vertical axis is the value output by CNN based on the image data.

If you take the $ x = y $ line drawn from the lower left to the upper right, you have successfully output.

It's not perfect, but it seems that I've learned a lot. Will it be a little better if I change the network configuration etc.?

Summary

It is now possible to output where and how large a circle is in an image as a parameter for drawing a circle.

This is the end of the 4th series!

Series 1st Preparation Series 2nd Mean and Standard Deviation Series 3rd Normal Distribution Series 4th Yen

[PYTHON] 4. Circle parameters with neural network!

Introduction

Creation of image data

4-001.c

4-002.c