[PYTHON] Generate physically sturdy shapes with GAN and print with a 3D printer

0. The nice thing about this article

--You can see the flow of printing the data generated by deep learning with a 3D printer (10. In the experimental code, ** all code is uploaded to GitHub **. If you create an environment by git clone, the same thing can be done It should be easy. How to create an environment is written in here. ――It will be studying strength of materials (I also write while studying ...) --Understand how TensorFlow 2.0 tensor manipulation is applied to strength of materials

1. Overview

--DCGAN (Radford et al., 2016) [^ DCGAN] is used, and intensity information is added to the loss function to generate high-intensity numbers. --Intensity can be manipulated while maintaining the concept of data --Strength can be increased or decreased by designing loss or changing parameters. --Discovered a trade-off between FID (Heusel et al., 2017) [^ FID](how much the original data diversity and shape are maintained) and strength of the generated image. --I found the possibility of application to medical images and the possibility of improving GAN itself. --The generated cross section was printed with a 3D printer. --Name it Danmen-GAN (because it's hard to call without a name)

(The article is long, but I think you can read it immediately because it is about 3/4 of the image or a bonus.)

2. Background

I recently got a 3D printer, so when I was wondering if I could do something with a combination of ** 3D printer x deep learning **, I decided to give it a try.

It seems that the basic theory of strength of materials has not changed in the last 100 years, so I wanted to approach from the deep learning side, which has a high degree of freedom.

You may enjoy reading which number of 0-9 is expected to be sturdy. There is one kind of number that is by far the most sturdy.

(* I'm doing it while reading the book on strength of materials (Japan Society of Mechanical Engineers, 2007) [^ JSME], so I may be saying something appropriate. If there is a mistake, I'll correct it. I couldn't find a similar study in the above range, but please let me know.)

[^ JSME]: JSME Text Series Strength of Materials, Japan Society of Mechanical Engineers, 2007.

3. Theory

If you're not interested, you can skip it at all.

3.1 Moment of inertia of area

In the first place, the rough word strength has multiple indicators. I don't know how the material store uses the word strength, but we, the general public, use the word strength in many ways.

Mohs hardness, Vickers hardness, yield stress (tensile strength), etc ... Among them, the dominant moment of area is ** the moment of inertia of area **, which has the strongest control and is determined only by the shape of the cross section without depending on the material. This is a factor of how hard the member is to bend.

When force is applied, most things bend subtly in proportion to the force at an invisible level. And if you bend too much, most things will break and you won't be able to return. Conversely, a member with a stronger cross section is less likely to break.

For example, newspapers can't support anything if they're fluffy, but just rolling them makes them reasonably stiff. (Although there is a factor that increases the thickness of the paper)

The Japan Society of Mechanical Engineers, in [^ JSME] Chapter 5, p63, states, "When a slender rod receives a lateral load that causes bending in a plane including the axis of the rod from the lateral direction, such a rod is called a beam. It is defined as "."

↑ MNIST [1] 形状をした断面と断面二次モーメントと向きの関係、3Dプリントのイメージ

If you fix both sides of the elongated rod in the shape of "2" in this image, it becomes a "beam". At this time, for the direction of the arrow, the moment of inertia of area for the force from the vertical is $ I_x $, and the moment of inertia of area for the force from the horizontal is $ I_y $ **.

It's a bit confusing because the orientations of $ x $ and $ y $ are transposed to the image processing axes. Wikipedia senior [^ wikipedia] also said that, and this is probably the standard.

Next, we will explain how to calculate the moment of inertia of area.

The moment of inertia of area $ I_x $ with respect to the force from the vertical axis is

I_x = \int_A y^2 dA \tag{1}

It can be calculated with. Where $ A $ is the cross-sectional area and $ y $ is the vertical distance from the neutral axis (the center of gravity of the cross-section).

Similarly, the moment of inertia of area $ I_y $ with respect to the force from the horizontal axis is

I_y = \int_A x^2 dA \tag{2}

It can be calculated with. Where $ x $ represents the horizontal distance from the neutral axis.

There is also an index called the moment of inertia of area, which indicates the strength of the cross section against "torsion".

I_r = I_x + I_y \tag{3}

Can be calculated. Since it is a great deal, I will introduce this strength to the experiment. "Torsion" is easy to understand if you imagine the rotational force applied when squeezing a rag.

The unit of all these three indicators is m ^ 4. This is because the square of the difference in distance [m ^ 2] is integrated by the area [m ^ 2].

Specifically, the moment of inertia of area in the case of a simple rectangular cross section is

-$ I_x $ is (thickness cubed) $ \ times $ (width squared) -$ I_y $ is (thickness squared) $ \ times $ (width cubed)

You can intuitively see that it is proportional to.

So what happens in this case?

First, for ease of understanding, visualize the moment of inertia of area of an image with all pixel values = 1.0 (maximum value).

image \Delta I_x \Delta I_y


I_x=1.000 (MAX)
I_y=1.000 (MAX)

Where $ \ Delta I_x $ is the effect on the moment of inertia of area $ I_x $ per unit pixel, and $ \ Delta I_y $ is the effect on the moment of inertia of area $ I_y $ per pixel. By summing these up, you can get $ I_x $ and $ I_y $.

Also, this time, we will separate it from the metric unit and set this square as the maximum moment of inertia of area $ 1.0 $.

Let's display a similar figure with some data from MNIST.

(For values <0.0 <pixel value <1.0, the approximation is that the pixel value and area are proportional. In other words, if the pixel value is 0.5, there is no cross section in half of the pixel, and if it is 0.1, it is assumed that 90% is missing. * This will be a different approximation at the time of final printing. )

image \Delta I_x \Delta I_y


I_x=0.112
I_y=0.093

I_x=0.060 I_y=0.036

"0" is high for both $ I_x $ and $ I_y $, but you can see that the strength in the vertical direction is particularly strong. On the contrary, "1" is low for both $ I_x $ and $ I_y $, and there is almost no lateral strength in particular.

When calculating the moment of inertia of area, the strength has a stronger influence as the distance from the neutral axis increases, and the influence decreases as the distance from the neutral axis increases. Considering these, the fact that most of the structures in the world are hollow (the contents are squishy) is largely affected by the moment of inertia of area.

From the above, typical cross sections with high strength while suppressing the material and weight correspond to hollow circles / quadrangles and H-type / I-type (only the strength of one axis is strong). This time, there are no particular restrictions such as the weight of the material, so the toughest shape is the ** "square" ** where the pixel values of the image are filled.

So how do you keep the shape of the numbers while maintaining their strength?

3.2 Generative Adversarial Nets (Goodfellow et al., 2014)[2]

The basic Generative Adversarial Nets (= GAN) uses two models of neural networks, one is a Generator that outputs a distribution close to the data, the other is a Generator, or it is generated for data input. It consists of a Discriminator that determines whether it is a thing or not.

The more accurate the Discriminator's judgment, the more the backpropagation method will modify the Generator's parameters and allow the Generator to generate a more data-like distribution. On the other hand, the inaccuracy of Discriminator's judgment is a penalty of Discriminator itself, and the parameters are modified to improve the accuracy of judgment.

These come from the formulation [^ GAN] of equation (4) by Goodfellow et al.

\min _{G} \max _{D} V(D, G)=\mathbb{E}_{\boldsymbol{x} \sim p_{\text {data }}(\boldsymbol{x})}[\log D(\boldsymbol{x})]+\mathbb{E}_{\boldsymbol{z} \sim p_{\boldsymbol{z}}(\boldsymbol{z})}[\log (1-D(G(\boldsymbol{z})))] \tag{4}

To explain it in a nutshell, if you apply this to the image data of numbers, the Generator side will proceed with learning to generate a number-like image, and if the learning is successful, it will even generate numbers that do not exist in the dataset. is.

In addition, various methods have been devised for GAN theory and learning methods, but this time it is based on the classic DCGAN [^ DCGAN]. DCGAN is, in a nutshell, a paper that applies the basic methods of GAN to convolutional neural networks.

3.3 Danmen-GAN

Now that we've finished explaining the basic technology, let's write the logic for creating sturdy numbers.

In normal GAN learning, the loss on the Generator is Eq. (5). This means that the Discriminator receives the result of the Generator generation when noise is input, and the loss increases at the rate at which the Discriminator answers correctly.

\mathcal{L}_{G} = \mathbb{E} [\log (1 - D(G(z))] \tag{5}

However, with this alone, learning proceeds so that numbers are randomly generated, so the function $ S (\ cdot) $ that calculates the moment of inertia of area from the cross-sectional shape of the input image is incorporated, and the generated cross-sectional image Create a new loss $ \ mathcal {L} _ {S} $ so that the lower the moment of inertia of area, the more penalized the Generator.

\mathcal{L}_{S} = \mathbb{E} \left[\|1 - S(G(z))\|_{2}\right] \tag{6}

Also, if we introduce the axes of vertical x, horizontal y, and twist r into $ S (\ cdot) $ and weight them with the parameters α, β, and γ, Eq. (6) becomes Eq. (7). Become.

\mathcal{L}_{S} = \alpha \cdot\mathbb{E} \left[|1 - S_x(G(z))|_2\right] + \beta \cdot\mathbb{E} \left[|1 - S_y(G(z))|_2\right] + \gamma \cdot\mathbb{E} \left[|1 - S_r(G(z))|_2\right]

Finally, Eq. (7) is added to the loss on the Generator side of GAN to make Eq. (8), which is complete.

\mathcal{L}_{All} = \mathcal{L}_{G} + \mathcal{L}_{S} \tag{8}

The logic is that by using Eq. (8) as the objective function of the Generator, the moment of inertia of area can be increased while maintaining the shape of the numbers in the cross section.

4. Data analysis

The dataset is MNIST [^ MNIST]. This time, we only need the source of the generated data, so we will use only the training data.

First, we want to know how strong the inside of the dataset is, so we'll calculate it.

4.1 Global strength

Average moment of inertia of area for each axis of the entire dataset $ \ mathbb {E} \ left [I_x \ right] $, $ \ mathbb {E} \ left [I_y \ right] $, $ \ mathbb {E} \ left I asked for [I_r \ right] $. We also calculated the standard deviations of the moment of inertia of area $ \ sigma_ {Ix} $, $ \ sigma_ {Iy} $, $ \ sigma_ {Ir} $ for each statistic.

E[I_x] (\sigma_{Ix}) E[I_y] (\sigma_{Iy}) E[I_r] (\sigma_{Ir})
0.088(0.032) 0.063(0.034) 0.076(0.031)

Looking at this table, I feel that the moment of inertia of area varies depending on the data. You can also see that the strength in the vertical direction is greater than that in the horizontal direction as a whole.

4.2 Strength for each number

Next, considering that the shape is biased for each number, the statistical values of the moment of inertia of area for each number are summarized. This is represented by a set of box plot and table. No processing is performed for abnormal values.

Number(n) E[I_{xn}] (\sigma_{Ixn}) E[I_{yn}] (\sigma_{Iyn}) $E[I_{rn}] $ (\sigma_{Irn})
0 0.121(0.029) 0.110(0.034) 0.116(0.030)
1 0.052(0.015) 0.020(0.014) 0.036(0.013)
2 0.107(0.031) 0.078(0.027) 0.093(0.027)
3 0.106(0.028) 0.066(0.026) 0.086(0.026)
4 0.064(0.018) 0.063(0.026) 0.064(0.021)
5 0.093(0.031) 0.065(0.024) 0.079(0.026)
6 0.083(0.022) 0.065(0.028) 0.074(0.024)
7 0.079(0.021) 0.053(0.022) 0.066(0.020)
8 0.105(0.027) 0.067(0.027) 0.086(0.026)
9 0.074(0.019) 0.054(0.024) 0.064(0.021)

From these, we can see the following.

-Basically, "0" is strong for both $ I_x $ and $ I_y $ --$ I_y $ of "1" is overwhelmingly low ――When it comes to subtleties, "2", "3" and "8" are reasonably strong. --Looking at the average value, there is a bias in the moment of inertia of area depending on the number, but (other than 1) there is a large variation depending on the image, so if you do your best, you may be able to beat "0".

4.3 Strongest and weakest

I checked the number with the strongest strength and the number with the weakest strength with ʻargmax / ʻargmin.

The strongest is naturally "0", but

allmax.png

Isn't the pen too wide ... By the way, $ I_x = 0.259 $, $ I_y = 0.244 $, $ I_r = 0.251 $, which are the top in all fields. The person who wrote this is proud.

Next, I will introduce the weakest of $ I_x $.

minIxa.png

It's "2", but it's crushed. I think it is also the most crushed number in terms of calculation. Looks weak. ($ I_x = 0.013 $)

And the weakest of $ I_y $

minIy.png

It's too thin. If you apply force from the side, it will break. ($ I_y = 0.0015 $ ← I lowered the digit because it is too small)

Finally, the weakest of $ I_r $

minIx.png

At first glance, it is similar to the previous "1", but it is slightly tilted. ($ I_r = 0.010 $) Machine learning datasets can be unique.

5. Experiment

Let's experiment.

We built a theoretical model in TensorFlow 2.0 and learned GAN by applying the loss function of Eq. (8) to Generator. I added the model to the Appendix as a bonus. I've put all the code on GitHub.

--All experiments run on Colaboratory (K80) --Calculation time is about one hour per learning (mostly FID calculation time) --Batch size: 50 --Number of epochs: 20 (24,000 iterations in total) --Compare FID with 5,000 sheets --No data expansion

Version etc.

5.1 Various parameters and generation results

5.1.1 Danmen-GAN

The minimum FID is the minimum FID value reached during its learning. (The smaller the FID, the better.) In addition, the comparison target of FID is the generated image and the training data.

GAN-I_x-to-1.0 alpha (2).png

The graph shows the changes in $ I_x $, $ FID $, and $ \ frac {I_x} {FID} $ when α is changed (β = γ = 0). From this graph, it becomes clear that instead of getting the moment of inertia of area, we are sacrificing the FID.

Changes in the output image (β = γ = 0). The lower you go, the stronger the vertical strength is.

\alpha output(Final epoch) Minimum FID maximumI_x maximumI_y maximumI_r
\alpha=0 (Normal GAN) _vanillaGAN_24000_image.png 36.0 0.109 0.090 0.083
0.1 _I_x01_24000_image.png 36.9 0.106 0.082 0.095
1.0 _I_x1_24000_image.png 32.8 0.103 0.077 0.090
5.0 _I_x5_24000_image.png 59.5 0.126 0.097 0.111
10.0 _I_x10_24000_image.png 69.4 0.145 0.116 0.130
25.0 _I_x25_24000_image.png 96.0 0.193 0.160 0.176
50.0 _I_x50_24000_image.png 135.8 0.249 0.212 0.230
75.0 _I_x+75_24000_image.png 180.4 0.317 0.278 0.297
100.0 _I_x100_24000_image.png 208.7 0.374 0.354 0.364

After all "0" is strong. I think it's because of the bias in the dataset.

Even without loss, a section with a geometrical moment of inertia that is 20% higher than the average $ I_x $ is generated, so it seems that the strength of the normal GAN itself is biased in the first place. This may be due to the basic behavior of GAN, such as being prone to noise.

Going to the $ \ alpha = 50.0 $ level, it seems that the strength of the outlier class of the original data can be generated on average, so I feel that it may be applicable to data expansion that generates a small number of data with GAN * *. For example, for medical diagnostic images, there are nuclear magnetic resonance T1 / T2-weighted images, but at T1, the tumor seems to look a little white [^ T1], so by applying a loss that increases the pixel value of the image. , It is possible to create a generated image with an extremely large tumor. After that, the properties of the brain and the like change on the outside (lateral sulcus and cerebral cortex) and inside (hippocampus and corpus callosum) of the cross section, so the tendency of the disease should probably change. By doing this, we thought that it might be possible to generate images with a shape close to the disease by expanding the GAN data according to the target disease. (I don't know anything about medical treatment, so I might say something appropriate ...)

Example of $ I_y $, $ I_r $

Parameters output(Final epoch) Minimum FID maximumI_x maximumI_y maximumI_r
\beta=25.0, \alpha=\gamma=0 _I_y+25_24000_image.png 122.2 0.180 0.178 0.179
\beta=75.0, \alpha=\gamma=0 _I_y+75_24000_image.png 160.8 0.267 0.284 0.275
\gamma=25.0, \alpha=\beta=0 _I_r+25_24000_image.png 113.2 0.181 0.165 0.173
\gamma=75.0, \alpha=\beta=0 _I_r+75_24000_image.png 170.5 0.285 0.284 0.285

These also have a strong "0". Other numbers are also appearing properly, so it seems that it will not be a mode collapse.

5.1.2 Weak numbers

By applying the theory of Danmen-GAN, the moment of inertia of area of interest can be set to 0.0 and the strength can be reduced. This can be expressed by equation (9)

\mathcal{L}_{S} = \mathbb{E} \left[\| S(G(z))\|_{2}\right] \tag{9}

GAN-I_x-to-0.0 alpha (2).png

This graph shows the changes in $ I_x $ and FID, $ \ frac {1} {I_x \ times FID} $ (β = γ = 0) when $ \ alpha $ is changed. Reducing the moment of inertia of area also comes at the expense of FID. This is because the deviation from the distribution of the original data is the same as increasing.

Changes in output image ($ \ beta = \ gamma = 0 $)

\alpha output(Final epoch) Minimum FID minimumI_x minimumI_y minimumI_r
0.1 _I_x-01_24000_image.png 38.6 0.093 0.058 0.076
1.0 _I_x-1_24000_image.png 40.5 0.091 0.058 0.073
5.0 _I_x-5_24000_image.png 35.3 0.084 0.050 0.067
10.0 _I_x-10_24000_image.png 36.4 0.086 0.053 0.070
25.0 _I_x-25_24000_image.png 30.0 0.069 0.042 0.056
50.0 _I_x-50_24000_image.png 41.5 0.062 0.033 0.048
100.0 _I_x-100_24000_image.png 48.6 0.055 0.026 0.040
500.0 _I_x-500_24000_image.png 112.4 0.043 0.013 0.028

After all, there is a tendency for many "1" s, which are originally low in strength, to appear. Moreover, I feel slender.

The scale of the parameter has changed from when the moment of inertia of area is strengthened, and it seems that this is less affected by applying stronger force (probably due to the ratio of pixels).

Example of $ I_y $, $ I_r $

Parameters output(Final epoch) Minimum FID minimumI_x minimumI_y minimumI_r
\beta=25.0, \alpha=\gamma=0 _I_y+25_24000_image.png 32.0 0.079 0.047 0.063
\beta=500.0, \alpha=\gamma=0 _I_y-500_24000_image.png 136.3 0.049 0.015 0.032
\gamma=25.0, \alpha=\beta=0 _I_r+25_24000_image.png 30.2 0.074 0.047 0.061
\gamma=500.0, \alpha=\beta=0 _I_r-500_24000_image.png 139.7 0.046 0.013 0.030

This is a possibility, but I think that if you put a loss on the one that lowers the intensity, ** the noise will be suppressed and the FID will be improved **. In particular, the calculation of the moment of inertia of area has the property of applying a strong loss to the edge of the screen, and CNN is doing suspicious things such as padding in the processing of the edge of the screen, so there is a possibility that the compatibility is in mesh. There is. In particular, the moment of inertia of area $ I_r $ may be effective because it affects the entire edge of the screen.

This may be just a coincidence, so I'll personally dig deeper.

5.2 Moment of inertia of area vs FID

We confirmed how much the rise of FID was suppressed with respect to the elongation of the moment of inertia of area. $ \ frac {I_x} {FID} $ is getting the largest one in training. The FID explodes in response to a slight increase in the moment of inertia of area, so it seems that the distribution of the original data deviates from the distribution of the original data instead of gaining strength.

\alpha \frac{I_x}{FID}
\alpha=0.0 0.00250
\alpha=0.1 0.00250
\alpha=1.0 0.00277
\alpha=5.0 0.00207
\alpha=10.0 0.00220
\alpha=25.0 0.00180
\alpha=50.0 0.00150
\alpha=75.0 0.00130
\alpha=100.0 0.00120
ones 0.00257

ones is the cross section when everything is 1.0. The FID at this time was 394.5. In this case, the performance of $ \ frac {I_x} {FID} $ will be better if all output is 1.0 or the output is straightforward. Conversely, if you create a layer that calculates the FID and design $ \ frac {I_x} {FID} $ as a loss, it seems interesting that different results will come out again. (I was able to create a FID layer [^ fidlayer], but I stopped it because the part for finding the square root of the matrix became a bottleneck in the calculation. In principle, it is possible.)

In the end, we succeeded in generating a cross section that exceeds the strongest of the dataset, but we have not yet confirmed the generation of a cross section that is weaker than the weakest (although we may be able to make a stronger loss).

6. Printing with a 3D printer

It's a big deal, so let's print it with a 3D printer and check the strength.

I used a model called Ender-3, which is 20,000 to 30,000 on mail order sites.

6.1 Run TensorFlow 2.0 in Blender and automatically generate numeric polygons

On the Blender side, automatic generation is performed like this. The image below is the first "0" in the MNIST data.

1.png

The part corresponding to the pixel is automatically generated with ʻadd_cube`. (Because the method is quite rough, it seems that the 3D shop will get angry)

The image below is an "8" -like number generated by Danmen-GAN with a penalty of $ \ alpha = 75 $. A module called bpy allows you to manipulate Blender's API in Python, so Python on Blender loads the trained model of TensorFlow and generates it on the fly. Very convenient.

2.png

Pixel values below 0.25 are 0, pixel values between 0.25 and 0.75 are randomly perforated to 3/4 of the area (slightly perforated to reduce strength), and pixel values above 0.75 Is a heuristic with 1.0 to handle ambiguous pixels for the time being.

Parts that are not related to strength are removed manually. (To improve printing efficiency)

3.png

The following is the cross section after preparation.

I made something like a fragment under the number with the image of a pedestal for fixing the members. This is done by inverting the pixels of the number (1.0 --img), rounding it up withnp.ceil (), applying scikit-image shrinking (skimage.morphology.binary_erosion), adjusting, and It is a set with the for statement.

Below is the screen of a software called Cura that generates Gcode (3D printer nozzle control command) for 3D printers.

Originally I made it with a total length of 60 mm, but since the printing time was displayed as 3-4 hours, I reduced it by 40% and printed it. (Since Cura can perform basic model operations such as scaling, Blender does not have to worry about scale.)

6.2 Print result

The left side is "0" and the right side is "8". (I noticed here that the base of "8" is separated on the left and right.)

IMG_4978.JPG

IMG_1209.JPG

IMG_6556.JPG

(I noticed that "8" became stable when I turned it over.)

It is like this.

6.3 Endurance

Let's do an endurance test as well. I didn't make a slight movement in the half-hearted stress test, so I finally put my weight on it because there was no other way.

"0" in the dataset

ezgif-2-632e2eb25ad3-compressor.gif

Generated numbers

ezgif-2-aaaf95b89279-compressor.gif

Both have broken ... (I regret that if I put weight on both at the same time, I might have found the stronger one. For the time being, the GAN generation result is based on a random seed, so it is easy to reproduce. )

When I checked the cross section, both were broken along the laminated surface of the filament of the 3D printer.

7. Conclusion

Conclusion

--Succeeded in improving the theoretical cross-sectional performance by adding the moment of inertia of area to the loss of GAN. ――By applying this principle in reverse, it is possible to design a cross section with low strength. --There is a trade-off with FID whether the moment of inertia of area is increased or decreased. --Finally, confirm that you can print with a 3D printer via Blender

New hypothesis obtained in the experiment

--I thought that it could be used as a GAN for data expansion used for Few-Show Learning by aiming at rare data using the loss of the moment of inertia of area (Would you like to do this together?) --We hypothesized that the FID would be improved by applying a slight loss that would lower the moment of inertia of area (also d).

8. Conclusion

This time, I tried it with MNIST to make it easier to understand and to ease the complexity of learning GAN, but I think that it can be theoretically applied to other data.

Also, since these are biased, I think that it is necessary to condition them with Conditional GAN etc. in order to obtain the desired numbers while manipulating the intensity. As an application, it should be logically possible to give strength information to the conditioning part of the Conditional GAN and generate a cross section according to the strength.

Finally, by applying these, using FEM (= finite element method), calculating the balance of structural forces, and adding loss, we think that it can also be applied to three-dimensional structures. (Please let me know if you already have it)

I think it goes well with "StackGAN font alchemy [^ nardtree]" that nardtree was doing.

9. Acknowledgments

Mr. Shichiya (@ sitiya78): The 3D printer was personally given to me by Amazon's wish list. Thank you very much for this section. I cannot thank you enough.

10. Experimental code

Here: I'll put everything at https://github.com/p-geon/DanmenGAN.

--Code for Blender: https://github.com/p-geon/DanmenGAN/blob/master/Blender/blender_mesh_generator.py --DanmenGAN body: https://github.com/p-geon/DanmenGAN/blob/master/Colaboratory/DanmenGAN.ipynb

--Statistical calculation: https://github.com/p-geon/DanmenGAN/tree/master/calcstats --Various images, weights, learning curves, scores, etc .: https://github.com/p-geon/DanmenGAN/tree/master/ExperimentalResults

Appendix

Bonus

A. About the layer structure of TensorFlow

Describes the layer structure of TensorFlow 2.0. Roughly, I will explain three things, Generator / Discriminator / Generator & Discriminator. For enthusiasts.

A-1. Generator

(Click the image to see the details.)

Generator is roughly a graph that generates an image ( Generator </ font>), a normalized graph ( Normalize </ font>), and a graph that calculates the density ( Normalize </ font>). Density </ font>), Graph for finding the moment of inertia of area $ I_x $ ( Ix </ font>), Finding the moment of inertia of area $ I_y $ It can be divided into a graph ( Iy </ font>) and a graph $ I_r $ ( Ir </ font>) for calculating the moment of inertia of area.

Generator: Image generation graph ~ Normalization graph

Below is the code from the image generator to normalization of Generator.

The basics are the same as normal GAN. Also, there is a lot of information about GAN on the net, so I will omit it here.

smoa is a class for calculating the moment of inertia of area, and the density calculation ~ the calculation of the moment of inertia of area is performed inside this class.

def build_generator(params, smoa):
    # Noise
    z = z_in = tf.keras.layers.Input(shape=(params.NOISE_DIM, ), name="noise")

    # (NOISE_DIM, ) -> (1024, )
    x = tf.keras.layers.Dense(1024)(z)
    x = tf.keras.layers.LeakyReLU(alpha=0.2)(x)
    x = tf.keras.layers.BatchNormalization(momentum=0.8)(x)

    # (1024, ) -> (7*7*64, ) -> (7, 7, 64)
    x = tf.keras.layers.Dense(7*7*64)(z)
    x = tf.keras.layers.LeakyReLU(alpha=0.2)(x)
    x = tf.keras.layers.BatchNormalization(momentum=0.8)(x)
    x = tf.keras.layers.Reshape(target_shape=(7, 7, 64))(x)

    # (7, 7, 64) -> (14, 14, 32)
    x = tf.keras.layers.Conv2DTranspose(32, kernel_size=(5, 5)
        , padding='same', strides=(2, 2), use_bias=False, activation=None)(x)
    x = tf.keras.layers.BatchNormalization(momentum=0.8)(x)
    x = tf.keras.layers.LeakyReLU(alpha=0.2)(x)

    # (14, 14, 128) -> (28, 28, 1)
    x = tf.keras.layers.Conv2DTranspose(1, kernel_size=(5, 5)
        , padding='same', strides=(2, 2), use_bias=False, activation=None)(x)
    img = tf.math.tanh(x)
    y = tf.keras.layers.Lambda(lambda x: x, name="generated_image")(img) #Since img is used later, change the variable name to y

    """
Calculation of moment of inertia of area(It becomes a graph like ResNet)
    """
    # range: [-1.0, 1.0] -> [0.0, 1.0]
    img = (img + 1.0)/2.0
    I_x, I_y, I_r = smoa.calc_second_moment_of_area(img)

    return tf.keras.Model(inputs=z_in, outputs=[y, I_x, I_y, I_r])

Generator: Density calculation graph ~ Moment of inertia calculation graph

The following is a graph construction method that obtains the moment of inertia of area only by tensor calculation.

First, calculate the constants in the calculation graph first, and prepare the tensor first using tf.constant () as a class variable.

Use self.arange_x, self.arange_y, self.distance_matrix_x, self.distance_matrix_y, self.norm_I_x, self.norm_I_y.

As a description of variables,

--self.arange_x / self.arange_y: Simple ordered vectors --self.distance_matrix_x / self.distance_matrix_y: Tensor representing the distance from the axis --self.norm_I_x / self.norm_y: Maximum moment of inertia of area (scalar) for normalization

It will be.

class SecondMomentOfArea:
    def __init__(self, img_shape=(28, 28)):
        distance_vector_x = np.asarray([0.5+d for d in range(img_shape[1])])
        distance_matrix_x = np.tile(distance_vector_x, (img_shape[0], 1))
        distance_matrix_y = distance_matrix_x.T
        """
Normalization matrix
        """
        matrix_for_norm_I_x = np.tile(np.abs(arange_y - img_shape[0]/2.0), (img_shape[1], 1)).T
        norm_I_x = np.sum(matrix_for_norm_I_x)

        matrix_for_norm_I_y = np.tile(np.abs(arange_x - img_shape[1]/2.0), (img_shape[0], 1)).T
        norm_I_y = np.sum(matrix_for_norm_I_y)

        """
        to TFconstant
        """
        self.arange_x = tf.constant(arange_x, dtype=tf.float32) # (28, )
        self.arange_y = tf.constant(arange_y, dtype=tf.float32) # (28,)
        self.distance_matrix_x = tf.constant(distance_matrix_x[np.newaxis, :, :, np.newaxis], dtype=tf.float32) # (1, 28, 28, 1)
        self.distance_matrix_y = tf.constant(distance_matrix_y[np.newaxis, :, :, np.newaxis], dtype=tf.float32) #(1, 28, 28, 1)
        self.norm_I_x = tf.constant(norm_I_x, dtype=tf.float32) #()
        self.norm_I_y = tf.constant(norm_I_y, dtype=tf.float32) #()

The distance_matrix is normalized, the [0,:,:, 0] is cut out, and the figure is as follows.

distance_matrix_x distance_matrix_y

I will write the continuation of the previous class.

In order to calculate the moment of inertia of area, it is first necessary to calculate the center of gravity (neutral axis) of the cross section. Then calculate the density (total of all pixels / number of pixels in the image) to calculate the neutral axis.

First, multiply the previous distance_matrix and the pixel value of the image by each element to obtain the moment. Next, the moment is corrected using the density, and when the moments are even, the neutral axis is in the center of the image.

After calculating the neutral axis, create a tensor that represents the distance to the neutral axis in the order of subtraction → absolute value → deformation → tiling → adding axis.

After that, calculate the tensor representing the distance and the image by multiplication for each element, calculate the total, and normalize it to complete the calculation of the moment of inertia of area.

The calculation of the moment of inertia of area $ I_r $ is normalized so that the maximum is 1.0 by adding $ I_x $ and $ I_y $ as defined.

tf.keras.layers.Lambda (lambda x: x) (・) does nothing, but is written to improve the visibility of the layers.

    def calc_second_moment_of_area(self, img): # (None, 28, 28, 1)
        """
Calculation of neutral axis
        """
        density = (tf.reduce_sum(img, axis=[1, 2], keepdims=True)/(img.shape[1]*img.shape[2]))
        # (1, 28, 28, 1) x (None, 28, 28, 1) -> (None, 28, 28, 1)
        x_moment = tf.math.divide_no_nan(tf.math.multiply(self.distance_matrix_x, img), density)
        y_moment = tf.math.divide_no_nan(tf.math.multiply(self.distance_matrix_y, img), density)

        # (None, 28, 28, 1) -> (None, )
        neutral_axis_x = tf.math.reduce_mean(x_moment, axis=[1, 2])
        neutral_axis_y = tf.math.reduce_mean(y_moment, axis=[1, 2])

        """
Moment of inertia of area(Vertical)
        I_x = ∫_A y^2 dA
        """
        # sub: (None, 28, ) - (None, ) -> abs: (None, 28)
        dy = tf.math.abs(self.arange_y - neutral_axis_y)
        # (None, 28) -> (None, 1, 28)
        dy = tf.reshape(dy, shape=[-1, img.shape[1], 1])
        # (None, 1, 28) -> (None, 28, 28)
        matrix_x = tf.tile(dy, multiples=[1, 1, img.shape[2]])
        # (None, 28, 28) -> (None, 28, 28, 1)
        matrix_x = tf.expand_dims(matrix_x, 3)
        # (None, 28, 28, 1)x(None, 28, 28, 1) -> (None, 28, 28, 1) -> (None,)
        I_x = tf.math.reduce_sum(tf.math.multiply(matrix_x, img), axis=[1, 2])/self.norm_I_x

        """
Moment of inertia of area(side)
        I_y = ∫_A x^2 dA
        """
        # sub: (None, 28, ) - (None, ) -> abs: (None, 28)
        dx = tf.math.abs(self.arange_x - neutral_axis_x)
        # (None, 28) -> (None, 28, 1)
        dx = tf.reshape(dx, shape=[-1, 1, img.shape[2]])
        # (None, 1, 28) -> (None, 28, 28)
        matrix_y = tf.tile(dx, multiples=[1, img.shape[1], 1])
        # (None, 28, 28) -> (None, 28, 28, 1)
        matrix_y = tf.expand_dims(matrix_y, 3)
        # (None, 28, 28, 1)x(None, 28, 28, 1) -> (None, 28, 28, 1) -> (None,)
        I_y = tf.math.reduce_sum(tf.math.multiply(matrix_y, img), axis=[1, 2])/self.norm_I_y
        """
Moment of inertia of area(2 for normalization.Divide by 0)
        """
        I_r = (I_x + I_y)/2.0
        """
        Lambda
        """
        I_x = tf.keras.layers.Lambda(lambda x: x, name="I_x")(I_x)
        I_y = tf.keras.layers.Lambda(lambda x: x, name="I_y")(I_y)
        I_r = tf.keras.layers.Lambda(lambda x: x, name="I_z")(I_r)

        return I_x, I_y, I_r

When generating on the Blender side, there is no need to calculate the moment of inertia of area, so you can output three tensors of (None,) with an appropriate function. I processed it with tf.reduce_sum (img).

A-2. Discriminator

Discriminator is no different from a regular GAN. It's a classic DCGAN style.

A-3. Generator & Discriminator

We also create a graph that combines Generator and Disctiminator to train GAN.

The input is noise z only, and the output is the predicted probabilities p and ʻI_x, ʻI_y, ʻI_r` output by Discriminator.

The three types of moment of inertia of area can be calculated and the coefficient can be adjusted when the loss is applied, so normal GAN can be learned and the moment of inertia of area can be strengthened. .. If you want to weaken the moment of inertia of area, you can change the target value of $ I $ from 1.0 to 0.0.

Reference material

Mainly my notes I wrote to make this

--When using tf.print (), the contents of the tensor cannot be displayed in f-string: https://qiita.com/HyperPigeon/items/007c5adca9a4e78bc6d1 --First aid when Nan appears in tf.linalg.sqrtm () of TensorFlow 2.0 Frechet Inception Distance (FID) calculation: https://qiita.com/HyperPigeon/items/f3f20f480269e2594724 --AttributeError:'dict' object has no attribute'name' when using tf.keras.utils.plot_model () in TensorFlow 2.0 and its solution: https://qiita.com/HyperPigeon/items/fb22b555e76b52b3d688 --Solution when Colaboratory (Jupyter Notebook) session crashes with tensorflow_addons (tfa.image.rotate): https://qiita.com/HyperPigeon/items/94831b8a9af75527b67b --Dimensions (meters, etc.) notation in Blender 2.8 and later: https://qiita.com/HyperPigeon/items/c5d2ec3264e8fd14d167 --Install TensorFlow 2.0 (CPU) with Blender 2.8.2, HelloWorld (Windows10): https://qiita.com/HyperPigeon/items/e6c37dc143039b75d0e4


  1. LeCun, Yann and Cortes, Corinna. MNIST handwritten digit database, 2010. ↩︎

  2. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair,Aaron Courville, and Yoshua Bengio. Generative Adversarial Networks. In NIPS, 2014. ↩︎