Deep learning can provide extremely high accuracy in specific fields, and its application range is expanding steadily. However, such deep learning also has its weaknesses. One of the biggest problems is that I don't know what the judgment is based on. Deep learning has the characteristic that it is good at learning the features themselves in the data during the learning process. As a result, it is sometimes said that "people do not need to extract features", but conversely, it is up to the network to decide what features to extract. As the name implies, the extracted features are latent in the weight in the deep network, and it is extremely difficult to extract "something" learned from it in a form that humans can understand.

Example: Answer from the above figure that visualizes the weight of the network on what basis this network judges the cat as a cat (time limit: 3 minutes) * image from CS231n Visualizations .io / neural-networks-3 /) *
"I should be learning something, but I'm not sure what I'm learning."
This is an interpretive issue in deep learning. However, research on this point has been progressing in recent years, and several methods have been proposed to clarify the basis for the judgment. In this document, I would like to introduce these methods.
There is also a theory that "if the accuracy is high, that's fine", so I will first introduce the merits of having explanatory power with some examples.
The following is a study that visualizes pixels, which are important clues for making predictions in image classification.
 SMOOTHGRAD//SmoothGrad:removingnoisebyaddingnoise
SMOOTHGRAD//SmoothGrad:removingnoisebyaddingnoise
The left is the input image, and the black and white image on the right is a visualization of the important parts for judgment (the upper one is the existing method, the lower one is the proposed method). If applied to a model for pathological diagnosis, it will be possible to clarify where in the image such as an X-ray photograph was focused on. It's hard to say, "You're sick. I don't know why, but with a high probability," but sharing the process is an important process in order to be convinced of the decisions made. .. Deep learning has been very successful in the field of imaging, and these techniques are an effective way to understand the decision-making process.
Next, I would like to introduce a study that not only detects whether or not there is a crisis situation (such as suicide) from the statements on SNS, but also provides the basis for the detection.
 Detecting and Explaining Crisis
Detecting and Explaining Crisis
In this research, we use a technique called Attention, which is often used in natural language processing, to calculate the importance of judgment for each word in a statement and visualize it. Since there are so many comments on SNS, we are trying to use it by prioritizing (triage) the ones with high risk (By the way, we are jointly conducting this research Koko. //itskoko.com/) is a venture from MIT Media Lab that provides trouble consultation functions through the messenger app. Since he is from a research institution, he is trying to utilize the latest research results in the service. It is a feature).
There are also studies that apply this Attention technique to image captioning to find out where in the image you are looking at when generating captions.

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
It can be said that all the studies assume that people act (do business) based on the judgment of the machine learning model. I think that machine learning will be used in various fields in the future, but I think that such a mechanism is indispensable for humans and machine learning models to work in cooperation. In Europe, this point is being considered, and it is also being considered that devices equipped with so-called AI should be able to present the judgment process when the user requests it]( https://arxiv.org/abs/1606.08813).
It is also significant in terms of verifying the behavior of machine learning models. This is because in recent years it has been reported that there are methods for "hacking" learning models. This is also called Adversarial Attack, and it is possible to misidentify the recognition result by adding minute noise that cannot be seen at a glance to the input image. In the figure below, a panda is mistaken for a gibbon by adding noise to the input image.
 Intriguing properties of neural networks
Intriguing properties of neural networks
In addition, Study has shown that it is possible to misidentify road signs. This is a slightly horrifying example when you think that the stop sign is misidentified during autonomous driving. This method is a top conference in machine learning [NIPS is also actively considering its method and defense method in a competition format](https://www.kaggle.com/c/nips-2017-non- targeted-adversarial-attack).
By understanding the rationale for machine learning models, you may realize that they are "unusual rationale" when you are hit by such an attack. Besides, even if you are not attacked, the accuracy may be reduced by purely changing the tendency of data (such as user behavior). Even in this case, it is easy to deal with it if you know the features on which the model relied. In other words, it is important to interpret the basis for judgment in terms of maintaining the model that we are using.
From here, we'll look at how to actually "understand" the rationale for deep learning decisions. However, it is unclear what is meant by "understanding", so here are some definitions.
First, there are two perspectives in terms of understanding the model.
Since there are various types of models that are actually used, it is difficult to understand the mechanism of the contents one by one. Therefore, we will focus on "understanding behavior" here. Specifically, it is assumed that you can "understand" if you can "explain" what kind of input and what kind of output will be obtained, starting from the state where the trained model is given. Speaking of vending machines, it is ok if you can understand what kind of drink will come out when you press the button, and if you press the button, this signal will run to the internal machine ... ,about it.
Now, since "explanation" is given to humans, it must be given in "expressions" that humans can understand. However, if you can "express" it, you can explain it. Even if it is a human-recognizable expression like the image that visualizes the weight of the network shown earlier, it is a different story if it has explanatory power. The difference between this expression and the description is defined as follows.
In other words, "explanation" is not just "expression" such as visualization and writing, but "explanation" only after "expressing" "features that contribute to network output". is. In other words, the explanation involves two steps: calculating the impact of each input on the output and expressing it in a human-understandable way. The methods introduced below have also been devised for these two points.
From here, I will introduce a method to actually explain the output of the network. The methods can be broadly divided as follows.
Incorporating the final point of view into the model (Attention) is out of the perspective of a black box because it requires modification to the network itself, but it is a very effective method and will be introduced here.
For networks dealing with classification problems, the output is the classification probability for each category. Here, if you can find an input with a very high classification probability of a category, it is synonymous with what the network considers to be a "typical example" of that category, and if you can identify it, it will be useful in estimating the judgment basis. is.
The following is a study published by Google in 2012 that allows networks that have learned image features by unsupervised learning (Auto encoder) to identify the data, which was judged to be the most human (above) and the most human. It is a composite of the data to be judged (bottom).
 Building high-level features using large scale unsupervised learning
Building high-level features using large scale unsupervised learning
The problem here is finding $ x ^ * $ that maximizes the probability $ p $ of being classified into the class $ c $, which is basically the minimum input.
(
However, this will certainly produce a mysterious image that has the maximum output but cannot be understood by humans (a state like overfitting), so there is also a pattern that restricts it to be closer to the actual input. There is.
It means adding $ p (x) $, that is, the probability that the data will actually appear. There is a study that visualizes image features using the same constraints.
 Understanding Deep Image Representations by Inverting Them
Understanding Deep Image Representations by Inverting Them
However, if the input data is high-dimensional, it will be very difficult to estimate this $ p (x) $ with high accuracy. Therefore, there is also a method of performing it in combination with a generative model such as VAE or GAN. Here, $ x ^ * $ will be generated from the appropriate vector $ z $ through the Generator.
It is in the form of generating an image from $ z $ that is classified in the class $ w_c $. Image generation is GAN, but there are studies in the direction of GAN that are equivalent to this constraint, that is, with a constraint that the generated image is classified in the same class as the real class ([Conditional Image Synthesis With Auxiliary]. Classifier GANs](https://arxiv.org/abs/1610.09585)).

In addition, the goal of GAN and the goal of understanding the basis of judgment are slightly different.(Simply put, it's better to be abstracted to some extent than the real thing.).. Therefore, in the above formula
By the way, Deep Dream, which became a hot topic for a while, uses this method. If you optimize the input image (or random noise) to maximize the output of the network, you will get that image. This article provides a detailed explanation of the visualization of image features by this Activation Maximization (the following images are also taken from that article).

In addition, as a sequel to this article, an article including a demo where you can see the basis of judgment interactively on the page is published. This article goes one step further and attempts to identify "where" in the network contributes to maximizing the judgment of each class.
The Building Blocks of Interpretability
If any of the input features have a large effect on the output by changing it, it can be considered as an important feature. In other words, it seems that its importance can be understood by examining the amount of change in which input the network is sensitive to.
This can be achieved by examining the Gradient, as the amount of change is the derivative. Since neural networks learn by gradient in the first place, this works well with existing optimization mechanisms. You can easily calculate the sensitivity to the input $ x $ as follows:
The SMOOTH GRAD introduced at the beginning is a device that makes this look cleaner (the gradient is too sensitive to noise, so it is intentional. Create multiple samples with noise added to them and average the results).

However, what we get from this is which part of the change makes the error larger / smaller. For example, I know where to change to make it look more like a car, but I don't know why it is judged to be a car in the first place. You need to be aware of this when using it.
The next idea is that if you follow the input in reverse from the output, you might get some meaningful results.
The following proposes a method in which if the network is propagated to a certain layer and then back-propagated by setting the part other than the part to be examined to 0, the input contributing to that part can be calculated back.
 Striving for Simplicity: The All Convolutional Net
Striving for Simplicity: The All Convolutional Net
In order to remove the negative value that attenuates Activation in this back propagation, the part where the value becomes negative during propagation / back propagation is set to 0 and propagated (as a result, it is non-linear equivalent to ReLU). Processing). In the previous research, we named this "guided backpropagation", and as a result, we succeeded in visualizing important points in the following inputs.

A method has also been proposed in which the gradient is reversed from the desired label because we want to know only where it contributed to the classification of the class.

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
This is a method of calculating the contribution of each feature map up to the classification and back-propagating with the weight. This will give you a heatmap-like output (the Grad-CAM diagram at the bottom of the second column from the left in the figure above). However, since the contribution is calculated on a map basis, it is not possible to obtain a pixel-based contribution like the Guided backpropagation mentioned above. Therefore, it is said in the above paper that it should be used together (Guided Grad-CAM). The detailed explanation of Grad-CAM and the result of actually applying it are introduced in the following blog.
Layer-wise relevance propagation (LRP) is a technique that propagates the relationships between layers in reverse and arrives at the input. The idea is that the sum of the contributions of each input to the output is equal between each layer, and its distribution only changes through propagation.

Then, how to calculate this contribution amount (Relevance) is set as follows (derivation is Explaining nonlinear classification decisions with deep Taylor decomposition See / pii / S0031320316303582?via%3Dihub)).

Normal propagation is activation in ReLU (top of table), and when the range of values is limited (0 to 255) like the input layer that receives the image, the rule in the center of the table (Pixel intensities = closed interval) Is applied. A tutorial to actually implement this is also available and you can try it out.
Tutorial: Implementing Layer-Wise Relevance Propagation
You can see the LRP results for the images you wrote on this site demo is also available.

A method called PatternNet / PatternAttribution has been proposed as an improved version of these methods.
Learning how to explain neural networks: PatternNet and PatternAttribution
This paper states that it is not possible to understand the rationale for judgment simply by analyzing the "weights" in the network. For that reason, first divide the input $ x $ into $ s $, which contributes to the final output $ y $, and $ d $, which is the rest (that is, noise), and express it as $ x = s + d $. I will. At this time, considering that the output of a simple network is $ w ^ T x = y $, the weight $ w $ filters $ d $ from $ x $ and sets $ s $ which contributes to $ y $. You can think of it as having a role to extract. Then, the role of the weight $ w $ is to cancel the noise $ d $, so the vector direction of $ w $ depends on $ d $. With this, no matter how much you search, you cannot find $ s $. In the figure below, the black arrow is $ w $ and the yellow arrow is $ d $, and you can see that they are pointing in the direction of canceling each other. The direction of $ w $ depends on $ d $ and has nothing to do with the $ s $ you want to find.

At this time, $ y = w ^ T x = w ^ T (s + d) $, and $ d $ should be canceled by $ w $, so $ w ^ T d = 0 $ and $ w ^ T s = y $. Also, since $ y $ and $ d $ should have no correlation, $ cov [y, d] = 0 $, and $ cov [x, y] $ is equivalent to $ cov [s, y] $. I will. If the function that extracts $ s $ from $ x $ is $ S (x) $, then $ cov [s, y] = cov [S (x), y] $. You can use this to find $ S (x) $.
First, in the case of a linear transformation, only linear $ y $ can be output = $ s $ should also be linear, so the correlation should be a linear relationship. In other words, the following formula holds.
Substituting this into the $ cov $ relationship above
from here,
It will be. However, this is just a linear case, and it is a non-linear function that does not propagate in the negative direction in ReLU, which is often used in images. In this case, it is necessary to consider $ s $ and $ d $ separately for $ s_ + and d_ + $ in the positive direction and $ s_- and d_- $ in the negative direction. This derivation formula is omitted, but the backpropagated values are replaced with $ s_ + and s_- $ in the enhanced DeConvNet / Guided Backpropagation and LRP, which are named PatternNet and PatternAttribution, respectively.

In addition, the extraction capacity of $ S (x) $ can be measured using the following evaluation index $ \ rho $.

$ x-S (x) $ has a very high correlation with $ w ^ T x = y $ because if $ S (x) $ completely exits $ s $, only the wreckage $ d $ remains. As a result, the value becomes high. This makes it possible to estimate the judgment basis with higher accuracy and measure its estimation ability.
The next idea is to explore the behavior of the model from variously transformed inputs, not from a single input. This is the LIME (Local Interpretable Model) proposed in “Why Should I Trust You?” Explaining the Predictions of Any Classifier. -Agnostic Explanations).
The technique is to first generate an input that looks like the original image divided into several parts. Feed it to a trained model to get judgment results. Then you will get a pair of input and model judgment. This is trained by a simpler and more explanatory model prepared separately from the main body model.
 Introduction to Local Interpretable Model-Agnostic Explanations (LIME)
Introduction to Local Interpretable Model-Agnostic Explanations (LIME)
Then, the feature quantity that is emphasized can be obtained from the trained simple model. Due to the simplicity of the technique, this can be applied to any model. The following articles provide more detailed explanations and the results of actual use.
Interpret the prediction result of machine learning with LIME
Understanding Black-box Predictions via Influence Functions is a study that formulates the effect of this input data fluctuation on the output. We have formulated both the effect of each training data and the effect of changes on the training data, and by doing so, we can identify samples that contribute to the judgment of the model / reduce the judgment accuracy, and change that affects the judgment of the model. I can guide you.
In addition, there is a research that made a machine learning model of the LIME mechanism of identifying what is important by masking an image (learning how to mask an image recognition model). This makes it possible to visualize the masked part = the important part.
 Interpretable Explanations of Black Boxes by Meaningful Perturbation
Interpretable Explanations of Black Boxes by Meaningful Perturbation
When the result is that A is OK and B is NG, how can B be OK / close? If you understand that, it seems that you can understand what is between OK and NG = judgment criteria. The following papers focus on this point.
Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking
This is a method based on ensemble learning using a decision tree, and the minimum cost is calculated by calculating the amount of change to bring NG x to OK in each decision tree and finding the minimum amount of change from it. It is a method of finding the amount of change to be OK with. This paper applies this to ad delivery and attempts to turn "unsellable" ads into "sellable" ads (which can be used to set appropriate ad fees according to change fees).
This is a reversal idea, the idea of not making decisions that you don't understand in the first place. Below is a task that attempts to learn coffee shop recommendations, which are usually better for nearby coffee shops, but when actually trained they make false predictions such as recommending them even though they are far away from lack of data. It is designed to do (left figure). To prevent this, we are restricting the tendency to be "monotonous" (right figure).
 TensorFlow Lattice: Flexibility Empowered by Prior Knowledge
Implementation: tensorflow / lattice
TensorFlow Lattice: Flexibility Empowered by Prior Knowledge
Implementation: tensorflow / lattice
The concept is to control the overall trend while leveraging the expressiveness of the model. This allows you to reflect a finer tendency to hit a linear model or the like. In the first place, it may be necessary to increase the data, but there are many cases where it is not available. There is a risk that you will not be able to respond to the situation that you should be dealing with by applying restrictions, but you can avoid the risk of unpredictable behavior. This will be a trade-off.
Finally, I will introduce a method to introduce a mechanism that indicates the point of interest for the input data in the model. This is a technique called Attention. Probably the first appearance was application to the translation model, and it appeared as a mechanism to learn which word in the original sentence to focus on when outputting the translated word. In the figure below, the x-axis is the original sentence and the y-axis is the translation destination sentence, and it is shown which word in the original sentence is referred to when the word in the translation destination sentence is output.
 Neural Machine Translation by Jointly Learning to Align and Translate
Neural Machine Translation by Jointly Learning to Align and Translate
The basic idea of Attention is to use not only the immediately preceding hidden layer but also the past hidden layer when outputting, and the weight is distributed according to the important points at that time.
 Effective Approaches to Attention-based Neural Machine Translation
Effective Approaches to Attention-based Neural Machine Translation
Given the hidden layers at time $ t $ as $ h_t $ and the hidden layers in the past as $ \ overline {h_s} $, the Attentions for them can be defined as follows:
There are various variations of score, such as simply taking the inner product or preparing weights for Attention.

Then, multiply this Attention to create a vector (context) $ c $ for the output of time $ t $. It may seem like you're doing something difficult, but it's roughly equivalent to taking a weighted average.
This is the basic mechanism of Attention. This allows you to get from $ a_t $ which of the series of data you are currently focusing on.
In the above, we have introduced various methods of "explanation", but is there a way to verify that "explanatory power"? From that point of view, there are the following two points.
You can check the consistency of the description by looking at how the description changes, for example, if the input is an image and you slide it gradually. In the figure below, when the image of the number 2 is shifted to the right, how the explanation ($ R (x) $) changes is compared for each method (when the rightmost LRP is shifted). It can be seen that the degree of influence of is small).
 Methods for Interpreting and Understanding Deep Neural Networks
Methods for Interpreting and Understanding Deep Neural Networks
For the correctness of the explanation, there is a technique called pixel flipping. This is to gradually remove the pixels (words if the data is natural language) that are important in the explanation, and measure the degree of change in accuracy. The figure below illustrates how the classification accuracy drops when you pull out of the important pixels (again, you can see that the LRP is dropping the most sharply and is highly explanatory).
 Methods for Interpreting and Understanding Deep Neural Networks
Methods for Interpreting and Understanding Deep Neural Networks
By using these methods, it is possible to verify the "explanatory power" of the method used for explanation.
Estimating the basis of judgment made in a black box basically looks only at inputs and outputs. This means setting aside what you are doing inside, so as long as the inputs and outputs match on any network inside, the estimates of the rationale for that decision must be the same ... ·about it.
What can be done with the techniques I've introduced so far to explain DNN's decisions? Since DNN is very good at expression learning (feature extraction), it may be possible to gain new awareness by having a person learn the expression from an object that does not recognize the feature well in the first place and then explain it. ..
The following is a visualization of the relationships between atoms that contribute to the formation of the molecular structure using a model that trains the molecular structure (a considerable amount of chemical knowledge is required to read the paper that made this presentation). So this description may be a bit inaccurate).
 Quantum-Chemical Insights from Deep Tensor Neural Networks
Quantum-Chemical Insights from Deep Tensor Neural Networks
AlphaGo has outperformed people in Go, but if AlphaGo had a high level of accountability, people would have gained more knowledge about Go. There are many research fields that people have not yet fully understood. I think that the power of deep learning will be of great help in trying to elucidate this.
There are many unsolved issues not only in the field of science but also in the field of social science. It is an exaggeration to say why people keep fighting, but in recent years, data on terrorist attacks have been compiled.
Such data may provide clues to some effective measures. As an aside, Columbia University, which is conducting research to detect crisis situations from the statements on SNS introduced at the beginning, is also conducting research to detect signs of gang conflict from SNS. This seems to have done data annotation work in cooperation with social workers.
 Automatically Processing Tweets from Gang-Involved Youth: Towards Detecting Loss and Aggression
Automatically Processing Tweets from Gang-Involved Youth: Towards Detecting Loss and Aggression
In Japan, where there are many earthquakes, it is a promising way to utilize it for disaster countermeasures. There is a satellite image before and after the Great East Japan Earthquake called ABCDdataset that annotates what was safe and what was not after the tsunami.

In order to receive insurance and compensation after the building is damaged, the damage situation is assessed, but in the case of a large-scale disaster, it takes time for the money to reach the victim as a result. There is a problem that it takes. At this time, it will be a great help if you can show the degree and grounds of the damage as well as the presence or absence of damage.
In this way, there are so many unsolved problems not only in the scientific field but also in society, and it is difficult for humans alone to discover "features" for solving them. Under these circumstances, it may be possible to obtain an opportunity to solve the problem by giving "explanatory power" not only to machine learning but also to deep learning, which is particularly strong in expression learning.
I think that the method of giving these "explanations" will become more important in order to solve more problems in the future.
Articles
Datasets
Recommended Posts