This article is the 5th day article of "Money Forward Advent Calendar 2015". Sorry for being late

- Touch on vectorization and how to write it in machine learning

- Machine learning beginner
- People who have no background in the procession and have not used it for a while

- Description of matrices, dot products, transposes, etc.

Intro I feel that the excitement in this field is accelerating rapidly, with IT giants such as Googe, Microsoft, and IBM opening source machine learning systems. In my opinion (maybe hope), the third AI boom has taken root before winter, and we are now at the stage of being conscious of creating services that "use machine learning to make it natural + differentiate it." Isn't it?

So, I've been devoting my last few months of input activities exclusively to taking MOOCs and reading MLP series.

What I felt in that was also in The shortest route for working programmers who have avoided mathematics to start studying machine learning. At the very least, if you are not accustomed to handling matrices and vectors, even if you manipulate matrices in the course, it is easy to get into a state of what you are doing now (quote). " I specialized in control engineering and mechatronics at university, so I had some background, but I still struggled (excuse me because I had a four-year blank). I had a hard time myself, so it's tough for people in the same position or with no background! From a slightly above perspective, I would like to write about machine learning using vectors and matrices with an example.

It is simply because the execution (learning) speed is fast. If you don't use a matrix, you will use for-loop to train each sample, but when n> 1000, the learning speed will drop dramatically. This is because Octave and Python, which are interpreted languages, incur overhead for each for minute. Therefore, it is recommended to learn by matrix rather than for-loop.

For reference, I will introduce the performance when I tried to implement MNIST data (28x28 pixels) learning of MLP (Multilayer Perceptron) model in both matrix version and for statement version.

- Training dataset: 14000 samples
- Features: 764
- Number of hidden layer units: 100
- epoc: 100
- Execution environment
- Learning time required for matrix version: 38.1sec
- Learning time for for sentence: (Experience) 10min
- Honestly, there may be a mistake in the implementation, but I have not verified it because debugging is difficult.

Consider a simple logistic regression. This time, let's vectorize the calculus of z and the gradient.

The vector z (each element) before plunging into the activation function can be calculated as follows. The above is first implemented with a for statement, but the goal is to implement it in a form using a matrix.

That is, we want to make z into the following form with just one command.

- x-> Input sample
- theta-> Parameters of each data
- m-> Number of datasets
- n-> dimension

And each of the above columns can be transformed as follows. Please recognize it as ** such thing **.

- For details, I would like you to go to "Inner product", but in the vector inner product Whichever comes first, it will be the addition for each element (theta0 * x0 + theta1 * x1 +…).

As mentioned above, each element in z is the inner product of the vector x and the vector theta. To express this with x and theta without using a for statement, create an X with each vector x (transposed) superimposed on a row as shown below.

Then you can create a simple formula like the one below.

If you implement this in Octave / python, you can get a neat shape as below.

`octave.m`

```
z = X * theta;
```

`python.py`

```
#With the method using numpy
np.dot(theta, X)
```

In order to find the optimum parameter, we need to implement the partial differential formula because we want to find a value that makes the partial differential for each parameter of the evaluation function 0 or below the threshold.

When this is vectorized

Now, to transform this, you can first use the following rules. Please recognize it as ** such thing **.

The matrix in which x is lined up is the transpose of X introduced in the vectorization of z, so

Therefore, you can write a vectorized partial derivative of the cost function as shown below.

I will briefly call it as follows.

`octave.m`

```
h = activate_function(z)
grad = 1 / m * (X'*(h-y))
```

`python.py`

```
h = activate_function(z)
grad = 1/m * np.dot(X.T, h-y)
```

The above is the explanation about vectorization. I would like to make a splint for the hearts of beginners with this. However, I don't feel like I'm making a mistake, so I'd love to hear your opinions and comments.

- Deep learning
- Data Scientist Training Reader / Introduction to Machine Learning
- An introduction to machine learning for IT engineers
- Building Machine Learning Systems with Python
- Professor Andrew Ng's Machine Learning
- Washington State University Machine Learning

- I want to know where beginners stumble
- I want kobito to be able to describe latex (urgently)
- Qiita Maybe I can't wait in line ...?
- I had no choice but to gyazo, but there must be a good way to do it.
- The person who made the textbook is amazing
- When I was making this, my heart was about to break.

Recommended Posts