Introduction

This article is a memorandum of reference for the author, who is a beginner in programming, to promote machine learning.

I will describe information that was helpful from an amateur's point of view.

Introducing sites that were helpful for machine learning

I also want to try machine learning of trends, so I am trying from the beginning.

What is machine learning?

First, I would like to introduce a service that allows you to easily learn what machine learning is.

One, aidemy

Free membership registration is required, but you can learn an introduction to machine learning with videos. There isn't much free content, but it might be good for people who want a quick overview. There is also a beginner's course in Python, so even people who have never touched the program may find it easy to get started.

https://aidemy.net/

Second, codexa

https://www.codexa.net/ Similarly, free membership registration is required, but you can also learn essential parts for machine learning such as linear algebra and statistics for free. (It was insanely saved)

What I did to actually move my hands

I would like to input from the top and then actually make something as output.

This time, we will make a sample ** "Recommended function" **. It's also Amazon.

I referred to the following sites. It was written carefully for beginners and was very easy to understand.

https://www.codexa.net/collaborative-filtering-k-nearest-neighbor/

Introducing two things I was addicted to

If you refer to the above article, you can almost copy it, but I'm a little addicted to it, so I'll write it down.

(In the first place) autocomplete-python in Atom doesn't work I used Atom as the editor to create the environment, but the autocomplete-python installed at that time did not work ... When I googled 「C:\Users\username.atom\packages\autocomplete-python\lib\jedi\parser」 It was useless if the ver of a file called grammarX.X.txt (X is a ver of python) does not match the ver of Pyhton in my environment. (There is such a post on StackOverFlow, and I referred to that.)
Memory Error trap I proceeded according to the tutorial article, and although it was good at the beginning I got ** Memory Error: Unable to allocate… **

Since it handles a large amount of data, this kind of error seems to be inherent in machine learning. There are two solutions I have tried:

--use dask It seems to be a library for handling large amounts of data that does not fit in memory. It was the processing of pivot that caused the Memory Error this time. I tried and errored how to write it so that only that part was distributed and processed, but it didn't work and I gave up ... (I want someone to tell me ...)

--Manually release memory with gc.collect () It's a rudimentary thing, but it probably didn't make much sense.

--Reduce the number of data items This time I started with a tutorial-like meaning, so I quickly solved it here. Specifically, there is a process of "cutting off data with members (the number of users belonging to the anime group) of 10,000 or less", but I just gradually increased that number.

Execution result

While cheating as above, I was able to finish the tutorial for the time being.

I personally tried it with my favorite Shakugan no Shana, but it is a recommended lineup that you can understand.

Summary

As a starting point for machine learning, I am personally satisfied with how to use basic functions and how to consider and process data. We will continue to devote ourselves to creating something that can be used in practice.

[Python] When an amateur starts machine learning