Hello. This is Hayashi @ Ienter.

In the previous Blog, the Python data analysis library "scikit-learn" Introduced the regression analysis in.

At that time, I installed a Python package called Anaconda. This time, we introduced Pandas and Seaborn for data visualization in the United States. Let's analyze the automobile stock.

Reading stock price data

First, import the basic library for analysis with jupyter notebook.

For handling time, for datetime module and for reading data from the outside Prepare the DataReader for use.

For example, let's write a process to read the data for the past year of "[General Motors](https://ja.wikipedia.org/wiki/General Motors)" from the Yahoo site. By the way, the brand code of General Motors is "GM".

Now, let's display the top 5 data.

The meaning of each column is as follows.

--Open: Open price --High: High price --Low: Low price --Close: Closing price --Volume: Volume (number of shares closed per day) --Adj Close: Adjusted closing price ([What is adjusted closing price](http://www.yahoo-help.jp/app/answers/detail/p/546/a_id/45316/~/ What is adjusted closing price)) )

Visualization of changes in stock price data

Let's check the transition of the closing price on the graph. For the closing price, use the adjusted closing price of "Adj Close".

Daily fluctuations are a key indicator of stock price investment risk. For example, the fluctuation transition can be calculated by using the pct_change function of Series.

Stock price correlation comparison between companies

Earlier, I focused on the stock price of General Motors and analyzed it. Next, let's look at the correlation of stock prices among companies in the same industry.

Introduce "Seaborn" to visualize the correlation. You can install it on the command line by entering the following command. pip install seaborn

Import the module.

This time, we will look at the correlation of the following five companies as automobile manufacturers.

--General Motors (brand code "GM") --Ford Motor (brand code "F") --Toyota (brand code "TM") --Honda (brand code "HMC") --Tesla Motors (brand code "TSLA")

Get the closing price data of the above 5 companies.

With the closing price data of these companies, we will calculate the daily fluctuation data.

I will try to plot it.

I don't really understand the relationship. .. ..

Now let's visualize it using Seaborn's pairplot function.

Regarding the height of the correlation in the graph, the denser the points on the straight line, the higher the correlation. I hope you can imagine it. (Reference: [Correlation coefficient](https://ja.wikipedia.org/wiki/Correlation coefficient))

From that point of view US company pair of "GM (General Motors)" and "F (Ford Motor)", "TM (Toyota)" and "HMC (Honda)" Japanese company pair I think you can imagine that the correlation is relatively high.

On the contrary, latecomer electric vehicle companies such as "TSLA (Tesla Motors)" I think you can imagine that there is little correlation with other companies.

In addition, try using Seaborn's heatmap to make the correlation easier to understand. The correlation coefficient value of the data between each company is expressed by the color depth. The shape is easier to understand visually.

That's all for this story!

[PYTHON] [Data analysis] Let's analyze US automobile stocks

Reading stock price data

Visualization of changes in stock price data

Stock price correlation comparison between companies