[PYTHON] [Stock price analysis] Learn pandas with Nikkei 225 (004: Change read data to Nikkei 225)

From the continuation of the last time (until the candlestick chart was created)

For analysis, I thought that artificial data that is easy to analyze is better than dealing with unreadable data, but [graph created last time](https://qiita. com / waka_taka / items / ab2f3b8fc6475d1c1a51 #% E5% AE% 9F% E8% A1% 8C% E7% B5% 90% E6% 9E% 9C-2) is not so realistic, so I'm a little discouraged. I did.

I think it was really stupid because I made the data myself and was not motivated.

So, I would like to analyze the actual Nikkei 225 trends (January 4, 2016 to November 8, 2019).

For the time being, in this article, I would like to keep the program up to the previous time as it is, change the read data, and investigate the details of the program.

Program up to the last time (repost)

Study_Code.py


import pandas as pd
import logging
#[Stock price analysis] Learning pandas with fictitious data(003)Add more
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from mpl_finance import candlestick_ohlc

#Specifying the log format
# %(asctime)s :A human-readable representation of the time the LogRecord was generated.
# %(funcName)s :The name of the function that contains the logging call
# %(levelname)s :Character logging level for messages
# %(lineno)d :Source line number where the logging call was issued
# %(message)s : msg %Log message requested as args
fomatter = logging.Formatter('%(asctime)s:%(funcName)s:%(levelname)s:%(lineno)d:\n%(message)s')

#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('info_log.log')
handler.setLevel(logging.INFO)
handler.setFormatter(fomatter)

logger.addHandler(handler)

#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('NikkeiAverage.csv', encoding='SJIS', \
	header=1, sep='\t')

#Convert to date type
dframe['date'] = pd.to_datetime(dframe['date'])
#Specify date column as index
dframe = dframe.set_index('date')

#Convert open to close prices to numbers
dframe =  dframe.apply(lambda x: x.str.replace(',','')).astype(np.float32)

#Change to use logger
logger.info(dframe)
#Output index
logger.info(dframe.columns)
#Output only open and close prices
logger.info(dframe[['Open price','closing price']])
#Checking the index
logger.info(dframe.index)
#Type confirmation
logger.info(dframe.dtypes)


#Creating data for plotting
ohlc = zip(mdates.date2num(dframe.index), dframe['Open price'], dframe['closing price'], dframe['High price'], dframe['closing price'])
logger.info(ohlc)

#Creating a campus
fig = plt.figure()

#Format the X-axis
ax = plt.subplot()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y/%m/%d'))

#Draw a candlestick chart
candlestick_ohlc(ax, ohlc, width=0.7, colorup='g', colordown='r')

#Save the image
plt.savefig('Candle_Chart.png')

Execution result (graph when reading the Nikkei average)

Obviously, the graph looks nice. (Although it is necessary to correct the appearance of the graph itself ...)

Candle_Chart.png

About the code of the plot creation function part

I wrote it casually, but I do not understand the following part well, so I will disassemble it one by one and check it.

Confirm_Code.py


ohlc = zip(mdates.date2num(dframe.index), dframe['Open price'], dframe['closing price'], dframe['High

Check the data in the dframe.index and mdates.date2num variables

First, try to create the following code by scraping the parts that are unnecessary for confirmation.

Study_Code.py


import pandas as pd
import logging
#[Stock price analysis] Learning pandas with fictitious data(003)Add more
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from mpl_finance import candlestick_ohlc

#Specifying the log format
# %(asctime)s :A human-readable representation of the time the LogRecord was generated.
# %(funcName)s :The name of the function that contains the logging call
# %(levelname)s :Character logging level for messages
# %(lineno)d :Source line number where the logging call was issued
# %(message)s : msg %Log message requested as args
fomatter = logging.Formatter('%(asctime)s:%(funcName)s:%(levelname)s:%(lineno)d:\n%(message)s')

#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('info_log.log')
handler.setLevel(logging.INFO)
handler.setFormatter(fomatter)

logger.addHandler(handler)

#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('NikkeiAverage.csv', encoding='SJIS', \
	header=1, sep='\t')

#Convert to date type
dframe['date'] = pd.to_datetime(dframe['date'])
#Specify date column as index
dframe = dframe.set_index('date')

#Convert open to close prices to numbers
dframe =  dframe.apply(lambda x: x.str.replace(',','')).astype(np.float32)

#Creating data for plotting
#ohlc = zip(mdates.date2num(dframe.index), dframe['Open price'], dframe['closing price'], \
#	dframe['High price'], dframe['closing price'])

# dframe.Check the contents of index
logger.info(dframe.index)

Execution result

The contents of dframe.index usually store index data.

info_log


2019-11-11 23:27:00,953:<module>:INFO:46:
DatetimeIndex(['2016-01-04', '2016-01-05', '2016-01-06', '2016-01-07',
               '2016-01-08', '2016-01-12', '2016-01-13', '2016-01-14',
               '2016-01-15', '2016-01-18',
               ...
               '2019-10-25', '2019-10-28', '2019-10-29', '2019-10-30',
               '2019-10-31', '2019-11-01', '2019-11-05', '2019-11-06',
               '2019-11-07', '2019-11-08'],
              dtype='datetime64[ns]', name='date', length=942, freq=None)

This is as expected.

Next, the contents of mdates.date2num (dframe.index) were the following numbers.

info_log


2019-11-11 23:31:04,163:<module>:INFO:47:
[735967. 735968. 735969. 735970. 735971. 735975. 735976. 735977. 735978.
 735981. 735982. 735983. 735984. 735985. 735988. 735989. 735990. 735991.
(Omitted)
 737349. 737350. 737353. 737355. 737356. 737357. 737360. 737361. 737362.
 737363. 737364. 737368. 737369. 737370. 737371.]

this is

  1. Converting '2016-01-04' to numbers 735967
  2. Converting '2016-01-05' to numbers 735968
  3. Converting '2016-01-06' to numbers 735969 ︙
  4. Converting '2019-01-04' to numbers 737370
  5. Converting '2019-01-04' to numbers 737371 Does that mean ...

I'm not good at datetime related python ... (I'm not good at file I / O, but I'm not good at date processing ...)

Check the contents of the ohlc object

The date, open price, high price, low price, and close price are only stored in tuple type, but I will check it just in case. I don't use the zip function when I make my own program.

Confirmation code

Study_Code.py


import pandas as pd
import logging
#[Stock price analysis] Learning pandas with fictitious data(003)Add more
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from mpl_finance import candlestick_ohlc

#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('info_log.log')
handler.setLevel(logging.INFO)

logger.addHandler(handler)

#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('NikkeiAverage.csv', encoding='SJIS', \
	header=1, sep='\t')

#Convert to date type
dframe['date'] = pd.to_datetime(dframe['date'])
#Specify date column as index
dframe = dframe.set_index('date')

#Convert open to close prices to numbers
dframe =  dframe.apply(lambda x: x.str.replace(',','')).astype(np.float32)

#Creating data for plotting
ohlc = zip(mdates.date2num(dframe.index), dframe['Open price'], dframe['closing price'], \
	dframe['High price'], dframe['closing price'])

#Confirmation of the contents of ohlc
for output_data in ohlc :
	logger.info(output_data)

Execution result

I was satisfied for the time being because the results were as expected.

info_log


2019-11-11 23:48:26,636:<module>:INFO:47:
(735967.0, 18818.580078125, 18450.98046875, 18951.119140625, 18450.98046875)
(735968.0, 18398.759765625, 18374.0, 18547.380859375, 18374.0)
(735969.0, 18410.5703125, 18191.3203125, 18469.380859375, 18191.3203125)

"abridgement"

(737369.0, 23343.509765625, 23303.8203125, 23352.560546875, 23303.8203125)
(737370.0, 23283.140625, 23330.3203125, 23336.0, 23330.3203125)
(737371.0, 23550.0390625, 23391.869140625, 23591.08984375, 23391.869140625)

Write a candlestick chart appropriately and check the operation.

I've read through the matplotlib sample page, but I haven't used the ** candlestick_ohlc function ** so much, so I'll try to check the operation with a few samples.

November 11, 2019 23:54 ・ ・ ・ Writing

Recommended Posts

[Stock price analysis] Learn pandas with Nikkei 225 (004: Change read data to Nikkei 225)
[Stock price analysis] Learning pandas with fictitious data (002: Log output)
[Stock price analysis] Learning pandas with fictitious data (001: environment preparation-file reading)
[Stock price analysis] Learning pandas with fictitious data (003: Type organization-candlestick chart)
Plot the Nikkei Stock Average with pandas
Download Japanese stock price data with python
Try converting to tidy data with pandas
How to read problem data with paiza
Read pandas data
Try to aggregate doujin music data with pandas
Read Python csv data with Pandas ⇒ Graph with Matplotlib
[Python] How to read excel file with pandas
Get stock price data with Quandl API [Python]
[Introduction to minimize] Data analysis with SEIR model ♬
Automatic acquisition of stock price data with docker-compose
How to replace with Pandas DataFrame, which is useful for data analysis (easy)
Convert 202003 to 2020-03 with pandas
Learn Pandas with Cheminformatics
How to convert horizontally held data to vertically held data with pandas
Reading Note: An Introduction to Data Analysis with Python
Data analysis environment construction with Python (IPython notebook + Pandas)
Data is missing when getting stock price data with Pandas-datareader
How to extract non-missing value nan data with pandas
[Python] How to deal with pandas read_html read error
Data visualization with pandas
Data manipulation with Pandas!
Shuffle data with pandas
Data analysis with Python
How to extract non-missing value nan data with pandas
Ingenuity to handle data with Pandas in a memory-saving manner
20200329_Introduction to Data Analysis with Python Second Edition Personal Summary
How to read an Excel file (.xlsx) with Pandas [Python]
Get Japanese stock price information from yahoo finance with pandas
Read csv with python pandas
Learn new data with PaintsChainer
[Python] Change dtype with pandas
Stock price forecast with tensorflow
Get stock price with Python
Stock price data acquisition tips
Data analysis using python pandas
Data processing tips with Pandas
Read json data with python
[Stock price analysis] Learning pandas on the Nikkei average (005: Grouping by year / month-confirmation of statistical information)
Stock price plummeted with "new corona"? I tried to get the Nikkei Stock Average by web scraping
Introduction to Data Analysis with Python P32-P43 [ch02 3.US Baby Names 1880-2010]
Introduction to Data Analysis with Python P17-P26 [ch02 1.usa.gov data from bit.ly]
Data engineers learn DevOps with a view to MLOps. ① Getting started
Links to people who are just starting data analysis with python
[Pandas] I tried to analyze sales data with Python [For beginners]
Move data to LDAP with python Change / Delete (Writer and Reader)
How to read e-Stat subregion data
How to deal with imbalanced data
How to deal with imbalanced data
Versatile data plotting with pandas + matplotlib
How to Data Augmentation with PyTorch
Convenient analysis with Pandas + Jupyter notebook
I want to do ○○ with Pandas
Data analysis starting with python (data visualization 1)
Data analysis starting with python (data visualization 2)
I tried fMRI data analysis with python (Introduction to brain information decoding)