[PYTHON] [Stock price analysis] Learning pandas on the Nikkei average (005: Grouping by year / month-confirmation of statistical information)

From the continuation of the last time (up to the candlestick chart (pseudonym))

Last time, I thought about editing the candlestick chart with various changes, but I wanted to summarize the functions of groupby before that, so I will write this article first.

Program up to the last time (repost)

Study_Code.py


import pandas as pd
import logging
#[Stock price analysis] Learning pandas with fictitious data(003)Add more
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from mpl_finance import candlestick_ohlc

#Specifying the log format
# %(asctime)s :A human-readable representation of the time the LogRecord was generated.
# %(funcName)s :The name of the function that contains the logging call
# %(levelname)s :Character logging level for messages
# %(lineno)d :Source line number where the logging call was issued
# %(message)s : msg %Log message requested as args
fomatter = logging.Formatter('%(asctime)s:%(funcName)s:%(levelname)s:%(lineno)d:\n%(message)s')

#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('info_log.log')
handler.setLevel(logging.INFO)
handler.setFormatter(fomatter)

logger.addHandler(handler)

#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('NikkeiAverage.csv', encoding='SJIS', \
	header=1, sep='\t')

#Convert to date type
dframe['date'] = pd.to_datetime(dframe['date'])
#Specify date column as index
dframe = dframe.set_index('date')

#Convert open to close prices to numbers
dframe =  dframe.apply(lambda x: x.str.replace(',','')).astype(np.float32)

#Change to use logger
logger.info(dframe)
#Output index
logger.info(dframe.columns)
#Output only open and close prices
logger.info(dframe[['Open price','closing price']])
#Checking the index
logger.info(dframe.index)
#Type confirmation
logger.info(dframe.dtypes)


#Creating data for plotting
ohlc = zip(mdates.date2num(dframe.index), dframe['Open price'], dframe['closing price'], dframe['High price'], dframe['closing price'])
logger.info(ohlc)

#Creating a campus
fig = plt.figure()

#Format the X-axis
ax = plt.subplot()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y/%m/%d'))

#Draw a candlestick chart
candlestick_ohlc(ax, ohlc, width=0.7, colorup='g', colordown='r')

#Save the image
plt.savefig('Candle_Chart.png')

Group by year / year

The program up to the last time handled data for 4 years from 2016 to 2019 at once, but since the index was made date type, this time

  1. 2016 data

  2. 2017 data

  3. 2018 data

  4. 2019 data And every year again

  5. Data for January 2016

  6. Data for February 2016

  7. Data for March 2016 ︙

  8. Data for October 2019

  9. Data for November 2019

I will try to group by year and month.

In addition, since the article is a little confusing, I would like to describe only the points from this article and describe the entire program in the last chapter.

Extract year / year / month information of confirmation index

First, check the year information of the index with the following source code.

Conf_Code.py


logger.info(dframe.index.year)

Execution result

info_log


2019-11-12 21:40:26,133:<module>:INFO:42:
Int64Index([2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,
            ...
            2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019],
           dtype='int64', name='date', length=942)

Next, let's check the date information of the index.

Conf_Code.py


logger.info([dframe.index.year, dframe.index.month])

Execution result

info_log


2019-11-12 22:12:26,052:<module>:INFO:42:
[Int64Index([2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,
            ...
            2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019],
           dtype='int64', name='date', length=942), Int64Index([ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
            ...
            10, 10, 10, 10, 10, 11, 11, 11, 11, 11],
           dtype='int64', name='date', length=942)]

Grouping by year

Conf_Code.py


for Conf_DF in dframe.groupby([dframe.index.year]) :
	logger.info(Conf_DF)

Execution result

info_log


2019-11-12 21:49:34,031:<module>:INFO:44:
(2016,Open price High price Low price Close price
date
2016-01-04  18818.580078  18951.119141  18394.429688  18450.980469
2016-01-05  18398.759766  18547.380859  18327.519531  18374.000000
2016-01-06  18410.570312  18469.380859  18064.300781  18191.320312
2016-01-07  18139.769531  18172.039062  17767.339844  17767.339844
2016-01-08  17562.230469  17975.310547  17509.640625  17697.960938
...                  ...           ...           ...           ...
2016-12-26  19394.410156  19432.480469  19385.939453  19396.640625
2016-12-27  19353.429688  19478.580078  19352.060547  19403.060547
2016-12-28  19392.109375  19442.130859  19364.730469  19401.720703
2016-12-29  19301.039062  19301.039062  19092.220703  19145.140625
2016-12-30  18997.679688  19176.810547  18991.589844  19114.369141

[245 rows x 4 columns])
2019-11-12 21:49:34,051:<module>:INFO:44:
(2017,Open price High price Low price Close price
date
2017-01-04  19298.679688  19594.160156  19277.929688  19594.160156
2017-01-05  19602.099609  19615.400391  19473.279297  19520.689453
2017-01-06  19393.550781  19472.369141  19354.439453  19454.330078
2017-01-10  19414.830078  19484.900391  19255.349609  19301.439453
2017-01-11  19358.640625  19402.169922  19325.460938  19364.669922
...                  ...           ...           ...           ...
2017-12-25  22909.410156  22948.830078  22870.189453  22939.179688
2017-12-26  22922.949219  22950.150391  22877.630859  22892.689453
2017-12-27  22854.390625  22936.160156  22854.390625  22911.210938
2017-12-28  22912.050781  22954.449219  22736.429688  22783.980469
2017-12-29  22831.490234  22881.210938  22753.199219  22764.939453

[247 rows x 4 columns])
2019-11-12 21:49:34,069:<module>:INFO:44:
(2018,Open price High price Low price Close price
date
2018-01-04  23073.730469  23506.330078  23065.199219  23506.330078
2018-01-05  23643.000000  23730.470703  23520.519531  23714.529297
2018-01-09  23948.970703  23952.609375  23789.029297  23849.990234
2018-01-10  23832.810547  23864.759766  23755.449219  23788.199219
2018-01-11  23656.390625  23734.970703  23601.839844  23710.429688
...                  ...           ...           ...           ...
2018-12-21  20310.500000  20334.730469  20006.669922  20166.189453
2018-12-25  19785.429688  19785.429688  19117.960938  19155.740234
2018-12-26  19302.589844  19530.349609  18948.580078  19327.060547
2018-12-27  19706.189453  20211.570312  19701.759766  20077.619141
2018-12-28  19957.880859  20084.380859  19900.039062  20014.769531

[245 rows x 4 columns])
2019-11-12 21:49:34,088:<module>:INFO:44:
(2019,Open price High price Low price Close price
date
2019-01-04  19655.130859  19692.580078  19241.369141  19561.960938
2019-01-07  19944.609375  20266.220703  19920.800781  20038.970703
2019-01-08  20224.669922  20347.919922  20106.359375  20204.039062
2019-01-09  20366.300781  20494.349609  20331.199219  20427.060547
2019-01-10  20270.880859  20345.919922  20101.929688  20163.800781
...                  ...           ...           ...           ...
2019-11-01  22730.490234  22852.720703  22705.599609  22850.769531
2019-11-05  23118.789062  23328.519531  23090.939453  23251.990234
2019-11-06  23343.509766  23352.560547  23246.570312  23303.820312
2019-11-07  23283.140625  23336.000000  23253.320312  23330.320312
2019-11-08  23550.039062  23591.089844  23313.410156  23391.869141

[205 rows x 4 columns])

If you group by ** dframe.groupby ([dframe.index.year]) **

  1. 2016 is [245 rows x 4 columns]
  2. 2017 is [247 rows x 4 columns]
  3. 2018 is [245 rows x 4 columns]
  4. 2019 is [205 rows x 4 columns]

I was able to extract the data normally.

Before grouping, it was [942 rows x 4 columns], so you can see the changes. I'm just a little worried The data of the read source is 2019/11/8 23,550.04 23,591.09 23,313.41 23,391.87 The value stored in the data frame is 2019-11-08 23550.039062 23591.089844 23313.410156 23391.869141 It was (all other days), so probably

dframe =  dframe.apply(lambda x: x.str.replace(',','')).astype(np.float32)

It seems that ** the second decimal place and the following are not converted correctly </ font> ** when converted with.

To be honest, as of November 12, 2019, I don't feel the need to worry about numbers after the decimal point in stock price analysis, so I will ignore it, but I feel that I will suffer from the round-off error problem when doing scientific calculations ...

Grouping by year

Conf_Code.py


for Conf_DF in dframe.groupby([dframe.index.year, dframe.index.month]) :
	logger.info(Conf_DF)

Execution result

info_log


(Omission)
2019-11-12 22:05:00,120:<module>:INFO:45:
((2019, 11),Open price High price Low price Close price
date
2019-11-01  22730.490234  22852.720703  22705.599609  22850.769531
2019-11-05  23118.789062  23328.519531  23090.939453  23251.990234
2019-11-06  23343.509766  23352.560547  23246.570312  23303.820312
2019-11-07  23283.140625  23336.000000  23253.320312  23330.320312
2019-11-08  23550.039062  23591.089844  23313.410156  23391.869141)

There is still garbage in the second decimal place, but ...

Check monthly stats

Since we were able to divide each month in the previous chapter, let's check the statistical information for each month.

Conf_Code.py


def Output_Describe(temp_DF) :
	logger.info(temp_DF.index)
	logger.info(temp_DF.describe())

dframe.groupby([dframe.index.year, dframe.index.month]).apply(Output_Describe)

Execution result

info_log


(Omission)
2019-11-12 22:25:51,012:Output_Describe:INFO:43:
DatetimeIndex(['2019-10-01', '2019-10-02', '2019-10-03', '2019-10-04',
               '2019-10-07', '2019-10-08', '2019-10-09', '2019-10-10',
               '2019-10-11', '2019-10-15', '2019-10-16', '2019-10-17',
               '2019-10-18', '2019-10-21', '2019-10-23', '2019-10-24',
               '2019-10-25', '2019-10-28', '2019-10-29', '2019-10-30',
               '2019-10-31'],
              dtype='datetime64[ns]', name='date', freq=None)
2019-11-12 22:25:51,043:Output_Describe:INFO:44:
Open price High price Low price Close price
count     21.000000     21.000000     21.000000     21.000000
mean   22173.896484  22250.916016  22117.458984  22197.476562
std      610.297974    598.321411    619.635559    591.679626
min    21316.179688  21410.199219  21276.009766  21341.740234
25%    21494.480469  21629.240234  21483.179688  21587.779297
50%    22451.150391  22522.390625  22424.919922  22451.859375
75%    22725.439453  22780.990234  22704.330078  22750.599609
max    22953.169922  23008.429688  22935.349609  22974.130859
(Omission)

If you check the closing price column based on the data of October 2019,

--The number of data is for 21 days --The average price of the Nikkei average in October 2019 (confusing) is 22197.476562 yen at the closing price. --Dispersion is 591.679626 --The cheapest price (0% point) is 21341.740234 yen --The price of 25% points in the price range of October 2019 is 21587.779297 yen. --The price of 25% points in the price range of October 2019 is 22451.859375 yen --The price of 25% points in the price range of October 2019 is 22750.599609 yen --The highest price (100% point) is 22974.130859 yen

Can be confirmed.

I'm writing (I'm sorry for the half-finished articles lined up. I'll summarize it to the end, but if I don't write it when I remember it, I'll forget it soon, so ...)

What does dispersion mean? What happens if this is visualized in a graph? What day was the cheapest in October 2019? Etc. I would like to describe it in the following chapters.

Recommended Posts

[Stock price analysis] Learning pandas on the Nikkei average (005: Grouping by year / month-confirmation of statistical information)
Plot the Nikkei Stock Average with pandas
[Stock price analysis] Learning pandas with fictitious data (002: Log output)
Stock price plummeted with "new corona"? I tried to get the Nikkei Stock Average by web scraping
[Stock price analysis] Learn pandas with Nikkei 225 (004: Change read data to Nikkei 225)
[Stock price analysis] Learning pandas with fictitious data (001: environment preparation-file reading)
[Stock price analysis] Learning pandas with fictitious data (003: Type organization-candlestick chart)
Compare nighttime and daytime returns on the Nikkei Stock Average with python
Pandas of the beginner, by the beginner, for the beginner [Python]