[PYTHON] [Stock price analysis] Learning pandas with fictitious data (003: Type organization-candlestick chart)

From the continuation of the last time (log output)

I made a log output function, so I would like to continue studying pandas.

Index and output of specific columns

Until the last time

import pandas as pd

I was importing only, but I have imported Series, DataFrame and numpy. By the way, as a practice of pandas, I will add a process to output the index and specific entanglement.

Success_case01.py


import pandas as pd
import logging
#[Stock price analysis] Learning pandas with fictitious data(003)Add more
from pandas import Series, DataFrame
import numpy as np

#Specifying the log format
# %(asctime)s :A human-readable representation of the time the LogRecord was generated.
# %(funcName)s :The name of the function that contains the logging call
# %(levelname)s :Character logging level for messages
# %(lineno)d :Source line number where the logging call was issued
# %(message)s : msg %Log message requested as args
fomatter = logging.Formatter('%(asctime)s:%(funcName)s:%(levelname)s:%(lineno)d:\n%(message)s')

#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('info_log.log')
handler.setLevel(logging.INFO)
handler.setFormatter(fomatter)

logger.addHandler(handler)

#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('SampleStock01_t1.csv', encoding='SJIS', \
	header=1, sep='\t')

#Change to use logger
logger.info(dframe)
#Output index
logger.info(dframe.columns)
#Output only open and close prices
logger.info(dframe[['Open price','closing price']])

Output result of dframe.columns

The following contents are recorded as index values.

info_log.log


2019-11-11 20:04:13,275:<module>:INFO:33:
Index(['date', 'Open price', 'High price', 'Low price', 'closing price'], dtype='object')

Output result of dframe.columns

I was able to extract only the opening and closing price data without any problems.

info_log.log


2019-11-11 20:04:13,290:<module>:INFO:35:
Open price Close price
0     9,934  10,000
1    10,062  10,015
2     9,961  10,007
3     9,946   9,968
4     9,812   9,932
..      ...     ...
937  13,956  14,928
938  13,893  14,968
939  14,003  15,047
940  14,180  15,041
941  14,076  15,041

[942 rows x 2 columns]

Convert date column to date type and index

The points are only the following two lines.

Point_Code.py


#Convert to date type
dframe['date'] = pd.to_datetime(dframe['date'])
#Specify date column as index
dframe = dframe.set_index('date')

Just in case, the whole code including debug information is as follows. (A blog post is useful because you can't make such redundant descriptions in a paper reference book.)

Success_case02.py


import pandas as pd
import logging
#[Stock price analysis] Learning pandas with fictitious data(003)Add more
from pandas import Series, DataFrame
import numpy as np

#Specifying the log format
# %(asctime)s :A human-readable representation of the time the LogRecord was generated.
# %(funcName)s :The name of the function that contains the logging call
# %(levelname)s :Character logging level for messages
# %(lineno)d :Source line number where the logging call was issued
# %(message)s : msg %Log message requested as args
fomatter = logging.Formatter('%(asctime)s:%(funcName)s:%(levelname)s:%(lineno)d:\n%(message)s')

#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('info_log.log')
handler.setLevel(logging.INFO)
handler.setFormatter(fomatter)

logger.addHandler(handler)

#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('SampleStock01_t1.csv', encoding='SJIS', \
	header=1, sep='\t')

#Convert to date type
dframe['date'] = pd.to_datetime(dframe['date'])
#Specify date column as index
dframe = dframe.set_index('date')

#Change to use logger
logger.info(dframe)
#Output index
logger.info(dframe.columns)
#Output only open and close prices
logger.info(dframe[['Open price','closing price']])
#Checking the index
logger.info(dframe.index)
#Type confirmation
logger.info(dframe.dtypes)

Execution result

The index information is displayed as follows.

info_log.log


2019-11-11 20:31:44,825:<module>:INFO:44:
DatetimeIndex(['2016-01-04', '2016-01-05', '2016-01-06', '2016-01-07',
               '2016-01-08', '2016-01-12', '2016-01-13', '2016-01-14',
               '2016-01-15', '2016-01-18',
               ...
               '2019-10-25', '2019-10-28', '2019-10-29', '2019-10-30',
               '2019-10-31', '2019-11-01', '2019-11-05', '2019-11-06',
               '2019-11-07', '2019-11-08'],
              dtype='datetime64[ns]', name='date', length=942, freq=None)

In addition, since other opening price, high price, low price, closing price are stored in Object type as shown below, number calculation is not possible, so we will convert it to float32 type in the next section.

info_log.log


2019-11-11 20:38:35,216:<module>:INFO:44:
Open price object
Overpriced object
Low price object
Closing price object
dtype: object

Reference information (index before type conversion)

I forgot to mention it, but the index before ** pd.to_datetime (dframe ['date']) ** is displayed as follows.

info_log.log


2019-11-11 20:36:22,326:<module>:INFO:37:
RangeIndex(start=0, stop=942, step=1)

Convert a character string with a comma to a numerical value

Point_Code.py


dframe =  dframe.apply(lambda x: x.str.replace(',','')).astype(np.float64)

I won't post the full code this time because it's too verbose.

Execution result

Since it has become a numerical type as shown below, it will be possible to calculate and graph it in the future.

info_log.log


2019-11-11 20:53:35,326:<module>:INFO:46:
Open price float32
High float32
Low price float32
Closing price float32
dtype: object

Reference information

If here

fail_Code01.py


dframe =  dframe.astype(np.float64)

If you try to do it easily, a Value Error will be spit out at the opening price at the beginning.

ValueError: could not convert string to float: '9,934'

This time there was no problem because it was the data I prepared myself, but when analyzing unknown data, there is a possibility that a character string such as "abcde" may be included instead of a number with a comma, so do error handling It's a point that is easy to get hooked on if you don't.

In such a place, I would like to output a cool log with ** logger.exception () ** etc., but as of November 11, 2019, I do not have that skill, so I will leave it as a future task.

Graphing stock price information (candlestick chart)

Advance preparation

Package installation for making candlestick charts

command prompt


pip install https://github.com/matplotlib/mpl_finance/archive/master.zip

Code changes

Point_Code.py


(Omitted)

import matplotlib.dates as mdates
from mpl_finance import candlestick_ohlc

(Omitted)

#Creating data for plotting
ohlc = zip(mdates.date2num(dframe.index), dframe['Open price'], dframe['closing price'], dframe['High price'], dframe['closing price'])
logger.info(ohlc)

#Creating a campus
fig = plt.figure()

#Format the X-axis
ax = plt.subplot()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y/%m/%d'))

#Draw a candlestick chart
candlestick_ohlc(ax, ohlc, width=0.7, colorup='g', colordown='r')

#Save the image
plt.savefig('Candle_Chart.png')

Execution result

Candle_Chart.png

I made a candlestick chart, but the data I prepared was so terrible that I didn't feel like going forward. .. ..

From the next article, I would like to organize the code, utilize the functions of panda, and improve the graph while preparing a little better data.

Whole code at the time of this article

Study_Code.py


import pandas as pd
import logging
#[Stock price analysis] Learning pandas with fictitious data(003)Add more
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from mpl_finance import candlestick_ohlc

#Specifying the log format
# %(asctime)s :A human-readable representation of the time the LogRecord was generated.
# %(funcName)s :The name of the function that contains the logging call
# %(levelname)s :Character logging level for messages
# %(lineno)d :Source line number where the logging call was issued
# %(message)s : msg %Log message requested as args
fomatter = logging.Formatter('%(asctime)s:%(funcName)s:%(levelname)s:%(lineno)d:\n%(message)s')

#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('info_log.log')
handler.setLevel(logging.INFO)
handler.setFormatter(fomatter)

logger.addHandler(handler)

#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('SampleStock01_t1.csv', encoding='SJIS', \
	header=1, sep='\t')

#Convert to date type
dframe['date'] = pd.to_datetime(dframe['date'])
#Specify date column as index
dframe = dframe.set_index('date')

#Convert open to close prices to numbers
dframe =  dframe.apply(lambda x: x.str.replace(',','')).astype(np.float32)

#Change to use logger
logger.info(dframe)
#Output index
logger.info(dframe.columns)
#Output only open and close prices
logger.info(dframe[['Open price','closing price']])
#Checking the index
logger.info(dframe.index)
#Type confirmation
logger.info(dframe.dtypes)


#Creating data for plotting
ohlc = zip(mdates.date2num(dframe.index), dframe['Open price'], dframe['closing price'], dframe['High price'], dframe['closing price'])
logger.info(ohlc)

#Creating a campus
fig = plt.figure()

#Format the X-axis
ax = plt.subplot()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y/%m/%d'))

#Draw a candlestick chart
candlestick_ohlc(ax, ohlc, width=0.7, colorup='g', colordown='r')

#Save the image
plt.savefig('Candle_Chart.png')

Finally

Again, the data I prepared was a bit too terrible, so from the next time I would like to prepare other data and write an article.

Recommended Posts

[Stock price analysis] Learning pandas with fictitious data (003: Type organization-candlestick chart)
[Stock price analysis] Learning pandas with fictitious data (002: Log output)
[Stock price analysis] Learning pandas with fictitious data (001: environment preparation-file reading)
[Stock price analysis] Learn pandas with Nikkei 225 (004: Change read data to Nikkei 225)
Download Japanese stock price data with python
Data analysis starting with python (data preprocessing-machine learning)
Get stock price data with Quandl API [Python]
Automatic acquisition of stock price data with docker-compose
Stock price forecast using deep learning [Data acquisition]
Data analysis with python 2
Data analysis environment construction with Python (IPython notebook + Pandas)
Data is missing when getting stock price data with Pandas-datareader
Data visualization with pandas
Data manipulation with Pandas!
Shuffle data with pandas
Data analysis with Python
Get Japanese stock price information from yahoo finance with pandas
Stock price forecast with tensorflow
Python data analysis learning notes
Get stock price with Python
Stock price data acquisition tips
Data analysis using python pandas
Data processing tips with Pandas
[Stock price analysis] Learning pandas on the Nikkei average (005: Grouping by year / month-confirmation of statistical information)
Versatile data plotting with pandas + matplotlib
Convenient analysis with Pandas + Jupyter notebook
Data analysis starting with python (data visualization 1)
Data analysis starting with python (data visualization 2)