From the continuation of the last time (log output)

I made a log output function, so I would like to continue studying pandas.

Index and output of specific columns

Until the last time

import pandas as pd

I was importing only, but I have imported Series, DataFrame and numpy. By the way, as a practice of pandas, I will add a process to output the index and specific entanglement.

`Success_case01.py`


import pandas as pd
import logging
#[Stock price analysis] Learning pandas with fictitious data(003)Add more
from pandas import Series, DataFrame
import numpy as np

#Specifying the log format
# %(asctime)s :A human-readable representation of the time the LogRecord was generated.
# %(funcName)s :The name of the function that contains the logging call
# %(levelname)s :Character logging level for messages
# %(lineno)d :Source line number where the logging call was issued
# %(message)s : msg %Log message requested as args
fomatter = logging.Formatter('%(asctime)s:%(funcName)s:%(levelname)s:%(lineno)d:\n%(message)s')

#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('info_log.log')
handler.setLevel(logging.INFO)
handler.setFormatter(fomatter)

logger.addHandler(handler)

#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('SampleStock01_t1.csv', encoding='SJIS', \
	header=1, sep='\t')

#Change to use logger
logger.info(dframe)
#Output index
logger.info(dframe.columns)
#Output only open and close prices
logger.info(dframe[['Open price','closing price']])

Output result of dframe.columns

The following contents are recorded as index values.

`info_log.log`


2019-11-11 20:04:13,275:<module>:INFO:33:
Index(['date', 'Open price', 'High price', 'Low price', 'closing price'], dtype='object')

Output result of dframe.columns

I was able to extract only the opening and closing price data without any problems.

`info_log.log`


2019-11-11 20:04:13,290:<module>:INFO:35:
Open price Close price
0     9,934  10,000
1    10,062  10,015
2     9,961  10,007
3     9,946   9,968
4     9,812   9,932
..      ...     ...
937  13,956  14,928
938  13,893  14,968
939  14,003  15,047
940  14,180  15,041
941  14,076  15,041

[942 rows x 2 columns]

Convert date column to date type and index

The points are only the following two lines.

`Point_Code.py`


#Convert to date type
dframe['date'] = pd.to_datetime(dframe['date'])
#Specify date column as index
dframe = dframe.set_index('date')

Just in case, the whole code including debug information is as follows. (A blog post is useful because you can't make such redundant descriptions in a paper reference book.)

`Success_case02.py`


import pandas as pd
import logging
#[Stock price analysis] Learning pandas with fictitious data(003)Add more
from pandas import Series, DataFrame
import numpy as np

#Specifying the log format
# %(asctime)s :A human-readable representation of the time the LogRecord was generated.
# %(funcName)s :The name of the function that contains the logging call
# %(levelname)s :Character logging level for messages
# %(lineno)d :Source line number where the logging call was issued
# %(message)s : msg %Log message requested as args
fomatter = logging.Formatter('%(asctime)s:%(funcName)s:%(levelname)s:%(lineno)d:\n%(message)s')

#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('info_log.log')
handler.setLevel(logging.INFO)
handler.setFormatter(fomatter)

logger.addHandler(handler)

#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('SampleStock01_t1.csv', encoding='SJIS', \
	header=1, sep='\t')

#Convert to date type
dframe['date'] = pd.to_datetime(dframe['date'])
#Specify date column as index
dframe = dframe.set_index('date')

#Change to use logger
logger.info(dframe)
#Output index
logger.info(dframe.columns)
#Output only open and close prices
logger.info(dframe[['Open price','closing price']])
#Checking the index
logger.info(dframe.index)
#Type confirmation
logger.info(dframe.dtypes)

Execution result

The index information is displayed as follows.

`info_log.log`


2019-11-11 20:31:44,825:<module>:INFO:44:
DatetimeIndex(['2016-01-04', '2016-01-05', '2016-01-06', '2016-01-07',
               '2016-01-08', '2016-01-12', '2016-01-13', '2016-01-14',
               '2016-01-15', '2016-01-18',
               ...
               '2019-10-25', '2019-10-28', '2019-10-29', '2019-10-30',
               '2019-10-31', '2019-11-01', '2019-11-05', '2019-11-06',
               '2019-11-07', '2019-11-08'],
              dtype='datetime64[ns]', name='date', length=942, freq=None)

In addition, since other opening price, high price, low price, closing price are stored in Object type as shown below, number calculation is not possible, so we will convert it to float32 type in the next section.

`info_log.log`


2019-11-11 20:38:35,216:<module>:INFO:44:
Open price object
Overpriced object
Low price object
Closing price object
dtype: object

Reference information (index before type conversion)

I forgot to mention it, but the index before ** pd.to_datetime (dframe ['date']) ** is displayed as follows.

`info_log.log`


2019-11-11 20:36:22,326:<module>:INFO:37:
RangeIndex(start=0, stop=942, step=1)

Convert a character string with a comma to a numerical value

`Point_Code.py`


dframe =  dframe.apply(lambda x: x.str.replace(',','')).astype(np.float64)

I won't post the full code this time because it's too verbose.

Execution result

Since it has become a numerical type as shown below, it will be possible to calculate and graph it in the future.

`info_log.log`


2019-11-11 20:53:35,326:<module>:INFO:46:
Open price float32
High float32
Low price float32
Closing price float32
dtype: object

Reference information

If here

`fail_Code01.py`


dframe =  dframe.astype(np.float64)

If you try to do it easily, a Value Error will be spit out at the opening price at the beginning.

ValueError: could not convert string to float: '9,934'

This time there was no problem because it was the data I prepared myself, but when analyzing unknown data, there is a possibility that a character string such as "abcde" may be included instead of a number with a comma, so do error handling It's a point that is easy to get hooked on if you don't.

In such a place, I would like to output a cool log with ** logger.exception () ** etc., but as of November 11, 2019, I do not have that skill, so I will leave it as a future task.

Graphing stock price information (candlestick chart)

Advance preparation

Package installation for making candlestick charts

`command prompt`


pip install https://github.com/matplotlib/mpl_finance/archive/master.zip

Code changes

`Point_Code.py`


(Omitted)

import matplotlib.dates as mdates
from mpl_finance import candlestick_ohlc

(Omitted)

#Creating data for plotting
ohlc = zip(mdates.date2num(dframe.index), dframe['Open price'], dframe['closing price'], dframe['High price'], dframe['closing price'])
logger.info(ohlc)

#Creating a campus
fig = plt.figure()

#Format the X-axis
ax = plt.subplot()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y/%m/%d'))

#Draw a candlestick chart
candlestick_ohlc(ax, ohlc, width=0.7, colorup='g', colordown='r')

#Save the image
plt.savefig('Candle_Chart.png')

Execution result

I made a candlestick chart, but the data I prepared was so terrible that I didn't feel like going forward. .. ..

From the next article, I would like to organize the code, utilize the functions of panda, and improve the graph while preparing a little better data.

Whole code at the time of this article

`Study_Code.py`


import pandas as pd
import logging
#[Stock price analysis] Learning pandas with fictitious data(003)Add more
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from mpl_finance import candlestick_ohlc

#Specifying the log format
# %(asctime)s :A human-readable representation of the time the LogRecord was generated.
# %(funcName)s :The name of the function that contains the logging call
# %(levelname)s :Character logging level for messages
# %(lineno)d :Source line number where the logging call was issued
# %(message)s : msg %Log message requested as args
fomatter = logging.Formatter('%(asctime)s:%(funcName)s:%(levelname)s:%(lineno)d:\n%(message)s')

#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('info_log.log')
handler.setLevel(logging.INFO)
handler.setFormatter(fomatter)

logger.addHandler(handler)

#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('SampleStock01_t1.csv', encoding='SJIS', \
	header=1, sep='\t')

#Convert to date type
dframe['date'] = pd.to_datetime(dframe['date'])
#Specify date column as index
dframe = dframe.set_index('date')

#Convert open to close prices to numbers
dframe =  dframe.apply(lambda x: x.str.replace(',','')).astype(np.float32)

#Change to use logger
logger.info(dframe)
#Output index
logger.info(dframe.columns)
#Output only open and close prices
logger.info(dframe[['Open price','closing price']])
#Checking the index
logger.info(dframe.index)
#Type confirmation
logger.info(dframe.dtypes)


#Creating data for plotting
ohlc = zip(mdates.date2num(dframe.index), dframe['Open price'], dframe['closing price'], dframe['High price'], dframe['closing price'])
logger.info(ohlc)

#Creating a campus
fig = plt.figure()

#Format the X-axis
ax = plt.subplot()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y/%m/%d'))

#Draw a candlestick chart
candlestick_ohlc(ax, ohlc, width=0.7, colorup='g', colordown='r')

#Save the image
plt.savefig('Candle_Chart.png')

Finally

Again, the data I prepared was a bit too terrible, so from the next time I would like to prepare other data and write an article.

[PYTHON] [Stock price analysis] Learning pandas with fictitious data (003: Type organization-candlestick chart)

From the continuation of the last time (log output)

Index and output of specific columns

Success_case01.py

Output result of dframe.columns

info_log.log

Output result of dframe.columns

info_log.log

Convert date column to date type and index

Point_Code.py

Success_case02.py

Execution result

info_log.log

info_log.log

Reference information (index before type conversion)

info_log.log

Convert a character string with a comma to a numerical value

Point_Code.py

Execution result

info_log.log

Reference information

fail_Code01.py

Graphing stock price information (candlestick chart)

Advance preparation

command prompt

Code changes

Point_Code.py

Execution result

Whole code at the time of this article

Study_Code.py

Finally

`Success_case01.py`

`info_log.log`

`info_log.log`

`Point_Code.py`

`Success_case02.py`

`info_log.log`

`info_log.log`

`info_log.log`

`Point_Code.py`

`info_log.log`

`fail_Code01.py`

`command prompt`

`Point_Code.py`

`Study_Code.py`