[PYTHON] [Numpy / pandas / matplotlib Exercise 01]

At first

While python is very convenient and cheap to learn, it becomes impossible as soon as you skip studying, so I will spell it as a memorandum including the meaning of review.

Advance preparation

Create a virtual environment for learning.

command prompt


python -m venv study01
.\study01\Scripts\activate

command prompt


python -m pip install --upgrade pip
pip install matplotlib
pip install pandas
pip install numpy
pip install japanize-matplotlib

In particular,

command prompt


pip install japanize-matplotlib

As for, I have a habit of installing it to use Japanese with matplotlib.

Prepare a python template

I think that each person has a different way of writing a program, but I try to write a program using this template.

sample.py


import logging
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import japanize_matplotlib	### [Japanese support]

#Specifying the log format
# %(asctime)s :A human-readable representation of the time the LogRecord was generated.
# %(funcName)s :The name of the function that contains the logging call
# %(levelname)s :Character logging level for messages
# %(lineno)d :Source line number where the logging call was issued
# %(message)s : msg %Log message requested as args
fomatter = logging.Formatter('%(asctime)s:%(funcName)s:%(levelname)s:%(lineno)d:\n%(message)s')

#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('handler_log.log')
handler.setLevel(logging.INFO)
handler.setFormatter(fomatter)

logger.addHandler(handler)

def SampleFunc() :
	try :
		logger.info("Hello World")

	#Exception handling
	except : 
		logger.info("Exception occurred", stack_info=True)

if __name__ == '__main__' :
	
	#Call SampleFunc
	SampleFunc()

To be honest, I don't understand functions, exception handling, and loggers, but I can't understand what I don't use, and I'm studying, so I'd like to forcibly use this template as a base to create programs. In the future, I would like to forcibly proceed with class and file division.

Execution result

2019-11-19 23:00:52,298:SampleFunc:INFO:28:
Hello World

numpy exercise

This time the goal is to use numpy $ \ displaystyle \ text {average} \ qquad \ overline {x} = \ frac {1} {n} \ sum_ {i = 1} ^ {n} x_ {i} $ The goal is to calculate.

In addition, the entire source of the program will be described at the end of this article, and only the points will be described in the middle of the article.

Generation of ndarray

Since decimal numbers come out in the calculation assumption, I don't think about memory consumption now, but use double precision floating point type.

sample.py


sample_data = [40, 6, 56, 13, 91, 7, 11, 4, 88, 66]
sample_array = np.array(sample_data, dtype=np.float64)
logger.info(sample_array)
logger.info(type(sample_array))
logger.info(sample_array.dtype)

Execution result

2019-11-19 23:17:46,839:make_ndarray:INFO:30:
[40.  6. 56. 13. 91.  7. 11.  4. 88. 66.]
2019-11-19 23:17:46,839:make_ndarray:INFO:31:
<class 'numpy.ndarray'>
2019-11-19 23:17:46,839:make_ndarray:INFO:32:
float64

Total calculation

First, let's perform the calculation in a primitive way. (I know that you can also use the basic statistical function sum of ndarray, but since it is a primitive method that is useful in case of emergency, I write it in a hurry.)

sample.py


sum_data = 0.
for single_val in sample_data :
	sum_data += single_val

logger.info(sum_data)
#Operation check using sum function
logger.info(sample_array.sum())

Execution result

2019-11-19 23:25:29,815:make_ndarray:INFO:35:
382.0
2019-11-19 23:25:29,815:make_ndarray:INFO:36:
382.0

This calculation is \displaystyle \sum_{i=1}^{n}x_{i} Since it is the part of, I will continue to find the average.

Average calculation

sample.py


ave_data = 0.
ave_data = sum_data / len(sample_array)

logger.info(ave_data)
#Operation check using the basic function of ndarray
logger.info(sample_array.mean())

Execution result

2019-11-19 23:31:56,746:make_ndarray:INFO:38:
38.2
2019-11-19 23:31:56,746:make_ndarray:INFO:40:
38.2

Graph drawing

Since the calculation method of total and average has been confirmed up to the previous chapter, I will describe the method to illustrate this.

Simple plot

sample.py


import logging
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import japanize_matplotlib	### [Japanese support]

#Specifying the log format
# %(asctime)s :A human-readable representation of the time the LogRecord was generated.
# %(funcName)s :The name of the function that contains the logging call
# %(levelname)s :Character logging level for messages
# %(lineno)d :Source line number where the logging call was issued
# %(message)s : msg %Log message requested as args
fomatter = logging.Formatter('%(asctime)s:%(funcName)s:%(levelname)s:%(lineno)d:\n%(message)s')

#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('handler_log.log')
handler.setLevel(logging.INFO)
handler.setFormatter(fomatter)

logger.addHandler(handler)

def make_ndarray() :
	try :
		sample_data = [40, 6, 56, 13, 91, 7, 11, 4, 88, 66]
		sample_array = np.array(sample_data, dtype=np.float64)

		sum_data = 0.
		for single_val in sample_data :
			sum_data += single_val

		ave_data = 0.
		ave_data = sum_data / len(sample_array)

		#Operation check using the basic function of ndarray
		#logger.info(sample_array.mean())

		make_graph(sample_data)

	#Exception handling
	except (KeyError, ValueError) as err:
		logger.exception('Error make_ndarray: %s', err)

def make_graph(sample_data) :
	try :
		x_axis_arr = np.linspace(1, 10, 10)

		fig, ax = plt.subplots(1, 1, figsize=(6, 4))

		ax.scatter(x_axis_arr, sample_data)

		fig.suptitle('TEST', fontweight="bold", fontsize = 12)

		plt.savefig('TEST.png')

		plt.close()


	#Exception handling
	except (KeyError, ValueError) as err:
		logger.exception('Error make_graph: %s', err)
	
if __name__ == '__main__' :
	
	# make_ndarray call
	make_ndarray()

Execution result

TEST.png

Added X-axis and Y-axis labels

sample.py


ax.set_xlabel("Sample data")
ax.set_ylabel("Sample value")

fig.suptitle('TEST(X axis/Added Y-axis label)', fontweight="bold", fontsize = 12)

Execution result

TEST.png

Add annotation

sample.py


#Limit the range of graph display
ax.set_xlim([0,11])
ax.set_ylim([0,110])

#Annotate at the specified position(annotation)Put in
for x_data, y_data in zip(x_axis_arr, sample_data) :
	# logger.info(str(x_data) + ', ' + str(y_data))
	ax.annotate('(' + str(x_data) + ', ' + str(y_data) + ')', \
		xy = (x_data, y_data+3), size = 8, color = "red")

Execution result

TEST.png

Add average graph

sample.py


#Describe the graph of the average value
ave_data_xplot = np.arange(0, 12)
ave_data_yplot = np.full(12, ave_data)
ax.plot(ave_data_xplot, ave_data_yplot, color = "green")

#Annotation(annotation)Add
#Set arrow properties
arrow_dict = dict(arrowstyle = "->", color = "mediumblue")

#Text box properties
# fc:facecolor, ec:edgecolor
text_dict = dict(boxstyle = "round",
	 fc = "white", ec = "mediumblue")

ax.annotate("The average value is" + str(ave_data), \
	xy = (9, ave_data), xytext = (9.5, ave_data+5), \
	size = 8, color = "red", \
	bbox = text_dict, arrowprops = arrow_dict)

Execution result

TEST.png

Addition of total graph (addition of 2nd axis)

sample.py


		#Addition of second axis
		ax2 = ax.twinx()
		#Calculation of cumulative sum
		sample_data_csum = np.cumsum(sample_data)
		ax2.bar(x_axis_arr, sample_data_csum, color = "blue", alpha = 0.2)
		#Limit the range of graph display
		ax2.set_xlim([0,11])
		ax2.set_ylim([0,400])
		ax2.set_ylabel("Cumulative sum of sample data")

Execution result

TEST.png

Overall program of this article

sample.py


import logging
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import japanize_matplotlib	### [Japanese support]

#Specifying the log format
# %(asctime)s :A human-readable representation of the time the LogRecord was generated.
# %(funcName)s :The name of the function that contains the logging call
# %(levelname)s :Character logging level for messages
# %(lineno)d :Source line number where the logging call was issued
# %(message)s : msg %Log message requested as args
fomatter = logging.Formatter('%(asctime)s:%(funcName)s:%(levelname)s:%(lineno)d:\n%(message)s')

#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('handler_log.log')
handler.setLevel(logging.INFO)
handler.setFormatter(fomatter)

logger.addHandler(handler)

def make_ndarray() :
	try :
		sample_data = [40, 6, 56, 13, 91, 7, 11, 4, 88, 66]
		sample_array = np.array(sample_data, dtype=np.float64)

		sum_data = 0.
		for single_val in sample_data :
			sum_data += single_val

		ave_data = 0.
		ave_data = sum_data / len(sample_array)

		#Operation check using the basic function of ndarray
		#logger.info(sample_array.mean())

		make_graph(sample_data, ave_data)

	#Exception handling
	except (KeyError, ValueError) as err:
		logger.exception('Error make_ndarray: %s', err)

def make_graph(sample_data, ave_data) :
	try :
		x_axis_arr = np.linspace(1, 10, 10)

		fig, ax = plt.subplots(1, 1, figsize=(6, 4))

		ax.scatter(x_axis_arr, sample_data)
		ax.set_xlabel("Sample data")
		ax.set_ylabel("Sample value")

		#Annotate at the specified position(annotation)Put in
		for x_data, y_data in zip(x_axis_arr, sample_data) :
			# logger.info(str(x_data) + ', ' + str(y_data))
			ax.annotate('(' + str(x_data) + ', ' + str(y_data) + ')', \
				xy = (x_data, y_data+3), size = 8, color = "red")

		fig.suptitle('TEST(X axis/Added Y-axis label)', fontweight="bold", fontsize = 12)

		ax.scatter(x_axis_arr, sample_data, color = "blue")
		
		#Describe the graph of the average value
		ave_data_xplot = np.arange(0, 12)
		ave_data_yplot = np.full(12, ave_data)
		ax.plot(ave_data_xplot, ave_data_yplot, color = "green")
		
		#Annotation(annotation)Add
		#Set arrow properties
		arrow_dict = dict(arrowstyle = "->", color = "mediumblue")

		#Text box properties
		# fc:facecolor, ec:edgecolor
		text_dict = dict(boxstyle = "round",
			 fc = "white", ec = "mediumblue")
		
		ax.annotate("The average value is" + str(ave_data), \
			xy = (9, ave_data), xytext = (9.5, ave_data+5), \
			size = 8, color = "red", \
			bbox = text_dict, arrowprops = arrow_dict)

		#Limit the range of graph display
		ax.set_xlim([0,11])
		ax.set_ylim([0,110])

		#Addition of second axis
		ax2 = ax.twinx()
		#Calculation of cumulative sum
		sample_data_csum = np.cumsum(sample_data)
		ax2.bar(x_axis_arr, sample_data_csum, color = "blue", alpha = 0.2)
		#Limit the range of graph display
		ax2.set_xlim([0,11])
		ax2.set_ylim([0,400])
		ax2.set_ylabel("Cumulative sum of sample data")

		plt.savefig('TEST.png')

		plt.close()


	#Exception handling
	except (KeyError, ValueError) as err:
		logger.exception('Error make_graph: %s', err)
	
if __name__ == '__main__' :
	
	# make_ndarray call
	make_ndarray()

Finally

Exception handling isn't working well. .. .. Also, if the graph drawing part is made into a separate file or classified, will it be easier to read? It may be, so I would like to try it while studying.

Recommended Posts

[Numpy / pandas / matplotlib Exercise 01]
[Numpy / pandas / matplotlib Exercise 01] Update template
Install Python3, numpy, pandas, matplotlib, etc. on Windows
Python Basic --Pandas, Numpy-
Introduction to Python numpy pandas matplotlib (~ towards B3 ~ part2)
NumPy and matplotlib environment construction
Data visualization method using matplotlib (+ pandas) (5)
Pandas
Versatile data plotting with pandas + matplotlib
[Memo] Small story of pandas, numpy
Data visualization method using matplotlib (+ pandas) (3)
Cases using pandas plot, cases using (pure) matplotlib plot
Data visualization method using matplotlib (+ pandas) (4)
[docker] python3.5 + numpy + matplotlib environment construction
If you want to use NumPy, Pandas, Matplotlib, IPython, SciPy on Windows
Jupyter, numpy, matplotlib notes used in reports
Draw hierarchical axis labels with matplotlib + pandas
Graph trigonometric functions with numpy and matplotlib