[PYTHON] Coloring points according to the distance from the regression curve

Purpose

--Depending on the distance from the straight line or curve obtained by regression analysis Change the color and density of the points to be plotted. --Visualize the distance distribution in other figures.

code

example.py


import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()  #It seems that the figure can be drawn beautifully, I always use it

amp = 100    #amplitude
frequency = 0.02     #frequency
offset = 1000       #offset

t = np.linspace(0,100,1000)
y_ = amp*np.sin(2*np.pi*frequency*t)+offset  #Theoretical value=Regression curve sin
y = np.random.poisson(y_)    #Observed values=With error sin

###Main###
dis = abs( y_ - y )/ y_.max()   #The difference between the theoretical value and the observed value is 0.0~1.Scaling to a value of 0
color_list=[ [1-9*i,0,i*9,i*5] for i in dis ]   #Specify color with RGB value

f = plt.figure(figsize = (12,6))
f.add_subplot(121)
plt.scatter(t,y_)
plt.xlabel('t')
plt.ylabel('y')

f.add_subplot(122)
plt.scatter(t,y,color = color_list)    
plt.xlabel('t')
plt.ylabel('y')

result

image.png

Commentary

First, regarding the data, by substituting the value of the sine wave at each time into np.random.poisson (y_), ** y ** is a sine wave that follows a Poisson distribution with an average ** y_ ** error. Therefore, a large value has a large error, and a small value has a small error.   As you can see, the main gradation this time uses a for statement to specify the color. In matplotlib, you can specify the color with [R, G, B, darkness] = [r, g, b, c]. However, r, g, b, c take values from 0.0 to 1.0. Using this mechanism, we performed four arithmetic operations so that the color would be lighter at short distances and darker at long distances. In this example, it is light red at short distances and dark blue at long distances.

important point

--The RGB value must be 0.0 to 1.0, so the coefficient must be adjusted to this range. --Of course, there are many points near the regression curve (regression straight line), so unless you make it considerably thin, the points will overlap and you will not be able to distinguish whether the density is high or the distance is short.

application

With this color specification, the distance from the regression curve can be visualized in other figures. Let's consider a case where the previous code is changed a little and the error increases as time goes by. (Code will be described later) Considering the figures of horizontal axis ** t **, vertical axis ** y **, horizontal axis ** y_ **, vertical axis ** y **,

image.png

From the figure on the left, the increase in distance (error) can be confirmed at a glance. From the figure on the right, it can be seen that the distance between ** y ** and ** y _ ** correlates with ** y _ **. In this way, it can be expected that the understanding of the data will be further deepened by changing the plotting axis. The more parameters that determine the observed value ** y **, the more various perspectives can be enjoyed.

Modified code

example2.py



import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

amp=100   #amplitude
frequency=0.02 #frequency
offset=1000     #bias
t=np.linspace(0,100,1000)
y_=amp*np.sin(2*np.pi*frequency*t)+offset

y=np.random.poisson(y_+2.6*t)   ###Change line###

dis=abs(y_-y)/y_.max()
color_list=[[1-3*i,0,3*i,i] for i in dis]    ###Change line###

f=plt.figure(figsize=(12,6))
f.add_subplot(121)
plt.scatter(t,y,color=color_list)    
plt.xlabel('t')
plt.ylabel('y')

f.add_subplot(122)
plt.scatter(y_,y,color=color_list)
plt.xlabel('y_')
plt.ylabel('y')

Additions

Comparison of time and theoretical value ** y_ ** image.png The color depth of the wrapping is probably due to the high density of dots.

Recommended Posts

Coloring points according to the distance from the regression curve
Dot according to the image
Points to note when deleting multiple elements from the List
Remove and retrieve arrays from fasta according to the ID list file
How to operate Linux from the console
How to access the Datastore from the outside
[Rust] Read the latitude / longitude csv data to find the distance between two points
Python constants like None (according to the reference)
Introduction to OPTIMIZER ~ From Linear Regression to Adam to Eve
I wanted to play with the Bezier curve
Points to note when switching from NAOqi OS 2.4.3 to 2.5.5
How to operate Linux from the outside Procedure
POST images from ESP32-CAM (MicroPython) to the server
From the introduction of pyethapp to the execution of contract
The story of moving from Pipenv to Poetry