[PYTHON] Interpolate 2D data with scipy.interpolate.griddata

What you want to do

How to use scipy.interpolate.griddata

Suppose you have x, y coordinates, and value data at those coordinates.

example.py


print(sample_df)
# >>>
#      X   Y  value
# 0    0   0     11
# 1    0   2     17
# 2    0   4     13
# 3    0   6     12
# 4    0   8     26
# ...
# 32  10   4      8
# 33  10   6     35
# 34  10   8     30
# 35  10  10     17

Plot only x, y coordinates xy.png

example.py


import matplotlib.pyplot as plt
import pandas as pd

fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(sample_df['X'], sample_df['Y'])
plt.show()

Use scipy.interpolate.griddata to interpolate the blanks in coordinates and values. Check the results on the contour map.

Interpolated data result.png

environment

Mac OS Python 3.8.1

matplotlib 3.3.2 numpy 1.19.2 pandas 1.1.3 scipy 1.6.0

pip install matplotlib numpy pandas scipy

Details

Sample data

Use the following as sample data.

sample_data.py


import itertools

import numpy as np
import pandas as pd

# sample data
x_coord_range = [i for i in range(0, 11, 2)]
y_coord_range = [i for i in range(0, 11, 2)]

xy_coord = list(itertools.product(x_coord_range, y_coord_range))
values = np.random.randint(1, 50, len(xy_coord))

sample_df = pd.DataFrame()
sample_df['X'] = [xy[0] for xy in xy_coord]
sample_df['Y'] = [xy[1] for xy in xy_coord]
sample_df['value'] = values

print(sample_df)
# >>>
#      X   Y  value
# 0    0   0     11
# 1    0   2     17
# 2    0   4     13
# 3    0   6     12
# 4    0   8     26
# ...
# 32  10   4      8
# 33  10   6     35
# 34  10   8     30
# 35  10  10     17

code

It is assumed that the contents of sample_df are unknown

import itertools  #Used only for sample data creation

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.interpolate import griddata

#Sample data creation location omitted

#Get data range
x_min, x_max = sample_df['X'].min(), sample_df['X'].max()
y_min, y_max = sample_df['Y'].min(), sample_df['Y'].max()

#Create an array with new coordinates in the acquired data range
new_x_coord = np.linspace(x_min, x_max, 100)
new_y_coord = np.linspace(y_min, y_max, 100)

# x,Create grid array of y
xx, yy = np.meshgrid(new_x_coord, new_y_coord)

#Known x,y coordinate,Get that value
knew_xy_coord = sample_df[['X', 'Y']].values
knew_values = sample_df['value'].values

#Interpolate data between coordinates, method='nearest', 'linear' or 'cubic'
result = griddata(points=knew_xy_coord, values=knew_values, xi=(xx, yy), method='cubic')

#graph display
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_aspect('equal', adjustable='box')
ax.contourf(xx, yy, result, cmap='jet')
plt.show()

Get data range, create new coordinates

#Get data range
x_min, x_max = sample_df['X'].min(), sample_df['X'].max()
# >>>This time x_min = 0, x_max = 10
y_min, y_max = sample_df['Y'].min(), sample_df['Y'].max()
# >>>This time y_min = 0, y_max = 10

#Create an array with new coordinates in the acquired data range
new_x_coord = np.linspace(x_min, x_max, 101)
# >>> [ 0.   0.1  0.2, ..., 10]
new_y_coord = np.linspace(y_min, y_max, 101)
# >>> [ 0.   0.1  0.2, ..., 10]
 

Get the maximum and minimum values ​​of x and y and create the coordinates for creating the actual x and y two-dimensional array. I want to put the interpolated value in the array like the table below, so new_x_coord, new_y_coord is for that Coordinates

0 0.1 0.2 ... 10
0 val val val ... val
0.1 val val val ... val
0.2 val val val ... val
... ... ... ... ... val
10 val val val ... val

np.grid

# x,Create grid array of y
xx, yy = np.meshgrid(new_x_coord, new_y_coord)

This is easy to understand. How to use the numpy.meshgrid function to generate a grid sequence from array elements --DeepAge

To briefly explain the array created by np.meshgrid,

example_meshgrid.py


import numpy as np

x = np.array([1, 2, 3])  # [x1, x2, x3]To
y = np.array([1, 2, 3])  # [y1, y2, y3]To
xx, yy = np.meshgrid(x, y)

xx=
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])
# xx=
# array([[x1, x2, x3],
#        [x1, x2, x3],
#        [x1, x2, x3]])


yy=
array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]])
# yy=
# array([[y1, y1, y1],
#        [y2, y2, y2],
#        [y3, y3, y3]])

#Coordinates can be created by overlaying xx and yy.
# array([[(1, 1),(2, 1),(3, 1)],
#        [(1, 2),(2, 2),(3, 2)],
#        [(1, 3),(2, 3),(3, 3)]])
# =
# array([[(x1, y1),(x2, y1),(x3, y1)],
#        [(x1, y2),(x2, y2),(x3, y2)],
#        [(x1, y3),(x2, y3),(x3, y3)]])

It is transmitted ...

griddata

#Known coordinates and their values
knew_xy_coord = sample_df[['X', 'Y']].values
# >>>[[ 0  0]
#     [ 0  2]
#     ...
#     [10  8]
#     [10 10]]

knew_values = sample_df['value'].values
# >>> [14 32  4 35..., 9]

#Interpolate data between coordinates, method='nearest', 'linear' or 'cubic'
result = griddata(points=knew_xy_coord, values=knew_values, xi=(xx, yy), method='cubic')
# >>> [[43.         40.16511073 37.34503184 ... 15.18228356 13.55766151 12.]
#      ...
#      [32.         30.91846813 29.83829943 ...  7.6922715   4.85443981 2. ]]

The interpolated two-dimensional array is stored in result.

The argument of scipy.interpolate.griddata is

--points: Known coordinates type = np.ndarray --values: Coordinate values ​​entered in points type = np.ndarray --xi: x, y grid array type = tuple (np.ndarray, np.ndarray) --method: Specify the type of interpolation type = str,'nearest', 'linear' or 'cubic' --fill_value: You can specify the value of the coordinates that could not be interpolated. type = float Default is nan, has no effect on 'nearest'

If the lengths of points and values ​​do not match, an error will occur.

Interpolation type

--nearest: Re-neighbor interpolation (substitute the data value closest to the point you want to interpolate) --linear: linear interpolation --cubic: (in the case of a two-dimensional array) cubic spline interpolation

result2.png

graph display

python


fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_aspect('equal', adjustable='box')
ax.contourf(xx, yy, result, cmap='jet')
plt.show()

Since it is a two-dimensional array with coordinates, ax.contourf is used instead of ax.imshow. Pass the x grid array, the y grid array, and the 2D array you want to display in that order. cmap specifies the type of color map.

ax.set_aspect ('equal', adjustable ='box') wants the x and y aspect ratios to be the same.

reference

scipy.interpolate.griddata — SciPy v1.6.0 Reference Guide

How to use the numpy.meshgrid function to generate a grid sequence from array elements --DeepAge

Recommended Posts

Interpolate 2D data with scipy.interpolate.griddata
Working with 3D data structures in pandas
Data analysis with python 2
Visualize data with Streamlit
Reading data with TensorFlow
Data visualization with pandas
Shuffle data with pandas
3D display with plotly
Data Augmentation with openCV
3D plot with matplotlib
3D or D with Py
Normarize data with Scipy
LOAD DATA with PyMysql
Sample data created with python
Let's play with 4D 4th
Embed audio data with Jupyter
Graph Excel data with matplotlib (1)
Export 3D data from QGIS
Extract Twitter data with CSV
Get Youtube data with python
Binarize photo data with OpenCV
Graph Excel data with matplotlib (2)
3D scatter plot with PyQtGraph
Save tweet data with Django
Data processing tips with Pandas
Read json data with python
Save & load data with joblib, pickle
R & D life with iPython notebook
3D skeleton structure analysis with Python
Solve ABC166 A ~ D with Python
How to deal with imbalanced data
How to deal with imbalanced data
[Python] Get economic data with DataReader
Versatile data plotting with pandas + matplotlib
Python data structures learned with chemoinformatics
Install the data files with setup.py
Parse pcap data with tshark command
Create noise-filled audio data with SoX
How to Data Augmentation with PyTorch
3D drawing with SceneKit in Pythonista
Easy data visualization with Python seaborn.
Generate fake table data with GAN
Process Pubmed .xml data with python
Data analysis starting with python (data visualization 1)
Manage your data with AWS RDS
Try data parallelism with Distributed TensorFlow
[Python scipy] Upscale / downscale 2D data
Data science environment construction with Docker
Data analysis starting with python (data visualization 2)
Implement "Data Visualization Design # 2" with matplotlib
Python application: Data cleansing # 2: Data cleansing with DataFrame
Subtitle data created with Amazon Transcribe