[PYTHON] Split data by threshold

Divide the data in ascending order into the number of thresholds + 1 at multiple thresholds

python3


import numpy as np, pandas as pd
def partition(attr, ths, tgt=None):
    if tgt is None:
        tgt = attr
    elif isinstance(attr, pd.DataFrame):
        tgt = attr[tgt]
    po = 0
    for th in ths:
        pr = po
        while tgt[po] < th:
            po += 1
        yield tgt[pr:po]
    yield tgt[po:]
# def partition(arr, ths, tgt=None):
#     if tgt is None:
#         tgt = arr
#     elif isinstance(arr, pd.DataFrame):
#         tgt = arr[tgt]
#     r = []
#     pr = 0
#     for th in ths:
#         po = ilen(takewhile(lambda i: i < th, tgt[pr:]))+pr
#         r.append(arr[pr:po])
#         pr = po
#     r.append(arr[po:])
#     return r

from IPython.display import display
for i in partition(range(1,11), [3,6]):
    display(i)
for i in partition(np.arange(1,11), [3,6]):
    display(i)
for i in partition(pd.Series(np.arange(1,11)), [3,6]):
    display(i)
for i in partition(pd.DataFrame(np.arange(1,11)), [3,6], 0):
    display(i)
>>>
range(1, 3)
range(3, 6)
range(6, 11)

array([1, 2])
array([3, 4, 5])
array([ 6,  7,  8,  9, 10])

0    1
1    2
dtype: int32
2    3
3    4
4    5
dtype: int32
5     6
6     7
7     8
8     9
9    10
dtype: int32
0
0 1
1 2
0
2 3
3 4
4 5
0
5 6
6 7
7 8
8 9
9 10

The comment part was made with reference to "NumPy to find the position above the threshold value --Qiita", but if the number is small, simply While was faster, so I replaced it.

that's all

Recommended Posts

Split data by threshold
Training data by CNN
Correlation by data preprocessing
Gzip the data by streaming
Visualization of data by prefecture
Data acquired by Django releted
First satellite data analysis by Tellus
Pandas Cleansing Labeled Training Data Split
10 selections of data extraction by pandas.DataFrame.query
Animation of geographic data by geopandas
ECG data anomaly detection by Matrix Profile
Organize data divided by folder with Python