We have released an extension that allows you to define xarray data like a Python data class.

TL;DR

xarray is a Python package used in data analysis like NumPy and pandas as a tool for handling data with metadata (axis labels, etc.) attached to a multidimensional array, but various data As I deal with xarray (DataArray), I feel the following.

--I want an array generation function that specifies the axes (dimensions) and type (dtype) of a multidimensional array. --Similarly, I want to specify the axis, type, and default value for the metadata (coordinates). --I want to generate a special array like ʻones () `of NumPy that satisfies the above two. --I want to define data-specific processing (method)

There are many ways to achieve these, but are the data classes (dataclasses) that appeared in the standard library from Python 3.7 the same problem? I noticed that I solved it with a simple writing style. Therefore, we have released a package "xarray-custom" for creating a user-defined DataArray class in the same way as a data class (you can install it with pip).

from xarray_custom import ctype, dataarrayclass

@dataarrayclass(accessor='img')
class Image:
    """DataArray class to represent images."""

    dims = 'x', 'y'
    dtype = float
    x: ctype('x', int) = 0
    y: ctype('y', int) = 0

    def normalize(self):
        return self / self.max()

Below, I will explain how this code works, touching on Python data classes and DataArary.

Python's dataclass

First, Python data classes are a feature (class decorator) that makes it easy to create user-defined data structures.

from dataclasses import dataclass

@dataclass
class Coordinates:
    x: float
    y: float

    @classmethod
    def origin(cls):
        return cls(0.0, 0.0)

    def norm(self):
        return (self.x ** 2 + self.y **2) ** 0.5

If you define a class like this,

coord = Coordinates(x=3, y=4)

It creates a class that stores data like (Originally, it is necessary to implement various special methods such as __init__ ()). I think the following are possible advantages of defining a data class in this way.

--The value to be stored can have a name, type (type hint only), and default value. --Special data generation (ʻorigin () in the above example) can be realized by class method --Data-specific processing (norm ()` in the above example) can be realized by instance method

xarray's DataArray

Next, let's look at the data structure of xarray. A DataArray takes a data structure consisting of a NumPy multidimensional array (data), axes (dimensions; a type of metadata), and metadata (coordinates). The following example intends to represent the data in a DataArray of a monochromatic image consisting of two axes of xy.

from xarray import DataArray

image = DataArray(data=[[0, 1], [2, 3]], dims=('x', 'y'),
                  coords={'x': [0, 1], 'y': [0, 1]})
print(image)

# <xarray.DataArray (x: 2, y: 2)>
# array([[0., 1.],
#        [2., 3.]])
# Coordinates:
#   * x        (x) int64 0 1
#   * y        (y) int64 0 1

Since DataArray is a class, a common way to define a user-defined DataArray is to create a new class with DataArray as a subclass. However, if you hard code the definition in __init__ () etc., there is a problem that it cannot be reused. In addition, xarray and pandas do not actively recommend subclassing in the first place, and it is better to implement user-defined processing etc. via a special object called accessor.

One standard solution to this problem is to subclass Dataset and/or DataArray to add domain specific functionality. However, inheritance is not very robust. It’s easy to inadvertently use internal APIs when subclassing, which means that your code may break when xarray upgrades. Furthermore, many builtin methods will only return native xarray objects. (Extending xarray - xarray 0.15.0 documentation)

xarray's dataarrayclass

It's finally the main subject. In order to realize the request mentioned at the beginning, considering the circumstances of xarray (DataArray),

-Provides a method to generate an array without hard-coding the (meta) data definition. --Provide user-defined processing via accessors —— Make these simple to write

I realized that I needed something. So, in xarray-custom, I decided to get used to the Python data class and implement it by ** dynamically modifying the user-defined DataArray class with a class decorator (dataarrayclass) **. Using xarray-custom, the above monochromatic image example can be defined as follows.

from xarray_custom import ctype, dataarrayclass

@dataarrayclass(accessor='img')
class Image:
    """DataArray class to represent images."""

    dims = 'x', 'y'
    dtype = float
    x: ctype('x', int) = 0
    y: ctype('y', int) = 0

    def normalize(self):
        return self / self.max()

By knowing the data class, you might have somehow understood what you're doing with this code without explanation. The points here are the following three points.

--Specify the data axis ('x','y') and type (float) with class variables --Specify the name, axis, type, and default value of the metadata including the axis in a specially typed class variable. --User-defined processing (method) is automatically moved to the accessor (ʻimg`).

Where ctype is a function to generate a special type (class) that defines the metadata. Now let's actually generate a user-defined array.

image = Image([[0, 1], [2, 3]], x=[0, 1], y=[0, 1])
print(image)

# <xarray.DataArray (x: 2, y: 2)>
# array([[0., 1.],
#        [2., 3.]])
# Coordinates:
#   * x        (x) int64 0 1
#   * y        (y) int64 0 1

You can see that the axes and metadata are predefined, so it's much simpler to write than the DataArray example above. It differs from a data class in that the type information is actually used to determine the type of the array. In the above example, the list of integers is type converted to float in the DataArray. If (implicit) type conversion is not possible, ValueError will be thrown. You can also leave the type unspecified (in which case it will accept objects of any type).

Instance methods via an accessor

According to the policy of xarray, the instance method defined in the class (normalize () in the above example) can be executed via the accessor. Running without an accessor will throw ʻAttribute Error`.

normalized = image.img.normalize()
print(normalized)

# <xarray.DataArray (x: 2, y: 2)>
# array([[0.        , 0.33333333],
#        [0.66666667, 1.        ]])
# Coordinates:
#   * x        (x) int64 0 1
#   * y        (y) int64 0 1

Special functions as class methods

As a special array generation, zeros () ・ ʻones () ・ ʻempty ()full (), which is familiar with NumPy, is automatically added as a class method. zeros_like () etc. are defined in the top-level function of xarray, so let's use that.

uniform = Image.ones((2, 3))
print(uniform)

# <xarray.DataArray (x: 2, y: 3)>
# array([[1., 1., 1.],
#        [1., 1., 1.]])
# Coordinates:
#   * x        (x) int64 0 0
#   * y        (y) int64 0 0 0

Misc

So far, isn't the dataarray class also subclassing the DataArray after all? Some people may have thought. However, the DataArray that is actually generated is a real DataArray instance. Conversely, in the above example it is not a ʻImage` instance.

print(type(image)) # xarray.core.dataarray.DataArray

This is because the dataarray class internally generates__new__ ()instead of__ init__ (), and__new__ ()returns a DataArray. Therefore, it should be noted that class variables etc. cannot be accessed like an instance created from an ordinary class.

I have the impression that xarray has fewer articles and is less well known than NumPy and pandas, but it is definitely useful for tasks dealing with multidimensional arrays, so I would like to use it more and more. xarray-custom has few functions in the early stages of development, but I hope to contribute to the community as much as possible through the development and articles of these extensions.

References

Recommended Posts

We have released an extension that allows you to define xarray data like a Python data class.
Created a service that allows you to search J League data
A memo that allows you to change Pineapple's Python environment with pyenv
We have released a Python module that generates a regional mesh for Japan
Introduction of "scikit-mobility", a library that allows you to easily analyze human flow data with Python (Part 1)
Create a plugin that allows you to search Sublime Text 3 tabs in Python
A learning roadmap that allows you to develop and publish services from scratch with Python
A Python script that allows you to check the status of the server from your browser
How to import a file anywhere you like in Python
A python script that converts Oracle Database data to csv
[Road to Python Intermediate] Call a class instance like a function with __call__
How to write a Python class
Python patterns that have been released to the world and have been scrutinized later
[Python] I made a decorator that doesn't seem to have any use.
Have Alexa run Python to give you a sense of the future
[Python3] Code that can be used when you want to cut out an image in a specific size
We have released LINE BOT, which allows you to search for Hotels and Bars based on location information.
[Python3] Code that can be used when you want to change the extension of an image at once
[Python] How to make a class iterable
I tried to create a class that can easily serialize Json in Python
What do you like about how to convert an array (list) to a string?
I didn't have to write a decorator in the class Thank you contextmanager
Introducing the book "Creating a profitable AI with Python" that allows you to learn machine learning in the shortest course
A memo to generate a dynamic variable of class from dictionary data (dict) that has only standard type data in Python3