[PYTHON] Device driver (NumPy compatible) for programs and hardware that run in user space on Linux to share memory

Introduction

Udmabuf introduced in the previous post ["Device driver for programs and hardware running in user space on Linux to share memory" (http://qiita.com/ikwzm/items/cc1bb33ff43a491440ea)] //github.com/ikwzm/udmabuf) is now available in NumPy (Python's numerical calculation library). Specifically, the buffer area allocated in the kernel by udmabuf is mapped by NumPy's memmap, and ndarray In this article, I will explain how to do it.

udmabuf update

Unfortunately, if the version of udmabuf is version 0.5.0 (2016/4/24) or earlier, you will get the following error.

shell# python
Python 2.7.9 (default, Aug 13 2016, 17:56:53)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> m = np.memmap('/dev/udmabuf0', dtype=np.uint8, mode='r+', shape=(100))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/numpy/core/memmap.py", line 217, in __new__
    fid.seek(0, 2)
IOError: [Errno 29] Illegal seek

This is because udmabuf did not support lseek. Apparently NumPy's memmap uses seek (0,2) to get the file size, but I forgot to implement lseek () in udmabuf. udmabuf version 0.6.0 (2017/1/29) supports lseek, so if you want to use udmabuf with NumPy, please use umdabuf version 0.6.0 or later.

Example of use

Install udmabuf

First, install udmabuf. For the specific method, refer to ["Device driver for programs and hardware running in user space on Linux to share memory" (http://qiita.com/ikwzm/items/cc1bb33ff43a491440ea).

The following example installs udmabuf.ko directly with insmod. At that time, a buffer of 8MByte is secured as udmabuf0.

shell# insmod udmabuf.ko udmabuf0=8388608
[34654.590294] alloc_contig_range: [1f100, 1f900) PFNs busy
[34654.596154] alloc_contig_range: [1f200, 1fa00) PFNs busy
[34654.622746] udmabuf udmabuf0: driver installed
[34654.627153] udmabuf udmabuf0: major number   = 237
[34654.631889] udmabuf udmabuf0: minor number   = 0
[34654.636685] udmabuf udmabuf0: phys address   = 0x1f300000
[34654.642002] udmabuf udmabuf0: buffer size    = 8388608

udmabuf_test.py

Here is a script that does a simple test with python + NumPy.

udmabuf_test.py


import numpy as np
import time

class Udmabuf:
    """A simple udmabuf class"""

    def __init__(self, name):
        self.name        = name
        self.device_name = '/dev/%s'               % self.name
        self.class_path  = '/sys/class/udmabuf/%s' % self.name
        for line in open(self.class_path + '/size'):
            self.buf_size = int(line)
            break

    def memmap(self, dtype, shape):
        self.item_size = np.dtype(dtype).itemsize
        self.mem_map   = np.memmap(self.device_name, dtype=dtype, mode='r+', shape=shape)

def test_1(a):
    for i in range (0,9):
        a *= 0
        a += 0x31

if __name__ == '__main__':
    udmabuf      = Udmabuf('udmabuf0')
    test_dtype   = np.uint8
    test_size    = int(udmabuf.buf_size/(np.dtype(test_dtype).itemsize))
    udmabuf.memmap(dtype=test_dtype, shape=(test_size))
    comparison   = np.zeros(test_size, dtype=test_dtype)
    print ("test_size  : %d" % test_size)
    start        = time.time()
    test_1(udmabuf.mem_map)
    elapsed_time = time.time() - start
    print ("udmabuf0   : elapsed_time:{0}".format(elapsed_time) + "[sec]")
    start        = time.time()
    test_1(comparison)
    elapsed_time = time.time() - start
    print ("comparison : elapsed_time:{0}".format(elapsed_time) + "[sec]")
    if np.array_equal(udmabuf.mem_map, comparison):
        print ("udmabuf0 == comparison : OK")
    else:
        print ("udmabuf0 != comparison : NG")
    

In the above script, the buffer area allocated in the kernel by udmabuf is made available from python by numpy.memmap. Objects created by numpy.memmap will be able to perform the same operations as numpy.ndarray. In the above script, a * = 0 and a + = 0x31 are repeated 10 times to measure the execution time.

Execution result

When I ran the script in the previous section, I got the following result.

shell# python udmabuf_test.py
test_size  : 8388608
udmabuf0   : elapsed_time:1.53304982185[sec]
comparison : elapsed_time:1.536673069[sec]
udmabuf0 == comparison : OK

The execution time for the operation on udmabuf0 (buffer area allocated in the kernel) and the execution time for the same operation on ndarray (comparison) were almost the same. That is, it seems that the CPU cache is working effectively for udmabuf0 as well.

After running this script, I checked the contents of udmabuf0.

shell# dd if=/dev/udmabuf0 of=udmabuf0.bin bs=8388608
1+0 records in
1+0 records out
8388608 bytes (8.4 MB) copied, 0.151531 s, 55.4 MB/s
shell# 
shell# od -t x1 udmabuf0.bin
0000000 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31
*
40000000

After executing the script, I was able to confirm that the execution result remained in the buffer. Just in case, make sure you can read it in NumPy as well.

shell# python
Python 2.7.9 (default, Aug 13 2016, 17:56:53)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> a = np.memmap('/dev/udmabuf0', dtype=np.uint8, mode='r+', shape=(8388608))
>>> a
memmap([49, 49, 49, ..., 49, 49, 49], dtype=uint8)
>>> a.itemsize
1
>>> a.size
8388608
>>>

Summary

You can now use python's NumPy to manipulate the buffer area allocated in the kernel using udmabuf. This makes it possible to handle the PL (Programmable Logic) part of FPGA directly with python without using code or libraries written in C.

If the PL (Programmable Logic) part of FPGA can support the NumPy format, it will be a little easier than now to link the CPU (Python + NumPy) with the FPGA accelerator.

Recommended Posts

Device driver (NumPy compatible) for programs and hardware that run in user space on Linux to share memory
Device driver for programs and hardware running in user space on Linux to share memory (V4L2_MEMORY_USERPTR)
How to run python in virtual space (for MacOS)
Run Lima Driver on Debian GNU / Linux for Ultra96 / Ultra96-V2
How to install OpenCV on Cloud9 and run it in Python
Device drivers that "cache enabled" access to FPGA memory from Linux