I want to tap the CUDA Driver API from python.
With pybind11 or cfffi, it is troublesome to build native code. I want to do it with pure python (standard python function) (CI build is troublesome with CUDA SDK installation)
Use ctypes.
https://docs.python.org/3/library/ctypes.html
Assume the CUDA Driver API (an API that can be used if the driver is included). It does not handle the CUDA Runtime API (which requires the installation of the CUDA SDK or runtime) (API is a hassle)
It is assumed that the running Kernel is generated in PTX format. By hitting CUDA with python, for example, you can use PTX code as a template, dynamically rewrite it on the python side, and compile (set the optimum parameters according to the target GPU).
Note converting CUDA code to NVPTX in Clang https://qiita.com/syoyo/items/4e60543aded0210fde49
Windows
C:\Windows\System32\nvcuda.dll
Normally you just need nvcuda.dll
(CUDA is automatically installed when you install the NVIDIA driver, so if you can not load it, it will be a PC without NV GPU (e.g. Intel built-in/Xe or AMD GPU))
Linux(Ubuntu)
For Ubuntu, it is located around / usr/lib/x86_64-linux-gnu /
.
from ctypes import *
cu = cdll.LoadLibrary("/usr/lib/x86_64-linux-gnu/libcuda.so")
print(cu)
ret = cu.cuInit(0)
assert ret == 0 # CUDA_SUCCESS
ver = c_int()
ret = cu.cuDriverGetVersion(byref(ver))
assert ret == 0 # CUDA_SUCCESS
print("CUDA version", ver)
CUDA version c_int(11020)
Voila!
It seems that you should use byref
for pointers.
After that, you should be OK if you call various APIs!
PTX Compiler API
Notes on the PTX Compiler API https://qiita.com/syoyo/items/cfaf0f7dd20b67cc734e
Since only static lib is provided, you need to create some dll once, but if you are not satisfied with the PTX compile in the driver, you can also do PTX compile on the client side with pure Python!
Runtime API
Sometimes you want to use a library built on top of the runtime API, such as cuSparse.
Most of the dll/so can be redistributed, so
https://docs.nvidia.com/cuda/eula/index.html
If necessary, bundle it with your own app to handle it. (E.g. CUDA SDK installation is troublesome every time in CI environment)
TODO
Recommended Posts