[LINUX] About virtiofs

This article is a brief introduction to virtio fs as of December 2019.

Overview

Official site: https://virtio-fs.gitlab.io/

virtiofs is a new file system for sharing directories between hosts and guest VMs (s), developed by redhat engineers.

The main use case is to first use virtiofs for the root file system of lightweight VMs (such as kata-container). There are advantages such as shortened boot time by reducing unnecessary file copy to the guest. Another use case is to hide file system details from guests. Since the guest cannot see the details of the file system of the shared directory, the guest does not have to worry about the IP and security settings of the network file system, for example.

Network filesystems such as NFS and 9pfs already exist as a way to share directories between hosts and guests. However, they utilize the network stack / protocol and are not optimized for use in virtual environments (communication between hosts and guests on the same machine). Also, the semantics of a network file system are often different from the semantics of a local file, which can affect the behavior of the (guest) application.

In order to improve these problems, virtiofs aims to be a file system that (1) has high IO performance and (2) provides guests with the same semantics as the local file system. And to achieve this, development is being done based on (partially extended) the FUSE protocol, which is network stack independent and close to the linux VFS interface [^ fuse].

[^ fuse]: FUSE also has the advantage of having many years of experience.

In a normal FUSE file system, a file system daemon that runs in user space receives a FUSE request from the kernel and processes it according to the request. In virtiofs, a daemon resides on the host (in user space), receives FUSE requests from guests, and interacts with the host's filesystem as needed. Note that the interaction between the guest and the daemon is done by virtio vhost-user as well as DPDK and SPDK [^ vhost].

[^ vhost]: To explain vhost-user roughly, control processing such as initialization is performed via qemu, but data exchange is performed via qemu by using virtqueue created on shared memory. It is a mechanism that allows the guest and the process in the host user space (here, the virtiofs daemon) to interact with each other. For the background of vhost-user, for example, [series](https://www.redhat.com/ja/blog/virtio-networking-first-series-finale-and-plans-2020?source] of this article. = bloglisting & f% 5B0% 5D = post_tags% 3A Networking) will be helpful

For virtiofs, see the official design document and the main developer's kvm forum 2019 slide See also / TmvA / virtio-fs-a-shared-file-system-for-virtual-machines-stefan-hajnoczi-red-hat?iframe=no).

status Support for linux and qemu is required to use virtiofs. At the moment (December 2019), the development of basic functions is almost completed, and the kernel part has been merged in Linux 5.4 (however, the DAX function described later is not included yet). On the qemu side, on the other hand, the vhost-user-fs-pci code was merged in 4.2, but the daemon code is currently under review. If you want to actually run it on Linux + qemu, please refer to Explanation of official website.

On the other hand, kata-container's virtio fs support is already active, and although it is experimental, virtio fs can be used from v1.7. It works easily, so I'll explain how to do it next.

How to use with kata-container

For the time being, the easiest way to try virtio fs at this point is to use kata-container. Let's make sure that the shared directory is actually used as the rootfs of the container.

Installation and container launch

First, install the latest version of kata-container using the kata-deploy command (files are located under / opt / kata):

# docker run --runtime=runc -v /opt/kata:/opt/kata -v /var/run/dbus:/var/run/dbus -v /run/systemd:/run/systemd -v /etc/docker:/etc/docker -it katadocker/kata-deploy kata-deploy-docker install

All you have to do is specify kata-qemu-virtiofs for docker runtime and virtiofs will be used:

# docker run --runtime=kata-qemu-virtiofs -it busybox

If you check the mount inside the container, you can see that virtiofs is used for the root filesystem [^ 1]:

[^ 1]: If you use kata-qemu for runtime you can see that traditional 9pfs is used

(In container)
/ # mount -t virtio_fs (note:virtio in the upstream version_It's virtio fs instead of fs)
kataShared on / type virtio_fs (rw,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,dax)
kataShared on /etc/resolv.conf type virtio_fs (rw,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,dax)
kataShared on /etc/hostname type virtio_fs (rw,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,dax)
kataShared on /etc/hosts type virtio_fs (rw,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,dax)

Some config files also appear to be mounted, as they are bind mounted.

If you check the process on the host, you can see that the virtiofs daemon (virtiofsd) is running:

(In host)
$ pgrep -a virtiofsd
13154 /opt/kata/bin/virtiofsd --fd=3 -o source=/run/kata-containers/shared/sandboxes/<container ID> -o cache=always --syslog -o no_posix_lock -f

The directory specified for source here will be the shared directory used by virtiofs.

Check shared directory

Make sure that the host and guest actually share the directory.

First, let's take a look at the directory specified as source on the host:

(In host)
# ls /run/kata-containers/shared/sandboxes/<container ID>
<container ID>
<container ID>-resolv.conf
<container ID>-hosts
<container ID>-hostname

# ls /run/kata-containers/shared/sandboxes/<container ID>/<container ID>
rootfs

# ls /run/kata-containers/shared/sandboxes/<container ID>/<container ID>/rootfs
bin  dev  etc  home  mnt  proc  root  sys  tmp  usr  var

You can see that there are some config files directly under the shared directory, and rootfs under the directory with another container ID [^ 2].

[^ 2]: The reason why the guest does not see directly under the shared directory is probably because you are pivot_root.

Since the contents of this directory are shared with the guest, you can confirm that the contents can be read from the host when the guest actually creates an appropriate file.

(In container)
/ # echo abc > XXX 
/ # ls
XXX   bin   dev   etc   home  mnt   proc  root  sys   tmp   usr   var
(In host)
# ls /run/kata-containers/shared/sandboxes/<container ID>/<container ID>/rootfs
XXX   bin   dev   etc   home  mnt   proc  root  sys   tmp   usr   var
# cat /run/kata-containers/shared/sandboxes/<container ID>/<container ID>/rootfs/XXX
abc

On the contrary, if you create a file from the host, you can read it from the guest.

About DAX

One of the goals of virtiofs is to achieve high IO performance, and one of the features for this is DAX. DAX is an abbreviation for Direct Access and is a term often used in the context of non-volatile memory [^ dax]. However, virtiofus DAX is irrelevant to the actual non-volatile memory, allowing guests to access host memory without using the guest's page cache = (multiple) guests and hosts sharing host memory (page cache), is what it means. Performance is improved because there is no guest-host communication when the data is in memory. Sharing the page cache also has the advantage that data changes are immediately visible to other guests / hosts (similar to local file semantics) and memory usage is reduced.

[^ dax]: Accessing the device directly without using the page cache

I will briefly explain the mechanism of DAX. First of all, in order to use virtio, you need to add a virtio device when qemu starts, but this is recognized as PCI by the guest [^ vhost-user-fs]. And PCI has a control register called BAR that indicates the memory area of the device. The function of DAX is realized by accessing the memory area visible from this BAR by mmap [^ aaa]. Of course, the size of the area is limited, so virtiofs controls which data is map / unmapped where in the BAR space (called DAX window). A FUSE protocol has also been added for this purpose to request areas to map / unmap.

[^ vhost-user-fs]: virtiofs recognizes that virtiofs is available to guests by adding a device called vhost-user-fs-pci to qemu's boot options. [^ aaa]: Note that this area looks like a non-volatile device to the guest, so the kernel code uses the DAX code.

Note that virtiofs merged into kernel 5.4 does not yet have DAX support features, so if you want to try this feature you can either compile it yourself using the development branch or use kata-container (kata). -Already included in the container version of virtiofs).

at the end

The virtiofs daemon (virtiofsd) that will be merged into qemu is implemented in C, but of course other implementations are possible. In fact, rust (crosvm) has also implemented a daemon [^ crosvm]. Also, I couldn't find much detailed information, but it seems that some people are already thinking about how to use virtiofs and SPDK in combination [^ snia]. Since the code has been merged upstream, I think there will be more use cases of virtiofs next year.

Recommended Posts

About virtiofs
About LangID
About CAGR
About python-apt
About Permission
About sklearn.preprocessing.Imputer
About gunicorn
About requirements.txt
About locale
About permissions
About Opencv ②
About axis = 0, axis = 1
About Opencv ③
About import
About numpy
About pip
About Linux
About numpy.newaxis
About endian
About Linux
About import
About Opencv ①
About Linux
About Linux
About Linux ①
About cv2.imread
About _ and __
About wxPython
Notepad about TecoGAN
About python slices
Briefly about __name__
About python comprehension
About Docker Volume
[Linux] About export
About reference type
About Twitter scraping
About the test
Learn about programming
About Flask customization
About variable scope. .. ..
About Python tqdm.
About python yield
Notes about with
About python, class
About Linear Models
About Go functions
About pandas describe
About Kivy root
About Firestore timeout
About python inheritance
About python, range ()
About Confusion Matrix
[Linux] About PATH
About python decorators
Linux (about groups)
Note about awk
About python reference
About Bitnami Autostart
About Python decorators
About Milkcocoa SDK
Notes about pytorch