[LINUX] Getting Started with CPU Steal Time

There seems to be a lot of misunderstanding about CPU Steal Time. For example, it is a metric that shows how much CPU resources that other VMs running on the same host have stolen CPU resources that should be allocated to them, or a metric that shows that CPU resource competition is occurring. It is true that the virtual environment tends to overallocate CPU resources depending on the settings etc., but it is not possible to simply conclude that it was "stolen" just by looking at the CPU Steal Time.

CPU Steal Time is a metric that counts the amount of performance a virtual machine is trying to outperform its allocated CPU resources. Originally, it is a metric that should be called ** involuntary wait time ** instead of the name Steal Time, as in the comment in kernel / sched / cputime.c.

/proc/stat First, let's see where the steal time metric % st, which can be seen intop (1)andvmstat (8), comes from. In fact, this metric just gets the information from / proc / stat and calculates it. Is it the following part?

fs/proc/stat.clinux/stat.c at master · torvalds/linux · GitHub:

		steal		+= cpustat[CPUTIME_STEAL];

It seems that I just got the value of cpustat [CPUTIME_STEAL]. So where is cpustat [CPUTIME_STEAL] accounted for? When I checked it, I came across the following source

VM CPU time

It is calculated by kernel / sched / cputime.c. Excerpt.

linux/cputime.c at master · torvalds/linux · GitHub:

/*
 * Account for involuntary wait time.
 * @cputime: the CPU time spent in involuntary wait
 */

void account_steal_time(u64 cputime)
{
	u64 *cpustat = kcpustat_this_cpu->cpustat;
	cpustat[CPUTIME_STEAL] += cputime;
}
...
/*
 * When a guest is interrupted for a longer amount of time, missed clock
 * ticks are not redelivered later. Due to that, this function may on
 * occasion account more time than the calling functions think elapsed.
 */
static __always_inline u64 steal_account_process_time(u64 maxtime)
{
#ifdef CONFIG_PARAVIRT
	if (static_key_false(&paravirt_steal_enabled)) {
		u64 steal;

		steal = paravirt_steal_clock(smp_processor_id());
		steal -= this_rq()->prev_steal_time;
		steal = min(steal, maxtime);
		account_steal_time(steal);
		this_rq()->prev_steal_time += steal;

		return steal;
	}
#endif
	return 0;
}

You can see that ʻaccount_steal_time ()only contains the measured steal time. It seems that the actual value is acquired byparavirt_steal_clock (), which is called in steal_account_process_time ()`.

Note that if you run both Xen and KVM as guest VMs, you don't have to worry because they usually have CONFIG_PARAVIRT = y and go inside #ifdef CONFIG_PARAVIRT, regardless of HVM / PV. This seems rather natural, as HVM often doesn't use paravirtual features. By the way, CONFIG_PARAVIRT = y is exactly an option that just enables the paravirtualization code, with or without virtualization. The help contents of Kconfig are excerpted below. linux/Kconfig at master · torvalds/linux · GitHub:

config PARAVIRT
	bool "Enable paravirtualization code"
	---help---
	  This changes the kernel so it can modify itself when it is run
	  under a hypervisor, potentially improving performance significantly
	  over full virtualization.  However, when run without a hypervisor
	  the kernel is theoretically slower and slightly larger.

Well, the story is off, but it was paravirt_steal_clock () that actually counted the steal time. If you follow this, [linux / paravirt.h at master · torvalds / linux · GitHub](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/paravirt.h# You can see that it is abstracted as pv_ops.time.steal_clock in L34-L37). Below is that part:

static inline u64 paravirt_steal_clock(int cpu)
{
	return PVOP_CALL1(u64, time.steal_clock, cpu);
}

Since there is a macro, you may be confused by something, but if you expand this macro, you will get pv_ops.time.steal_clock from time.steal_clock. In other words, you can see that the original identity is pv_ops.time.steal_clock.

Now let's look at what is registered as pv_ops.time.steal_clock for Xen and KVM, respectively.

For Xen

For Xen, it is defined in drivers / xen / time.c.

linux/time.c at master · torvalds/linux · GitHub:

	pv_ops.time.steal_clock = xen_steal_clock;

The reality is:

u64 xen_steal_clock(int cpu)
{
	struct vcpu_runstate_info state;

	xen_get_runstate_snapshot_cpu(&state, cpu);
	return state.time[RUNSTATE_runnable] + state.time[RUNSTATE_offline];
}

Apparently, in the case of Xen, it seems to measure steal time from the state of VCPU. Note that xen_get_runstate_snapshot_cpu () takes a snapshot of the VCPU state at that time. This function is also defined in the same source file, so if you're curious, take a look there. The meanings of RUNSTATE_runnable and RUNSTATE_offline can be found in ʻinclude / xen / interface / vcpu.h`.

linux/vcpu.h at master · torvalds/linux · GitHub:

/* VCPU is currently running on a physical CPU. */
#define RUNSTATE_running  0

/* VCPU is runnable, but not currently scheduled on any physical CPU. */
#define RUNSTATE_runnable 1

/* VCPU is blocked (a.k.a. idle). It is therefore not runnable. */
#define RUNSTATE_blocked  2

/*
 * VCPU is not runnable, but it is not blocked.
 * This is a 'catch all' state for things like hotplug and pauses by the
 * system administrator (or for critical sections in the hypervisor).
 * RUNSTATE_blocked dominates this state (it is the preferred state).
 */
#define RUNSTATE_offline  3

As you can see from the above, RUNSTATE_runnable and RUNSTATE_offline do not reflect the effects of other resources taken. If the application running on the virtual machine has the necessary resources allocated, the VCPU will not be in the runnable state. It could be running, blocked, or ʻoffline. You can borrow extra resources in Xen, but you can't steal resources from other VMs. If the VCPU of a virtual machine is in the runnable state for a long time and the Steal Time of top/vmstat` is high, it means that the virtual machine is requesting more CPU resources than it can use. .. Instead, if Steal TIme suddenly jumps up, it's possible that something went wrong.

For KVM

For KVM, the steal value is to be read directly from the MSR. You can find it in ʻarch / x86 / kernel / kvm.c`.

linux/kvm.c at master · torvalds/linux · GitHub:

static void __init kvm_guest_init(void)

{
	int i;
	paravirt_ops_setup();
...
	if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
		has_steal_clock = 1;
		pv_ops.time.steal_clock = kvm_steal_clock;
	}

linux/kvm.c at master · torvalds/linux · GitHub:

static u64 kvm_steal_clock(int cpu)
{
	u64 steal;
	struct kvm_steal_time *src;
	int version;

	src = &per_cpu(steal_time, cpu);
	do {
		version = src->version;
		virt_rmb();
		steal = src->steal;
		virt_rmb();
	} while ((version & 1) || (version != src->version));

	return steal;
}

It just loads the MSR (Model Specific Register). Probably the implementation of what values are provided is [linux / x86.c at master · torvalds / linux · GitHub](https://github.com/torvalds/linux/blob/master/arch/x86/kvm/ It will be the part of x86.c # L2651-L2694), but I have not followed it here. Fortunately, the documentation mentions this definition.

https://www.kernel.org/doc/Documentation/virtual/kvm/msr.txt

MSR_KVM_STEAL_TIME: 0x4b564d03 ... steal: the amount of time in which this vCPU did not run, in nanoseconds. Time during which the vcpu is idle, will not be reported as steal time.

It is written as the time when VCPU was not executed except idle. Similar to Xen.

Note that steal time is provided by the hypervisor in both Xen and KVM, as we have seen above. However, depending on the cloud service provider, it seems that there is a difference in whether or not this value is provided correctly depending on the environment. It seems that GCE did not report steal time. VP on AWS/Being a Distinguished EngineerMatthew S. Wilson(@msw)Mr.| TwitterVia,IwaspreviouslyaProductDirectorofGCEongooglePaulR.Nash(@paulrnash)Mr.|TwitterTaughtme(ItseemsthatyouarecurrentlyinMSAzure)。

I've heard that GCE uses a KVM-based hypervisor, so isn't MSR_KVM_STEAL_TIME enabled? I don't know the details.

Summary

In general, the situation where CPU steal time is counted does not indicate an underlying host issue. The steal time also covers the time spent on the hypervisor and VMM acting on behalf of the guest, and basically requires more than the tolerance given to the VM to run. It reveals the time of the minute.

You should carefully check the guest and host metrics and select the appropriate VM. CPU steal time is a kind of indicator for choosing the right virtual machine. When looking at the CPU consumption required to perform a task, it is important to be able to subtract the time waiting for the physical CPU to execute. CPU steal time will help for that.

When the CPU steal time increases, first read the documentation for your virtual environment, analyze the metrics provided by the host, and see if it is valid.

Recommended Posts

Getting Started with CPU Steal Time
Getting started with Android!
1.1 Getting Started with Python
Getting Started with Golang 2
Getting started with apache2
Getting Started with Golang 1
Getting Started with Python
Getting Started with Django 1
Getting Started with Optimization
Getting Started with Golang 3
Getting started with Spark
Getting Started with Python
Getting Started with Pydantic
Getting Started with Golang 4
Getting Started with Jython
Getting Started with Django 2
Translate Getting Started With TensorFlow
Getting Started with Python Functions
Getting Started with Tkinter 2: Buttons
Getting Started with PKI with Golang ―― 4
Getting Started with Python Django (1)
Getting Started with Python Django (3)
Getting Started with Python Django (6)
Getting Started with Django with PyCharm
Python3 | Getting Started with numpy
Getting Started with Python responder v2
Getting Started with Git (1) History Storage
Getting started with Sphinx. Generate docstring with Sphinx
Getting Started with Python Web Applications
Getting Started with Python for PHPer-Classes
Getting Started with Sparse Matrix with scipy.sparse
Getting Started with Julia for Pythonista
Getting Started with Python Basics of Python
Getting Started with Cisco Spark REST-API
Getting Started with Python Genetic Algorithms
Getting started with Python 3.8 on Windows
Getting Started with Python for PHPer-Functions
Getting Started with python3 # 1 Learn Basic Knowledge
Getting Started with Flask with Azure Web Apps
Getting Started with Python Web Scraping Practice
Getting Started with Python for PHPer-Super Basics
Getting Started with Python Web Scraping Practice
Getting started with Dynamo from Python boto
Getting Started with Lisp for Pythonista: Supplement
Getting Started with Heroku, Deploying Flask App
Getting Started with TDD with Cyber-dojo at MobPro
Getting started with Python with 100 knocks on language processing
MongoDB Basics: Getting Started with CRUD in JAVA
Getting Started with Drawing with matplotlib: Writing Simple Functions
Getting started with Keras Sequential model Japanese translation
[Translation] Getting Started with Rust for Python Programmers
Django Getting Started Part 2 with eclipse Plugin (PyDev)
Getting started with AWS IoT easily in Python
Getting Started with Python's ast Module (Using NodeVisitor)
Materials to read when getting started with Python
Settings for getting started with MongoDB in python
Django 1.11 started with Python3.6
Getting Started with python3 # 2 Learn about types and variables
Getting Started with pandas: Basic Knowledge to Remember First
Getting Started with Google App Engine for Python & PHP
Getting Started with Tensorflow-About Linear Regression Hypothesis and Cost