Automatically bind swap device to numa node
If the system has more than one swap device and swap device has the node information, we can make use of this information to decide which swap device to use in get_swap_pages() to get better performance.
If a system with more than one swap device or swap device has node information, use this information to determine the swap device to handle get_swap_page () more efficiently.
How to use this feature
Swap device has priority and that decides the order of it to be used. To make use of automatically binding, there is no need to manipulate priority settings for swap devices. e.g. on a 2 node machine, assume 2 swap devices swapA and swapB, with swapA attached to node 0 and swapB attached to node 1, are going to be swapped on. Simply swapping them on by doing::
Swap devices have priorities that determine which one to use. With automatic binding, swap A goes to node 0, swap B, for example, assuming a 2 node machine has 2 swap device swap A and swap B without having to manually prioritize the swap device. Is connected to node 1 and swap is enabled. To use swap is simply:
# swapon /dev/swapA # swapon /dev/swapB
Then node 0 will use the two swap devices in the order of swapA then swapB and node 1 will use the two swap devices in the order of swapB then swapA. Note that the order of them being swapped on doesn't matter.
node 0 can use two swap devices in the order of swapA and swap B, and node1 can use swap devices in the order of swapB and swap A. Note that it is not related to the swap order.
A more complex example on a 4 node machine. Assume 6 swap devices are going to be swapped on: swapA and swapB are attached to node 0, swapC is attached to node 1, swapD and swapE are attached to node 2 and swapF is attached to node3. The way to swap them on is the same as above::
Add more complex examples of 4 node machines. 6 The swap device is enabled as follows: swap A and swap B are connected to node 0, swap C is connected to node 1, swap D and swap E are connected to node 2, and swap F is connected to node 3. The way to swap them is as described above.
# swapon /dev/swapA # swapon /dev/swapB # swapon /dev/swapC # swapon /dev/swapD # swapon /dev/swapE # swapon /dev/swapF
Then node 0 will use them in the order of::
Use node 0 in the following order.
swapA/swapB -> swapC -> swapD -> swapE -> swapF
swapA and swapB will be used in a round robin mode before any other swap device.
swapA and swapB are used in round robin before other swap devices.
node 1 will use them in the order of::
Use node1 in the following order.
swapC -> swapA -> swapB -> swapD -> swapE -> swapF
node 2 will use them in the order of::
Use node2 in the following order.
swapD/swapE -> swapA -> swapB -> swapC -> swapF
Similaly, swapD and swapE will be used in a round robin mode before any other swap devices.
Similarly, swapD and swapE are used in round robin before other swap devices.
node 3 will use them in the order of::
Use node3 in the following order.
swapF -> swapA -> swapB -> swapC -> swapD -> swapE
The current code uses a priority based list, swap_avail_list, to decide which swap device to use and if multiple swap devices share the same priority, they are used round robin. This change here replaces the single global swap_avail_list with a per-numa-node list, i.e. for each numa node, it sees its own priority based list of available swap devices. Swap device's priority can be promoted on its matching node's swap_avail_list.
The current code uses the list-based priority of swap_avail_list to determine which swap device to run. It is used by round robin when multiple swap devices share the same priority. Here, this change replaces the only global swap_avail_list with an oer-numa-node kust for each numa node. It has its own priority based on a list of valid swap devices. Swap device priority is promoted by node's swap_avail_list.
The current swap device's priority is set as: user can set a >=0 value, or the system will pick one starting from -1 then downwards.
The current swap device priority is set as follows: The user sets a value greater than or equal to 0, or system chooses one lower value starting with 01.
The priority value in the swap_avail_list is the negated value of the swap device's due to plist being sorted from low to high. The new policy doesn't change the semantics for priority >=0 cases, the previous starting from -1 then downwards now becomes starting from -2 then downwards and -1 is reserved as the promoted value.
The priority values in swap_avail_list are negative because they are sorted from lowest to highest in the swap device plist order. The new policy does not change the semantics of priorite> = 0. Starting with the previous -1, it is processed downwards, starting with -2, and -1 is reserved as an promoted value.
So if multiple swap devices are attached to the same node, they will all be promoted to priority -1 on that node's plist and will be used round robin before any other swap devices.
So if multiple swap devices are connected to the same node, they will be promoted with priority -1 and will be used by round robin before other swap devices.
Originally, it is a part of the Linux Kernel source code, so it will be treated as GPLv2 (recognition that it should be).
The following describes the license of the Linux kernel source code (GPLv2), how to properly mark the license of individual files in the source tree, as well as links to the full license text.