[LINUX] Unaligned Memory Accesses (1/2)

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/process/unaligned-memory-access.rst

Unaligned Memory Accesses

Linux runs on a wide variety of architectures which have varying behaviour when it comes to memory access. This document presents some details about unaligned accesses, why you need to write code that doesn't cause them, and how to write such code!

Linux runs on different architectures that behave differently with respect to memory access. This document will give you more details about unaligned access, why we have to write code to prevent it from happening, and how to write it!

The definition of an unaligned access

Unaligned memory accesses occur when you try to read N bytes of data starting from an address that is not evenly divisible by N (i.e. addr % N != 0). For example, reading 4 bytes of data from address 0x10004 is fine, but reading 4 bytes of data from address 0x10005 would be an unaligned memory access.

Unaligned memory access occurs when you try to read an N bytes data structure starting from an address position that is not divisible by N (that is, add% N! = 0). For example, reading 4 bytes of data from address 0x10004 is fine, but reading 4 bytes of data from address 0x10005 is an unaligned memory maccess.

The above may seem a little vague, as memory access can happen in different ways. The context here is at the machine code level: certain instructions read or write a number of bytes to or from memory (e.g. movb, movw, movl in x86 assembly). As will become clear, it is relatively easy to spot C statements which will compile to multiple-byte memory access instructions, namely when dealing with types such as u16, u32 and u64.

The above may seem a bit vague, as memory access occurs in different ways. The context is at the machine code level. Certain instructions read or write multiple bytes into memory (movb, movw, movl in x86 assembler). Obviously, it's relatively easy to compile a C statement that results in multiple byte memory access instructions. They deal with types such as u16, u32, u64.

Natural alignment

The rule mentioned above forms what we refer to as natural alignment: When accessing N bytes of memory, the base memory address must be evenly divisible by N, i.e. addr % N == 0.

The above rule defines a behavior called natural alignment. When accessing N byte memory, the base memory address must be divisible by N. That is, add% N == 0.

When writing code, assume the target architecture has natural alignment requirements.

When writing code, assume that the target architecture has a natural alignment requirement.

In reality, only a few architectures require natural alignment on all sizes of memory access. However, we must consider ALL supported architectures; writing code that satisfies natural alignment requirements is the easiest way to achieve full portability.

In fact, there are limited architectures that require natural alignment for memory access for all sizes. However, all supported architectures must be considered. Writing code that meets the natural alignment requirements is the easiest way to achieve full portability.

Why unaligned access is bad

The effects of performing an unaligned memory access vary from architecture to architecture. It would be easy to write a whole document on the differences here; a summary of the common scenarios is presented below:

The impact of running unaligned memory access depends on the architecture. It's easy to describe all the differences here. An overview of common scenarios is given below.

  • Some architectures are able to perform unaligned memory accesses transparently, but there is usually a significant performance cost.

--There is an architecture that can perform unaligned memory access equivalently. These usually have significant performance costs. --There is an architecture that raises an exception on the processor when unaligned access occurs. The exception handler can compensate for unaligned access, but at a very high cost to performance. --There is an architecture that raises an exception on the processor when unaligned access occurs. However, the exception does not contain enough information to correct unaligned access. --Some architectures do not allow unaligned memory access. Memory access is performed differently than requested, causing code bugs that are difficult to detect.

It should be obvious from the above that if your code causes unaligned memory accesses to happen, your code will not work correctly on certain platforms and will cause performance problems on others.

As mentioned above, when code causes unaligned memory access, the code will not work properly on certain platforms and will lead to performance issues on other platforms.

Code that does not cause unaligned access

At first, the concepts above may seem a little hard to relate to actual coding practice. After all, you don't have a great deal of control over memory addresses of certain variables, etc.

First, it is difficult to relate the above concept to the actual coding implementation. After all, it is difficult to control the memory address of a particular variable.

Fortunately things are not too complex, as in most cases, the compiler ensures that things will work for you. For example, take the following structure::

Fortunately, in most cases, the problem isn't that complicated, as it guarantees that the compiler will work. For example, when dealing with the following structures ::

	struct foo {
		u16 field1;
		u32 field2;
		u8 field3;
	};

Let us assume that an instance of the above structure resides in memory starting at address 0x10000. With a basic level of understanding, it would not be unreasonable to expect that accessing field2 would cause an unaligned access. You'd be expecting field2 to be located at offset 2 bytes into the structure, i.e. address 0x10002, but that address is not evenly divisible by 4 (remember, we're reading a 4 byte value here).

Suppose an instance of the above structure is in memory starting at 0x10000 address. At a basic level of understanding, it is not unreasonable to expect unaligned access to occur in field2. field2 is located at offset 2byte, 0x10002 address in the structure and cannot be divisible by 4 (note that we are reading a 4byte value here).

Fortunately, the compiler understands the alignment constraints, so in the above case it would insert 2 bytes of padding in between field1 and field2. Therefore, for standard structure types you can always rely on the compiler to pad structures so that accesses to fields are suitably aligned (assuming you do not cast the field to a type of different length).

Fortunately, the compiler understands alignment formation, so in such cases we put a 2 byte padding between field1 and field2. As a result, for standard struct types, the compiler embeds padding in the struct so that access to the field is properly adjusted (assuming you don't cast fields with different length types). Masu).

Similarly, you can also rely on the compiler to align variables and function parameters to a naturally aligned scheme, based on the size of the type of the variable.

Similarly, the compiler can adjust the parameters of a variable's function to a naturally aligned scheme based on the size of the variable's type.

At this point, it should be clear that accessing a single byte (u8 or char) will never cause an unaligned access, because all memory addresses are evenly divisible by one.

At this point, even if you access byte (u8 or char), all memory addresses are divisible by 1, so unaligned memory access will not occur.

On a related topic, with the above considerations in mind you may observe that you could reorder the fields in the structure in order to place fields where padding would otherwise be inserted, and hence reduce the overall resident memory size of structure instances. The optimal layout of the above example is::

As a related topic, with the above cave in mind, you'll notice that by rearranging the fields of a struct, you can put the fields where the padding is inserted. And the resident memory size of the structure instance can be reduced. The optimal layout for the above example is:

	struct foo {
		u32 field2;
		u16 field1;
		u8 field3;
	};

For a natural alignment scheme, the compiler would only have to add a single byte of padding at the end of the structure. This padding is added in order to satisfy alignment constraints for arrays of these structures.

In the natural alignment scheme, the compiler just adds a 1-byte padding to the end of the structure. This padding is added so that an array of this structure meets the placement constraints.

(Translation: u32 + u16 + u8, 4byte + 2byte + 1byte = 7byte. This shifts the next array, so you can get consistency by adding 1byte at the end and making the entire structure 8byte.) ..

Another point worth mentioning is the use of attribute((packed)) on a structure type. This GCC-specific attribute tells the compiler never to insert any padding within structures, useful when you want to use a C struct to represent some data that comes in a fixed arrangement 'off the wire'.

Another point to mention is the application of __attribute __ ((packed)) to structure types. This GCC-specific attribute tells you not to put padding inside the structure. This is useful for using C structures to represent fixed placement data that is "off the wire".

You might be inclined to believe that usage of this attribute can easily lead to unaligned accesses when accessing fields that do not satisfy architectural alignment requirements. However, again, the compiler is aware of the alignment constraints and will generate extra instructions to perform the memory access in a way that does not cause unaligned access.

If you take advantage of this attribute, you might think that unaligned access can easily be triggered by field access that does not meet the architectural alignment requirements. But, again, the compiler is aware of the alignment constraint and generates additional instructions to perform memory access in a way that does not cause unaligned memory access.

Of course, the extra instructions obviously cause a loss in performance compared to the non-packed case, so the packed attribute should only be used when avoiding structure padding is of importance.

Of course, additional instructions cause a performance loss compared to non-packed cases. Therefore, use the packed attribute only when it is important to avoid struct padding.


Originally, it is a part of the Linux Kernel source code, so it will be treated as GPLv2 (recognition that it should be).

https://www.kernel.org/doc/html/latest/index.html

Licensing documentation

The following describes the license of the Linux kernel source code (GPLv2), how to properly mark the license of individual files in the source tree, as well as links to the full license text.

https://www.kernel.org/doc/html/latest/process/license-rules.html#kernel-licensing

Recommended Posts

Unaligned Memory Accesses (1/2)
Unaligned Memory Accesses (2/2)