https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/process/unaligned-memory-access.rst
Unaligned Memory Accesses
Code that causes unaligned access
With the above in mind, let's move onto a real life example of a function that can cause an unaligned memory access. The following function taken from include/linux/etherdevice.h is an optimized routine to compare two ethernet MAC addresses for equality::
With that in mind, let's move on to a working example of a function that causes unaligned memory access. The following function is an optimized process that compares two MAC addresses as described in include / linux / etherdevice.h.
bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
{
#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
return fold == 0;
#else
const u16 *a = (const u16 *)addr1;
const u16 *b = (const u16 *)addr2;
return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) == 0;
#endif
}
In the above function, when the hardware has efficient unaligned access capability, there is no issue with this code. But when the hardware isn't able to access memory on arbitrary boundaries, the reference to a[0] causes 2 bytes (16 bits) to be read from memory starting at address addr1.
In the above function, if the hardware allows unaligned access, this code will have no problem. However, if the hardware cannot access the memory at any boundary, the reference to a [0] will read 2 bytes (16 bits) from the memory starting at address addr1.
Think about what would happen if addr1 was an odd address such as 0x10003 (Hint: it'd be an unaligned access.)
Think about what happens if addr1 has an odd address, like 0x10003 (hint: unaligned access).
Despite the potential unaligned access problems with the above function, it is included in the kernel anyway but is understood to only work normally on 16-bit-aligned addresses. It is up to the caller to ensure this alignment or not use this function at all. This alignment-unsafe function is still useful as it is a decent optimization for the cases when you can ensure alignment, which is true almost all of the time in ethernet networking context.
Despite the potential unaligned access problem in the above function, it is included in the kernel. This function is usually understood to work only at 16 bit aligned addresses. It's up to the caller to check this alignment or not use this function. This alignment-unsafe function is still available as it is a good optimization if alignment can be guaranteed. This is most often the case with ethernet networking contexts.
Here is another example of some code that could cause unaligned accesses::
The following is another example of code that causes unaligned access.
void myfunc(u8 *data, u32 value)
{
[...]
*((u32 *) data) = cpu_to_le32(value);
[...]
}
This code will cause unaligned accesses every time the data parameter points to an address that is not evenly divisible by 4.
This code causes unaligned access every time the data parameter indicates an address that is not divisible by 4.
In summary, the 2 main scenarios where you may run into unaligned access problems involve:
In summary, the two main scenarios where unaligned access problems can occur are:
- Casting variables to types of different lengths
Pointer arithmetic followed by access to at least 2 bytes of data
Cast variables to types of various lengths.
Access to pointer operation and associated data of at least 2 bytes or more
Avoiding unaligned accesses
The easiest way to avoid unaligned access is to use the get_unaligned() and put_unaligned() macros provided by the <asm/unaligned.h> header file.
An easy way to avoid unaligned access is to use get_unaligned () and put_unaligned () macro defined in the <asm / unaligned.h> header.
Going back to an earlier example of code that potentially causes unaligned access::
Let's return to an example of code that potentially causes unaligned access.
void myfunc(u8 *data, u32 value)
{
[...]
*((u32 *) data) = cpu_to_le32(value);
[...]
}
To avoid the unaligned memory access, you would rewrite it as follows::
To avoid unaligned memory access, rewrite as follows:
void myfunc(u8 *data, u32 value)
{
[...]
value = cpu_to_le32(value);
put_unaligned(value, (u32 *) data);
[...]
}
The get_unaligned() macro works similarly. Assuming 'data' is a pointer to memory and you wish to avoid unaligned access, its usage is as follows::
get_unaligned () macro behaves as well. If data
points to a memory pointer and expects to avoid unaligned access, then this usage would be:
u32 value = get_unaligned((u32 *) data);
These macros work for memory accesses of any length (not just 32 bits as in the examples above). Be aware that when compared to standard access of aligned memory, using these macros to access unaligned memory can be costly in terms of performance.
These macros work for memory access of any length (even if it's not 32bit, as in the example). Unaligned memory access using these macros can be costly in terms of performance compared to normal access to alignment memory.
If use of such macros is not convenient, another option is to use memcpy(), where the source or destination (or both) are of type u8* or unsigned char*. Due to the byte-wise nature of this operation, unaligned accesses are avoided.
If you find it inconvenient to use these macros, you can use memcpy () as an alternative. Here, the source and destination (or both) types are u8 * and / or unsigned char *. Due to the byte-by-byte nature of this operation, unaligned access is avoided.
Alignment vs. Networking
On architectures that require aligned loads, networking requires that the IP header is aligned on a four-byte boundary to optimise the IP stack. For regular ethernet hardware, the constant NET_IP_ALIGN is used. On most architectures this constant has the value 2 because the normal ethernet header is 14 bytes long, so in order to get proper alignment one needs to DMA to an address which can be expressed as 4*n + 2. One notable exception here is powerpc which defines NET_IP_ALIGN to 0 because DMA to unaligned addresses can be very expensive and dwarf the cost of unaligned loads.
In architectures that require aligned loads, networking requires that the IP header be aligned on a 4-byte boundary for IP stack optimization. In general ethernet hardware, NET_IP_ALIGN is applied. In most architectures, the value of this constant is 2 because the typical ethernet header is 14 bytes long. Therefore, to get proper alignment, you need to DMA to an address that can be represented as 4 * n + 2. One notable exception here is powerpc with NET_IP_ALIGN set to 0. This is because DMA to an unaligned address is very costly and may reduce the cost of an unaligned load.
For some ethernet hardware that cannot DMA to unaligned addresses like 4*n+2 or non-ethernet hardware, this can be a problem, and it is then required to copy the incoming frame into an aligned buffer. Because this is unnecessary on architectures that can do unaligned accesses, the code can be made dependent on CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS like so::
This can be a problem for some ethernet hardware that cannot be DMAd to an unaligned address, such as 4 * n + 2 or non-ethernet hardware. Then you need to copy the incoming frame to the aligned buffer. This is not needed for architectures that can perform unaligned access, so you can make your code depend on CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS as follows:
#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
skb = original skb
#else
skb = copy skb
#endif
Originally, it is a part of the Linux Kernel source code, so it will be treated as GPLv2 (recognition that it should be).
https://www.kernel.org/doc/html/latest/index.html
Licensing documentation
The following describes the license of the Linux kernel source code (GPLv2), how to properly mark the license of individual files in the source tree, as well as links to the full license text.
https://www.kernel.org/doc/html/latest/process/license-rules.html#kernel-licensing