Introduction

InfoScale guarantees operation on RHEL on AWS, and has the following two main advantages. In fact, in the on-premises world, many companies are deploying InfoScale to enjoy the following benefits: ** 1. Unique storage management function reduces downtime and work man-hours during storage (EBS in the case of AWS) maintenance ** ** 2. Unique clustering function improves availability to cover application and OS level failures ** However, the reality is that there is a deep-rooted concern that "I / O performance management on the cloud is difficult unlike on-premise, but by introducing InfoScale on AWS, it will become even more complicated?" In this article, in order to eliminate such anxiety, I / O performance test results in a typical configuration with InfoScale 7.4.2 RHEL version built on AWS are published. In addition, we will consider the validity of introducing InfoScale for storage management and clustering for each configuration. It also describes simple tuning to improve I / O performance.

5 configurations with performance tests

** Configuration 1 Single node configuration ** This is the structure that should be the standard in this document. When InfoScale is installed in a non-cluster configuration instance on AWS to manage block storage, file system expansion when expanding or adding block storage can be performed without service outage or in a shorter service outage time. (For more information, see here](https://www.veritas.com/support/en_US/doc/InfoScale7.4.1_RHEL_on_AWS_FSS_storage_maintenance) and here (https://www.veritas.com/support/) Please refer to en_US / doc / InfoScale7.4.1_Win_on_AWS_VVR_storage_maintenance). Please use it as a standard of I / O performance that can be achieved when InfoScale is installed in a non-clustered instance on AWS. Also, by comparing how inferior the other four configurations are to the I / O performance obtained with this configuration, the points to keep in mind regarding the I / O performance for each cluster configuration become clear.

** Configuration 2 Shared disk cluster configuration ** By considering how much the I / O performance deteriorates with respect to configuration 1, it is possible to understand the degree of overhead associated with clustering. In Configuration 1 and Configuration 2, one instance still has dedicated access to one EBS, so in theory only the clustering overhead is noticeable. Please refer to here for how to build the above configuration.

** Configuration 3 Cluster configuration in AZ using FSS (heartbeat is TCP1 system) ** By considering how much the I / O performance deteriorates with respect to configuration 2, it is possible to understand the degree of overhead associated with the virtual mirror by FSS. The only difference between Configuration 2 and Configuration 3 is whether or not you are using a virtual mirror, so in theory only the overhead of the virtual mirror will be noticeable. Please refer to here for how to build the above configuration.

** Configuration 4 Cluster configuration in AZ using FSS (Heartbeat is UDP 2 system) ** By comparing the I / O performance with the configuration 3, it is possible to understand the effect of the heartbeat LAN configuration that realizes the virtual mirror by FSS on the I / O performance. The difference between configuration 3 and configuration 4 is whether the heartbeat LAN has one TCP / IP system or two UDP / IP systems. If an I / O load that exceeds the performance of the AWS NIC is applied, it can be expected that configuration 4, which can distribute the load in two systems, will prevail, and in other cases, configuration 3, which has a small overhead, will prevail. Please refer to here for how to build the above configuration.

** Configuration 5 AZ straddling cluster configuration using FSS (heartbeat is UDP 2 system) ** By comparing Configuration 4 and I / O performance, it is possible to understand the effect of network delay across AZ on I / O performance. The only difference between Configuration 4 and Configuration 5 is whether or not it straddles AZ. The performance of the virtual mirror by FSS, especially IOPS, depends on the latency of the heartbeat network used, so it can be expected that configuration 4 will be superior. Please refer to here for how to build the above configuration.

** Type of EBS to use ** In this document, the EBS used for verification uses "gp2". gp2 has a function called "burst bucket", which guarantees a certain IOPS if there is room in the burst ticket. Generally, the IOPS of EBS is proportional to the capacity, so if you use EBS with a small capacity, you may not get enough IOPS, but you can avoid this disadvantage by using gp2. However, be careful not to run out of burst tickets. For gp2 and burst buckets, please refer to AWS Information.

** Machine environment ** The environmental information for this verification is as follows. OS：RHEL7.7 InfoScale 7.4.2 Instance type: t2.large (2Core, 8Gbyte memory) EBS: gp2 Standard SSD 5Gbyte

Comparison of performance test results for 5 configurations

In the comparison of each configuration, the tendency was almost the same for all of the following 9 types. In this book, we will introduce and discuss the results of the most typical "I / O size 4Kbyte, Read / Write ratio 5: 5, random access". ・ I / O size 4Kbyte Read / Write ratio 10: 0, random access ・ I / O size 4Kbyte Read / Write ratio 5: 5, random access ・ I / O size 4Kbyte Read / Write ratio 0:10, random access ・ I / O size 4Kbyte Read / Write ratio 10: 0, sequential ・ I / O size 4Kbyte Read / Write ratio 5: 5, sequential ・ I / O size 4Kbyte Read / Write ratio 0:10, sequential ・ I / O size 64Kbyte Read / Write ratio 10: 0, sequential ・ I / O size 64Kbyte Read / Write ratio 5: 5, sequential ・ I / O size 64Kbyte Read / Write ratio 0:10, sequential

IOPS As for IOPS, as expected, the single configuration and the shared disk cluster configuration gave almost the same results. It seems to be an error that the shared disk cluster configuration slightly exceeded. Both of them demonstrated the performance of about 2000 IOPS. Considering that the IOPS guaranteed by the burst bucket was 3000 and the load applied by IOMETER, it can be said that the result is as expected. The two configurations with FSS virtual mirroring in AZ are about half the IOPS of the previous two configurations. This can be thought of as the overhead of the virtual mirror. Finally, in the configuration where the virtual mirror was performed by FSS across AZ, the performance was further deteriorated due to the heartbeat delay caused by straddling AZ.

throughput In terms of throughput, the comparison result is exactly the same as IOPS. This is because throughput is IOPS multiplied by I / O size. As for the specific throughput value, for example, when focusing on the single node configuration, the IOPS on the previous page was 2000, so multiplying it by the I / O size of 4Kbyte gives 8Mbyte, which matches the graph above.

** CPU usage ** Regarding CPU usage, the comparison result was completely different from IOPS and throughput, but this is also as expected. Single-node and shared disk configurations with no special I / O related results resulted in very low results. On the other hand, the remaining three configurations with FSS virtual mirroring have a maximum overhead of about 2%. Although it looks prominent in the graph, 2% is negligible for the entire system. In addition, the CPU usage rate is slightly higher in the configuration using TCP / IP because various checks are performed in the network communication for synchronous processing of the virtual mirror.

Consideration

** Single node configuration ** When InfoScale is installed on a single node (non-cluster configuration) RHEL instance on WS for the purpose of improving storage manageability, the CPU overhead due to the introduction of InfoScale is within the margin of error, and the I / O performance is also in the native environment. It can be said that there is not much difference.

** Shared disk type cluster configuration (cluster in AZ) ** The I / O performance and CPU overhead of the shared disk type cluster configuration using nfoScale were not much different from those of the non-cluster configuration. Therefore, when considering a cluster configuration within the AZ, it is recommended to select a shared disk type cluster from the viewpoint of performance.

** Cluster configuration across AZ ** All three patterns using virtual mirrors by FSS resulted in inferior I / O performance and CPU overhead compared to single node and shared disk types. Therefore, if there is no need to do virtual mirroring (cluster in AZ), virtual mirror type cluster is not recommended from the viewpoint of performance. However, virtual mirrors are a viable option because shared disk configurations are not available for clusters across AZs. It is possible to use replication in addition to virtual mirrors, but for the sake of clarity, replication is not mentioned in this document. In other words, assuming a cluster that straddles the AZ, tuning is required to improve the I / O performance of the cluster using virtual mirrors. Tuning is discussed in detail in the next chapter.

Tuning to improve virtual mirror performance

Heartbeat tuning is an effective way to improve the I / O performance of virtual mirrored clusters. This is because virtual mirrored clusters synchronize the local disks of individual cluster nodes via heartbeat. There are three ways to tune your heartbeat.

1. MTU size expansion
RHEL communication buffer size expansion
1. Extension of flow control threshold of LLT, InfoScale's heartbeat driver Each has its own orientation, so I will explain them individually.

** Improved performance by expanding MTU size ** When synchronizing local disks via heartbeat, the larger the MTU size, the larger the unit of data transfer, which improves transfer efficiency. However, if the data write size is small, the transfer efficiency will be worse, so be careful. Tuning is effective in the following cases.

1. Batch processing
Sequential access
1. Many writes For example, if the NIC MTU is 1500 bytes and the file system block size is 8K, you can expect performance improvement by making the MTU size larger than 8K.

The following is a performance comparison of I / O size 4Kbyte, Read / Write ratio 5: 5, and sequential access in a cluster using TCP / IP for heartbeat in AZ. The following 4 patterns were compared.

1. No tuning
Increased MTU size from 1500 bytes to 9000 bytes
1. Expanded RHEL communication buffer size to 4 times the default
Expanded LLT flow control threshold to twice the default In this pattern, 4Kbytes of data flow continuously to the heartbeat, so expanding the MTU size was the most effective result.

** Performance improvement by expanding RHEL communication buffer size ** When synchronizing local disks via heartbeat, a large RHEL communication buffer size will improve the efficiency of data transfer and improve performance. However, the buffer size must not exceed the amount of available memory. Tuning is effective in the following cases.

1. Batch processing
UDP is selected for the heartbeat of the cluster
1. Many writes and large I / O block size For more information on RHEL communication buffers and tuning methods, see Red Hat Documentation (https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-network-dont- See adjust-defaults).

The following is a performance comparison with I / O size 64Kbyte, Read / Write ratio 5: 5, and sequential access in a cluster using UDP / IP for heartbeat in AZ. The following 4 patterns were compared.

1. No tuning
Increased MTU size from 1500 bytes to 9000 bytes
1. Expanded RHEL communication buffer size to 4 times the default
Expanded LLT flow control threshold to twice the default In this pattern, there is not much difference in any tuning, but the result is that the communication buffer size expansion is slightly more effective than the others.

** Performance improvement by expanding LLT flow control threshold ** InfoScale's heartbeat is done using a unique protocol called LLT. LLT performs flow control to prevent a large amount of data from accumulating in the send queue. Flow control has a threshold, and when the amount of data accumulated in the queue exceeds the high water value, the queue will not be accepted until the amount of accumulated data falls below the low water value. Therefore, when data transfer occurs in bursts, flow control reduces data transfer efficiency. Conversely, by expanding the flow control threshold (high water value), the efficiency of data transfer and performance will improve. However, the size of the queue must not exceed the amount of available memory. Tuning is effective in the following cases.

1. Network latency is high, such as across AZs
Many writes For more information on LLT flow control thresholds and tuning methods, see page 1155 of the InfoScale Manual (https://sort.veritas.com/DocPortal/pdf/142044775-142044783-1).

The following is a performance comparison of I / O size 4Kbyte, Read / Write ratio 5: 5, and sequential access in a cluster that uses UDP / IP across AZs for heartbeat. The following 4 patterns were compared.

1. No tuning
Increased MTU size from 1500 bytes to 9000 bytes
1. Expanded RHEL communication buffer size to 4 times the default
Expanded LLT flow control threshold to twice the default In this pattern, flow control occurs frequently due to the high latency of the heartbeat. Tuning to suppress it was the most effective result. This tuning is recommended if you want to use FSS virtual mirrors in clusters across AZs to improve performance.

In addition, Veritas has published a white paper on overall performance testing, including details on the tuning methods described here. Please refer to here!

in conclusion

How was it? Isn't this article and the white paper introduced in the article reduced the concerns when introducing InfoScale to RHEL on AWS? Please look forward to InfoScale on AWS in the future!

For business negotiation consultation here

When filling in inquiries from this article, please be sure to enter the #GWC tag in the "Comment / Communication" field. Your information will be managed in accordance with Veritas' privacy policy.

[LINUX] InfoScale 7.4.2 for RHEL on AWS Performance Test Results Public Release