Performance issues during serialization operations related to the use of LocalDateTime

This article describes performance issues related to the use of the LocalDateTime and Instant time formats that Alibaba engineers encountered during the serialization process.

From Lv Renqi

Performance issues

When performing performance pressure tests on the new version of Apache Dubbo, the attributes of the Transfer Object (TO) class I found a related issue. Changing Date to LocalDateTime reduced throughput from 50,000 to 20,000 and increased response time from 9ms to 90ms.

Of these changes, the one we were most concerned about was the response time change. Response times are, in many ways, the cornerstone of good performance numbers, because performance indicators are only meaningful once a certain response time level is ensured. For stress tests, Gigabit Per Second (GPS) and Transaction Per Second (TPS) numbers are only allowed if the target response time numbers are met. Pure theoretical numbers are meaningless. In cloud computing, every bit of response time is important. A 0.1ms increase in the response time of the underlying service means a 10% increase in overall cost.

Latency is like the Achilles heel of a system with remote users. Data packet delays increase by 1 millisecond for every 100 km. The waiting time between Hangzhou and Shanghai is about 5ms, and the waiting time between Shanghai and Shenzhen is naturally even higher due to the considerably larger distance. The direct result of latency is an increase in response time, which worsens the overall user experience and increases costs.

If the request modifies the records in the same row in different units, the cost is very high, even if it can be consistent and consistent. Remote High Speed Service Framework, A distributed RPC service framework widely used in Alibaba. If one service calls another on a request that requires access to the 0.18e27d1f7aNxOS) (HSF) service or other remote database more than 10 times, the latency will be added immediately, resulting in a snowball effect.

The importance of universality in Java

Dealing with time is everywhere in the world of computer science. Without the rigorous notion of time, 99.99% of applications would be meaningless and impractical. This is especially true of the time-oriented custom processing found in most surveillance systems in the cloud these days.

Java Development Kit 8 (JDK 8) Previously, java.util.Date was used to describe the date and time, and java.util.Calendar was used for time-related computing. JDK 8 introduces more convenient time classes such as ʻInstant, LocalDateTime, ʻOffsetDateTime, ZonedDateTime. In general, these classes have made time processing more convenient.

ʻInstantstores time stamps in Coordinated Universal Time (UTC) format and provides a machine-facing or internal time display. It is suitable for database storage, business logic, data exchange, and serialization scenarios.LocalDateTime, ʻOffsetDateTime, and ZonedDateTime contain time zone or seasonal information and also provide a time display for users to input and output data. If the same time is output to different users, the values will be different. For example, the shipping time of an order is displayed to buyers and sellers in different local times. You can think of these three classes as tools that are directed to the outside, rather than the internal working parts of your application.

In short, ʻInstant is good for back-end services and databases, while LocalDateTime` and its cohort are good for front-end services and displays. The two are theoretically compatible, but they actually perform different functions. The international business team has a wealth of experience and ideas in this regard.

Date and ʻInstant` are often used to integrate Dubbo with Alibaba's internal High Speed Services Framework (HSF).

Reproduction of performance problems

You can try to reproduce it in order to get an accurate picture of what is behind the performance problems you saw earlier. But before that, let's consider the performance benefits of ʻInstantthrough a brief demo. To do this, consider the general scenario of defining a date in theDate format and then using the ʻInstant format.

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public String date_format() {
        Date date = new Date();
        return new SimpleDateFormat("yyyyMMddhhmmss").format(date);
    }

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public String instant_format() {
        return Instant.now().atZone(ZoneId.systemDefault()).format(DateTimeFormatter.ofPattern(
                "yyyyMMddhhmmss"));
    }

After doing this, run the stress test for 30 seconds on four local concurrent threads. The result is as follows.

Benchmark                            Mode  Cnt        Score   Error  Units
DateBenchmark.date_format           thrpt       4101298.589          ops/s
DateBenchmark.instant_format        thrpt       6816922.578          ops/s

From these results, we can conclude that ʻInstant` is advantageous in terms of format performance. In fact, Instant has a performance advantage for other operations as well. For example, Instant has been found to show promising performance in date and time addition and subtraction operations.

Instant pitfalls during serialization operations

Then, as a replication of the problem seen above, Java and Hessian (optimized for Taobao) We also performed stress tests to see the changes in performance during serialization and deserialization operations, respectively.

Hessian defaults to HSF 2.2 and Dubbo serialization schemes:

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public Date date_Hessian() throws Exception {
        Date date = new Date();
        byte[] bytes = dateSerializer.serialize(date);
        return dateSerializer.deserialize(bytes);
    }

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public Instant instant_Hessian() throws Exception {
        Instant instant = Instant.now();
        byte[] bytes = instantSerializer.serialize(instant);
        return instantSerializer.deserialize(bytes);
    }

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public LocalDateTime localDate_Hessian() throws Exception {
        LocalDateTime date = LocalDateTime.now();
        byte[] bytes = localDateTimeSerializer.serialize(date);
        return localDateTimeSerializer.deserialize(bytes);
    }

The result was as follows. By using the Hessian protocol, the throughput dropped sharply when using the ʻInstant` and LocalDateTime formats. In reality, the throughput is 100 times lower than when using the Date format. Upon further investigation, we found that the Date serialized byte stream is 6 bytes, while the LocalDateTime stream is 256 bytes. It also increases the cost of network bandwidth for transmission. Java's built-in serialization solution shows a slight drop, but it doesn't make a substantial difference.

Benchmark                         Mode  Cnt        Score   Error  Units
DateBenchmark.date_Hessian       thrpt       2084363.861          ops/s
DateBenchmark.localDate_Hessian  thrpt         17827.662          ops/s
DateBenchmark.instant_Hessian    thrpt         22492.539          ops/s
DateBenchmark.instant_Java       thrpt       1484884.452          ops/s
DateBenchmark.date_Java          thrpt       1500580.192          ops/s
DateBenchmark.localDate_Java     thrpt       1389041.578          ops/s

Problem analysis

Our analysis is as follows. Date is one of eight primitive types of serialization of Hessian objects.

image.png

Second, Instant had to go through Class.forName for both serialization and deserialization, causing a sharp drop in throughput and response time. Therefore, Date has an advantage.

image.png

Last impression

I found that you can upgrade and optimize Hessian by implementing com.alibaba.com.caucho.hessian.io.Serializer in a class such as Instant via extension and registering with SerializerFactory, so in this article You can solve the problem you have addressed. However, there are compatibility issues with earlier and future versions. This is a serious problem. Alibaba's fairly complex dependencies make this impossible. Given this problem, the only recommendation we can make is to use Date as the preferred time attribute for the TO class.

Technically, the HSF RPC protocol is a session layer protocol, and version recognition is also done here. However, the presentation layer of service data is implemented by a self-describing serialization framework like Hessian and lacks version recognition. Therefore, it is very difficult to upgrade.

Recommended Posts

Performance issues during serialization operations related to the use of LocalDateTime
Set the time of LocalDateTime to a specific time
Output of how to use the slice method
Java: Use Stream to sort the contents of the collection
How to use the link_to method
How to use the include? method
How to use the form_with method
[Java version] The story of serialization
How to use the wrapper class
How to use setDefaultCloseOperation () of JFrame
From introduction to use of ActiveHash
I tried to make full use of the CPU core in Ruby
I want you to use Enum # name () for the Key of SharedPreference