Step-by-step understanding of O / R mapping

Introduction

What is O / R mapping?

O / R mapping is, in a nutshell, a way of manipulating relational database records as regular objects in an object-oriented programming language. It's easier to see the actual code than to give a more detailed definition. The following are examples of using the low-level JDBC API and JPA, which is a typical example of the high-level O / R mapping framework.

public List<Issue> findByProjectId(long projectId) {
  String query = "select id, title, description from issue where project_id = ?";
  try (PreparedStatement ps = connection.prepareStatement(query)) {
    ps.setLong(1, projectId);
    List<Issue> issues = new ArrayList<>();
    try (ResultSet rs = ps.executeQuery()) {
      while (rs.next()) {
        Issue issue = new Issue();
        issue.setId(rs.getLong("id"));
        issue.setTitle(rs.getString("title"));
        issue.setDescription(rs.getString("description"));
        issues.add(issue);
      }
    }
    return issues;
  } catch (SQLException e) {
    throw new RuntimeException(e);
  }
}
public List<Issue> findByProjectId(long projectId) {
  String query = "select i from Issue i where i.project.id = ?1";
  List<Issue> issues = entityManager.createQuery(query, Issue.class)
      .setParameter(1, projectId).getResultList();
  return issues;
}

The difference is clear when comparing the two. The latter, which utilizes high-level O / R mapping, eliminates the need for routine descriptions and expresses the intent more clearly.

Misunderstood O / R mapping

In the above example, O / R mapping looks good, but there are many complaints in the world. Some are radical, such as denying all O / R mappings, while others are denying high-level O / R mapping frameworks and preferting the use of simpler alternatives. In such a situation, the position to actively promote the use of high-level O / R mapping framework seems to be rather a minority.

There are two main reasons why O / R mapping is disliked. The first reason would be that high-level O / R mapping frameworks don't seem to listen to programmers. Many people have experience using O / R mapping frameworks who have had difficulty dealing with performance issues because SQL was not executed as intended under the hood. Underlying this seems to be a misunderstanding about the basic mechanism of O / R mapping. The second reason may be that high-level O / R mapping is often used in unsuitable projects. As we will see in more detail later, using a high-level O / R mapping framework in situations where prerequisites such as schema discretion are not met is close to suicide. Underlying this seems to be a misunderstanding about the criteria for proper use of O / R mapping.

Summary of this article

In this article, in order to resolve the above misunderstanding, I will show that there are multiple levels of O / R mapping in one word. By looking at the low-level to high-level means step by step, what kind of problem the basic mechanism of each level has emerged as a solution, and what criteria should be used for each level means. You should understand what to do.

5 levels

In this article, I'll explain O / R mapping in five levels:

--Level 1: Low level API --Level 2: Abstraction of pre-processing and post-processing --Level 3: Query and simple object mapping --Level 4: Mapping queries and related navigable objects --Level 5: Table-object mapping

These level settings are for convenience of explanation only. The functionality of various Java O / R mapping frameworks actually overlaps at multiple levels. In addition, for the sake of brevity, the target processing type is narrowed down to the reference system, and the update system is omitted.

As the subject matter deals with various actual Java frameworks, it should be read as a brief introduction to them. However, it is not intended to be an exhaustive explanation of the function, so if you want to know more, please refer to the official document linked to.

Level 1: Low level API

First, let's take a look at data access using the JDK built-in JDBC API as is. As an example, I will reprint the code given at the beginning of this article.

public List<Issue> findByProjectId(long projectId) {
  String query = "select id, title, description from issue where project_id = ?";
  try (PreparedStatement ps = connection.prepareStatement(query)) {
    ps.setLong(1, projectId);
    List<Issue> issues = new ArrayList<>();
    try (ResultSet rs = ps.executeQuery()) {
      while (rs.next()) {
        Issue issue = new Issue();
        issue.setId(rs.getLong("id"));
        issue.setTitle(rs.getString("title"));
        issue.setDescription(rs.getString("description"));
        issues.add(issue);
      }
    }
    return issues;
  } catch (SQLException e) {
    throw new RuntimeException(e);
  }
}

The subject is a simple issue tracker application. The processing content of the above code is extraction by the project ID of the issue.

The definition of the ʻissue` table looks like this:

create table issue
(
   id bigint primary key,
   project_id bigint,
   title varchar (100),
   description text
);

How to Use

What you need to do to use this level of approach is:

--Specify the query string --Specify query parameters --Execute the query --Scan the query results in a loop --Refill records into objects --Manage resources --Corresponding to low-level exceptions

Task

The complexity of routine description is a problem. For the code writer, the amount of description is too much for the code that just executes the query and fetches the result. Even for the code reader, the intention is buried in the extra code and it is difficult to understand.

In addition, resource management risks cannot be overlooked. If you forget the close process that is supported by using try-with-resources in the above example, a resource leak will occur.

Selection criteria

As of 2019, there are few occasions in production code that should adopt this level of approach. You may have the opportunity to use this level of approach only if you are extremely performance conscious or if for some reason your use of the framework is restricted. However, even in those cases, it is easy to realize the method equivalent to Level 2 described later on your own.

Level 2: Abstraction of pre-processing and post-processing

Of the level 1 standard descriptions, those related to pre-processing and post-processing can be abstracted relatively easily. The following is an example using Jdbi.

public List<Issue> findByProjectId(long projectId) {
  String query = "select id, title, description from issue where project_id = ?";
  List<Issue> issues = handle.createQuery(query).bind(0, projectId)
      .map((rs, ctx) -> {
        Issue issue = new Issue();
        issue.setId(rs.getLong("id"));
        issue.setTitle(rs.getString("title"));
        issue.setDescription(rs.getString("description"));
        return issue;
      }).list();
  return issues;
}

How to Use

What you need to do to use this level of approach is: It is clearly reduced compared to Level 1.

--Specify the query string --Specify query parameters --Execute the query --Refill records into objects

Task

Although the pre-processing and post-processing have been abstracted, refilling records into objects is still cumbersome. Since the above code is just an example, the number of columns is limited, but in an actual project, a standard description will be required for many columns.

Typical Java framework

There is no framework dedicated to this level, but Jdbi and Spring JdbcTemplate Frameworks with Level 3 functionality, such as /spring-framework-reference/data-access.html#jdbc-JdbcTemplate), also have Level 2 functionality.

Also, as mentioned in Level 1 “Selection Criteria”, it is easy to build a framework of this level on your own. It should be a good practice platform to familiarize yourself with Lambda.

Selection criteria

There are not many situations where this level of method should be adopted. As with Level 1, it is an option if performance is extremely important or if for some reason the use of the framework is restricted.

Also, if the structure of records and objects is significantly different and you need to manually write flexible mapping processes, you may dare to stay at this level instead of level 3.

Level 3: Query and simple object mapping

Level 3 automates the refilling of records to objects that was manually supported at Level 2. The following is an example of using another API of Jdbi which is the same as level 2.

public List<Issue> findByProjectId(long projectId) {
  handle.registerRowMapper(BeanMapper.factory(Issue.class));
  String query = "select id, title, description from issue where project_id = ?";
  List<Issue> issues = handle.createQuery(query).bind(0, projectId)
      .mapTo(Issue.class).list();
  return issues;
}

How to Use

What you need to do to use this level of approach is:

--Specify the query string --Specify query parameters --Execute the query

Task

At first glance, this level of approach may seem versatile, but the lack of related navigation of objects is a serious flaw. The actual application consists of multiple tables. For example, in the case of the simple issue tracker application that is the subject of this article, there should be a many-to-one related project table and a one-to-many related comment table in addition to the ʻissue table. For object-oriented thinking, it is natural that such data can be accessed as related objects by methods such as ʻIssue # getProject () and ʻIssue # getComments ()`. However, such related navigation cannot be achieved with this level of approach.

At this level of technique, you can only get a single object or a list of objects (a two-dimensional table structure). If you want to achieve data access equivalent to related navigation, you can either write the logic to get it in separate queries and merge it yourself, or you can force it as one JOIN object.

Under these constraints, it is hopeless to realize an architecture that assumes a rich domain model such as Domain Driven Design. As a result, individual models dragged by the convenience of displaying each screen proliferate, resulting in a screen-centric application rather than a domain-centric one. There is no option to refill the objects acquired by this level method into a rich domain model on your own, but it is often cheaper to learn the level 4-5 method obediently than to do such a troublesome thing. You should be done.

In addition, related navigation is almost an indispensable function in UIs such as OOUI that provide free interaction to users. Models without related navigation can eventually hinder the realization of an easy-to-use UI.

Typical Java framework

Typical frameworks at this level are Jdbi and [Spring JdbcTemplate](https://docs.spring.io/spring-framework/docs/current/spring- framework-reference / data-access.html # jdbc-JdbcTemplate). Strictly speaking, both of these can handle data access equivalent to Level 4 if you do your best (although Example As you can see from to-many-relationship-with-jdbi-sql-object-api), it's complicated). There are also many other options, such as sql2o and the nostalgic Commons DbUtils.

Also, Doma, which has gained some support as a simple O / R mapping framework, supports only up to level 3 for reference systems. Not done. This is stubbornly [Do not support related navigation as a design concept](https://doma.readthedocs.io/en/stable/faq/#does-doma-map-database-relationships-such-as-one -to-one-and-one-to-many-to-java-objects) is stated.

Selection criteria

There should be some situations where this level of approach is sufficient, such as an application with a standard UI or a batch with simple data access. Let's hope that it will not be a common development that the actual requirements were neither standard nor simple, contrary to the initial assumption.

Level 4: Mapping queries and related navigable objects

Let's take a look at an example of implementation using MyBatis for related navigation that could not be achieved at level 3.

First, the definition of the project`` comment table associated with the ʻissue` table is as follows.

create table project
(
   id bigint primary key,
   name varchar (100)
);
create table comment
(
   id bigint primary key,
   issue_id bigint,
   description text
);

Next, the Java class to be mapped is shown below.

@Data
public class Issue {
	private long id;
	private Project project;
	private List<Comment> comments;
	private String title;
	private String description;
}
@Data
public class Project {
	private long id;
	private String name;
}
@Data
public class Comment {
	private long id;
	private String description;
}

In addition, write settings to map these. Here, XML-based method is used (Note that MyBatis also has Annotation-based method. //mybatis.org/mybatis-3/ja/java-api.html#Mapper_.E3.82.A2.E3.83.8E.E3.83.86.E3.83.BC.E3.82.B7.E3. There is also 83.A7.E3.83.B3)).

<resultMap id="issueResult" type="Issue" autoMapping="true">
  <id property="id" column="id" />
  <association property="project" column="project_id"
    select="Project.find" />
  <collection property="comments" column="id"
    select="Comment.findByIssueId" />
</resultMap>
<select id="findByProjectId" parameterType="long"
  resultMap="issueResult">
  <![CDATA[
    select
      id,
      project_id,
      title,
      description
    from
      issue
    where
      project_id = #{projectId}
  ]]>
</select>
<resultMap id="projectResult" type="Project"
  autoMapping="true">
  <id property="id" column="id" />
</resultMap>
<select id="find" parameterType="long"
  resultMap="projectResult">
  <![CDATA[
    select
      id,
      name
    from
      project
    where
      id = #{id}
  ]]>
</select>
<resultMap id="commentResult" type="Comment"
  autoMapping="true">
  <id property="id" column="id" />
</resultMap>
<select id="findByIssueId" parameterType="long"
  resultMap="commentResult">
  <![CDATA[
    select
      id,
      description
    from
      comment
    where
      issue_id = #{issueId}
  ]]>
</select>

Finally, perform data access based on the above settings.

public List<Issue> findByProjectId(long projectId) {
  return sqlSession.selectList("Issue.findByProjectId", projectId);
}

How to Use

What you need to do to use this level of approach is:

--Specify the query string --In the above example, it is specified in the XML settings. --Set query result and object mapping --In the above example, it is specified in the XML settings. --Specify query parameters --Execute the query

Task

Now that related navigation is possible, it's not easy to set up, as you can see from the example. Furthermore, when the types of reference queries increase or update insert / update / delete processing is required, it is necessary to manually write SQL each time. As will be described later, there are countermeasures by automatic generation, but the effect is limited.

There is also a weakness in the abstraction of routine processing. This level is for routine processing that can be easily abstracted by Level 5 methods, such as automatic input of audit columns (registration date and time, update date and time, registered user, update user, ...) and optimistic locking by version number. With this method, you have to manually write SQL each time.

Furthermore, as a problem common to all levels so far, as long as SQL is written manually, there is a dependency on a specific DBMS. It is difficult to meet all the requirements with standard-compliant SQL that considers portability. There are few opportunities to be aware of this problem in daily development, but when the story of large-scale renovation such as system replacement comes up, the seriousness of the situation becomes apparent at once.

Note that the N + 1 problem that rampages at level 5 can also occur at this level. However, this level of approach works only as you wrote the SQL, so you're responsible for it, not the framework, and it's clear how to deal with it. For example, in the above example, the N + 1 problem actually occurs, but the problem can be solved by replacing SQL with one using JOIN as shown below.

<resultMap id="issueResultWithProjectAndComments"
  type="Issue" autoMapping="true">
  <id property="id" column="id" />
  <association property="project" columnPrefix="p_"
    resultMap="Project.projectResult" />
  <collection property="comments" columnPrefix="c_"
    resultMap="Comment.commentResult" />
</resultMap>
<select id="findByProjectIdWithProjectAndComments"
  parameterType="long" resultMap="issueResultWithProjectAndComments">
  <![CDATA[
    select
      i.id as id,
      i.project_id as project_id,
      i.title as title,
      i.description as description,
      p.id as p_id,
      p.name as p_name,
      c.id as c_id,
      c.description as c_description
    from
      issue i
      inner join project p on i.project_id = p.id
      left outer join comment c on i.id = c.issue_id
    where
      project_id = #{projectId}
  ]]>
</select>

Ancillary mechanism

This level of approach is accompanied by the following mechanisms:

Typical Java framework

A typical framework at this level is MyBatis mentioned in the example. Due to its long history, it has a full set of functions listed in "Attached Mechanisms".

Also, for JPA that appears at level 5, it is a part of Native Query. /5.4/userguide/html_single/Hibernate_User_Guide.html#sql) can be considered a Level 4 technique. However, compared to MyBatis, there are differences in the ease of use of basic functions and the support for the functions listed in "Attached Mechanisms".

Selection criteria

Database shackles can be a reason to choose a Level 4 approach. "Grandfathers" are, for example, legacy schemas that cannot be modified, data structures that are inconsistent with UI requirements, and existing SQL diversion requirements.

Also, given a rich domain model with Domain Driven Design, a Level 4 approach, which is superior to Level 5 in terms of mapping configuration flexibility, is a viable option.

In addition, considering performance requirements and team member skills, Level 4 approaches may be selected as a low-risk, low-return alternative to Level 5 performance risks.

Level 5: Table-object mapping

At level 4, I wrote SQL manually each time, but in a typical application, the structure of tables and objects is similar in many respects, so if you can map them well, you should be able to save that part. Below, let's look at an example using Hibernate ORM that conforms to the standard JPA. ..

The database schema is the same as up to level 4. The Java class to be mapped is the same as Level 4, but annotations for mapping settings have been added.

@Data
@Entity
public class Issue {
	@Id
	private long id;
	@ManyToOne
	@JoinColumn(name = "project_id")
	private Project project;
	@OneToMany
	@JoinColumn(name = "issue_id")
	private List<Comment> comments;
	private String title;
	private String description;
}
@Data
@Entity
public class Project {
	@Id
	private long id;
	private String name;
}
@Data
@Entity
public class Comment {
	@Id
	private long id;
	private String description;
}

Perform data access based on the above.

public List<Issue> findByProjectId(long projectId) {
  String query = "select i from Issue i where i.project.id = ?1";
  List<Issue> issues = entityManager.createQuery(query, Issue.class)
      .setParameter(1, projectId).getResultList();
  return issues;
}

How to Use

The usage of this level method is as follows.

--Set table-object mapping --In the above example, it is specified by annotation. --There is another way to specify it in the XML settings (although it is not often used nowadays). --Specify the query string --In the above example, it is not a DBMS-dependent SQL, but an abstract JPQL. --In the above example, it is specified by the character string in the Java code when executing the query. --In addition, there is also Named Query specified by annotation. --Specify query parameters --Execute the query

Task

A major challenge for this level of approach is performance degradation due to unintended SQL issuance, the most typical of which is the N + 1 problem. The N + 1 problem is that the related table is queried N times for N records returned in a single query result to the main table. Based on the issue management application that is the subject of this article, in the data access of the issue list screen, after a single query to the ʻissuetable is executed, ʻIssue # getComments () Whenis called each time, the comment table is queried N times, which is equivalent to the number of ʻissue` records obtained in the previous query.

One of the main solutions to the N + 1 problem in JPA is FETCH JOIN Is. For example, in the above example, the N + 1 problem actually occurs, but the problem can be suppressed by replacing the query with the one using FETCH JOIN as shown below.

public List<Issue> findByProjectIdWithProjectAndComments(long projectId) {
  String query = "select distinct i from Issue i join fetch i.project"
      + " left join fetch i.comments where i.project.id = ?1";
  List<Issue> issues = entityManager.createQuery(query, Issue.class)
      .setParameter(1, projectId).getResultList();
  return issues;
}

The SQL query that is actually generated looks like this:

select
    distinct issue0_.id as id1_1_0_,
    project1_.id as id1_2_1_,
    comments2_.id as id1_0_2_,
    issue0_.description as descript2_1_0_,
    issue0_.project_id as project_4_1_0_,
    issue0_.title as title3_1_0_,
    project1_.name as name2_2_1_,
    comments2_.description as descript2_0_2_,
    comments2_.issue_id as issue_id3_0_0__,
    comments2_.id as id1_0_0__
from
    issue issue0_
inner join
    project project1_
        on issue0_.project_id=project1_.id
left outer join
    comment comments2_
        on issue0_.id=comments2_.issue_id
where
    issue0_.project_id=?

In addition to FETCH JOIN, how to use @Fetch (FetchMode.SUBSELECT) Or you can use Entity Graph .. Also, @Where and @Filter In some cases, fine-grained related control such as /hibernate/orm/5.4/userguide/html_single/Hibernate_User_Guide.html#pc-filter) is required.

There are various measures for performance as described above, but it is one of the major issues in introducing Level 5 that the initial learning cost is not small for learning these. Perhaps in many settings, those costs are not consciously paid in advance, and the resulting performance problems are vaguely held on the framework. If analysis tools such as Hypersistence Optimizer can be used, the initial learning cost can be expected to be reduced to some extent.

There is also the problem of deficiencies in the standard JPA. For example, implemented in the behavior of Entity Graph above. There are some dependencies, as well as @Where and [@Filter](https: // docs.jboss.org/hibernate/orm/5.4/userguide/html_single/Hibernate_User_Guide.html#pc-filter) is a non-standardized Hibernate implementation dependent feature.

Furthermore, as a constraint in the first place, the effect cannot be expected in a project where the structural similarity between the table and the object, which is the premise of level 5, is low in terms of requirements. In such cases, the level 3-4 method, which allows you to freely write queries, is more appropriate.

Ancillary mechanism

This level of approach is accompanied by the following mechanisms:

--Life cycle management --The framework detects object changes and performs appropriate persistence processing according to the state. --It often behaves differently than the programmer intended, and is one of the sources of bad publicity for O / R mapping. -There is also a framework like EBean that Dare to remove centralized lifecycle management from the function --Advanced cache --In addition to the primary cache for life cycle management, a secondary cache that can be linked with a general-purpose cache library can be used. --See Hibernate Cache for an example. --Type safe query --Provide a mechanism for writing queries in Java code as an alternative, as writing queries as strings cannot be checked for errors at compile time. --The JPA standard provides the Criteria API (https://docs.jboss.org/hibernate/orm/5.4/userguide/html_single/Hibernate_User_Guide.html#criteria), but it's incredibly difficult to use. --A non-standard but easier-to-use extension is Query DSL. --There is jOOQ as a special framework specialized for type-safe queries.

Typical Java framework

A typical framework at this level is the standard JPA. As an implementation system of JPA, in addition to Hibernate ORM dealt with in this article, EclipseLink There is also eclipselink /). Which implementation to choose will depend on which is the default for your application server and higher-level framework.

In addition, EBean and jOOQ mentioned in "Attached Mechanisms", and Reladomo There are alternatives like .com / goldmansachs / reladomo). Given that they are all non-standard, and that they are quite addictive, especially with jOOQ and Reladomo, you should be cautious when making your choices.

Selection criteria

As mentioned in the assignment, there are strict prerequisites for initial learning costs and schemas when using this level of approach. If the prerequisites are met, high productivity returns can be expected, but if not, it can be a waste of man-hours.

Also, if some restrictions enforce standards compliance, you can skip level 3-4 and only have level 5 JPA (as mentioned above, you can use the Native Query function equivalent to level 4).

How to choose the best level of approach for your project

Now, what level of method should be selected in what situation is somewhat duplicated with the "selection criteria" description of each level, but let's take a quick look back.

First, it's rare to start at level 1-2. Whether level 3 is sufficient or level 4-5 is often the first decision.

If your project is less complex, the Level 3 approach will suffice. However, there is no single easy-to-understand index for complexity, and it is necessary to make a comprehensive judgment based on UI characteristics, data access characteristics, project size, team personnel composition, and so on.

If your project is complex, choose from levels 4-5. The Level 4 method is probably appropriate if the schema is legacy or requires some ingenuity in mapping and low uncertainty is important, otherwise the Level 5 method is appropriate.

It should be noted that the fact that the current level 5 method is not completed complicates the problem. Improvements such as making lifecycle management with high initial learning costs optional, simplifying relevant navigation control specifications and incorporating them into the standard, and allowing them to be combined with rich Level 4 techniques as needed. , There should be more opportunities to choose a more level 5 method.

(Digression) From what level is "O / R mapping"?

It's not obvious from what level you should call it "O / R mapping". There would be no objection to calling the Level 5 method O / R mapping. Also, perhaps level 4 methods can be called O / R mapping with the prefix "broadly defined", and there is little opposition. However, for level 3 methods, there are cases like JDBC called something that is different from O / R mapping, and like Doma in the past. In some cases, proclaimed to be an O / R mapper. Furthermore, even for the methods that originally belong to Level 1-2, the interface of the data access component is technically the same as Level 3-5 (Repository that looks like O / R mapping when viewed from above). Can be provided.

There may be multiple positions regarding the level of "O / R mapping" as follows. I don't think it's going to be a very productive debate about which position is right, so I won't go into it.

--Position to be regarded as a design method --Even if the implementation is level 1 if the interface of the data access component meets the conditions --Position to be regarded as mounting technology --Includes only Level 4-5 as a framework that can map object graphs --Furthermore, limit native SQL to only hiding level 5

in conclusion

As mentioned above, regarding O / R mapping, the levels are divided into 5 levels, and it is shown that each has its own inevitability and usage criteria. We hope that this article will reduce misunderstandings about O / R mapping and unfortunate mismatches in the selection of elemental technologies at the development site.

Recommended Posts

Step-by-step understanding of O / R mapping
Step-by-step understanding of Java exception handling
Consideration of GIS and O / R Mapper
Flexible data type conversion mechanism of O / R mapping library Lightsleep for Java 8
I used it without knowing the O / R mapping of rails, so I checked it.
[Java] Beginner's understanding of Servlet-②
[Java] Beginner's understanding of Servlet-①
Memorandum of understanding about LOD.