Blaze-Persistence: Use Modern SQL like native JPA

Guest Post

This is a guest post by Christian Beikov, creator of Blaze-Persistence.

Markus recently published a video in which he talks about Java & SQL, and that lead to a little discussion with me on Twitter. The conclusion Markus often came to is that one has to revert to writing plain SQL for some of the more advanced use cases because of limitations that JPA imposes. I agree that JPA could use an update to support some of the more advanced features of DBMS like set operations, CTEs and recursive CTEs, but that will take some time. Adding such features to a JPA implementation like Hibernate is a first step, but one has to convince a broad audience of people about the benefits. Sometimes people can’t imagine how an ORM could make use of these more advanced DBMS concepts. To avoid having to convince people, I started developing Blaze-Persistence, a query builder API that lives on top of JPA – it implements support for these advanced concepts with the JPA model.

In the following article I will discuss some of the problems and limitations Markus discovered and provide alternate solutions that work with the JPA model by using Blaze-Persistence.

Pattern #1 - Data listing

Part 1: List data

Markus mentioned that due to the querying of entities, the select clause of the generated SQL lists all columns defined in the entity. This is a waste of resources though, as oftentimes it’s not even necessary to fetch all of the columns as only a few of them are really used.

Lukas Eder wrote a nice article about that. He calls it “Unnecessary, mandatory work”, and I think it’s quite a good fit for a name!

Fetching data that isn’t needed increases the memory usage and network traffic, but the worst part is that it might hinder the usage of certain database indexes which are essential for performance. Just like when using plain SQL, you can write a scalar JPQL query and list all the columns/attributes that you want to fetch in the select clause. JPQL even offers a constructor syntax that, although being limited to flat object structures, allows creating objects from the scalar result. In the Spring Data JPA world one can use Spring Data Projections, which allow one to define the desired result structure as an interface. The getters in the interface map to attributes of an entity. It’s even possible to use the Spring Expression Language (SpEL) within a @Value annotation to model some more complex mappings. Unfortunately the problem of “unnecessary, mandatory work” is not solved with Spring Data Projections! When using SpEL or nested projections, the projection engine falls back to fetching full entities and just provides a “wrapper” around the entity. At this point, one normally has the option to revert to writing JPQL/SQL and building DTOs from a flat result list, accept the bad performance or use a solution like Blaze-Persistence Entity-Views.

Blaze-Persistence Entity-Views can be thought of as Spring Data Projections on steroids. Behind the scenes, it will always produce the JPQL/SQL that fetches only the data that was requested through getter definitions. It supports mapping collections, nested projections (also in collections), correlate unrelated data and much more. On top of that, Blaze-Persistence Entity-Views are validated with the JPA model during boot, so there are no runtime surprises involving wrong mapping expressions or type incompatibilities. Thanks to the Blaze-Persistence Spring-Data integration, the switch from Spring Data Projections to Entity-Views requires only two things: setting up Blaze-Persistence and annotating the projections with @EntityView(TheEntity.class).

The example Markus presented in his video was roughly this:

interface ProjB {
  Integer getA1();
}

interface Proj {
  Integer getA1();
  ProjB getB();
}

When using this projection in a Spring-Data repository, one can observe that the resulting SQL query selects all attributes rather than just what the projections define. On the contrary, we can use Entity-Views like this:

@EntityView(Entity2.class)
interface ProjB {
  Integer getA1();
}

@EntityView(Entity.class)
interface Proj {
  Integer getA1();

  @Mapping("child")
  ProjB getB();
}

The result is exactly the query that one would expect:

SELECT e.a1, b.a1
  FROM Entity e
  JOIN e.child b

Part 2: Derived data

Since Spring Data Projections are more or less limited to simple 1:1 mappings of entity attributes to projection attributes, there are limits to how far one can go with that. There is no support for aggregate functions and deep paths, for example, so one would have to revert to writing JPQL in such cases and wire up a constructor of a hand-rolled DTO class. With Blaze-Persistence Entity-Views, one can use JPQL.next expressions for attribute mappings, which is a very powerful superset of JPQL expressions. Thanks to the automatic group by generation of Blaze-Persistence, which groups by all non-aggregate expressions that appear in the select, order by and having clause, one can stop thinking about explicitly grouping entirely and write projections like this:

@EntityView(Entity.class)
interface Proj {
  Integer getA1();

  @Mapping("SUM(child.a1)")
  Integer getSum1();

  @Mapping("SUM(child.a1) FILTER (WHERE a2 = 1)")
  Integer getSum2();
}

You probably noticed the filter clause in the JPQL.next mapping expression, which is like the filter clause the SQL standard defines for aggregate functions. Markus mentioned in his video that JPQL neither supports the filter clause nor the common workaround, which involves using a case when expression that is only partially true. He is right in the sense that the JPQL grammar defines that only path expressions are allowed as argument for aggregate functions, but every JPA implementation out there supports expressions, so using case when here will work regardless what JPA provider you use. The Blaze-Persistence JPQL.next expression syntax supports many advanced SQL features, like the filter clause for aggregate functions, but it also has support for window functions like the lag function. Here is an example that shows how window functions could be used within Blaze-Persistence Entity-Views:

@EntityView(Entity.class)
interface Proj {
  Integer getA1();

  @Mapping("SUM(child.a1)")
  Integer getSum1();

  @Mapping("LAG(SUM(child.a1)) OVER (ORDER BY child.a1)")
  Integer getSum2();
}

The result for sum2 is the sum value from the previous element, i.e. the one lagging behind according to the order as specified in the over clause. You will again notice that the JPQL.next query and the resulting SQL query look just as one would expect:

SELECT e.a1
     , SUM(b.a1)
     , LAG(SUM(b.a1)) OVER (ORDER BY b.a1)
  FROM Entity e
  JOIN e.child b
 GROUP BY e.a1

This allows you to make use of advanced database features while still staying in the realm of the JPA model. Note that the support for window functions and the filter clause was added to Blaze-Persistence in a recent release of the 1.5 series.

Part 3: Closed cycle for ORM

In his video, Markus mentions that entities are extremely useful for what he calls the “load-change-store cycle”. So if you load an entity graph, apply changes on it and then flush/store it back to the database, Markus notes that an ORM can then play to its strengths. I partly agree with him on that point, because an ORM like JPA is mostly focused on working with managed entities that are flushed back when a transaction is finished. To me it seems like this way of thinking – that ORMs are only or mostly about managed entities – is a quite popular mental model, but let’s step back for one second and try to understand what OR (Object-Relation) mapping really is.

In the realm of SQL, we think in terms of relations which essentially are bags of tuples, whereas in object-oriented languages like Java we think in terms of collections of objects. OR (Object-Relation) mapping is exactly what the name implies: a way to describe a mapping between relations and classes/objects. ORMs like JPA allow mapping associations and columns to fields or properties of a class. When we load data from relations, we can use that mapping to construct objects. Moreover, we can also go the other way round since these mappings are bidirectional, so it is actually possible to map object state back to relations again. In fact, JPA only provides annotations to model bidirectional mappings, and it seems to me that many developers think this is the only possible way to do OR mapping.

The “load-change-store cycle” which Markus described is exactly what these bidirectional mappings are for. Entities are great for such use cases because of their bidirectional mapping. Although being good for one thing, he rightfully questions whether entities are also appropriate when breaking this cycle, i.e. only load data in order to present it. In my opinion, entities are not the right tool for simple data listing for multiple reasons:

Lazy loading issues in the representation lead to LazyInitializationException or unexpected queries, maybe even N + 1 queries
Entities usually expose too much state rather than what is really necessary for a use case
Bidirectional mapping is sometimes not powerful enough to efficiently model what a representation needs

Developers tend to just use SQL when they don’t know how to proceed further with their ORM or when entities get in their way, which in principle is totally fine. The only problem with that is that they now have to overcome the object-relational mismatch by doing the object mapping manually. For simple and flat object structures this is pretty easy, but the more complex the object structure is, the more painful the manual object mapping becomes. This brings us back to ORMs, because at the end of the day we usually have to map these flat result tuples to an object graph and this is one of the main use cases ORMs are made for.

These are the main reasons for reverting to SQL that I observed:

The need for complex unidirectional mappings, i.e. aggregations, window functions
The need for a special SQL feature to implement something more efficiently, i.e. recursive CTEs
Performance issues with entity queries because of the need for deeply nested data, i.e. multiple fetch joins

It doesn’t have to be this way! Developers should be able to use unidirectional OR-mapping even when they need advanced features. This is where Blaze-Persistence Entity-Views come in. Entity-Views can be used to model interfaces/classes with unidirectional mappings. The Entity-View technology isn’t just another ORM – it works on top of the JPA model rather than on the SQL relational model, which allows for more natural mappings as well as the reuse of well defined association mappings of entities.

Blaze-Persistence Entity-Views work on top of the Blaze-Persistence Core query builder API, which has support for most of the advanced SQL features that developers need. By using Entity-Views, one can benefit from the rich object mapping capabilities and still be able to write queries with the needed SQL features.

Pattern #2: Search

Part 1: Fetching tree structures

In the video, Markus also presented how a tree-like structure can be fetched. First he presents a naive approach that loads the tree by triggering lazy loading while traversing the tree, which is not very efficient because of the amount of queries that are needed for lazy loading to traverse the whole tree. Next he presents how the @NamedNativeQuery and @SqlResultSetMapping annotations can be used to improve this situation by loading entities with a native query that uses recursive CTEs. This is a very common approach for making use of native SQL with JPA and works quite well, but the biggest problem is that the query is static. If the query needs dynamic conditions/joins based on user input, one usually has to revert to writing SQL manually or using a SQL query builder and mapping the tuples back to objects.

Another option is to use the Blaze-Persistence query builder, which has first-class support for CTEs, recursive CTEs and many other more advanced SQL features. It allows developers to work within the JPA model. The result type of a CTE is modeled as a special entity type. The recursive CTE example that Markus showed can be easily modeled with Blaze-Persistence.

First we need an entity for the table that models the tree through a parent association.

@Entity
class MyEntity {
  @Id
  Long id;

  @ManyToOne
  MyEntity parent;

  // other attributes …
}

We also need to model the result type of the CTE as a special @CTE entity.

@CTE
@Entity
class MyCte {
  @Id
  Long id;
}

Marking the entity with the @CTE annotation essentially tells the JPA provider that the entity does not map to a table. The query that Markus presented started at a row with a specific id and traversed down to the leafs. Recursive CTEs are always split into two parts, the initial query and the recursive query. This means that querying the row with the specific id is the initial query. The recursive query part then takes the rows from previous results and queries the children of these nodes, and that then become the results for the next run. This repeats until no new data is produced.

We start off by creating a query builder where we expect the MyCte type as result type.

CriteriaBuilder<MyCte> builder =
  criteriaBuilderFactory.create(entityManager, MyCte.class);

Next we start with the initial query part of the recursive query for the type MyCte:

builder.withRecursive(MyCte.class)
       .from(MyEntity.class, "e")
       .bind("id").select("e.id")
       .where("id").eq(idValue)
       …

The interesting part here is the binding of select expressions to entity attributes. A CTE has a type, in this case the type MyCte, which has attributes. In SQL, the columns of the CTE relation are listed after the CTE name. Each attribute/column for a CTE must be bound. In SQL, the binding happens positionally, i.e. the first select item is bound to the first column in the column list. Since the attributes for the CTE are not explicitly listed in the query definition, the attributes have to be bound to select items by calling the bind method, which is followed by select.

We then switch to the recursive query part by using the union all set operation.

       …
       .unionAll()
       .from(MyEntity.class, "e")
       .innerJoinOn(MyCte.class, "cte")
         .on("e.parent.id").eqExpression("cte.id")
       .end()
       .bind("id").select("e.id")
.end()

Just as Markus did in his SQL example, we also join the main entity to the CTE entity based on the parent association and bind the id attribute again. Finally we end the recursive query by calling end and then continue by using the CTE entity in the from clause of the main query.

builder.from(MyCte.class, "myCte")
       .getResultList();

The final result is a list of MyCte entity objects, but we could also load the MyEntity objects like this:

builder.from(MyEntity.class, "myEntity")
       .where("myEntity.id").in()
                            .from(MyCte.class, "cte")
                            .select("cte.id")
       .end()
       .getResultList();

I hope the features of Blaze-Persistence help people write better queries and be more productive thanks to object-relational mapping. With Blaze-Persistence, you stay within the realm of the JPA model and do not have to sacrifice query performance or readability due to the necessity to use inferior querying techniques.

About the Author

Christian Beikov is the creator of Blaze-Persistence and part of the Hibernate team. He improves ORM technology to enable the use of all kinds of fancy advanced SQL features. With Entity-Views he tries to make it easy for developers, to do the right thing and define proper types for projections and domain driven design approaches with JPA.

Blaze-Persistence: Use Modern SQL like native JPA

Guest Post

Pattern #1 - Data listing

Part 1: List data

Part 2: Derived data

Part 3: Closed cycle for ORM

Pattern #2: Search

Part 1: Fetching tree structures

Links

About the Author

Can I use…

News

The Standard

Concepts

Features

Use Cases

Connect with Markus Winand