Hibernate is a pretty cool ORM framework with lots of features and is probably one of the frameworks, that are used most frequently in Java applications that use a relational database. With the criteria API and the hibernate query language (HQL) it offers both a programmatic and a more SQL-like approach to access data. But unfortunately, both have their limitations and are not as powerful as SQL. In this article, I will show an approach to work around one of this limitations I stumbled upon.
The limitation that I’m talking about is the fact, that “HQL subqueries can occur only in the select or where clauses” as the HQL documentation states. In my case, this prevented me from doing some performance optimization on a SELECT statement: To get the information I needed, I had to JOIN nearly the full data model which lead in some cases to a statement runtime of more than 900 seconds – which is clearly beyond any transaction timeout.
To explain the problem in the data model and to show the solution, here is a small example: Say you have five database tables STUDENT, COURSE, FACULTY, PROFESSOR and STUDENT_TO_COURSE whilst the latter one stores the association between the student and the course and some additional information like the year in which the course was taken and the grade the student got. The question we want to answer now is, if a specific student attended a course at a specific professor. So this is a pretty simple example and can probably easily be solved using some JOINs, but say you have found out, that these JOINs are the bottleneck in your query and you want to optimize it.
In my particular case, I wanted to optimize the query using a sub-query in the JOIN part of my SQL, which would look something like this
SELECT s.NAME, p.NAME FROM STUDENT s JOIN ( SELECT DISTINCT s2c.STUDENT_ID AS STUDENT_ID, f.ID AS FACULTY_ID FROM STUDENT_TO_COURSE s2c JOIN COURSE c on c.ID=s2c.COURSE_ID JOIN FACULTY f on f.ID=c.FACULTY_ID ) AS s2f on s.ID = s2f.STUDENT_ID JOIN FACULTY ON f on f.ID = s2f.FACULTY_ID JOIN PROFESSOR p on p.FACULTY_ID = f.ID WHERE s.ID = :studentId AND p.ID = :profId
The sub-select in this query would get us a distinct mapping from a student to a faculty which would get us a better performance if students attend lots of courses which potentially belong to different faculties. But the problem here is, that it is not possible to perform a JOIN on a sub-select in Hibernate, neither with HQL nor with the criteria API.
One possible solution is to create a view in the database which is based on the sub-query of the above SQL statement.
CREATE VIEW STUDENT_TO_FACULTY (STUDENT_ID, FACULTY_ID) AS ( SELECT DISTINCT s2c.STUDENT_ID AS STUDENT_ID, f.ID AS FACULTY_ID FROM STUDENT_TO_COURSE s2c JOIN COURSE c on c.ID=s2c.COURSE_ID JOIN FACULTY f on f.ID=c.FACULTY_ID );
In this case, the view holds only redundant information and therefore using a view has the advantage that the application does not have to bother about updating an additional, real table. In contrast to multiple JOINs, it is possible that the database performs some further optimizations or finds a better execution plan, which depends on the database you use.
Luckily Hibernate actually does not care about, whether an entity or – like in this case – the JOIN table between two entities is a real table in the database or a view. Therefore the hibernate mapping can basically be done like with any other entity or join table.
<hibernate-mapping package="de.bitinsomnia.db"> <class name="Student" table="STUDENT"> […] <set name="faculties" access="field" table="STUDENT_TO_FACULTY" inverse="true"> <key column="STUDENT_ID" /> <many-to-many column="FACULTY_ID" class="Faculty" /> </set> […] </class> <class name="Faculty" table="FACULTY"> […] <set name="students" access="field" table="STUDENT_TO_FACULTY" inverse="true"> <key column="FACULTY_ID" /> <many-to-many column="STUDENT_ID" class="Student" /> </set> […] </class> </hibernate-mapping>
There is one speciality with the above mapping that I want to mention: the inverse=”true” . The inverse attribute in HBM files is one of the most confusing ones, as other people before me pointed out. For me, I found it easy to remember that “inverse” has the same meaning as “someone else cares about this”.
As you can see, both sides of the association between Faculty and Student have inverse=”true” which would produce stale data in a real table: Neither when a Student nor a Faculty entity gets deleted, the association table is updated; this would lead to exceptions in your application, if you have foreign key constraints, or would leave your database table with invalid association rows, if you don’t have foreign keys.
But as we are using a view, this is the only way to handle this: If you omit one of the inverse=”true”, Hibernate would try to delete rows from the view which isn’t possible. So essentially you tell Hibernate for both sides of the association, that “someone else cares about this”.
Besides this, the mapping is pretty straight forward and can easily be transformed to an annotation-based mapping.
Using a view instead of a sub-select is obviously something, you can’t do constantly for small optimizations, but it is ok if you can’t find other solutions. Of course you could instead create a normal database table and maintain the information for the sub-select in there, but for one thing these are redundant information you don’t want in your normalized database model, and for another thing you have to add code to your application to maintain the table, which could lead to significant effort, depending on your application and functionality. But of course, querying a normal table instead of a view has a better performance. Finally, it is a trade-off between performance, effort and easiness.