What’s New in MariaDB 10.3


Let me start with an announcement: From now on, MariaDB is being treated as a distinct database on modern-sql.com.

The reason for the inclusion in my club of major SQL databases0 is simple: Although MariaDB was originally described as a “branch of MySQL that is, on the user level, compatible with the main version”1, both versions have diverged considerably in the past few years. At first, the differences were mostly limited to operative aspects (including some legal aspects). Over the last two years, even the SQL dialects started to diverge notably. Treating MariaDB as a distinct product is the unavoidable consequence for me.

Furthermore, MariaDB’s popularity is still growing2 and it seems that the MariaDB team is finally embracing the SQL standard. I must actually say that “they now embrace modern SQL standards”—not the SQL-92 standard that’s been overhauled six times.3

The release of MariaDB 10.3 demonstrates this in an impressive way. Read on to see what I mean.

Contents:

  1. System-Versioned Tables
  2. The Values Clause
  3. Sequence Generators
  4. Percentile_disc and Percentile_cont
  5. Intersect and Except
  6. Two-Phase Processing of Update’s Set Clause
  7. Self-Referencing Update and Delete
  8. Off Topic: Limit in Group_Concat
  9. Event Note: Modern SQL Training in Vienna

System-Versioned Tables

Have you ever had the requirement to keep the old data when using update or delete? Pretty much every business application needs that—e.g. for customer master data. System-versioned tables is one of the standard SQL features that get’s it done.4

The SQL standard uses closed-open intervals5—stored in “from” and “to” timestamp columns—to denote the period in which each row is valid. The “system” in system-versioned tables means that those time stamps are automatically maintained by the database whenever you write to such a table. System versioning is meant to be transparent for the application.

The syntax to create system-versioned tables looks like this:

CREATE TABLE … (
  [ regular columns and constraints ]
  
, valid_from TIMESTAMP(6) GENERATED ALWAYS AS ROW START NOT NULL
, valid_to   TIMESTAMP(6) GENERATED ALWAYS AS ROW END   NOT NULL

, PERIOD FOR SYSTEM_TIME (valid_from, valid_to)
) WITH SYSTEM VERSIONING

As you can see, it explicitly adds two columns to hold the validity time stamps. These are basically regular columns, and are visible to the user.6 That is the only required change that is not transparent to select and data-modifying statements. Next, the time stamps are logically combined into a period called system_time. Finally, the with system versioning clause enables the magic.

On my Own Behalf

I make my living from training, other SQL related services and selling my book. Learn more at https://winand.at/.

Of course you can also alter existing tables to add system versioning. If you already have the validity stored as an closed-open interval, you can use it right away.

Once enabled, system versioning maintains the columns in the system_time period automatically. That means that delete doesn’t actually remove anything. It just sets the end of the validity of the affected rows to the transaction time.7 Update does the same, but it also adds new rows with the modified data and the transaction time as the start of validity. Once again, update is basically delete and insert.

Note

The SQL standard does not specify how system-versioned tables store the data physically, nor does it define data retention mechanisms.

MariaDB keeps the old data in the same table by default. However, partitioning capabilities have been enhanced so you can physically separate the current data from the old one.

When you select from a system-versioned table you’ll just get the current data. To access old data, the syntax of the from clause was extended:

FROM <table> FOR SYSTEM_TIME [ AS OF <ts>
                             | BETWEEN <ts> AND <ts>
                             | FROM <ts> TO <ts> ]
          [ [AS] <new name> ]

The new for system_time clause immediately follows the table name in the from clause—i.e. a new table (AS <new name>) follows at the very end.

Of course the as of clause delivers the data as of the specified time. Between and from take two time stamps. The difference between them is that between included the upper bound while from excludes it. The lower bound is exclusive for both of them.8

Further reading:

The Values Clause

The values clause is probably the most basic SQL feature at all. Most people know about it from its use in the insert statement. However, the values clause is actually a concept in its own right that’s also useful outside of insert.

The values clause is the standard SQL approach to select without from. In addition to that, the values clause has the big advantage that it can easily create multiple rows in one go:

VALUES (<columns of row 1>)
     , (<columns of row 2>)
     ,  …

There is no need to union all multiple select statements.

In comparison to select without from, the values clause has a drawback too: it cannot assign names to its columns. You need to use from clause column renaming or the with clause for that.

SELECT *
  FROM ( VALUES ('a', 'b') ) t(a,b)

Unfortunately, MariaDB 10.3 does not support the from clause renaming shown above. It remains the with clause, which is somewhat bulky for this case.

Another problem with the values clause is its lack of support in many databases. Even though MariaDB 10.3 now supports the values clause, select without from still has better support among the tested databases.

Further reading:

Sequence Generators

“A sequence generator is a mechanism for generating successive exact numeric values, one at a time.”9 They are similar to identity columns, but they are not tied to an insert operation on a specific table so that they can be used for arbitrary purposes. The next value for <sequence name> expression is used to obtain a new value.

Naturally, sequence generator can be used in a stand-alone values statement to fetch a new value.

VALUES (NEXT VALUE FOR <sequence name>)

Further reading:

Percentile_disc and Percentile_cont

The percentile_disc and percentile_cont functions are used to get a percentile value—e.g. the median—from an ordered set of values.

In standard SQL these functions require the within group clause and optionally accept an over clause. However, in MariaDB 10.3 the over clause is also required.

PRECENTILE_DISC (0.5) WITHIN GROUP (ORDER BY x) OVER …

The difference between these two functions is how they cope with a case in which the specified percentile falls in between two rows. For example, when the middle row—holding the median value—out of four rows is needed.

00.250.50.7511234PERCENTILE_CONT(0.5)PERCENTILE_DISC(0.5)

Percentile_disc always returns a discrete value from the input data—even if the specified percentile falls between two rows. In that case, it returns the first one with respect to the specified order by clause. Percentile_cont, on the other hand, performs a weighted linear interpolation between the two values of the adjacent rows.

MariaDB 10.3 also added the proprietary median(x) function, which is a different syntax for percentile_disc(0.5) within group (order by x).

Further reading:

Intersect and Except

Intersect and except are table operators similar to union. Instead of concatenating two tables, they produce the intersection and the difference, respectively.

Further reading:

Two-Phase Processing of Update’s Set Clause

This is best explained by example. Have a close look at this query.

UPDATE …
   SET c1 = c2
     , c2 = c1

Note that both columns appear on both sides of the assignment operator (=)—i.e. both columns are read and changed in the same statement.

In this case, the SQL standard requires that all read operations are effectively completed before the first writing. You can think of it as a read-only phase in which all right-hand side expressions are evaluated, followed by a write-only phase, which actually stores the results. If this rule is obeyed, the statement above exchanges the values in the c1 and c2 columns.

Nonetheless the default behavior in MariaDB (including 10.3) as well as in MySQL is different. They execute the set clauses in the order they appear in the statement. Thus the result of the above query is that both columns contain the value previously stored in c2.

The new SIMULTANEOUS_ASSIGNMENT mode introduced with MariaDB 10.3 activates standard-conforming handling of this case.

SET sql_mode = (SELECT CONCAT( @@sql_mode
                             , ',SIMULTANEOUS_ASSIGNMENT'
                             )
               )

Further reading:

Self-Referencing Update and Delete

This is very similar to the previous topic. The main difference is that previous issue deals with cyclic references between columns of the same row, whereas this issue is about cyclic references between different rows of the same table.

The problem can appear when a data-modifying statement has a query—e.g. inside exists or in10—that fetches data from the very same table that the statement is writing to. Again, there could be a cyclic reference, this time between rows.

Consider the simplest possible example:

INSERT INTO t1
SELECT *
  FROM t1

The question is basically whether the select can already see the rows that are just inserted. Well, the obvious answer is “no”—otherwise it would end up as infinite loop.

The problem can also appear with the other data-modifying statements: update, delete and merge. However, the problem can only occur if these statements contain a query that accesses the target table again—otherwise it cannot access different rows from the same table—cyclic references cannot emerge.

The SQL standard solves the problem in the same way as for the set clause discussed above. It requires the execution to be effectively split into read-only and write-only phases. This time, the standard doesn’t strictly require this processing because it is an optional feature (F781, “Self-referencing operations”). Databases not supporting this feature should yield an error when you try to execute a self-referencing statement.

Indeed, MySQL and MariaDB do produce an error.

You can't specify target table 't1' for update in FROM clause

Note that this message always says “update” even when the statement was a delete. With MariaDB 10.0, the message was rephrased:

Table 't1' is specified twice, both as a target for 'UPDATE' and as a separate source for data

MariaDB 10.3 can execute self-referencing update and delete statements.11

Off Topic: Limit in Group_Concat

Group_concat is not a standard SQL feature. The standard function for that is listagg. However, I though I’d include this enhancement here nevertheless because listagg offers a different solution for the same problem—how to prevent exceeding the length limitation of the results’s data type?

The SQL standard’s listagg has the on overflow clause that can stop adding further values before exceeding the limit. MariaDB’s group_concat can now take an optional limit clause to put an upper bound on the number of concatenated values.

Further reading:

Event Note: Modern SQL Training in Vienna

If you have read this far, you are probably interested in learning more about all the modern SQL goodness. If so, have a look at my upcoming training in Vienna (Sep 17-21).

It’s a full week long and covers all the recurring issues that I have observed in my training and consulting assignments over the years. That includes indexing in detail, basics you might be afraid to ask about, and modern SQL features such as window functions and recursions in detail.

Check it out now ;)

About the Author

Photo of Markus Winand

Markus Winand teaches efficient SQL—inhouse and online. He minimizes the development time using modern SQL and optimizes the runtime with smart indexing. His book entitled SQL Performance Explained has become standard reading.

Buy his Book on Amazon

Cover of “SQL Performance Explained”: Squirrel running on grass

The essence of SQL tuning in 200 pages

Buy on Amazon
(paperback only)

Paperback and PDF also available at Markus’ store.

Hire Markus

Markus offers SQL training and consulting for developers working at companies of any size.
Learn more »

Footnotes

0

I frequently claim that my material covers “all major SQL databases”. For example, on the cover of my book SQL Performance Explained.

1

Originally appearing on askmonty.org (now on the Wayback Machine).

2

In the last year, MariaDB has climbed from rank 11 to rank 9 in the RDBMS ranking of db-engines.com.

3

SQL-92 was obsoleted by SQL:1999, SQL:2003, SQL:2006, SQL:2008, SQL:2011 and the—as of writing—current standard SQL:2016.

4

The other one is “application-time period tables”.

5

The lower bound (valid from) is inclusive, the upper bound (valid to) is exclusive. Mathematically written: [from, to).

6

Select * returns these columns. Insert must skip these columns or use default for them. To mitigate these obstacles, some databases allow hiding columns from the user. In MariaDB, the are called invisible columns.

7

Transaction time is immutable within a transaction so all changes in one transaction look like they were done simultaneously.

8

This is different to the between predicate, which is inclusive on both ends!

9

SQL:2016, Part 2, §4.27.1

10

Merge can have that problem without a subquery as it can directly access a second table.

11

It seems that some variants of insert (e.g. containing aggregates on the target table) can still fail with this message.

“modern SQL” by Markus Winand is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported License.
Legal | Contact | NO WARRANTY | Trademarks | Privacy and GDPR | CC-BY-NC-ND 3.0 license