Apache Directory project has a new Chairman !

by Emmanuel Lécharny

Posted on Monday January 21, 2019 at 11:45AM in General

Hi all,

This project has a tradition of rotating the PMC Chair about every 2 years, to ensure continuity while at the same time preventing potential contributor burn-out.

Recently the PMC has undertaken the necessary actions for this and has found contributor (and PMC Member) Shawn McKinney willing and ready to take the gavel from Stefan Seelmann to head our great project, and the Board of the ASF has, in their meeting on Jan 16th 2019 (see [1], ratified the rotation.

Please join me in thanking Stefan Seelmann for heading the project in 2017 and 2018 and welcoming Shawn McKinney as the new PMC Chair and Vice President of the Apache Directory project.


Best regards,

Pierre Smits

Mavibot Transactions

by Emmanuel Lécharny

Posted on Monday September 18, 2017 at 02:45PM in Technology

So here is the blog post I promised 6 months ago... It took a bit of a time, due to a family expansion that sucked up pretty much all our free time :-) Let's start stating the obvious !

What is a transaction ?

It's simply 'a unit of work'. That means it's either fully completed, or it never happened. This is critical to database, as it makes certain the database is always in a consistent state. But it's more that that. A Transaction system must provide the following properties (aka ACID) :


You can't split any of the tasks in a transaction. The transaction is always seen as a whole.


The database must be valid after the transaction as it was before. It's mainly about guarantying that implied updates (like triggers, constraints, etc) are done within the transaction


Here, we are going to use a restricted version of 'isolation' : read committed. That means we guarantee a user will always be able to read consistent data at any time, and will not see 'dirty' data (ie, data being written during a transaction). However, when the transaction is written, then a user doing a second read may not see the same data he got with a previous read. This has some deep impact on the way the user should manage the results : there is no guarantee that a data one get with a read is still valid a few seconds later. We don't lock the database on reads.


This seems trivial : once committed, the transaction will let the database in a stable state, for ever. It's mainly about guaranteeing that the database survives a crash.

Mavibot transaction implementation

So far, we can assume that being a MVCC system, Mavibot already covers all those aspects :
  • Atomicity : Updates are either done or not, in any case, a new version is not seen until it's fully committed.
  • Consistency : You can't modify some existing version. An update does not change anything i the database, it just adds a new version
  • Isolation : A reader just sees a single version
  • Durability : Once a new version is added, it's visible by readers using it until the readers don't need anymore the data.
The only part we have to take care of is the way we make a new version available to readers : at some point, every new reader should use the new version. In order to make that possible, we just have to switch one single reference to the current version.

Mavibot B-trees

At this point, I have to come back to how B-trees are managed in Mavibot.

A B-tree consist of three separate elements :

  • B-tree info : this data structure stores all the characteristics of the B-tree (it's name, the serializers being used, etc). It's immutable.
  • B-tree header : the starting point from which the B-tree can be pulled from the disk. It points to the root page, and has some extra information about the B-tree (number of elements, revision, Page ID, etc)
  • B-tree root page : the top level page, this is where data are stored (and in all the children nodes and leaves)
Basically, we always use the B-tree header as a reference when reading or updating a B-tree.

Now, we have special B-trees that are used to manage Mavibot :

  • B-tree of B-trees (aka BoB) : This special B-tree contains a reference to all the existing B-trees in Mavibot. It's critical as we need to know about the managed B-trees when we start Mavibot. Every B-tree update will result in this B-tree to be updated.
  • Copied Pages B-tree (aka CPB) : This other special B-tree stores the list of pages ID that has been copied when a new update is injected. This is used by the Pages Reclaimer (which is a process that move old pages into the free-page list. Think of it as a Garbage Collector, because this is exactly what it is). It's updated after each update.
  • Free-Page list : it's a chained list of Pages that can be reused. The Pages Reclaimer will update this list, and the writer will pull pages from this list before fetching some new pages at the end of the file if there is no more free pages.
I won't go any further here, I just wanted to expose those management elements for a better understanding of what will follow.

So what's the point of transaction in Mavibot?

Well, we could live without them, but it would be very inefficient. There are two problems :
  • performance : Flushing the modified pages every time we update a B-tree means we also flush the management B-trees as many times.
  • cross B-tree transaction : This is the critical piece. Users may need to ensure ACID properties across multiple B-trees.
This is where adding a higher level of transaction is required and good to have. Let's see what that means more in detail.


The following two schema show what happens when we update one single tree 5 times (from the very beginning, with an empty B-tree, up to a point we have added 4 elements in it.

For the purpose of this demonstration, I'm using pages that can contain only 2 elements, but we would get the exact same result with bigger pages (except that it would be way more dreadful to draw :-)

In this schema, each page has a reference (its offset on disk), noted O where 'x' is a integer number.


Pretty complicated...

What if we gather updates within a single transaction ?

Here is what we get if 3 of the previous updates are done within one single transaction :


As we can see, we have way less pages being created, mainly in the BoB and CPB B-trees. For the record, we avoid flushing 15 pages (out of 36), a 40% improvement.

Cross-B-tree transaction

Is it realistic to gather many operations on many B-trees into a single transaction ? It feels like we may lose some data if the system crashes in the middle of a transaction...

Actually, this is badly needed : it's quite rare that we update only one B-tree. Typically, when using Mavibot in ApacheDS, we want to commit a transaction only when the full LDAP update is completed, and that involves many B-trees :

  • Master table
  • Rdn table (potentially many times, as many as we have RDNs in the DN)
  • Present index
  • ObjectClass index
  • entryCSN index
  • Alias index
  • One-Alias index
  • Sub-Alias index
  • AdministrativeRole index
  • Add all the user defined indexes...
We do want all these index to be updated all at once, otherwise if we have a crash when only a sub-set of them are cmmitted on disk, and teh rest aren't, then we end up with an inconsistent database.

So we want a transaction that covers all those B-trees, and the side benefit is that if any of those B-trees get updated more than once (like the RDN index), we save some disk access. We also save a lot of disk access as the BoB and CPB B-tress get updated at the very end, saving some more disk access.

Side benefits

We can move forward, and let the user decide that transaction should be committed only after a certain amount of updates being applied. This is typically something you want when pushing a lot of modifications in a batch.

The drawback is that you may lose everything from the point you started the transaction if your system crashes at some point before the commit, and it eats some memory as we have to keep the pending modification in memory until the moment the transaction is committed.

It's a balance : speed vs safety.

How it all works

As soon as we start a transaction, we create a copy of the existing B-trees (well, not all of them, just the parts that are to be protected !) so that the existing readers aren't impacted by the on going modifications.

Now, for every modification done in a B-tree, we will copy the modified pages in memory, and update them. If a page has already been copied - and is in memory -, we simply reuse the copy. We have a map of all the in-memory pages for that purpose, and every new copy is put into this map.

When we are done with the modifications, we just have to flush all the pages in this map on the disk : we are done.

Well, not completely... We also have to update the BoB and the CPB B-trees, as usual. The point is that those updates can be done at the very end. Last, not least, we can move the unused pages into the free-pages list.

That is the theory, the implementation is a bit complex, and I won't expose it here...

At this point, a bit of code sample could help :

    // Create the Mavibot database
    RecordManager recordManager = new RecordManager( "MyData.bin" );

    // Create a B-tree : we need to start a transaction...
    try ( WriteTransaction transaction = recordManager.beginWriteTransaction() )
        btree = recordManager.addBTree( transaction, "test", LongSerializer.INSTANCE, StringSerializer.INSTANCE, true );

    // Add some data in the created B-tree.
    try ( WriteTransaction writeTxn = recordManager.beginWriteTransaction() )
        BTree btree = writeTxn.getBTree( "test" );

        btree.insert( writeTxn, 1L, "1" );
        btree.insert( writeTxn, 4L, "4" );
        btree.insert( writeTxn, 2L, "2" );

    // Now, read the B-tree
    try ( Transaction readTxn = recordManager.beginReadTransaction() )
        BTree btree = readTxn.getBTree( "test" );
        TupleCursor cursor = btree.browse( readTxn );

        while ( cursor.hasNext() )
            System.out.println( );

This will print "<1,1>", "<2,2>" and "<4,4>".

A few remarks :

  • Transactions are implementing Closeable, which makes it possible to use them in try-with-resource statements. The automatically called close() method will call the transaction commit() method
  • W can see we have two transaction flavors : read and write. The Read transaction just get a context in which it applies, and this context is simply the latest committed version. The idea is that the reads won't be impacted by any forthcoming updates, as it works in its own context.
  • In any case, you have to pull the B-tree you want to update or read from the RecordManager, which manages the whole system


This is just a 10 feet high description of the transaction system we use in Mavibot, it does not go into details, nor it gives all the keys to be able to use Mavibot. The whole idea is to give a sense of what we are working on, and why we need transactions in Mavibot.

We may changes a thing here and there, typically some method names (so don't expect the code to be perfectly representative of what will be the released version). Anyway, it's going to be pretty close !

In the next blog post, I'll talk more in details about the management B-trees (BoB, CPB) and the free-pages management (ie, the Pages Reclaimer). There are quite interesting aspects to discuss when it comes to reclaiming pages :-)

Let's meet B-trees

by Emmanuel Lécharny

Posted on Monday February 27, 2017 at 06:19PM in Technology

Render unto Caesar

B-trees have been invented in the 70's by Rudolf Bayer and Edward McCreight. They were both working at Boeing ™ and wanted to design a better version of Binary trees, when addressing loads of data stored in disks. The B in B-tree comes from an aggregation of the three Bs in Bayer, Boeing and Binary-tree.

They wanted to improve browsing efficiency. The idea was to store values sequentially on the disk so as to limit the number of seek operations. Aggregating consecutive values in pages so that a single read returns a few of them is faster than fetching them one by one in different places on a disk.


What is a B-tree?

Let's first define some of the terminology in use:

  • Page: either a Node or a Leaf

  • Node: a Node is a page in a B-tree that will have children

  • Leaf: a Leaf is a page in a B-tree that will not have child

  • Root page: this is the top level page. It can be a Node - and in this case, the B-tree has many levels - or it can be a Leaf - and in this case all the values are hold int this leaf.

And also a few properties/constraints of a B-tree:

  • A Page can contain up to N values

  • A B-tree page always contain at least N/2 values

  • The only page that can contain less than N/2 values is the root page, that can contain zero values (which mean the B-tree is empty), or 1 to N values

Let's continue with the definition: it's a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. Thanks Wikipedia :-)

  • Self-balancing: that means the height of the tree is automatically adjusted on each update to be evenly balanced. It allows accessing any data in a equal number of operations (well, 'equal' has to be taken with caution: we are talking about the number of nodes we have to read before fetching the data). For instance, if you have a B-tree that has 5 levels of Nodes, then fetching an element from this B-tree will 'cost' 5 node reads.

  • Sorted: the data are stored in a specific order, the lowest value being on the left of the B-tree and the highest value on the right of the B-tree. That also means you have to provide a comparator in order to be able to sort those values.

  • Search: it's the operation of going down the B-tree to find a specific value. There are various possibilities here but you always start from the top of the B-tree and go down the Nodes to the value you are looking for.

  • Sequential access: this is where it starts to be interesting. A B-tree allows you to browse the values, and to get them one by one in a very efficient way. Since those values are sorted, fetching the next value is a very simple operation. Typically, this is something you can't do with a Map

  • Insertions and deletions: It's always possible to add or remove values from a B-tree. This will potentially reorganize the B-tree structure, but usually not that much. Most of the time, one single Leaf will be modified, but sometimes many Nodes will be modified up to the root Node.

  • In logarithmic time: That is the key. In big O notation, it reads like O(logn). For instance, if each Node/Leaf holds 16 values, that means a B-tree storing 1 000 000 values will have a height of 7 maximum and 5 minimum (it depends how full are the Nodes/Leaves). If each Node/Leaf contain 8 values each, a 7 level B-tree can hold up to 8^7 values, ie 2,097,152 values. If each Node/Leaf contains 16 values, a 5 level B-tree can hold up to 16^5 values, ie 1,048,576 values. If we expect each Node/Leaves to be 2/3rd full, ie containing 10 values, then a 6 level B-tree can hold 1 000 000 values (10^6). That is pure math, nothing new under the sun!

That pretty much defines what a B-tree is.

Why have B-trees been invented?

So the whole idea was to speed up operations made on a binary-tree. In a binary-tree, values are stored so that each value has a left and right child. Searching in a binary tree is all about traversing nodes, down to the value you are looking for. The biggest issue is that binary-trees aren't necessary balanced: for any set of values, you may have any kind of distribution, from perfectly balanced binary-trees where at any point of the tree, the height of the B-tree is equal to log2(N), to a degenerated B-tree where you have N levels.

Actually, binary trees are faster because the number of comparisons you have to make to retrieve a value is also equal to the number of levels. If the binary-tree is balanced, this will be log2(nb values). In a B-tree, as you hold up to N values in each Page, the number of pages you will fetch is logN(nb values), but for each fetched page, you also have to do logN(nb values in the page) comparisons per level. So if N is 16, with 1 000 000 values stored, and each page 2/3rd full on average, you will do (nb levels) * log2( nb value per page) comparison. That is 6 * 4 comparison, ie 24 - where 4 stands for log2(10 values per page on average ) - instead of 20 (which is log2(1000000)).

Ok, so B-tree are less efficient than balanced binary-trees, that's a fact. Why then should we use a B-tree? Simply because we don't process data in memory only: we usually fetch them from disk. And that is the key: fetching something from a disk is a costly operation (even if you are using a SSD). Back when B-trees were invented, data were read from a sequential support: a band or a magnetic disk. The probability that we may fetch many consecutive values was extremely high, thus storing many values per page was a great idea.

Mavibot and ApacheDS

Mavibot was designed as a replacement for JDBM for the reason I explained in my previous post. Let's now see what are its requirements.

We needed some B-tree implementation in Java, under an AL 2.0 license, with support of cross-B-tree transaction. The solution was to implement a Multiversion concurrency control model, aka MVCC. What are the characteristics of such a system ?

  • No locks: no need for those since reads are working on immutable data, and we have only one single writer, which creates a new version;

  • Consistency: since data are immutable once written, a reader will always have access to them;

  • Crash resistance: As the written data are immutable, they will always be available in their initial state. A new version will be visible only when it has been completely written on disk, or will not be visible. In case of a hard crash, the system will start with the latest visible version, thus in a stable state;

  • Immediate availability on restart: In the event of a crash, readers will always see the latest version, regardless of a potential updates that could have been interrupted by the crash. In the worst case scenario, data being updated will be lost and the associated pages will remain unreachable;

Those characteristics come at a cost:

  • One single writer: as we want cross-B-tree transactions, that will limit seriously the update throughput;

  • Versions visibility: a version will remain visible until no reader uses it. This can be for quite a long time, so we may have to keep a lot of data on disk;

  • Disk space: the simple fact we keep many versions alive will consume more disk space than it would in a normal system;

  • Data relevance: since a reader is stuck with a given version, it cannot work on a updated version unless it starts a new read;

  • Complex cleanup: In order to discard 'dead data' (ie data for version not anymore in use), we need to implement a system known as a Garbage Collector, which can be complex and costly;

The third point (disk space) is important. In a B-tree, for any update, we have to copy each Node we are traversing to go to the leaf containing the data, and also copy this leaf. For a B-tree containing 1 million elements with pages holding up to 256 elements, any update will require 3 levels - 2 Nodes and a Leaf - to be copied. That means for 1000 updates, we will eat 3*(page size)*1000 bytes, ie 12Mb for 4096 byte pages. Even worse, we also have to keep some side data which will double this size. With a standard B-tree, we will just eat what is needed to hold those updates in the B-tree, ie something like a few Kb.

Now, if we keep all the versions, we might end up with a really huge database. By the way, this is what happens when you use git, except that git does not copy intermediate Nodes :-)

As I said, we needed Mavibot for Apache Directory Server, for which those drawbacks weren't critical: LDAP favors reads over writes, so having only one writer is a non issue. Data relevance is not a big issue: just because a phone number does not change that frequently, it's not really a problem that the number a reader has grabbed is not relevant anymore when the reader wants to use it. In the worst case scenario the reader will ask it again, and get the new one back. Now, for passwords, it might sounds a bit more critical, but actually, the time frame during which a password is read and used is really short, so when you change it, it's very unlikely to impact any reader. If it does, then it's not really a problem, because any logged application has been authenticated with the old password, and the administrator should have logged out anyone *before* allowing the password to be changed.

That being said, the most important feature for ApacheDS is consistency, which Mavibot provides. If you are to write a bank application, then this is not the right database for you...

How does it work ?

The B-tree we use in such a system is a bit specific. This is also a specific flavor of B-tree, because we decided not to store data in the Nodes.

Typically, we also cannot chain leaves together (in some B-tree implementations, chaining leaves increases browsing speed - because there is no need to go up to the parent Nodes to fetch the next values -. It also limits the number of locks needed to update the B-tree).


Let's see what's going on when we inject some values in a B-tree. Here, we will insert {E, A, D, I, K} in that order. The very first version will be:


Here, the root page is a Leaf, and contains one value, E. Then we insert the value A, which is added to the root page, and we create a copy of this root page for that purpose:


The insertion of the third value, D, can't be done in the root page, because it's already full. We have to create a parent Node which refers to two leaves, containing the three values:

As you can see, we have created two leaves and copied the root-page. Let's insert a fourth value, I:

Each new addition will copy some of the pages from the previous versions, but may keep some references to previously created pages. Here, version 4 has split the right leaf into 2 new pages, but we have kept the left leaf intact. The parent node is also copied. Now, a reader using version 3 will still see only three values (A, D and E) while a new reader will see 4 values ( A, D, E and I). In the process we have created 3 new pages.

Insertion is actually done bottom-up: we add the new value in the leaf, and if the leaf is already full, we create a new leaf, spread the data between the 2 leaves, and go back to the parent to add a new reference to the created leaves. Of course, this has to propagate up to the root page, and if the current root page is full, then we have to split it again. This is what happens when we insert a fifth value (K) in the previous B-tree:


This time, the B-tree has 3 levels. Version 5 would look like this to any new reader:


It's goes on and on like this. In real life, we won't store only 2 values in each leaf: that would be way too expensive. But it's easier when drawing what happens :-)

At some point in time, when no reader is pointing to a version, we can reclaim the unused pages (to be explained in another blog post).

I haven't talked about deletion. Let's see what happens when we delete the value E, for instance:


This time, we have just created a new root page, pointing to existing previous pages. Note that you can still browse the tree and get all the data for this version: {A, D, I, K}.

One side note: you can see that if we were to link leaves to each other creating a new version would leads us to copy the whole B-tree, not only new or modified pages.


The most frequent operation is browsing. We may want to find a specific value or to read many values, starting from a specific part of the B-tree. Typically, if the values are dates, then reading all the values after a given date is a matter of finding the starting date, and move on toward the end of the B-tree.

However it gets a bit more complicated in the absence of links between leaves.

Fetching a value

First let's see what happens when we try to fetch a value. We always start from the root Page. Here, let's say we want to retrieve the value I from version 5. I is not present in the root page, so we will go down the tree on the right side. The I key is present in the node at the second level, so we move again down to the right page, which is a leaf, and we can now get the data.

The rules are simple:

  • When in a Node, if the key is found, then go down to the right page

  • If the key is not found, go to the page on the left of the key immediately above the key, except if the key is higher than the highest present key: in this case, go the the right page of the latest present key

  • When the page is a leaf, if the key is not present, then we have no such key in the B-tree

The following image shows how we walk the tree to get the value we are looking for:


Browsing many values

Now, we can decide to browse the B-tree values, either from the beginning or from a given value. This is slightly more complex, because we can't easily move from one leaf to another: we have to go up to the parents to do that. The following image shows such a browsing path:


As we can see, we start from the bottom-left, get back to the parent, and down to the second leaf, back to the root page and down to the leaf containing E, up to the parent and down to the last leaf. Browsing is not a simple operation !

One important thing to note here though is that every reader will start from the root-page, and that the intermediary nodes are accessed many times. That gives us a clue about what pages need to stay in memory, but that will be discussed later on.


We are done with this blog post, which detailed the basic operations on a MVCC B-tree. In the next post, I will explore transactions.

Apache Mavibot history

by Emmanuel Lécharny

Posted on Friday February 17, 2017 at 12:23AM in Technology


First of all, let me introduce Apache Mavibot: it's a MVCC B+ tree library in Java under an AL 2.0 license (MVCC stands for Multi-Version Concurrency Control). The whole idea is to have a B-tree implementation that never crashes, and does not use locks to protect the data against concurrent access (well … while reading).

The B+ tree is a variant of a B-tree, where values are only stored in the leaves, not in internal nodes.

Ok. Good. You don't know much about Mavibot after this introduction, so I'll dig a bit deeper in this post. Let's start with the original idea.

Apache Directory, CouchDB, and some other databases...

Back in 2009, I was attending the Apache Conference in Oakland. I had been working on the Apache Directory project for a bit more than a 4 years and a half. Apache Directory is a LDAP server written in Java, and we chose to store data in B-trees. There was a very limited choice back then, and the library we used - and still use as of today - was JDBM , a java avatar of GDBM.

JDBM is written in Java, implements B+trees and has transactions support (experimental), but it has one big drawback: it's not a cross-B-trees transaction system. And that does not fit our requirement in LDAP.

An alternative could have been Berkeley DB &tm;, which released a Java edition of its database, but its license was incompatible with the AL 2.0 license. Moreover Berkeley DB was bought by Oracle in 2006, so it was simply not an option.

What’s wrong with using JDBM?

In LDAP, an update operation impacts one single entry but this entry uses many AttributeTypes, which can be indexed. In other words, an update will impact as many B-trees as we have indexes (and a bit more). In order to guarantee that an entry UPDATE is consistent, we must be sure that either all or none of the indexes have been flushed to disk: otherwise we might end with an inconsistent database, where some indexes are up to date when some other aren't.


Even worse, in the event of a crash, we might simply not be able to restart the server because the database gets corrupted (and sadly, we are experiencing this problem today...).

So in Oakland, I went to the Apache Couch-DB presentation (sadly, the slides are not anymore available), and was struck by the idea behind their database: MVCC. Crucially when you start to use the database at a given revision, you always see everything associated with this revision, up to the point you are done. Sounds like Git or Subversion … actually, it's pretty much the same mechanism.

Being able to process some read operations on a specific version of the database guarantees that no update will ever corrupt the data being processed. And every time we want to access the database, the very first thing it will do is to select the latest available version: this is all we will see during the operation processing. Perfect when you don't really care about having a fresh view of the stored data at any time, which is the case in LDAP.

But Apache CouchDB was written in Erlang :/ Anyway, the discussion we had with the Directory team was really about moving to a MVCC database.


Transactions are another big missing feature in LDAP. This is not something that was in the air back then: it was specified only one year later. Of course, the original specifications said that every operation is atomic, but there is no requirement for multiple operations to be atomic (and we often need to update two entries in LDAP, and to guarantee that those two operations are either completed, or roll-backed). Think about user/group management...

Alex Karasulu always had in mind that we needed a transactional database in Apache Directory, too. And his point was proved correct when years later, we faced the first database corruptions. It's a bit sad that we ignored this aspect for so long :/

Anyway, we needed (a) transactions and (b) a rock solid database that could resist any type of crash.


For some time, we tried to mitigate the consistency problems we had by adding tons of locks. As we weren't able to protect the database against concurrent reads and writes we made them exclusive (i.e. when some write is processed, no read can be processed). This was slightly better, but it came at a huge cost: a major slowdown when writes were done. Also it was not good enough: long-lasting searches were just killing us, as there were no solution to guarantee that an entry for which we had a reference would still be present in the database when we needed to fetch it. In such cases, we simply ended up by discarding the entry.  Last, not least, a crash in the middle of an update operation would leave the database in a potential inconsistent state, which would make it impossible to start again (this was somehow mitigated by adding a 'repair' mode lately, but this is just an horrible hack).

Mavibot first steps

So we needed something better, which turned out to be Mavibot. We started working on Mavibot in June 2012 (Jun 13 00:04:10 2012, exactly).

The funny thing is that OpenLDAP started to work on the exact same kind of database 1 year before (LMDB) - even if the discussion about the need for such a database started in 2009. Parallel discussions, parallel developments, we have always shared a lot!

The very first released version of Mavibot was out one year later, in June 2013, followed by 7 other versions (all of them milestones). At some point, we added a MVBT partition in ApacheDS, in 2.0.0-M13 (and it was using a SNAPSHOT!!! Mavibot 1.0.0-M1 was used in ApacheDS 2.0.0-M15). This was 'good enough' to have the LDAPJDBM, too ;-), but it didn't offer all we wanted to add: typically, we didn't have transaction support.

So why isn’t Mavibot the Apache Directory Server backend of choice today?

Well, we don't have cross B-tree transactions, so we are pretty much in the same situation as with JDBM (except that it's faster, and we also have a bulk-loader for Mavibot). Adding cross-B-trees transaction is not a piece of cake, and it requires some thinking. Sadly, it arrived at a moment where the team had less time to work on it (new jobs, family, you name it).

So in 2017, the effort has been rebooted, and we do expect to have a working version soon enough!

I'll blog later on about various technical aspects on Apache Mavibot, so keep tuned !