VoltDB - in memory, ACID compliant, partitioned, SQL database

superjared · on May 20, 2010

Early adopter signup gives access to the source, which is licensed under GPLv3. However, there is a binding Beta agreement which states that I may not redistribute the code/application.

The VoltDB software is being published as open source software under the Gnu Public License V3. In other words, as Early Release program members, you now have access to the source files for VoltDB and the development tools for browsing the source code and the current issues lists.

Please note, however, that the Beta agreement is still in effect until VoltDB is officially available. So we ask that you not redistribute the source or binaries beyond your own use at this time.

Is this legally enforceable?

lsb · on May 20, 2010

Section 7:

All other non-permissive additional terms are considered “further restrictions” within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying.

I don't think it's enforceable, but IANAL.

Scriptor · on May 20, 2010

What prevents them from just using the parts of the GPL they want?

koenigdavidmj · on May 20, 2010

>Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.

mbreese · on May 21, 2010

Somewhat ironically, the reason for this is that the FSF holds the copyright to the GPL, so only they can change it. Well, you could probably change it, but you couldn't call it the GPL anymore... and it would probably be copyright infringement.

I wonder what would happen if the GPL itself was licensed under the GPL... I suspect it would start raining turtles. :)

koenigdavidmj · on May 21, 2010

According to their FAQ you can change it, but you have to change the name and remove the preamble entirely.

stcredzero · on May 21, 2010

The GPL was devised as a way of turning Copyright against itself.

silas · on May 20, 2010

http://www.gnu.org/licenses/gpl-faq.html#DoesTheGPLAllowNDA

carbocation · on May 20, 2010

From that FAQ: "If someone asks you to sign an NDA for receiving GPL-covered software copyrighted by the FSF, please inform us immediately by writing to license-violation@fsf.org."

jhugg · on May 20, 2010

Also from that FAQ: "If the violation involves GPL-covered code that has some other copyright holder, please inform that copyright holder, just as you would for any other kind of violation of the GPL."

FYI: VoltDB owns the copyright to the code.

dedward · on May 21, 2010

Right.... So it's probably perfectly binding. If they hold the copyright outright, they can create such conundrums lik e conflicting licenses. If it incorporates the work of others though , it's tricky.

khafra · on May 20, 2010

Looks to me like it's carefully worded to be a non-binding NDA--they "ask" that you not redistribute the source, but they don't provide the code solely upon receipt of that promise, or threaten any penalties for it.

As far as I know, it's both legal and meaningless to ask for anything you like.

sad · on May 20, 2010

Legally, probably not. They aren't demanding that you not redistribute, just asking. And asking politely. They are banking on the good nature of people. Good for them! And us.

pinko · on May 20, 2010

Exactly. The Condor Project did this for many years and, miraculously, every one of the hundreds of users to whom we gave the source respected the request and the source never leaked. (It is now fully open.)

cabalamat · on May 20, 2010

If they don't want people to redistribute the code, why on earth are they GPL'ing it?

pinko · on June 14, 2010

I can't speak for them, but in our case we had major users who required open-source (mostly European govt. projects).

All our developers wanted to just post the source online and be done with it, but our boss had security concerns he wanted us to address first -- so in the meantime we gave people who needed it GPL'ed copies, and asked them politely not to redistribute until we were ready.

jfager · on May 21, 2010

Probably not, but if people start disrespecting their wishes, and the code is all their own (that is, does not include other parties' GPL'd code), then they can simply change the license and get the same effect.

xtacy · on May 20, 2010

I don't understand how they could avoid locks. They say that each replica runs independently and hence that avoids locking; what about write updates to the same object from different clients? That would need locking of some form at some level.

They should also explicitly discuss the failure models for which they're guaranteeing Durability. If you're scaling to "web-scale", and running this db in a data-centre, then, a single failure would wipe out a rack of machines. Coordinated failures are not uncommon in data-centres. What about those?

marcua · on May 20, 2010

You avoid locks by serializing transactions at any site. Since you're not waiting on disk (in memory DB) and each partition runs on its own block of memory and has its own cpu/thread, you simply don't let two transactions on the same partition run concurrently.

See http://cs-www.cs.yale.edu/homes/dna/papers/hstore-cc.pdf for cases where you want to run two transactions in the same location concurrently in h-store (the academic precursor to VoltDB).

xtacy · on May 20, 2010

I see; what about cases where transactions involve objects at multiple servers? I guess I am confusing server specific locks vs client specific locks.

cx01 · on May 20, 2010

In that case there will need to be some kind of locking, which is probably the reason the whitepaper recommends avoiding transactions that span multiple servers: "For multi-partition transactions, one engine distributes and coordinates work plans for the other engines. VoltDB assumes that an application designer can construct a partitioning/cloning scheme and a transaction design that makes a large majority of the transactions local to a single virtual node. "

stcredzero · on May 21, 2010

I had an idea for handling this situation that exploited the replication functionality. Basically, you prepare for a transaction by migrating the primary copy of all the objects involved to the same node. One would have to avoid deadlocks by sorting and serializing transaction-prep migration blocks. So transactions never span multiple servers, but you get this by possibly dramatically slowing down the time it takes to prepare for such transactions. (With the idea that the application designer is encouraged to avoid such transactions.)

cx01 · on May 21, 2010

If you do this for every transaction, you'll have more overhead than you'd have if you had just performed distributed consensus. But in principle you're right. It makes sense to put all the primary copies that are often used together on the same node. The problem is that it's hard to decide the ideal placement strategy. VoltDB puts this responsibility into the hands of the developers.

stcredzero · on May 24, 2010

The point is to a) not have to implement distributed consensus and b) require the user to have to design the database for localized transactions if performance is desired.

cx01 · on May 20, 2010

It seems that it's single-threaded, so all writes will be serialized, hence no locking.

itistoday · on May 20, 2010

From their whitepaper:

"Conventional databases experience disk and user stalls within transactions. Rather than let the CPU be idle during the stalls, those DBMSs interleave SQL execution from multiple transactions during the waits so the CPU is always busy. This is what requires much of the complex latching and locking overhead.

VoltDB doesn’t experience user stalls (since transactions happen within stored procedures) or disk stalls (because VoltDB processes data in main memory). Therefore, it is able to eliminate the overhead associated with multi-threading (latching) and locking. Each VoltDB execution engine is single-threaded and contains a queue of transaction requests, which it executes sequentially—and exclusively—against its data. Elimination of stalls and the need for locking and latching overhead and allows typical VoltDB SQL operations to complete in microseconds.

For single-partition transactions, each VoltDB engine operates autonomously. For multi-partition transactions, one engine distributes and coordinates work plans for the other engines. VoltDB assumes that an application designer can construct a partitioning/cloning scheme and a transaction design that makes a large majority of the transactions local to a single virtual node. Many common applications such as order-fulfillment, software as a service (SaaS), Web 2.0 and trading systems have this property."

http://www.voltdb.com/_pdf/VoltDBTechnicalOverviewWhitePaper...

stingraycharles · on May 20, 2010

That still doesn't completely answer the question how it avoids race conditions and the likes without locking. Assume I'm executing multiple long-running transactions, each hitting and modifying a lot (millions) of different objects. How are race conditions avoided in this case ? One engine that distributes and coordinates work plans for the other engines sounds nice, but does this essentially mean that only one of these transactions is able to run at the same time, even when they could be running concurrently when using locking?

wmf · on May 20, 2010

VoltDB is designed for short transactions that each touch one row, so you should expect pretty poor performance for long-running transactions.

stcredzero · on May 21, 2010

It's Prevayler all over again!

http://www.prevayler.org/

(Not really. Prevayler uses a log.)

adamilardi · on May 20, 2010

SQL doesn't scale didn't they get the memo...lol

braindead_in · on May 20, 2010

I guess in memory means that it never writes data to the disks. So what happens when the machine reboots?

cx01 · on May 20, 2010

From the FAQ: "Durability: VoltDB provides both replication of partitions (known as K-safety) and periodic database snapshots to ensure the availability of the data."

jws · on May 20, 2010

I expect there is a persistent store of some sort, you'll have to register to find out. I'm also unable to find a license without registering.

It does appear from the FAQ that you build your transactions as Java stored procedures. Constraining a generic SQL database that way will make it much easier to distribute and scale.

_wiv7 · on May 20, 2010

Their FAQ states: Durability: VoltDB provides both replication of partitions (known as K-safety) and periodic database snapshots to ensure the availability of the data.

I suppose it depends on your definition of "D". Seems like this might be acceptable if you had different replicas on different power sources, etc.

tlack · on May 20, 2010

In-memory DBs can take a lot of excellent shortcuts but I have to wonder if their usability is limited. After all, I'd think most interesting data sets end up quickly eclipsing the available RAM in most off the shelf/affordable servers.

cx01 · on May 20, 2010

Keep in mind that VoltDB targets the OLTP market, where datasets tend to be not that large. If you build a small cluster of 50 nodes with 256GB RAM each, you have about 4.3TB of storage (with a replication factor of 3).

st3fan · on May 20, 2010

Have to fill in a form to see a whitepaper. Yeah whatever.

Pahalial · on May 20, 2010

They're just gauging interest and making sure to get emails they can push adoption to later, it's a classic sales technique. Don't hate.

Regardless, the links they send are open:

VoltDB product overview: http://www.voltdb.com/_pdf/VoltDBOverview.pdf

VoltDB Technical Architecture white paper: http://www.voltdb.com/_pdf/VoltDBTechnicalOverviewWhitePaper...

jbellis · on May 20, 2010

It's based on the work discussed in http://cs-www.cs.yale.edu/homes/dna/papers/hstore-cc.pdf

ck2 · on May 20, 2010

So, how fast does it run ALTER ?

sovande · on May 20, 2010

GPLv3, wonder what their business model is.