Berkeley DB Architecture - NoSQL Before NoSQL Was Cool

antirez · on Feb 20, 2012

Berkley DB was not SQL, and accordingly to my own classification of NoSQL (that is, at least one of the two must be true: [1] A different data model compared to SQL, [2] A different tradeoff in CAP compared to traditional DBs) Berkley DB should be classified as a NoSQL database.

However I think it was not a NoSQL database for an important reason, it was only embedded in other programs, and not in the form of a networked server. Not only this raised the barrier to entry, it never made Berkeley DB an "object" in your infrastructure that you could use in different ways, and in a competitive way with other databases.

So the Berkeley DB creators did not started the NoSQL era long time before because they missed the importance of what they were doing, thinking that their DB was limited to just something you could bind to programs not needing all the power of an SQL database. At least this is what emerges form their choices. And the implication is that they were also considering relational databases as the only "real DBs" in my opinion.

So there was no real competition with traditional DBs, and it was not a NoSQL DB.

EDIT: I understand this is a mostly personal point of view, and not an objective critique, but I wanted to share it with HN nevertheless.

damian2000 · on Feb 21, 2012

And probably the reason is that is wasn't networked running as a server was to reduce network traffic, which was obviously a lot slower (and a bottleneck) back when it was created. Also, it would be trivial to expose its functionality as a networked server, if that's what you needed.

rbranson · on Feb 20, 2012

NoSQL is a time period characterized by a set of ideas, not a category. The data stores that came out of NoSQL were about using more specialized tools and rejecting the dogma that all data persistence should be built on an RDBMS.

Besides it's ability to be embedded, BerkeleyDB doesn't really provide any advantages over an RDBMS, other than raw simplicity. It's unlikely anyone looking at data persistence solutions would consider BDB alongside an RDBMS; they solve different problems.

I think BDB is a useful and well-designed tool, but it's not NoSQL.

luser001 · on Feb 20, 2012

I'd like to hear comparisons between Leveldb and Berkeley DB.

I'm using LDB in my most-recent project. I wrote a trivial HTTP wrapper around it using Mongoose to make it a "server". :)

I like the snapshotting feature, the support for transactions, the built-in support for accessing the db from multiple threads, and that it maintains keys in sorted order (this great for timestamped keys like I have).

I looked at BDB a while back, and IIRC it doesn't have snapshotting and sorted keys. Unsure about threads.

I use the snapshotting feature dump a backup copy of the database.

The BDB API seemed significantly more complex than Leveldb's. IIRC, the native API is C.

I like C++ and like that Leveldb's native API is C++. It uses the standard library string (which allows embedding null characters).

btbuilder · on Feb 21, 2012

Berkeley DB has many features and many modes. This, IMHO is part of the reason it has a steep learning curve (and why others have had a bad experience with it).

It is easy to programmatically configure it incorrectly. E.g. not use transactions when you should, fail to deal with deadlocks or not clean up locks of failed threads or processes.

Some of the mature features it has are:

Page-level locking for concurrency via threading and multi-process. Snapshot isolation for MVCC - you can get a high isolation level without read locks. Nested transactions. B-tree indexes. Two-phase commit. You can also throw all this out the window and use an in-memory database or non-durable

The newer versions even have replication.

BDB is a hard beast to tame properly, but is definitely fully featured, and mature.

The Oracle paper on the BDB backed SQLite makes an interesting read.

teraflop · on Feb 21, 2012

> I looked at BDB a while back, and IIRC it doesn't have ... sorted keys.

I'm not familiar enough with Berkeley DB to comment on the rest, but it definitely supports B-tree indexes which store keys in sorted order.

tete · on Feb 20, 2012

One simply has to mention Tokyo/Kyoto Cabinet/Tyrant.

http://fallabs.com/

With all the NoSQL Hype Redis and MongoDB get I think these are too often overseen. What's really nice is that you can use them in embedded (cabinet) or server (tyrant) "mode". What's also great about them is that they provide a lot of flexibility, since they are not just one database. Also great that they have official bindings that work. They have tons of features and sometimes an embedded database is really interesting when you don't want (need) to care about the protocol you send it over.

pak · on Feb 21, 2012

I just found these two little jewels and have been using them to provide fast access to a haystack of images (GB to TB size, each image about ~1KB). I am curious to hear what others think about moving from Tokyo to Kyoto--the website says Kyoto is recommended, but there is a real dearth of information about Kyoto (at least in English) out on the web, compared with the scattered breadcrumbs that you can sort of piece together for information on Tokyo. Most of those breadcrumbs are unfortunately silly benchmark articles instead of real experiences.

Also, I already did have one Tokyo Cabinet Hashtable go corrupt on me--which caused certain operations to lock up the CPU indefinitely. Hmm, that didn't ease my conscience. However, it was completely recoverable with "tchmgr optimize". It'd be nice to google for more war stories, but like I said... breadcrumbs so far.

premchai21 · on Feb 20, 2012

Do Tokyo Cabinet and Kyoto Cabinet still use totally unprotected mmap? Last I checked, truncating the DB file in the middle of an operation would take down the entire process with SIGBUS, and I assume a disk error would do something similar. This seems like it'd be bad for fault tolerance. Or are storage devices just that reliable nowadays, or does enough else break along with them that they're considered a critical component anyway?

hcles · on Feb 21, 2012

My experience with their B+ Tree database is that when your database size hits a Gb or more, closing the database can take up to 10mins or more (depending on writes queued), and opening a database that was not closed correctly rebuilds the entire database.

I've not found any such limitations with LevelDB

orp · on Feb 20, 2012

I've used BDB a lot over the last 10 years, embedded inside a C++ server, and I have to say I've been disappointed in its multi-threaded scalability.

Running a single threaded access pattern can easily get you 20K plus reads/sec, but if you try to run more threads the throughput per thread just goes down, up to a point where more threads actually slow you down.

Make sure to run extensive benchmarks if you consider using it in a multi-threaded application.

I've never tried running BDB in a multi-process architecture, so I have no idea how it'd behave when used that way.

rbranson · on Feb 20, 2012

BDB isn't really designed for scalable performance. It's for good performance and support for concurrent, ACID transactions. Tens of thousands of reads per second is reasonable for the vast majority of embedded applications, which often live on the desktop or mobile devices.

mcbain · on Feb 21, 2012

Coincidently, Keith Bostic (co-creator of BDB) announced his new project in the last few weeks: http://wiredtiger.com/

I don't know all that much about it, but given the background of those involved, should be worth keeping an eye on.

DanielRibeiro · on Feb 20, 2012

Direct links http://www.aosabook.org/en/bdb.html , http://news.ycombinator.com/item?id=3607914

alatkins · on Feb 20, 2012

Then there's Gelernter's Linda [1] and it's tuple space model [2], the granddaddy of non-relational data stores:

[1] http://en.wikipedia.org/wiki/Linda_(coordination_language)

[2] http://en.wikipedia.org/wiki/Tuple_space

jrydberg · on Feb 20, 2012

Some sane design lessions in the original material: http://www.aosabook.org/en/bdb.html

zandorg · on Feb 20, 2012

Is there an alternative to Berkeley DB which has a hash database file on disk? Rather than in RAM? The problem is I use BDB 1.86 on Windows, because it's free, but the DB size is limited to 2GB. I want to use an alternative which is just as fast and pretty much the same interface.

emidln · on Feb 20, 2012

This is designed to be very compatible with BDB: http://fallabs.com/tokyocabinet/ and is open source.

This builds on the ideas from BDB and Tokyo Cabinent with a newer codebase: http://fallabs.com/kyotocabinet/

Both have network server implementations available (see links from their pages).

luser001 · on Feb 20, 2012

Leveldb? https://leveldb.googlecode.com/svn/trunk/doc/index.html

Dunno about performance numbers vs BDB etc.

diego · on Feb 21, 2012

Check out Krati, from LinkedIn.

http://sna-projects.com/krati/

Maro · on Feb 20, 2012

Friends don't let friends use BerkeleyDB.

http://www.google.com/search?hl=en&q=bdb+corruption

cjensen · on Feb 21, 2012

My first exposure to BerkeleyDB was when Subversion was brand new. There was a perl script which converted from CVS to SVN and used BerkeleyDB during the conversion. I had to manually replace BerkeleyDB with a hand-written key/value store in the filesystem to make the script work.

I used Subversion since pre-1.0. The only problems I've ever had with Subversion were caused by BerkeleyDB failing to be sufficiently robust. Since Subversion eliminated BDB use, I've never had a problem with it.

BerkeleyDB is dead to me.