FerretDB: open-source MongoDB alternative

deepsun · on April 12, 2023

At one company we had one-off requests to run get some statistics from our mongodb.

Writing the proper queries proven pretty tricky whenever GROUP BY/JOINs were involved, so I used online converters between SQL -> mongo query.

But then I realized that PostgreSQL has nice JSONB type (supports indices for subfields). So I put all the mongo data into tables with a single JSONB column (or two columns id+data, if you prefer).

Turned out that was much faster to run analytical queries :)

"document" is just row

"collection" is just table

softfalcon · on April 12, 2023

I would recommend against modeling too many relations in a document DB. It can work, but it gets bogged down very quickly for the reasons you've stated.

My off-the-cuff suggestion is that document databases are not built to model normalized relational data. If you try to do so, you bring a whole new level of pain upon yourself. It is do-able, but it is hard and annoying.

I know this from extensive personal experience dealing with a large, highly relational Mongo database.

I am very glad you found a solution within Postgres that works for you, and your mapping of document to row and collection to table is very apt!

If possible, would you care to tell me what the size of said documents (rows) and collections (tables) are in your solution? I am curious if in another life, we might have built our tech stack on Postgres instead of MongoDB and been much happier for doing so.

deepsun · on April 13, 2023

We had everything de-normalized, as you pointed out.

However, every now and then we had some one-off questions that required using Mongo for something it's not designed for. However, migrating to postgres+JSONB made it easy to do both.

phamilton · on April 13, 2023

We've been on a big "Use postgres for everything" path. Mostly migrating dynamodb, elasticsearch, and redis to postgres.

The number of times we've had to do some one-off query that was orthogonal to production usage is higher than I predicted. There were so many times in the past where we just didn't fix something because it was too difficult to backfill a document store. We just hacked around it.

Now with postgres we fix the root of the problem instead of hacking around it. And we do so with a level of confidence we've never had with document stores.

dfragnito · on April 13, 2023

This is one of the reasons we created https://schemafreesql.com. We wanted SQL with some of the document store (NOSQL) features.

tasubotadas · on April 12, 2023

Joins only make sense in analytical context as a tool to get some additional data into your report and almost never as domain modeling concept. People should pretty much hardly ever use RDBMS for their domain data... but since everyone learns RDBMS as their first DB, we have these horrible ORM frameworks on every corner.

unusualmonkey · on April 13, 2023

Curious why you think this? I've found Relational databases pretty powerful, and learned mongo first.

tasubotadas · on April 13, 2023

They are powerful. But often their Power is unnecessary and counter productive for domain modeling.

Where are they really good is reporting. Analysts love rdbms. When I am on an analyst role, I love them them too. As an engineer I find them redundant.

More about domain modeling

https://dev.tasubo.com/2022/07/crash-course-domain-driven-de...

unusualmonkey · on April 16, 2023

1) That article says nothing against using rdms in my quick perusal, in fact it suggests MySQL!

2) In practice having the ability to run reports on your data can be super important and useful.

I'm still confused at the point you're trying to make.

tracker1 · on April 12, 2023

Yeah, I've kind of railed against ORMs for a while now... I'd just assume a simple data mapper (like Dapper for C#) or be really explicit in a scripted language with template queries.

skatanski · on April 12, 2023

If there were JOINs it may be a sign MongoDB was not the best DB for you. It means the model was relational and a relational DB would be a better fit. MongoDB lookups are discouraged in general, and especially in analytical workloads, which cover a lot of data. MongoDB is better for the scenario, where all you need is already in the document.

mgfist · on April 12, 2023

It may also mean poor schema design. You can absolutely model relational data in MongoDB without relying on joins. If you want to normalize your data don't pick Mongo.

eropple · on April 12, 2023

If you aren't normalizing, how are you ensuring that you don't avoid anomalies?

solatic · on April 13, 2023

Proper NoSQL design has a different perspective, it asks you "well, so what if you have an anomaly?". One example could be movies, with actors, producers, genres, etc. that could all be in separate tables in a relational database, or each movie could be a document in a document database. Now let's imagine that an actor changes their name a few years after the movie is released. Is it important to go back and change the name of the actor in each of the movies? Maybe, maybe not. Certainly you can't change the credits inside the movie itself. Maybe it's sufficient to just have a link to the actor's page, where the actor's up-to-date name is. Maybe it's insufficient if the actor will get upset that their name wasn't updated across their old movies.

You choose relational models when anomalies are unacceptable and non-relational models when anomalies are acceptable.

eropple · on April 13, 2023

That makes sense. I tend to go away from non-relational models for this reason, but it's definitely a matter of risk management.

victorbjorklund · on April 13, 2023

Thanks! That is a very useful example.

saghm · on April 12, 2023

Not sure if this is exactly what you're referring to, but my understanding is that picking the "right" schema for a document database to ensure that you don't end up with slower queries like mentioned elsewhere in the thread tends to benefit from thinking at a somewhat lower level of granularity than you would probably need to with a relational database. Instead of just identifying "one to one", "one to many", "many to many", it's useful to ask questions like "how 'many' is many"; as a simple example, if the "many" in "one to many" is on the order of 10, maybe it makes sense to embed them in an array rather than use a separate collection for them. It can also help to start from thinking about the types of queries you might want and then designing the schema based on them rather than starting by deciding on the schema and then having the queries be based on that; if you're going to want certain data to be accessed at the same time, you're probably going to find some way to store it together.

skatanski · on April 12, 2023

agreed, you can have reference fields to other collections and filter by them or query related data. But I wouldn't say its designed for relational workloads. Similarly I wouldn't use MongoDB for graph queries, even though it has an operator for just that.

mgfist · on April 16, 2023

> agreed, you can have reference fields to other collections and filter by them or query related data.

Yes it has that, but that's an anti-pattern. What I'm referring to is schema design. In Mongo you denormalize and store relational data in the same document as references. In SQL you a record might be split up into 3 tables, in Mongo that should all live in the same document. If you can't embed and need lookups, you should avoid Mongo and other nosql DBs

forgetfulness · on April 12, 2023

Should it be used as a database in the way that Postgres would be used as a database? It seems pretty unfit for that role, with its past durability issues and that writes have to be tailored to how the data will be consumed, due to lack of ad-hoc queries.

As something layered on top of the actual source of truth it may be reasonable, writing a materialized version of a costly query that's read very often, sure, but that's an optimization and it'd be competing with Redis for what matters there.

afiori · on April 14, 2023

MongoDB has pipeline queries https://www.mongodb.com/docs/manual/core/aggregation-pipelin... that allow for quite ad-hoc queries.

deepsun · on April 13, 2023

Same thoughts. I see mongo as a key-value store (string key -> string value), with a json wrapper over the values.

deepsun · on April 13, 2023

No, our app didn't use JOINs for the main work. But, as I wrote, every now and then we had some one-off requests that required either JOIN or GROUP BY or both.

With postgres+JSONB we could do both at the same time on live data.

phendrenad2 · on April 12, 2023

MongoDB really only makes sense when you have an extremely large set of documents and you don't need to do this kind of statistics.

deepsun · on April 13, 2023

Yep, although for large datasets I'd prefer multi-master architecture, depending what is needed, for example Blob storages or Cassandra for key-value, and BigQuery or Clickhouse for analytics.

We also thought we wouldn't need it, but turned out sometimes we wanted to run some stats.

nextaccountic · on April 14, 2023

Since FerretDB is built on top of Postgres, could it be an extension? So that one can mix some FerretDB tables and some general SQL tables in the same database (with proper foreign key constraints between them etc)

peterfarkas · on April 14, 2023

Yes, building an extension is something we are looking at, and I am glad to hear that you would be interested in this kind of mixed use case. Would definitely make things interesting, and more performant.

nextaccountic · on April 15, 2023

Pretty exciting!

What about optionally validating some columns with jsonschema? Perhaps using https://github.com/supabase/pg_jsonschema - is using other postgres extensions supported in FerretDB? (if not, maybe it's feasible to incorporate the code of pg_jsonschema in FerretDB?)

aleksi · on April 15, 2023

Yeah, that's planned: https://github.com/FerretDB/FerretDB/issues/77

packetlost · on April 12, 2023

The only problem I have with PostgreSQL's JSONB type is it strictly implements the JSON standard, so things like nulls and integers aren't supported. Depending on your data, this could be a real problem.

aleksi · on April 12, 2023

FWIW, JSONB supports JSON nulls, and all numbers are encoded as `numeric` type, storing integers and float64 values precisely (well, as much as possible for float64 values).

But you comment is correct for other data types like date-times and binary strings where some form of encoding/decoding is needed.

packetlost · on April 13, 2023

Yeah, I was misremembering. Our issues were related to NaNs, which we ended up encoding as nulls. I don't think we've had issues with dates yet.

udp · on April 12, 2023

The json standard has nulls

dang · on April 12, 2023

FerretDB: A truly open-source MongoDB alternative - https://news.ycombinator.com/item?id=29448906 - Dec 2021 (110 comments)

MangoDB has a new name - https://news.ycombinator.com/item?id=29407987 - Dec 2021 (5 comments)

Open source MongoDB drop-in replacement, built on top of Postgres - https://news.ycombinator.com/item?id=29096331 - Nov 2021 (2 comments)

MangoDB: An open-source MongoDB alternative - https://news.ycombinator.com/item?id=29071623 - Nov 2021 (200 comments)

Show HN: MangoDB - https://news.ycombinator.com/item?id=4139723 - June 2012 (2 comments)

yawnxyz · on April 13, 2023

Kind of cool that MangoDB started out as a joke and ended up as a real, solid project

ahachete · on April 12, 2023

I applaud this effort.

Many years ago I founded a project called ToroDB [1]. ToroDB had a vision very similar to that of FerretDB's: help MongoDB users feel at home on Postgres. This has far reaching implications, like allowing MongoDB applications to run without MongoDB (this is what FerretDB is essentially and what "ToroDB Server" was meant to be) or to replicate data from MongoDB to Postgres to improve the performance of analytical queries by several orders of magnitude (that was "ToroDB Stampede").

ToroDB ended up being discontinued. Timing was not right. At the time, NoSQL was exploding, and users "didn't want to look back to SQL" --until they learned the notable advantages, but it was a time consuming and hard job. Today, there's a much higher acceptance of SQL and most recognize that data querying in many cases goes through, or is significantly helped, by SQL.

I wish FerretDB a successful road and reach to where ToroDB didn't reach at the time. Good luck and congratulations on the 1.0 launch!

[1]: https://torodb.com

nacs · on April 12, 2023

Looks like they mentioned your DB in a (probably SEO-bait) blog post here:

https://blog.ferretdb.io/5-database-alternatives-mongodb-202...

peterfarkas · on April 12, 2023

FerretDB maintainer here. We consulted the ToroDB team before we started FerretDB, as we wanted to understand why they couldn't conquer the world. They were way ahead of their time. And what a cool logo!

satvikpendem · on April 12, 2023

As well, Postgres now has great support for jsonb and querying that data, so you can effectively use Postgres as if it were NoSQL, if you really wanted to.

kbd · on April 12, 2023

Ok, old man yelling at clouds moment finally coming for me. Now that we've been through the document-database heyday and are out the other end, what have we learned about where document databases are a good fit?

At the time I looked at them like a fad. "These script kiddies want to write javascript and ignore schemas. Let's see how well that works out for them." As expected, most of what I ever hear is regret.

Today, MongoDB has grown out of the reliability issues it had in the past, and Postgres has json features for the occasional times it's useful to store some loosely structured data along with otherwise relational data. Question is, what applications is a document-first database good for, outside of prototyping?

Edit: and to make sure I understand, FerretDB is a layer reimplementing MongoDB on top of Postgres and its json features?

peterfarkas · on April 12, 2023

Yes, FerretDB is a layer which implements the MongoDB wire protocol on top of Postgres. Right now we are using JSONB, but this affects performance and we need to depart from this strategy in the long run. We have an article which explains the concept [1].

I wouldn't go into the document vs. relational argument, all arguments for and against would have merit. There are valid use cases for document databases (take e-commerce, for example), and we should not discount the fact that using a relational database is just more complicated. Using vanilla Postgres for a MongoDB use case will not be feasible for someone who's focus is, let's say, mobile application development. There is a reason behind MongoDB's popularity - it just provides a great developer experience. This is what we are aiming to recreate on top of Postgres.

[1]: https://blog.ferretdb.io/pjson-how-to-store-bson-in-jsonb/

yawnxyz · on April 13, 2023

if we use Ferret for prototyping and eventually land on a schema, would it be possible to convert the FerretDB rows into more structured postgres? Would be so cool if it could just analyze and create a schema for you, you double check it, and it just works.

I think being able to convert back and forth would make it so worthwhile!

aleksi · on April 13, 2023

Yeah, we want to do that: https://github.com/FerretDB/FerretDB/issues/226

Zababa · on April 12, 2023

I'm curious about what makes document databases a good fit for e-commerce. I've never worked in that industry.

tylerchurch · on April 13, 2023

Two things off the top of my head:

1. Denormalized data is often a boon. One of your key use cases is "give me all the information about this Product so I can show it to the user." Forget joining a ton of tables, just do 1 read of 1 product and go on your way.

2. A certain amount of more freeform data is expected. Different product categories will have different sets of information. T-shirts and drinkware both have sizes, but these sizes have nothing to do with each other. You can model all this in a traditional SQL database, but you have to stop and think really hard about it, and potentially end up with a plethora of tables. In a document database, it's much easier to just add the data and go on your way.

Zababa · on April 14, 2023

That makes a lot of sense, thanks!

CodesInChaos · on April 13, 2023

> we need to depart from this strategy in the long run

What's the alternative you want to move to?

aleksi · on April 13, 2023

Probably a custom PostgreSQL extension for BSON support.

The biggest issue right now is that MongoDB compares and sorts values differently than PostgreSQL/jsonb. For that reason, we have to do a lot of filtering on the FerretDB side, and that can't be great for performance. Pushing more work on the database size should make FerretDB perform much better.

aranchelk · on April 12, 2023

> Question is, what applications is a document-first database good for, outside of prototyping?

Some criteria I’d use: * Data is already naturally segregated and won’t be shared/joined much during normal usage * App already has transactions modeled in the application layer (or I guess if consistency really doesn’t matter) * App would benefit from being geographically distributed. * App is written in a language with a strong type system, has high code quality, test coverage, etc.

In my case, my company makes a distributed project management application. All changes to projects are canonically ordered and applied in the application layer.

Data is stored in Cloudflare Durable Objects and R2. Hard to classify DOs but they’re a lot closer to a document store than an SQL Db.

There are some nice benefits that would be hard to replicate with a traditional SQL Db setup.

mitjam · on April 12, 2023

A use case I have for FerretDB is migrating existing apps off MongoDB without needing to change the code. I find it funny that you could call these now „legacy“ apps.

peterfarkas · on April 12, 2023

Exactly. Let's say a major company have a few applications which require MongoDB, and the rest of their applications are running on Postgres.

With FerretDB, they can migrate the app off MongoDB as you said, and keep Postgres only. Therefore they don't need to maintain internal knowledge on how to run MongoDB, or pay for MongoDB to run it for them (in which case they are not in control of their data, because it is all under MongoDB's account...).

This is one real world user example.

ocdtrekkie · on April 13, 2023

Yeah, anxiously waiting for FerretDB to support Meteor over here.

peterfarkas · on April 13, 2023

FerretDB maintainer here: we are working on it. We are already getting reports of successful migrations with Meteor apps. [1]

Would you mind sharing the specific Meteor app you are looking to get supported?

MeteorJS is among the most requested applications/frameworks our users are looking to use with FerretDB.

[1]: https://twitter.com/CowboyCaramel/status/1646089964126347264

ocdtrekkie · on April 13, 2023

I'm a contributor to the Sandstorm.io project, and Mongo continues to present a bit of a pickle for us. We can arguably write our way out of one upgrade, but upgrading to a later Mongo just punts the issue again. We'd much rather just leave Mongo behind.

CodesInChaos · on April 13, 2023

What kind of problems does mongo cause for sandstorm that postgres does not?

ocdtrekkie · on April 13, 2023

The core of the issue is Mongo does not seem intended to be upgraded reliably without intervention. Sandstorm is running on thousands of servers where the admins aren't equipped to handle Mongo upgrade issues, as well as within some Sandstorm apps which also use Mongo inside containers not intended to be user servicable.

One of the issues we hit is here: https://github.com/meteor/meteor/issues/11666 in which if you happened to have a Mongo database over eight years old (many Sandstorm servers have been deployed for that long!), you needed manual intervention to correct it, even if you had done intermediate version updates in between.

Meteor patched around this issue... but after dropping support for several releases of Mongo. So we essentially need to build our own automation which understands and can export old Mongo databases, and then import new Mongo databases, while shipping a Meteor app that can only run on one or the other, which has to auto-update smoothly, and recover from failure like if there isn't hard drive space to handle the process.

And then we also need to implement that within app sandboxes which can also arbitrarily terminate so that also has to recover well and we need to ship the logic to do this with every Mongo-backed app package until the end of time.

kiney · on April 12, 2023

Insert <look at me, I am the legacy app now> meme here

tracker1 · on April 12, 2023

Looks that way to me... I really liked a LOT about MongoDB, I don't think they had a good story for administration of scale + redundancy. RethinkDB, Cassandra and others have a ring + redundancy model which I think is easier to deal with. Where Mongo at least was limited to replication or sharding, and was a serious pain to deal with from the admin side imo.

Understanding the advantages and shortfalls is sometimes harder too. Mongo having multiple secondary indexes is pretty nice as well. Haven't dug into FerretDB, my first question is if it will run over the top of CockroachDB, as that's how I would probably want it configured. Then, I'm not sure if I wouldn't just use the JSONB surface in PostgreSQL/CockroachDB directly.

I think it just really depends on how/what you need to accomplish.

quasilyte · on April 12, 2023

Congrats with v1.0! I remember the times when it was called MangoDB. :D

aleksi · on April 12, 2023

Yeah, those were easier times for sure :)

audioheavy · on April 12, 2023

This sounds like an interesting hybrid implementation, and I understand its motivation. However, if distributed writes at low latency with the highest consistency guarantees are essential, combining those two layers (Mongo and PostgreSQL) isn't worth the complexity. It looks like you can administer it with both Mongo and PostgreSQL tools, but do you want to? Clusters, partitioning, replication? That's a lot of heavy lifting.

To fully disclose, I have a biased view on this given that I work for a (closed source) serverless, no-ops DB provider (Fauna) that implements a distributed transaction engine that is natively document-relational and doesn't compromise on relational (ACID, transactional) guarantees. Although not directly comparable to an OSS DB, The market expects software and services to abstract as much complexity as possible. It is hard to imagine how such a Mongo/Postgres hybrid could handle hyper-scale apps that require highly consistent distributed writes.

peterfarkas · on April 12, 2023

Thanks for the feedback. Any managed Postgres-compatible database which offers autoscaling works with FerretDB. You don't have to manage the Postgres side if it is already managed [1].

One example would be Yugabyte which is not yet officially supported, but was tested by Yugabyte with FerretDB. Same goes for CockroachDB, which was also tested to some extent. Neon would work, too.

On the other hand, FerretDB may not be a solution for all possible use cases out there, but no database is.

[1]: https://dev.to/aws-heroes/ferretdb-yugabytedb-on-kubernetes-...

https://dzone.com/articles/migrating-mongodb-collections-to-...

https://dzone.com/articles/using-cockroachdb-as-a-backend-fo...

https://dzone.com/articles/experimenting-with-unique-constra...

https://dzone.com/articles/cockroachdb-multiregion-abstracti...

singeezie · on April 12, 2023

if ferretDB truly does provides ad-hoc queries, indexing, aggregation, and real-time data processing, making it well-suited for a wide range of applications, then sure , it’s a great alternative . I can see all types of web apps even enterprise systems working well w it.

peterfarkas · on April 12, 2023

FerretDB maintainer here.

We already offer basic support for aggregation and indexing, and we are adding as many features as we can, see our roadmap [1].

We are mainly building on our user's experience with running FerretDB and add features to our roadmap accordingly. We are not aiming to implement the entire feature set of MongoDB, of course, but the majority of MongoDB workloads are not utilizing the full feature set, either.

[1]: https://github.com/orgs/FerretDB/projects/2/views/1

jzelinskie · on April 12, 2023

Congrats on the launch! Your GitHub Project board looks nice and clean. Do you have any posts or code for how your bot is managing issues there?

aleksi · on April 12, 2023

That’s easy, really. There was a person behind ferretdb-bot account – me. :) I still maintain our projects mostly manually.

That being said, we do have some automation in place. The public part is there: https://github.com/FerretDB/github-actions We are planning to do more there, open source the other part, and then blog about it.

brightball · on April 12, 2023

I’m still waiting for the NoSQL crowd to realize that the best use case was always handled by CouchDB.

yawnxyz · on April 13, 2023

I started picking up CouchDB and PouchDB and the DX is fantastic. Always wondered why there’s never been any hype or more examples around this stack?

brightball · on April 13, 2023

Mongo got popular at the same time and people cared more about top end benchmarks than anything else.

yawnxyz · on April 13, 2023

Apparently npm runs couchdb under the hood. There’s your answer to the “does it scale” question

manishsharan · on April 12, 2023

This looks great. I had been using Document Layer for FoundationDB but Apple has abandoned the Document Layer for the FoundationDB ; so this looks promising .

richieartoul · on April 12, 2023

Tigris(https://github.com/tigrisdata/tigris) is one of the supported FerretDB backends and Tigris is backed by FoundationDB, so you can still have the Mongo interface with the reliability and scaling of FDB if that's what you're looking for

0xbadcafebee · on April 13, 2023

But is it web scale?

exabrial · on April 13, 2023

Somewhere there's an actual implementation that copies everything to /dev/null, with the justification that if you're using mongodb, you probably don't care about your data anyway. Can't seem to find it.

aleksi · on April 13, 2023

I think you mean https://github.com/dcramer/mangodb

jchannon · on April 12, 2023

If it can run a replica set on an apple m1 which mongo can’t currently do then great!

vrtx0 · on April 12, 2023

Er, why can’t you run a MongoDB replica set on your M1?

I mean, I wouldn’t recommend running a ReplicaSet on the same host on any production host (it defeats the purpose), but for testing, I’ve run a sharded cluster w/ 3 replicas per shard…

Happy to try and help!

hartator · on April 13, 2023

Can you send me an email? Julien at/ serpapi.com

mathattack · on April 13, 2023

Probably a huge cost savings to turn off Al those MongoDB nodes.

chunsj · on April 13, 2023

I don’t think it is that matter whether a license is OSI compliant or not. The more important one is whether it is a Free Software License compliant like GPL3 preferably.

bitwize · on April 13, 2023

Sigh. Again.

"Free software" is not coterminous with copyleft. It includes copyleft and non copyleft free licenses like BSD. Even Stallman would happily call BSD- or MIT-licensed stuff free software.

The OSD was carefully chosen to incorporate what was already known as free software.

tristan957 · on April 12, 2023

One thing that I didn't see discussed was why FerretDB uses Postgres as a backend and not a storage engine. Perhaps I am just missing something though.

PeterZaitsev · on April 12, 2023

Because PostgreSQL Is fantastic Open Source database engine, very popular with a lot of operational experience and tooling support.

This is Open Source way of doing things - having project focus on as narrow problem as possible (first) and leverage as much of existing componets as possible.

tristan957 · on April 12, 2023

Ok, that makes sense. I guess this project is targeting a compatible frontend for multiple backends. When I saw MongoDB alternative, I thought they were building both sides of the coin.

rmangi · on April 12, 2023

Cramming an api for one product on top of another is absolutely not “the open source way” I’d expect a much better explanation for this Frankenstein’s monster. This seems like it would work but never scale.

themerone · on April 13, 2023

Just a few of the things you need to build if you don't do it this way. 1. A storage layer, 2. A network protocol with clients and servers, 3. A replication system, 4. an authentication system. 5. Administrative tools.

If you roll everything yourself, you pretty much have to reinvent everything PostGres has except the SQL and relational aspects.

djha-skin · on April 13, 2023

When MongoDB came out it was sort of the anti-SQL database, so it is interesting that this flavor of the same kind of database is backed by postgreSQL.

coding123 · on April 13, 2023

How compatible is it, supposedly the mongo drivers can connect to it. Can you connect spark mongo connector to it?

vorticalbox · on April 12, 2023

Does ferret support transactions and document write locking?

peterfarkas · on April 12, 2023

Our development roadmap [1] is based on the needs of the users and the software we are focusing on building compatibility with.

Interestingly, transactions are not commonly used and therefore it is not high on our priority list, it may take months until we get to it. Nevertheless, if enough users would request it, we would reconsider. The same goes for any other item on the roadmap, please check it out.

[1]: https://github.com/orgs/FerretDB/projects/2

hartator · on April 13, 2023

Any benchmarks against regular MongoDB?

josevalerio · on April 12, 2023

Another day on the orange site where people don't understand that you can 100% have relationships in a NoSQL database - just don't try to do it the same way you do in a "relational" one or you're going to have a bad time. Storing everything in one collection is critical.

Here is an example using multi key indexes with Rick Houlihan from re:Invent 2022: https://youtu.be/eEENrNKxCdw?t=1131

that_guy_iain · on April 12, 2023

I’m confused, to me it seems that MongoDB is also open-source. Their new license is basically just a better agpl. I only point this out because the blog post makes a point of saying it’s not open source.

peterfarkas · on April 12, 2023

By definition, SSPL is not an open-source license [1].

One could argue that the SSPL brings justice to open-source, as cloud providers will not be able to monetize a project without giving anything back. SSPL forces them to pay a license fee.

On the other hand, this also creates a vendor lock-in situation for the user, simply because not all service providers will be able to negotiate a deal with the developer of the SSPL-licensed product. This limits choice, limits competition between providers, among other things.

This license fee may also be increased at the sole discretion of the developer, and the increased fee will be paid by the user in the end. This all doesn't sound open-source to me at all.

[1]: https://blog.opensource.org/the-sspl-is-not-an-open-source-l...

vrtx0 · on April 12, 2023

MongoDB’s source code is still freely available. It’s still actively developed in the open on GitHub. Unless you’re offering MongoDB as a service, it’s just as “open source” as ever.

If you want to offer MongoDB as a service, you can still do so free of charge, as long as the service infrastructure is also made openly available, right? And if you don’t want to make the source available, you can purchase a license and do so, right?

Furthermore, if you’re using MongoDB, be it self-hosted, with a vendor, or even some proprietary database that implements the wire protocol for compatibility, you’re likely using MongoDB-developed clients/drivers, which are Apache 2.0 (“OSI-approved open source”).

So it seems like the only way to be locked into a vendor is if you’re using MongoDB drivers to connect with a 3rd party database that doesn’t fully implement all functionality of MongoDB in a compatible way… Right?

I could be wrong, but as someone who contributed to MongoDB as an open source project, and was later hired by MongoDB based on said contributions, it kinda hurts to see the OSI’s “MongoDB isn’t open source anymore” campaign work so well.

That said, I sincerely wish the team behind this project all the best!

My complaints aren’t against anyone in the open source communities I’ve known and loved. Just this self-important legal organization that acts like it controls (and even gets to define) open source software.

P.S. I left MongoDB in 2015 due to a neurological disability, but it was one of the highlights of my career, with so many kind and brilliant people. But it’s also been a while, and my brain doesn’t work so well these days, so please correct me if I got anything wrong!

zokier · on April 12, 2023

> If you want to offer MongoDB as a service, you can still do so free of charge, as long as the service infrastructure is also made openly available, right?

The license text is worded as such that it is basically impossible to comply with.

> OSI’s “MongoDB isn’t open source anymore” campaign work so well.

It is hardly OSIs campaign. Pretty much all major organizations involved in FOSS licensing have rejected sspl. For example this is Fedoras stance:

> Fedora considers the Server Side Public License (v1) to be a Non-Free license. It is the belief of Fedora that the SSPL is intentionally crafted to be aggressively discriminatory towards a specific class of users. Additionally, it seems clear that the intent of the license author is to cause Fear, Uncertainty, and Doubt towards commercial users of software under that license. To consider the SSPL to be "Free" or "Open Source" causes that shadow to be cast across all other licenses in the FOSS ecosystem, even though none of them carry that risk.

ahachete · on April 12, 2023

> "Unless you’re offering MongoDB as a service, it’s just as “open source” as ever."

It's not. It's source available, and that's still proprietary.

The reason why the SSPL is not Open Source is pretty simple: it doesn't convey the four essential freedoms of free software [1]. That's it, there is essentially no more to it: it imposes usage restrictions, and that's orthogonality against the spirit.

Whether it is "except this or that" or "AGPLv3 but with this additional clause" is irrelevant: even the tiniest change can cause significant differences, and this is the case: it removes the freedom to run the program as you want, as it imposes restrictions to some use cases. And these restrictions go beyond the realm of the software itself.

In contrast, Open Source copyleft software, like AGPLv3, never go beyond the software itself. It provides guarantees that modified versions of it also remain available for users of modified software (forward carrying guarantees) but do not add a requirement to also provide under the same license other unrelated software (which is essentially a nice way of saying "simply don't do this", turning de facto into a usage restriction).

[1]: https://www.gnu.org/philosophy/free-sw.en.html#four-freedoms

PeterZaitsev · on April 12, 2023

"You can do so as soon as you open source infrastructure" is impractical and misleading. It is very likely part of infrastructure will be commercially licensed so even if one would want to open source it, it would not be possible.

MongoDB Specifically switches from Open Source License to SSPL to create monopoly in DBaaS Space. IT is business decision so lets not pretend here.

If you look at Real Open Source software it is created for cooperation and innovation together, not monopoly.

vrtx0 · on April 12, 2023

Er, if you develop the infrastructure to host MongoDB, you should absolutely be able to open-source that infrastructure. I mean, before MongoDB, I wrote a cluster management system for virtualized software security and hypervisor research, and all of it was either open source or something I wrote…

Also, if you bought something closed-source to sell MongoDB as a service, why isn’t it realistic to buy a license?

Your suggestion of a monopoly in the DBaaS space seems to preclude the existence of other databases… Or am I misunderstanding?

I’m not sure what you mean by “IT is a business decision” — could you elaborate?

-edit- P.S. I’m trying to be supportive here; not trying to take anything away from what you’ve built with FerretDB! Honestly, there’s room for so room for innovation in this domain, and it’s nice to see new projects…

zokier · on April 12, 2023

> Also, if you bought something closed-source to sell MongoDB as a service, why isn’t it realistic to buy a license?

Lets say you are running your infra on any cloud provider. Do you think its realistic to get them to hand out their source code?

vrtx0 · on April 13, 2023

The cloud provider wouldn’t have to, if you are the one running your infra. The SSPL restrictions only apply to businesses that offer MongoDB as a service.

In fact, you can build a similar service and offer it within your organization (and subsidiaries), and you still don’t have to release anything. The license only applies to companies like Amazon if they offer MongoDB as their own service (DocumentDB).

I know that’s a bit tangential, but hope that helps a bit?

that_guy_iain · on April 12, 2023

Personally, that blog post is just arrogance from OSI. I read it. It didn’t make clear which freedoms were being removed. They are still able to run cloud hosting just release their custom code?

And OSI thinking they alone get to decide what is and what is not open source is arrogant.

peterfarkas · on April 12, 2023

While I do respect your opinion, I disagree.

I think that arrogance is when a single vendor tries to single-handedly redefine open-source to fit their business needs better. Not just a license, but the definition itself.

A quote from the MongoDB CEO: "MongoDB was built by MongoDB. There was no prior art. We didn't open source it for help; we open sourced it as a freemium strategy". [1]

Whether the OSI was arrogant or not, I really don't want this person to define opensource.

[1]: https://techmonitor.ai/leadership/strategy/mongodb-ceo-inter...

that_guy_iain · on April 12, 2023

I am not really sure what your issue is. What you think open-source as freemium isn’t a model?

If it wasn’t for copy-left licenses that forced companies to release code that they used to create their products we wouldn’t have Linux in its current state.

In a world, where many products are cloud based it seems fair and within the current model of open source to force code sharing.

aleksi · on April 12, 2023

> They are still able to run cloud hosting just release their custom code?

The SSPL license reads:

> […] you must make the Service Source Code available via network download to everyone at no charge, under the terms of this License.

> “Service Source Code” means the Corresponding Source for the Program or the modified version, and the Corresponding Source for all programs that you use to make the Program or modified version available as a service, including, without limitation, management software, user interfaces, application program interfaces, automation software, monitoring software, backup software, storage software and hosting software, all such that a user could run an instance of the service using the Service Source Code you make available.

You just can't comply with those terms if you don't have access to the source code of your storage software, for example.

(but I'm not a lawyer, of course)

> And OSI thinking they alone get to decide what is and what is not open source is arrogant.

OSI invented the term “open source”: https://web.archive.org/web/20021001164015/http://www.openso... I think they are in a position to define what it means.

vrtx0 · on April 12, 2023

No, the term “open source” was in use before long before the OSI, and it was in popular/hacker culture in the 1980s. UNIVAC used it in for a major system in the 1950s. [1] I used it in 1993 (I wrote a small BBS).

The OSI looks and sounds like an authority on open source software, but their entire strategy is legal, political and quasi-philosophical. I get how easy it is to be mislead by them though — they’re good at spinning things and rewriting history.

https://en.m.wikipedia.org/wiki/History_of_free_and_open-sou...

PeterZaitsev · on April 12, 2023

There are 3 "competing" Open Source and Free Software definitions - from OSI, Free Software Foundation and Debian. MongoDB does not match any of them and most importantly does not match the spirit of Open Source Software Movement.

vrtx0 · on April 12, 2023

I could name more, but let me clarify something. I’m not a fan of the SSPL, but I get why it was necessary.

It was rough seeing huge cloud providers profit off open source projects without giving anything back. When they offered competing hosting services with no value added (well, past “integrated billing”), no contributions or innovation, and drove their new customers to the documentation and libraries of the companies backing these projects, they crossed a huge line.

And it’s not just MongoDB. Or Elastic. Just look at all the “services” AWS offers, and note how many AWS actually invented or even contributed to…

Monopolistic practices forced a lot of companies to either shut down, or find a way to survive. I’m glad MongoDB decided to use the SSPL instead of shut down like so many others. I’m glad they’ve continued to thrive.

Changing to the SSPL isn’t ideal, but it only impacts people who want to sell hosted versions of the software (not users, self-hosted or otherwise). For those infinitesimal few selling hosted versions of the software, it doesn’t even stop them from doing what they want — it just stopped the monopolies from destroying something a lot of people dedicated a lot of effort to... That seems like a pretty amazing feat to me, given the reality...

I wish the OSI wasn’t so successful painting users of the SSPL as somehow betraying the open source community. And I wish the SSPL wasn’t necessary. But until there a better option, I’m ok with the SSPL…

Again, I say this with all due respect, and this is just my opinion. Corrections and new perspectives welcome!

PeterZaitsev · on April 13, 2023

Well, In my opinion this is exactly how Open Source is suppose to work - you get benefit of collaboration on writing the code and promoting your software by broad community, you give more value to your customers because they have a choice of vendors rather than single vendor lockin but also as you give up on having monopoly it is well possible someone else will be making more money than you on your product.

You mention Elastic - do not forget it was built on top of Lucene, capturing most of the value in that project.

It DOES very much impact users because users increasingly want DBaaS experience and if the only one you can get is through MongoDB or MongoDB authorized partnershp it is really no different than proprietary software.

In any case I agree for certain users SSPL is just a good as Open Source, same however can be said about Proprietary Software - some who just "buy subscription" do not care.

that_guy_iain · on April 12, 2023

Please state in what way they don’t? This is what I’m confused about. Apgl is open source but a sspl isn’t? They seem to be aimed at solving the same thing, which is cloud/server based code modifications.

peterfarkas · on April 12, 2023

SSPL goes way beyond AGPL as it contains an additional clause called "Offering the Program as a Service". It is not defined at all what constitutes providing MongoDB as a service. How many layers are needed to abstract MongoDB in a way that it doesn't trigger this clause? There are no answers for that in the SSPL. It comes with an enormous amount of risk compared to AGPL. There is a good article from Dor Laor at ScyllaDB which explains this in detail [1].

Moreover, cloud providers are not limited to AWS, Azure and GCP. Smaller providers whom we talked to are not able to negotiate licensing terms with MongoDB the same way as how AWS could. For this reason, these providers are not able to provide MongoDB as a service. Yes, it's great for MongoDB that they were stopped from providing MongoDB for free, but now they can't provide the service at all. This limits competition and choices, and that is never in the favor of users.

[1]: https://www.scylladb.com/2018/10/22/the-dark-side-of-mongodb...

that_guy_iain · on April 12, 2023

AGPL is basically the same thing, except far more reaching. AGPL, to my understanding, makes it a requirement if you allow people to use the software via network levels. “Program as a Service” seems a lot more limited in scope.

All new licenses come with risk since the decisions can only be made via court decisions.

aleksi · on April 13, 2023

> AGPL, to my understanding, makes it a requirement if you allow people to use the software via network levels.

AGPL requirements do not trigger if you don't modify the source code. The relevant text is in section 13:

> […] if you modify the Program, your modified version must prominently offer all users […]