At one company we had one-off requests to run get some statistics from our mongodb.
Writing the proper queries proven pretty tricky whenever GROUP BY/JOINs were involved, so I used online converters between SQL -> mongo query.
But then I realized that PostgreSQL has nice JSONB type (supports indices for subfields). So I put all the mongo data into tables with a single JSONB column (or two columns id+data, if you prefer).
Turned out that was much faster to run analytical queries :)
I would recommend against modeling too many relations in a document DB. It can work, but it gets bogged down very quickly for the reasons you've stated.
My off-the-cuff suggestion is that document databases are not built to model normalized relational data. If you try to do so, you bring a whole new level of pain upon yourself. It is do-able, but it is hard and annoying.
I know this from extensive personal experience dealing with a large, highly relational Mongo database.
I am very glad you found a solution within Postgres that works for you, and your mapping of document to row and collection to table is very apt!
If possible, would you care to tell me what the size of said documents (rows) and collections (tables) are in your solution? I am curious if in another life, we might have built our tech stack on Postgres instead of MongoDB and been much happier for doing so.
We had everything de-normalized, as you pointed out.
However, every now and then we had some one-off questions that required using Mongo for something it's not designed for. However, migrating to postgres+JSONB made it easy to do both.
We've been on a big "Use postgres for everything" path. Mostly migrating dynamodb, elasticsearch, and redis to postgres.
The number of times we've had to do some one-off query that was orthogonal to production usage is higher than I predicted. There were so many times in the past where we just didn't fix something because it was too difficult to backfill a document store. We just hacked around it.
Now with postgres we fix the root of the problem instead of hacking around it. And we do so with a level of confidence we've never had with document stores.
Joins only make sense in analytical context as a tool to get some additional data into your report and almost never as domain modeling concept. People should pretty much hardly ever use RDBMS for their domain data... but since everyone learns RDBMS as their first DB, we have these horrible ORM frameworks on every corner.
They are powerful. But often their Power is unnecessary and counter productive for domain modeling.
Where are they really good is reporting. Analysts love rdbms. When I am on an analyst role, I love them them too. As an engineer I find them redundant.
Yeah, I've kind of railed against ORMs for a while now... I'd just assume a simple data mapper (like Dapper for C#) or be really explicit in a scripted language with template queries.
If there were JOINs it may be a sign MongoDB was not the best DB for you. It means the model was relational and a relational DB would be a better fit. MongoDB lookups are discouraged in general, and especially in analytical workloads, which cover a lot of data. MongoDB is better for the scenario, where all you need is already in the document.
It may also mean poor schema design. You can absolutely model relational data in MongoDB without relying on joins. If you want to normalize your data don't pick Mongo.
Proper NoSQL design has a different perspective, it asks you "well, so what if you have an anomaly?". One example could be movies, with actors, producers, genres, etc. that could all be in separate tables in a relational database, or each movie could be a document in a document database. Now let's imagine that an actor changes their name a few years after the movie is released. Is it important to go back and change the name of the actor in each of the movies? Maybe, maybe not. Certainly you can't change the credits inside the movie itself. Maybe it's sufficient to just have a link to the actor's page, where the actor's up-to-date name is. Maybe it's insufficient if the actor will get upset that their name wasn't updated across their old movies.
You choose relational models when anomalies are unacceptable and non-relational models when anomalies are acceptable.
Not sure if this is exactly what you're referring to, but my understanding is that picking the "right" schema for a document database to ensure that you don't end up with slower queries like mentioned elsewhere in the thread tends to benefit from thinking at a somewhat lower level of granularity than you would probably need to with a relational database. Instead of just identifying "one to one", "one to many", "many to many", it's useful to ask questions like "how 'many' is many"; as a simple example, if the "many" in "one to many" is on the order of 10, maybe it makes sense to embed them in an array rather than use a separate collection for them. It can also help to start from thinking about the types of queries you might want and then designing the schema based on them rather than starting by deciding on the schema and then having the queries be based on that; if you're going to want certain data to be accessed at the same time, you're probably going to find some way to store it together.
agreed, you can have reference fields to other collections and filter by them or query related data. But I wouldn't say its designed for relational workloads. Similarly I wouldn't use MongoDB for graph queries, even though it has an operator for just that.
> agreed, you can have reference fields to other collections and filter by them or query related data.
Yes it has that, but that's an anti-pattern. What I'm referring to is schema design. In Mongo you denormalize and store relational data in the same document as references. In SQL you a record might be split up into 3 tables, in Mongo that should all live in the same document. If you can't embed and need lookups, you should avoid Mongo and other nosql DBs
Should it be used as a database in the way that Postgres would be used as a database? It seems pretty unfit for that role, with its past durability issues and that writes have to be tailored to how the data will be consumed, due to lack of ad-hoc queries.
As something layered on top of the actual source of truth it may be reasonable, writing a materialized version of a costly query that's read very often, sure, but that's an optimization and it'd be competing with Redis for what matters there.
No, our app didn't use JOINs for the main work. But, as I wrote, every now and then we had some one-off requests that required either JOIN or GROUP BY or both.
With postgres+JSONB we could do both at the same time on live data.
Yep, although for large datasets I'd prefer multi-master architecture, depending what is needed, for example Blob storages or Cassandra for key-value, and BigQuery or Clickhouse for analytics.
We also thought we wouldn't need it, but turned out sometimes we wanted to run some stats.
Since FerretDB is built on top of Postgres, could it be an extension? So that one can mix some FerretDB tables and some general SQL tables in the same database (with proper foreign key constraints between them etc)
Yes, building an extension is something we are looking at, and I am glad to hear that you would be interested in this kind of mixed use case. Would definitely make things interesting, and more performant.
What about optionally validating some columns with jsonschema? Perhaps using https://github.com/supabase/pg_jsonschema - is using other postgres extensions supported in FerretDB? (if not, maybe it's feasible to incorporate the code of pg_jsonschema in FerretDB?)
The only problem I have with PostgreSQL's JSONB type is it strictly implements the JSON standard, so things like nulls and integers aren't supported. Depending on your data, this could be a real problem.
FWIW, JSONB supports JSON nulls, and all numbers are encoded as `numeric` type, storing integers and float64 values precisely (well, as much as possible for float64 values).
But you comment is correct for other data types like date-times and binary strings where some form of encoding/decoding is needed.
Many years ago I founded a project called ToroDB [1]. ToroDB had a vision very similar to that of FerretDB's: help MongoDB users feel at home on Postgres. This has far reaching implications, like allowing MongoDB applications to run without MongoDB (this is what FerretDB is essentially and what "ToroDB Server" was meant to be) or to replicate data from MongoDB to Postgres to improve the performance of analytical queries by several orders of magnitude (that was "ToroDB Stampede").
ToroDB ended up being discontinued. Timing was not right. At the time, NoSQL was exploding, and users "didn't want to look back to SQL" --until they learned the notable advantages, but it was a time consuming and hard job. Today, there's a much higher acceptance of SQL and most recognize that data querying in many cases goes through, or is significantly helped, by SQL.
I wish FerretDB a successful road and reach to where ToroDB didn't reach at the time. Good luck and congratulations on the 1.0 launch!
FerretDB maintainer here. We consulted the ToroDB team before we started FerretDB, as we wanted to understand why they couldn't conquer the world. They were way ahead of their time. And what a cool logo!
As well, Postgres now has great support for jsonb and querying that data, so you can effectively use Postgres as if it were NoSQL, if you really wanted to.
Ok, old man yelling at clouds moment finally coming for me. Now that we've been through the document-database heyday and are out the other end, what have we learned about where document databases are a good fit?
At the time I looked at them like a fad. "These script kiddies want to write javascript and ignore schemas. Let's see how well that works out for them." As expected, most of what I ever hear is regret.
Today, MongoDB has grown out of the reliability issues it had in the past, and Postgres has json features for the occasional times it's useful to store some loosely structured data along with otherwise relational data. Question is, what applications is a document-first database good for, outside of prototyping?
Edit: and to make sure I understand, FerretDB is a layer reimplementing MongoDB on top of Postgres and its json features?
Yes, FerretDB is a layer which implements the MongoDB wire protocol on top of Postgres. Right now we are using JSONB, but this affects performance and we need to depart from this strategy in the long run.
We have an article which explains the concept [1].
I wouldn't go into the document vs. relational argument, all arguments for and against would have merit. There are valid use cases for document databases (take e-commerce, for example), and we should not discount the fact that using a relational database is just more complicated. Using vanilla Postgres for a MongoDB use case will not be feasible for someone who's focus is, let's say, mobile application development. There is a reason behind MongoDB's popularity - it just provides a great developer experience. This is what we are aiming to recreate on top of Postgres.
if we use Ferret for prototyping and eventually land on a schema, would it be possible to convert the FerretDB rows into more structured postgres? Would be so cool if it could just analyze and create a schema for you, you double check it, and it just works.
I think being able to convert back and forth would make it so worthwhile!
1. Denormalized data is often a boon. One of your key use cases is "give me all the information about this Product so I can show it to the user." Forget joining a ton of tables, just do 1 read of 1 product and go on your way.
2. A certain amount of more freeform data is expected. Different product categories will have different sets of information. T-shirts and drinkware both have sizes, but these sizes have nothing to do with each other. You can model all this in a traditional SQL database, but you have to stop and think really hard about it, and potentially end up with a plethora of tables. In a document database, it's much easier to just add the data and go on your way.
Probably a custom PostgreSQL extension for BSON support.
The biggest issue right now is that MongoDB compares and sorts values differently than PostgreSQL/jsonb. For that reason, we have to do a lot of filtering on the FerretDB side, and that can't be great for performance. Pushing more work on the database size should make FerretDB perform much better.
> Question is, what applications is a document-first database good for, outside of prototyping?
Some criteria I’d use:
* Data is already naturally segregated and won’t be shared/joined much during normal usage
* App already has transactions modeled in the application layer (or I guess if consistency really doesn’t matter)
* App would benefit from being geographically distributed.
* App is written in a language with a strong type system, has high code quality, test coverage, etc.
In my case, my company makes a distributed project management application. All changes to projects are canonically ordered and applied in the application layer.
Data is stored in Cloudflare Durable Objects and R2. Hard to classify DOs but they’re a lot closer to a document store than an SQL Db.
There are some nice benefits that would be hard to replicate with a traditional SQL Db setup.
A use case I have for FerretDB is migrating existing apps off MongoDB without needing to change the code. I find it funny that you could call these now „legacy“ apps.
Exactly. Let's say a major company have a few applications which require MongoDB, and the rest of their applications are running on Postgres.
With FerretDB, they can migrate the app off MongoDB as you said, and keep Postgres only. Therefore they don't need to maintain internal knowledge on how to run MongoDB, or pay for MongoDB to run it for them (in which case they are not in control of their data, because it is all under MongoDB's account...).
I'm a contributor to the Sandstorm.io project, and Mongo continues to present a bit of a pickle for us. We can arguably write our way out of one upgrade, but upgrading to a later Mongo just punts the issue again. We'd much rather just leave Mongo behind.
The core of the issue is Mongo does not seem intended to be upgraded reliably without intervention. Sandstorm is running on thousands of servers where the admins aren't equipped to handle Mongo upgrade issues, as well as within some Sandstorm apps which also use Mongo inside containers not intended to be user servicable.
One of the issues we hit is here: https://github.com/meteor/meteor/issues/11666 in which if you happened to have a Mongo database over eight years old (many Sandstorm servers have been deployed for that long!), you needed manual intervention to correct it, even if you had done intermediate version updates in between.
Meteor patched around this issue... but after dropping support for several releases of Mongo. So we essentially need to build our own automation which understands and can export old Mongo databases, and then import new Mongo databases, while shipping a Meteor app that can only run on one or the other, which has to auto-update smoothly, and recover from failure like if there isn't hard drive space to handle the process.
And then we also need to implement that within app sandboxes which can also arbitrarily terminate so that also has to recover well and we need to ship the logic to do this with every Mongo-backed app package until the end of time.
Looks that way to me... I really liked a LOT about MongoDB, I don't think they had a good story for administration of scale + redundancy. RethinkDB, Cassandra and others have a ring + redundancy model which I think is easier to deal with. Where Mongo at least was limited to replication or sharding, and was a serious pain to deal with from the admin side imo.
Understanding the advantages and shortfalls is sometimes harder too. Mongo having multiple secondary indexes is pretty nice as well. Haven't dug into FerretDB, my first question is if it will run over the top of CockroachDB, as that's how I would probably want it configured. Then, I'm not sure if I wouldn't just use the JSONB surface in PostgreSQL/CockroachDB directly.
I think it just really depends on how/what you need to accomplish.
This sounds like an interesting hybrid implementation, and I understand its motivation. However, if distributed writes at low latency with the highest consistency guarantees are essential, combining those two layers (Mongo and PostgreSQL) isn't worth the complexity. It looks like you can administer it with both Mongo and PostgreSQL tools, but do you want to? Clusters, partitioning, replication? That's a lot of heavy lifting.
To fully disclose, I have a biased view on this given that I work for a (closed source) serverless, no-ops DB provider (Fauna) that implements a distributed transaction engine that is natively document-relational and doesn't compromise on relational (ACID, transactional) guarantees. Although not directly comparable to an OSS DB, The market expects software and services to abstract as much complexity as possible. It is hard to imagine how such a Mongo/Postgres hybrid could handle hyper-scale apps that require highly consistent distributed writes.
Thanks for the feedback. Any managed Postgres-compatible database which offers autoscaling works with FerretDB. You don't have to manage the Postgres side if it is already managed [1].
One example would be Yugabyte which is not yet officially supported, but was tested by Yugabyte with FerretDB. Same goes for CockroachDB, which was also tested to some extent. Neon would work, too.
On the other hand, FerretDB may not be a solution for all possible use cases out there, but no database is.
if ferretDB truly does provides ad-hoc queries, indexing, aggregation, and real-time data processing, making it well-suited for a wide range of applications, then sure , it’s a great alternative . I can see all types of web apps even enterprise systems working well w it.
We already offer basic support for aggregation and indexing, and we are adding as many features as we can, see our roadmap [1].
We are mainly building on our user's experience with running FerretDB and add features to our roadmap accordingly.
We are not aiming to implement the entire feature set of MongoDB, of course, but the majority of MongoDB workloads are not utilizing the full feature set, either.
That’s easy, really. There was a person behind ferretdb-bot account – me. :) I still maintain our projects mostly manually.
That being said, we do have some automation in place. The public part is there: https://github.com/FerretDB/github-actions We are planning to do more there, open source the other part, and then blog about it.
This looks great. I had been using Document Layer for FoundationDB but Apple has abandoned the Document Layer for the FoundationDB ; so this looks promising .
Tigris(https://github.com/tigrisdata/tigris) is one of the supported FerretDB backends and Tigris is backed by FoundationDB, so you can still have the Mongo interface with the reliability and scaling of FDB if that's what you're looking for
Somewhere there's an actual implementation that copies everything to /dev/null, with the justification that if you're using mongodb, you probably don't care about your data anyway. Can't seem to find it.
Er, why can’t you run a MongoDB replica set on your M1?
I mean, I wouldn’t recommend running a ReplicaSet on the same host on any production host (it defeats the purpose), but for testing, I’ve run a sharded cluster w/ 3 replicas per shard…
I don’t think it is that matter whether a license is OSI compliant or not. The more important one is whether it is a Free Software License compliant like GPL3 preferably.
"Free software" is not coterminous with copyleft. It includes copyleft and non copyleft free licenses like BSD. Even Stallman would happily call BSD- or MIT-licensed stuff free software.
The OSD was carefully chosen to incorporate what was already known as free software.
One thing that I didn't see discussed was why FerretDB uses Postgres as a backend and not a storage engine. Perhaps I am just missing something though.
Because PostgreSQL Is fantastic Open Source database engine, very popular with a lot of operational experience and tooling support.
This is Open Source way of doing things - having project focus on as narrow problem as possible (first) and leverage as much of existing componets as possible.
Ok, that makes sense. I guess this project is targeting a compatible frontend for multiple backends. When I saw MongoDB alternative, I thought they were building both sides of the coin.
Cramming an api for one product on top of another is absolutely not “the open source way” I’d expect a much better explanation for this Frankenstein’s monster. This seems like it would work but never scale.
Just a few of the things you need to build if you don't do it this way. 1. A storage layer, 2. A network protocol with clients and servers, 3. A replication system, 4. an authentication system. 5. Administrative tools.
If you roll everything yourself, you pretty much have to reinvent everything PostGres has except the SQL and relational aspects.
When MongoDB came out it was sort of the anti-SQL database, so it is interesting that this flavor of the same kind of database is backed by postgreSQL.
Our development roadmap [1] is based on the needs of the users and the software we are focusing on building compatibility with.
Interestingly, transactions are not commonly used and therefore it is not high on our priority list, it may take months until we get to it. Nevertheless, if enough users would request it, we would reconsider. The same goes for any other item on the roadmap, please check it out.
Another day on the orange site where people don't understand that you can 100% have relationships in a NoSQL database - just don't try to do it the same way you do in a "relational" one or you're going to have a bad time. Storing everything in one collection is critical.
I’m confused, to me it seems that MongoDB is also open-source. Their new license is basically just a better agpl. I only point this out because the blog post makes a point of saying it’s not open source.
By definition, SSPL is not an open-source license [1].
One could argue that the SSPL brings justice to open-source, as cloud providers will not be able to monetize a project without giving anything back. SSPL forces them to pay a license fee.
On the other hand, this also creates a vendor lock-in situation for the user, simply because not all service providers will be able to negotiate a deal with the developer of the SSPL-licensed product. This limits choice, limits competition between providers, among other things.
This license fee may also be increased at the sole discretion of the developer, and the increased fee will be paid by the user in the end. This all doesn't sound open-source to me at all.
MongoDB’s source code is still freely available. It’s still actively developed in the open on GitHub. Unless you’re offering MongoDB as a service, it’s just as “open source” as ever.
If you want to offer MongoDB as a service, you can still do so free of charge, as long as the service infrastructure is also made openly available, right? And if you don’t want to make the source available, you can purchase a license and do so, right?
Furthermore, if you’re using MongoDB, be it self-hosted, with a vendor, or even some proprietary database that implements the wire protocol for compatibility, you’re likely using MongoDB-developed clients/drivers, which are Apache 2.0 (“OSI-approved open source”).
So it seems like the only way to be locked into a vendor is if you’re using MongoDB drivers to connect with a 3rd party database that doesn’t fully implement all functionality of MongoDB in a compatible way… Right?
I could be wrong, but as someone who contributed to MongoDB as an open source project, and was later hired by MongoDB based on said contributions, it kinda hurts to see the OSI’s “MongoDB isn’t open source anymore” campaign work so well.
That said, I sincerely wish the team behind this project all the best!
My complaints aren’t against anyone in the open source communities I’ve known and loved. Just this self-important legal organization that acts like it controls (and even gets to define) open source software.
P.S. I left MongoDB in 2015 due to a neurological disability, but it was one of the highlights of my career, with so many kind and brilliant people. But it’s also been a while, and my brain doesn’t work so well these days, so please correct me if I got anything wrong!
> If you want to offer MongoDB as a service, you can still do so free of charge, as long as the service infrastructure is also made openly available, right?
The license text is worded as such that it is basically impossible to comply with.
> OSI’s “MongoDB isn’t open source anymore” campaign work so well.
It is hardly OSIs campaign. Pretty much all major organizations involved in FOSS licensing have rejected sspl. For example this is Fedoras stance:
> Fedora considers the Server Side Public License (v1) to be a Non-Free license. It is the belief of Fedora that the SSPL is intentionally crafted to be aggressively discriminatory towards a specific class of users. Additionally, it seems clear that the intent of the license author is to cause Fear, Uncertainty, and Doubt towards commercial users of software under that license. To consider the SSPL to be "Free" or "Open Source" causes that shadow to be cast across all other licenses in the FOSS ecosystem, even though none of them carry that risk.
> "Unless you’re offering MongoDB as a service, it’s just as “open source” as ever."
It's not. It's source available, and that's still proprietary.
The reason why the SSPL is not Open Source is pretty simple: it doesn't convey the four essential freedoms of free software [1]. That's it, there is essentially no more to it: it imposes usage restrictions, and that's orthogonality against the spirit.
Whether it is "except this or that" or "AGPLv3 but with this additional clause" is irrelevant: even the tiniest change can cause significant differences, and this is the case: it removes the freedom to run the program as you want, as it imposes restrictions to some use cases. And these restrictions go beyond the realm of the software itself.
In contrast, Open Source copyleft software, like AGPLv3, never go beyond the software itself. It provides guarantees that modified versions of it also remain available for users of modified software (forward carrying guarantees) but do not add a requirement to also provide under the same license other unrelated software (which is essentially a nice way of saying "simply don't do this", turning de facto into a usage restriction).
"You can do so as soon as you open source infrastructure" is impractical and misleading. It is very likely part of infrastructure will be commercially licensed so even if one would want to open source it, it would not be possible.
MongoDB Specifically switches from Open Source License to SSPL to create monopoly in DBaaS Space. IT is business decision so lets not pretend here.
If you look at Real Open Source software it is created for cooperation and innovation together, not monopoly.
Er, if you develop the infrastructure to host MongoDB, you should absolutely be able to open-source that infrastructure. I mean, before MongoDB, I wrote a cluster management system for virtualized software security and hypervisor research, and all of it was either open source or something I wrote…
Also, if you bought something closed-source to sell MongoDB as a service, why isn’t it realistic to buy a license?
Your suggestion of a monopoly in the DBaaS space seems to preclude the existence of other databases… Or am I misunderstanding?
I’m not sure what you mean by “IT is a business decision” — could you elaborate?
-edit-
P.S. I’m trying to be supportive here; not trying to take anything away from what you’ve built with FerretDB! Honestly, there’s room for so room for innovation in this domain, and it’s nice to see new projects…
The cloud provider wouldn’t have to, if you are the one running your infra. The SSPL restrictions only apply to businesses that offer MongoDB as a service.
In fact, you can build a similar service and offer it within your organization (and subsidiaries), and you still don’t have to release anything. The license only applies to companies like Amazon if they offer MongoDB as their own service (DocumentDB).
I know that’s a bit tangential, but hope that helps a bit?
Personally, that blog post is just arrogance from OSI. I read it. It didn’t make clear which freedoms were being removed. They are still able to run cloud hosting just release their custom code?
And OSI thinking they alone get to decide what is and what is not open source is arrogant.
I think that arrogance is when a single vendor tries to single-handedly redefine open-source to fit their business needs better. Not just a license, but the definition itself.
A quote from the MongoDB CEO: "MongoDB was built by MongoDB. There was no prior art. We didn't open source it for help; we open sourced it as a freemium strategy". [1]
Whether the OSI was arrogant or not, I really don't want this person to define opensource.
I am not really sure what your issue is. What you think open-source as freemium isn’t a model?
If it wasn’t for copy-left licenses that forced companies to release code that they used to create their products we wouldn’t have Linux in its current state.
In a world, where many products are cloud based it seems fair and within the current model of open source to force code sharing.
> They are still able to run cloud hosting just release their custom code?
The SSPL license reads:
> […] you must make the Service Source Code available via network download to everyone at no charge, under the terms of this License.
> “Service Source Code” means the Corresponding Source for the Program or the modified version, and the Corresponding Source for all programs that you use to make the Program or modified version available as a service, including, without limitation, management software, user interfaces, application program interfaces, automation software, monitoring software, backup software, storage software and hosting software, all such that a user could run an instance of the service using the Service Source Code you make available.
You just can't comply with those terms if you don't have access to the source code of your storage software, for example.
(but I'm not a lawyer, of course)
> And OSI thinking they alone get to decide what is and what is not open source is arrogant.
No, the term “open source” was in use before long before the OSI, and it was in popular/hacker culture in the 1980s. UNIVAC used it in for a major system in the 1950s. [1] I used it in 1993 (I wrote a small BBS).
The OSI looks and sounds like an authority on open source software, but their entire strategy is legal, political and quasi-philosophical. I get how easy it is to be mislead by them though — they’re good at spinning things and rewriting history.
There are 3 "competing" Open Source and Free Software definitions - from OSI, Free Software Foundation and Debian. MongoDB does not match any of them and most importantly does not match the spirit of Open Source Software Movement.
I could name more, but let me clarify something. I’m not a fan of the SSPL, but I get why it was necessary.
It was rough seeing huge cloud providers profit off open source projects without giving anything back. When they offered competing hosting services with no value added (well, past “integrated billing”), no contributions or innovation, and drove their new customers to the documentation and libraries of the companies backing these projects, they crossed a huge line.
And it’s not just MongoDB. Or Elastic. Just look at all the “services” AWS offers, and note how many AWS actually invented or even contributed to…
Monopolistic practices forced a lot of companies to either shut down, or find a way to survive. I’m glad MongoDB decided to use the SSPL instead of shut down like so many others. I’m glad they’ve continued to thrive.
Changing to the SSPL isn’t ideal, but it only impacts people who want to sell hosted versions of the software (not users, self-hosted or otherwise). For those infinitesimal few selling hosted versions of the software, it doesn’t even stop them from doing what they want — it just stopped the monopolies from destroying something a lot of people dedicated a lot of effort to... That seems like a pretty amazing feat to me, given the reality...
I wish the OSI wasn’t so successful painting users of the SSPL as somehow betraying the open source community. And I wish the SSPL wasn’t necessary. But until there a better option, I’m ok with the SSPL…
Again, I say this with all due respect, and this is just my opinion. Corrections and new perspectives welcome!
Well, In my opinion this is exactly how Open Source is suppose to work - you get benefit of collaboration on writing the code and promoting your software by broad community, you give more value to your customers because they have a choice of vendors rather than single vendor lockin but also as you give up on having monopoly it is well possible someone else will be making more money than you on your product.
You mention Elastic - do not forget it was built on top of Lucene, capturing most of the value in that project.
It DOES very much impact users because users increasingly want DBaaS experience and if the only one you can get is through MongoDB or MongoDB authorized partnershp it is really no different than proprietary software.
In any case I agree for certain users SSPL is just a good as Open Source, same however can be said about Proprietary Software - some who just "buy subscription" do not care.
Please state in what way they don’t? This is what I’m confused about. Apgl is open source but a sspl isn’t? They seem to be aimed at solving the same thing, which is cloud/server based code modifications.
SSPL goes way beyond AGPL as it contains an additional clause called "Offering the Program as a Service". It is not defined at all what constitutes providing MongoDB as a service. How many layers are needed to abstract MongoDB in a way that it doesn't trigger this clause? There are no answers for that in the SSPL. It comes with an enormous amount of risk compared to AGPL. There is a good article from Dor Laor at ScyllaDB which explains this in detail [1].
Moreover, cloud providers are not limited to AWS, Azure and GCP. Smaller providers whom we talked to are not able to negotiate licensing terms with MongoDB the same way as how AWS could. For this reason, these providers are not able to provide MongoDB as a service. Yes, it's great for MongoDB that they were stopped from providing MongoDB for free, but now they can't provide the service at all. This limits competition and choices, and that is never in the favor of users.
AGPL is basically the same thing, except far more reaching. AGPL, to my understanding, makes it a requirement if you allow people to use the software via network levels. “Program as a Service” seems a lot more limited in scope.
All new licenses come with risk since the decisions can only be made via court decisions.
> AGPL, to my understanding, makes it a requirement if you allow people to use the software via network levels.
AGPL requirements do not trigger if you don't modify the source code. The relevant text is in section 13:
> […] if you modify the Program, your modified version must prominently offer all users […]
See also [1], [2], [3].
> “Program as a Service” seems a lot more limited in scope.
AGPL scope:
> The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work.
Basically, the software itself and build scripts.
SSPL scope:
> “Service Source Code” means the Corresponding Source for the Program or the modified version, and the Corresponding Source for all programs that you use to make the Program or modified version available as a service, including, without limitation, management software, user interfaces, application program interfaces, automation software, monitoring software, backup software, storage software and hosting software, all such that a user could run an instance of the service using the Service Source Code you make available.
As noted in another thread, that's not only beyond the scope of the software itself – it is beyond the scope you _can_ relicense and arguably beyond the scope of the copyright terms themselves.
> As noted in another thread, that's not only beyond the scope of the software itself – it is beyond the scope you _can_ relicense and arguably beyond the scope of the copyright terms themselves.
This is not true. People who are saying this would have said this about GPL when it first came out.
Because the nature of software development and usage has changed massively since GPL was created. If GPL was created today it would most likely be like SSPL. The fact, GPL states "derivative work" must be released under a similar license shows to me that their intent was if you build anything where this is the core you must share that.
Well, AGPL did no allow to create Monopoly in practice - Compose.io, ObjectRocket and others could offer alternatives to MongoDB Atlas without any permission.
This was benefit for users but obviously not something classical corporaton would enjoy
It was a benefit for classical corporations not users. These companies can still create alternatives. It’s just they’re forced to share how they did it. It’s that forcing of sharing which is at the heart of GPL.
SSPL license text doesn't set any limits on the scope for source release. Verbatim quote:
> “Service Source Code” means the Corresponding Source for the Program or the modified version, and the Corresponding Source for all programs that you use to make the Program or modified version available as a service, including, without limitation, management software, user interfaces, application program interfaces, automation software, monitoring software, backup software, storage software and hosting software, all such that a user could run an instance of the service using the Service Source Code you make available.
Basically it is saying you need to release the source of every tiny bit in your stack and then some.
I think “including, without limitation, […]” applies to the breadth of components, not depth, right? I mean, I’m not a lawyer, but that seems to be what syntactic context and logic indicate… no?
If you disagree, could you indicate the relevant text?
Your interpretation ignores the sentence “all such that a user could run an instance of the service using the service source code”. That just means they need to be able to run the service. Not have the exact same stack.
Let's say you decided to run MongoDB as a service, and you are fine with releasing Service Source Code under SSPL. The problem is – you can't. You don't own the source code for your NAS and can't relicense it. Nothing in the text says, "relicense all you can, and then you are fine". Clause 13 puts requirements on software that are completely unrelated in terms of copyright.
> I am confused. If you use a custom storage system then you would have the code for it. No?
Nothing in the license text says that the storage system has to be custom.
> The idea you need to release code for the os, file system, network routers, etc is absurd.
I agree with you that this is absurd. But that's what the license text says.
I suggest you read section 13, not MongoDB's FAQs or other explanations of how it works. The actual legal tech is very different from what you seem to think it is.
Er, I could be wrong, but I think you’re missing the scope set in the first sentence:
If you make the functionality of the Program or a modified version available to third parties as a service, you must make the… SNIP …programs that you use to make the Program or modified version available as a service…
I’m eliding for clarity, but the NAS doesn’t make the program available as a service. The code that accesses the file system on the NAS to offer your service? Probably need to release code that calls fread/fwrite/NtFileX in your infrastructure code.
I get that it sounds vague and everything, but the FAQ also clarifies none of this applies unless you’re competing and targeting third parties. If you’re one of the few companies who want to do that, your legal team can formalize the line of demarcation.
Apologies, I hate defending the SSPL, but I can’t think of any better way to stop the monopolistic and EEE practices against open source projects. If anyone has a better solution to protect the freedoms of open-source developers, please, please publish it!
No, full list is in the license, but it’s only those components that they provide MongoDB as a service to end users. Doesn’t penetrate OS abstractions AFAICT, but I’d check the license and/or consult an expert if starting a relevant business!
FWIW, just realized it doesn’t even apply if deployed internally (within an organization and/or subsidiaries)…
Writing the proper queries proven pretty tricky whenever GROUP BY/JOINs were involved, so I used online converters between SQL -> mongo query.
But then I realized that PostgreSQL has nice JSONB type (supports indices for subfields). So I put all the mongo data into tables with a single JSONB column (or two columns id+data, if you prefer).
Turned out that was much faster to run analytical queries :)
"document" is just row
"collection" is just table