> Relational databases are great, but they're almost an optimization -- they're way more useful after the problem set has been well defined and you have a much better sense of data access patterns and how you should lay out your tables and so on. But a lot of times that information isn't obvious upfront, and mongo is great in that context.
I think it's exactly the other way around. I prefer to lay out tables in a way that reflects the meaning and the relationship in the data. Only later, if there is a bottleneck somewhere, I might add a cache (i.e. de-normalize) to better fit the specific data access patterns that were causing trouble.
I think if your understanding of the domain is complete enough that you can map out a relatively wholesome definition of the relationships of data then you're probably right.
I think the advantage the parent is talking about is when you're exploring a domain that you do not have as deep of an understanding in. Schema-less data storage can be helpful in that exploration, as it allows you to dive in and get messy without much upfront consideration. Then afterwards you can step back with what you've learned/seen/experienced and build out your conceptual model of the problem domain.
Thanks for spelling that out, I see that it can be a useful exploration tool.
For me though, the act of thinking about a problem in terms of data and relationships helps a lot in exploring and understanding what I'm dealing with. Even more so in an agile setting, where things can and will be changed often - the schema is no exception, it can be changed. No need to cast in stone the first schema you came up with.
But that's just my preferred way of approaching the problem :-)
I share your preference, but I am always curious how others do things. IMO the datastore is the most important piece of a CRUD app, because it is the foundation that everything hangs off of. So from my point of view tracking changes to its structure is extremely important in avoiding major headaches. How do developers manage this without a schema definition? Mongo's popularity has always made me second guess my assumptions about how important I think this is.
See I disagree. As a developer the core of a system for me is not the datastore but the code. I have my domain model in code e.g. User.java and it gets stored transparently and correctly to the database using the ORM every time I save it. The ORM will never compromise the integrity of the database by say trying to store a Date in the wrong format.
So you have to ask yourself what is the schema getting me ? It doesn't help with validation or business rules enforcement since I do that in code anyway. And it doesn't help with integrity since in the case of Java my data is strongly typed so no chance of storing types incorrectly.
Thanks for sharing your point of view. I'm still not convinced that it's much easier to change schema with MongoDB. Do you mean before there is any production data that needs to be kept indefinitely? If that's so, then I get it. But if you have some production data and your data model changes, you have two options:
1) Don't migrate any existing data. That means you must forever handle the old form of data when you read it from the db.
2) Actually migrate and transform existing data in the database to the new data model. In which case, it seems easier to do in RDBMS because schema changes are at least properly transactional and you actually now what exact schema you're migrating from.
Additionally, with the relational database, you simply don't have to make as many decisions about how to store your data early on because your data model is less dictated by your usage patterns (because you have joins and other tools at your disposal). In my eyes, that's a big advantage that relational databases have, even for prototyping.
You're putting far too much importance on the DB, when really, it's an implementation detail. It has nothing to do with your app, really, and letting your app be a slave to a schema is an anti-pattern. In the end, all a schema really is a fancier linter.
This is one of the advantages of Mongo. No schema, no longer an issue.
I'm fascinated by this perspective because for any business, it's the data that is treasured more than anything else. For example, as a bank I can afford to lose all the app code but I cannot afford to lose a record of my customers and their balances. Therefore, I would never see data as being inferior to my app. The app can be rebuilt from logic; data not so easily.
In my perspective I don't want my app to even touch the schema. That is not the app's job.
It also means that just because a dev decided that the user object no longer needs the DOB field that that field will be discarded. Even scarier, what precisely happens in those situations varies from implementation to implementation. Someone who is handling the database directly will think many many times before deleting any db column. Even then, he will take a back up. I don't see the same discipline among developers when dealing with the same data, just abstracted via an object programmatically.
I would certainly be happier using a schemaless database with a strongly typed language.
I guess it comes down to where you want your dynamism: in the app code, or in the persistence layer. Using a highly dynamic language with a schemaless database feels very unsafe to me. Similarly, using a strongly typed language with a relational DB is sometimes a pain.
I wonder, when one makes a large change to one's app code in the "strongly typed language + schemaless db" scenario, what happens to the existing data that was persisted before the change, which is now in a format that is incompatible with the new model code?
I'm used to writing migrations that change the schema and update the existing data if needed, all inside a transaction.
I just had an ah ha moment, thank you for that. It looks to me like the trade off is deciding where to put business logic based on what needs to access it. The setup you describe sounds great as long as all access to it is through your ORM. I usually assume access from multiple applications is going to happen at some point.
We solve this by creating a schema definition for mongo collections, from within the app layer. Now we have easy version control where we want it, not in sqldump files like we used to. That was painful.
Most full stack frameworks have a way to create migrations. When the schema changes over time, you just get a mess with mongo, thats just my observation. Its far easier to manage change with db migrations.
You'd be right except that we also have the ability to create data migrations based on schema changes over time. I'm glad it's not built into MongoDB or else we wouldn't have the (fairly ideal) solution we built to work with it today.
During early prototyping, when I'm still figuring out the problem and how the user thinks, that's when I try not to worry too much about a schema. For me a data model is kinda seductive; of course it's necessary when you know what you're doing, but it draws my attention away from the messy uncontrollable users.
IMO you can't generalize every problem domain by specifying schema first approach.
I developed a real time analytic solution where the system was suppose to receive unstructured web data. And each client of system can send any type of data & generate unlimited number of reports with unlimited number of fields and all of this in real time. When I say real time, I literally mean all processing was supposed to be done in sub-second. Above all the size of data was 100s of GBs every day.
None of RDBMS would have modeled this problem with the efficiency like MongoDB did.
Most of the web related things like analytics, CMS systems or when you need high performance and horizontal scalability, its hard to beat Document DBs like MongoDB.
I found it to really suck with horizontal scalability and high performance. We are paying 3k per month for mongo hosting because its such a pain to manage and it performs so poorly at just 150gb of data that we need 3 shards which I find incredibly ridiculous. I would pick postgres tables with JSON fields over mongo any day of the week.
Since most companies now are doing Agile development there isn't the big upfront design process where the data model is clearly understood at the beginning. Instead you have this situation where the schema is continually evolving week by week and hence this is why schema less systems can be appealing. It isn't about performance.
I would argue if you don't know how your data is going to be used then you should use the most flexible option - a normalised relational database.
This gives you the most flexibility when querying, whereas with a denormalised database you need to know how you're going to query it ahead of time. Unless you want to abandon any semblance of performance.
IMO, this is the worst argument. There are multiple schema evolution tools for SQL, there's nothing stopping your team from changing the schema every week - plus, it's not hard, certainly less hard than having to maintain code that deals with multiple document schemas at once.
Rails-style migrations (did they invent them? I have no idea. I currently use Sequel's implementation) allow you to change the schema as often as you want. I often write half a dozen of them over the course of a feature branch. The rest of the team merges them in, and when they boot the app they get a message saying they have migrations to run. It's always been supremely reliable, even after complex merges. It gets more involved when you have multiple interacting DBs, but what doesn't?
You have to write sensible migrations of course, or you can get into trouble. This is a feature.
Obviously wide-ranging schema changes may require you to rewrite a lot of your app code, but I don't see how that's different for a schemaless database.
My bigger worry is that every "schemaless" app still has a schema, just one that isn't well defined, similarly to how an app with a big mess of flags and boolean logic still has a state machine representation, it just isn't codified, or even finite or deterministic for all the developers know.
The point is that if you using an ORM and have domain classes then it is unnecessary and annoying step. You have to make changes in two places rather than just one. Most people using MongoDB I know are seasoned enterprise Java developers and we have used many schema management tools for the last decade. It is a giant relief to be finally rid of them.
IMO, it sounds like the wrong remedy for the right diagnostic. I would never throw away the database because I'm duplicating information between ORM and domain classes. This seems more related to the particular constraints imposed by your codebase/architecture than the database.
Right now I'm writing a REST API that will be consumed by web and mobile apps. It would impractical to duplicate validation across all codebases. Rather, I'm leveraging the database checks to get form validation on all clients for free. The application layer is thin, adding a field amounts to adding one line and running a command.
I believe it boils down to which component rules the system: application or data layer.
I agree, but this is something else. The parent was talking about evolving schema. You're talking about what is effectively unstructured data. In this case, the main concern is being able to store the data first, and figuring out how to deal with it later, at the application layer, after you've added intelligence to deal with the new schema(s).
The point is that rapid prototyping and rigid database hierarchies are diametrically opposite.
If you can maintain a flexible sql database, thats great. However, my experience has always been that the 'normalised databases are good' crowd either a) are DBAs trying to remain relevant or b) people who have never actually done that in a project; because its not flexible and dynamic, its performant.
It depends on your problem domain; and servers are so ridiculously overspec'd these days (linode $20/month is a 2GB of ram machine) that performance optimisation is severely less important than rapidly developing functionality.
anyway NoSQL or SQL you'll still have migration issues if you change the way your application consume datas.
if you have an array of category strings in a document and then you decide you prefer categories to be a dictionary with title keys, you still need to migrate your old datas. NoSQL or SQL same thing.
I think what made MongoDB interesting at first place is the use of JSON for storing documents,and the aggregation framework.
then you realize simple SQL queries are almost impossible to write as aggregate, so you end up writing map/reduce code which is slow and gold forbid uses javascript.
At first you think it solves the impedance mismatch problem,then you realize MongoDB has its own datatypes and you still need an ORM anyway because at some point you need to implement "single table" inheritance because your code is OO.
Now about perfomances. They are good yes, but only if you use "maybe" writes.
Now in my opinion, CouchDB does far less,but the little it does ,it does it better than MongoDB. curious about CouchBase though.
The only reason i'd use Mongo is when using nodejs,cause frankly Mongoose is the most mature ORM on the plateform,and that's quite a bad reason.
> most companies now are doing Agile development ... you have this situation where the schema is continually evolving week by week
I haven't seen that level of churn in any of the agile places that I have worked and it is not an inevitable consequence of agile working. If you don't want schema churn, then don't do that.
I just finished the Commonwealth Saga + The Void Trilogy by Peter F. Hamilton, loved it. Both set in the same universe, the first two in the relatively near future (about 300 years from now), the trilogy in the far future (about 1500 years from now).
The trilogy can get a bit abstract, verging on the fantasy genre at times, but in the end the author manages to give an explanation that "makes sense" for everything, which is the thing I really like about his stories. Highly entertaining.
It's a bit of a dubious explanation of The Void, to be honest (although the "Think of it as an 8-dimensional onion" kind of makes up for it), and the ending is a little weak, but overall I enjoyed the series a lot. The alternating of space opera/fantasy chapters is also done pretty well, and keeps things interesting.
The pollution comes IMHO not only from having the timestamps in the commit (and to fix that the idea of using git-notes that came up in a sibling comment seems to be a perfect fit), but more from trying to put too much in a single commit.
This could easily be fixed by using a feature branch, do nice self-contained and easy to read commits in the branch, then attach the time-logging and billing information to the merge commit (using git-notes).
I wholeheartedly agree and share the frustration. This only detracts from web apps in favor of native apps.
At this point a solution (or a way to mitigate the problem) would be something like phonegap. You still have to endure the pain of deploying the app (app stores, customers that don't upgrade, etc.) but at least you have to only support the browser that phonegap integrates (webkit, if I recall).
I find Javascript depressing. If it had a few more years to develop, it could have been a really great language. I understand why it wound up that way, but, sigh.
If you want to see a much better language with the same general design, look at Lua. It's made for scripting C programs rather than web pages, and it has had over a decade longer to mature.
Lua has been able to make major, reverses-compatibility breaking changes to improve its design in ways Javascript hasn't. Where Javascript has "The Good Parts" and incrementally improving implementations, Lua has been able to fix things and evolve.
Lua (without JIT) is also one of the fastest non-JIT, non-native compiled languages there is. LuaJIT is also one of the faster JIT languages. Now, I'm not sure how it compares to the popular JS JIT engines, but from what I've read, its very hard to beat LuaJIT for performance.
I've talked and written recently about why "just pick[ing] up Lua or Python" was not an option, but there are other strong reasons that was not in the cards.
Think about what mid-1990s Lua and Python were like, how much they needed to change in incompatible ways. The web browsers would never have tolerated that -- you'd get a fly-in-amber-from-1995-or-1996 version of Python or Lua, forced into a standards body such as Ecma (then ECMA), and then evolved slowly _a la_ ECMA-262 editions 2, 3, and (much later, after the big ES4 fight) edition 5.
Interoperation is hard, extant C Python and Lua runtimes were OS-dependent and as full of security holes, if not more full, than JS in early browsers, and yet these languages and others such as Perl were also destined to evolve rapidly in their virtuously-cycling open source communities, including server-side Linux and the BSDs (also games, especially in Lua's case -- Python too, but note the forking if not fly-in-amber effects: Stackless Python in Eve Online, e.g.).
JS, in contrast, after stagnation, has emerged with new, often rapidly interoperating, de-facto (getters, setters, array extras, JSON) and now de-jure (ES5) standards, the latter a detailed spec that far surpasses C and Scheme, say, in level of detail (for interop -- C and Scheme favor optimizing compiler writers and underspecify on purpose, e.g. order of evaluation).
The other languages you cite have been defined normatively over most or all of their evolving lives entirely by what their C implementations do. Code as spec, single source implementations do not cut it on the web, what with multiple competing open- and closed-source browsers.
You have to compare Lua at 1995 (version 2) to JavaScript at 1995, as its development, at least inside the browser, would be arrested the same way JavaScript's was. Lua 2 was certainly had a much better implementation than the first JavaScript, but I would not call it a better language (didn't have JavaScript's annoying quirks, but also didn't have closures, for example). 2000's Lua (5.0 and 5.1) is quite different from its earlier incarnations.
I'm mostly thinking about how many of the problems highlighted in "Javscript: The Good Parts" have been fixed in Lua, while Javascript can't be fixed. Not a matter of design and taste, but outright bugs.
That's what I try to do, which turns it into a crippled Ruby. And there's no getting around it silently doing the wrong thing if I ever forget a "var" or "===".
For what it's worth, there are tools that can help you avoid the bad parts. JSLint won't let you forget a "var" or "===" ... and if you use CoffeeScript, it's not possible to forget "var" or "===", because there aren't any.
I'm not saying that there aren't better programming languages than Javascript. I'm just saying that there is a subset of Javascript which is really expressive and elegant.
That elegant, expressive subset is basically Lua. It's been able to jettison most cruft over the years.
I'm not a web developer, but every time I read/use Javascript, it feels like a broken fork of my favorite language. Javascript could have been that good, too. I like where Eich was going with it, but the browser wars etc. meant that shipping an early version made the most business sense, and design errors (which would have shaken out) got frozen in the spec.
It's not the case that the Web dooms us forever to use JS as it was in 1995.
That is simply false on a number of JS-specific points, but more generally: the Web's deployed-browsers-with-enough-market-share-to-matter intersection semantics moves over the years. It does not stay still or grow only compatibly. Bad old forms (plugins, especially, but also things like spacer GIFs used in pre-modern table layouts, not to mention old JS versions) die off.
So, cheer up! JS can't be Lua, but it doesn't need to be. Its job is to be the best it can be according to its species and genus, its body plan. Which may be related to Lua's, but which was not and will never be the same as Lua's, because JS and Lua live in quite different ecosystems.
A concrete example: Lua has coroutines now, but JS is unlikely to get them in the same way, interoperably. Some VMs would have a hard time implementing, especially where the continuation spans a native method activation.
This is a case where Lua's single-source implementation shines, but that's just not going to happen on the web, in variously-implemented, open- and closed-source (the latter including clean-room, for fear of patents) browsers. So, we're aiming for Pythonic generators at most.
If we go further than generators, I'll be surprised. Pleased too, don't get me wrong. However I doubt it will happen, because some on the committee do not want deeper continuations than generators, since greater than one frame of depth breaks local security-hat-wearing reasoning about what is invariant in JS's run-to-completion, apparently single-threaded execution model.
I think it's exactly the other way around. I prefer to lay out tables in a way that reflects the meaning and the relationship in the data. Only later, if there is a bottleneck somewhere, I might add a cache (i.e. de-normalize) to better fit the specific data access patterns that were causing trouble.