I've been doing ORM on Java since Hibernate was new, and it has always sucked. One of the selling points, which is now understood to be garbage, is that you can use different databases. But no-one uses different databases. Another selling point which is "you don't need to know SQL", is also garbage. Every non-trivial long-lived application will require tweaks to individual queries at the string level. The proper way to build a data layer is one query at a time, as a string, with string interpolation. The closer you are to raw JDBC the better.
Oh yeah, another bad reason for ORM: to support the "Domain Model". Which is always, and I mean always, devoid of any logic. The so-called "anemic domain model" anti-pattern. How many man-hours have been wasted on ORM, XML, annotations, debugging generated SQL, and so on? It makes me cry.
Couldn't agree with this more. The way I like to put it is "ORMs make the easy parts slightly easier, but they make the hard parts really hard". But I don't care that much about improving the easy parts, because they're already easy! Yet I can remember plenty of times when ORMs made my job a million times more difficult and all I was doing was fighting with the ORM itself (and usually at the worst time, e.g. when a query reached enough scale that it "fell over").
Query builders aren't exactly the same thing, but I love this article, https://gajus.medium.com/stop-using-knex-js-and-earn-30-bf41..., and it's made me a huge fan of Slonik and similar technologies. It does 90% of what I want a query builder or ORM to do (e.g. automatic binding of results to objects, strong type checking, etc.), but it lets me just write normal SQL. I also never understood this idea that SQL is some mystically difficult language. It has warts from being old, sure, but I'd rather learn SQL than whatever custom-flavor-of-ORM-query-language some ORM tool created.
> but I'd rather learn SQL than whatever custom-flavor-of-ORM-query-language some ORM tool created
This point is worth emphasizing!
When you application carries non-trivial business logic, it will create non-trivial DB access patterns.
The developer will either use SQL or some other language that, pragmatically speaking, translates to SQL. But either way, they will learn some DB expression.
I'd rather use a standard. Existing familiarity, easier to find docs, support, and is probably more bug-free, etc.
Another reason to use standard MySQL? ChatGPT is really good at writing, and modifying, SQL queries and the code that injects input into it.
You don't really have to write SQL anymore, you just tell chatgpt what you want changed and it works most the time. The times it doesn't, it gets you most of the way there.
The problem is the ORM requires you to know its language, the underlying SQL for that database, as well as how the ORM maps to that underlying language.
It’s the worst of a leaky abstraction, and like 3x the conceptual overhead. But it’s better because … something something OOP.
> It’s the worst of a leaky abstraction, and like 3x the conceptual overhead.
I really think you hit the nail on the head with this sentence. For example, another commenter said "But the same SQL fanboys who believe every developer ought to memorize tri-state truth tables..."
Yes, I think it's fine to argue that the boolean logic with respect to NULL in SQL is a pain in the ass. But ORMs certainly don't hide that from you! The ways DBs treat nulls is a fundamental part of nearly all RDBMSes - if you don't understand that, you're (eventually) going to have a bad time. All ORMs really do is make it harder to understand when you may need to take special null logic into account.
OOP isn't an anti pattern, OOP as a _silver bullet_ is an anti pattern. Almost all of my programs end up to some degree multi paradigm of OOP, functional, and procedural.
I call that post-OOP. The outcome might even be very, very similar to what you might have implemented as an OOP believer, but the assessment has flipped: instead of assuming yourself "did I use enough OOPisms" you ask yourself "could it be better with less?"
I'm not gonna argue against OOP in general here, but in the context of ORMs, some of the most important characteristics of OOP (encapsulation and inheritance, especially) are the source of many anti-patterns.
Keeping methods that operate on data together with the data is the biggest problem with the Active Record pattern, and it's good that some new ORMs (like Ecto) are inspiring others to try other alternatives. Those alternatives invariably eschew the exaggerated use of method/data coupling and code-sharing via inheritance/mixins, among other things. They are either functional or procedural.
Not to mention that a lot of OOP-heavy codebases are already replacing a lot of OOP-heavy ORM code with procedural (Services) or functional-inspired ("Railway Oriented Programming", Trailblazer operations).
Sure, there are still classes being used. But I don't think it's fair to call "just using classes" as really as OOP, when they're used as structs, and especially in languages that make class usage mandatory.
I never said otherwise. “Lovely” and “the ORM is free of issues” is orthogonal. The issues are why there’s a lot of different alternative ways of writing domain logic in Rails.
I agree. We'd be better off with data structures and functions that work on those data structures, and packages/modules/namespaces (with public/private exposing) to do encapsulation.
OOP comes with the banana-gorilla-jungle problem, which I consider the fundamental issue with it.
This stems from the fact that SQL itself is a very leaky abstraction: each RDBMS implementation is different, and the common part is shallow. There is no way to optimize a query in an RDBMS-agnostic way.
Actually, ORMs are worse than a leaky abstraction: they border on cheating! They actually abstract the easy, boring part away, but they run with their tail between their legs as soon as the issue becomes interesting.
> They actually abstract the easy, boring part away, but they run with their tail between their legs as soon as the issue becomes interesting.
This is the biggest offense, IMO. It's not just that ORMs are a leaky abstraction, it's that they're just a bad abstraction--leaky or not. We don't really need an abstraction to make easy things easier, and we certainly don't need/want an abstraction that makes hard things even harder!
There's a famous quote that always comes to mind when I encounter stuff like this. I've seen it attributed to several people, so I'm not positive, but I think it was from Larry Wall: "Make easy things easy and hard things possible."
The analogy I use is SQL is like two-wheeled motorbike - fast, nimble, elegant. Yes, it may be dangerous. Yes, you may need to wear a helmet. Yes, it cannot carry too much load. But it gets you very quickly very far, if you have skill!
Now, ORM sees deficiencies and presents you... two motorbikes welded together side-by-side by some iron sticks. They call it a "car". Four wheels are better than two, right? Two engines can push more load, right? No helmet needed, right?
But does it really become a "car", or it is just has most of deficiencies of the motorbike plus some spectacular new ones? And no benefits of a car whatsoever?
Granted, I haven't dove deep into Django, but from what I learned scratching the surface, it's slow, bloated, utterly un-Pythonic, and I just don't understand how it got so popular. Yeah, it's the most batteries-included web app framework for Python, but it just sucks to use.
Yep, fair fix. My favourite bit of Django metabollocks is when they mix in an override for __new__ on models, that won't let you instantiate an instance if the Django "app" the model or any of its dependencies belong to isn't in django.settings.INSTALLED_APPs.
I'm sure they had a good reason for doing this at some point, but I'm just trying to write a unit test for something that consumes a FooModel, no DB access needed, just a small test that doesn't involve starting up all the Django bits, but Django says no, best I can do is creating an SQLite DB and creating all your tables on it.
I have toyed with doing my own metabollocks hackery to circumvent this, but it just makes everything even more fragile.
(Oh, the Django test-runner won't discover tests that aren't subclasses of a Django test class, so no writing a unittest.TestCase for you! You might need Django in it!)
> I'd rather learn SQL than whatever custom-flavor-of-ORM-query-language some ORM tool created.
Me too! That's why I use an ORM that lets you write queries in SQL, like... Hibernate.
This narrative seems to come up frequently. People make (good) arguments against certain features of ORMs and then conclude it's an argument against ORMs in general. The boundary between databases and codebases is very broad. But ultimately you work with objects in your code, and relational tables in your database, so there's going to be some sort of Object Relational Mapper in between.
> Me too! That's why I use an ORM that lets you write queries in SQL, like... Hibernate.
Yeah, this criticism of ORMs strikes me as off-base. I feel like any ORM worth its salt gives you the tools to write raw SQL as needed, which can also be combined with the query builder the ORM uses.
I use Rails/ActiveRecord and have very few complaints. When I need to do something simple, the convenience methods work great. When I need a complex query, I just write it.
There's a learning curve here, where you have to understand when to "break glass" for writing queries directly (either for better performance reasons or because an ORM lacks the expressiveness required).
I think a lot of ORM hate comes from people who never got deep enough to really hit the sweet spot. They (correctly) note that the ORM abstractions aren't suitable for every query, but (IMO) they disregard the benefits that they provide for most queries. If something like 90% of my queries can be generated by an ORM, I'm feeling pretty good about that - it's a lot more maintainable.
> I think a lot of ORM hate comes from people who never got deep enough to really hit the sweet spot.
This. And also, the feel of using Rails/ActiveRecord resonates with me in a way that no other ORM I've used does. It's not just the ORM but the tooling around it esp. wrt generating migrations, scaffolding and so on.
> There's a learning curve here, where you have to understand when to "break glass" for writing queries directly.
Yep. ORM is a tool. The craftsperson needs to acquire skill both in using the tool and in knowing when to reach for another tool.
Maybe some people feel ORM is never the right tool. Who can argue with that? I'm curious to see how they do it. Maybe I'll learn something.
> ultimately you work with objects in your code, and relational tables in your database
Yes, you can map a record in a DB table to a class, and as long as you are happy with retrieving records (and related records) and mapping them to classes, an ORM will be all you need. But SQL is not just limited to the columns of a table, you can join, define new columns in your select statement, group, use aggregate functions etc. etc., and then you quickly get into territory that the ORM doesn't understand anymore. So the point that ORMs take away a lot of the flexibility that SQL gives you still stands...
This is not an objection to ORMs, except perhaps very simplistic ones. The ORMs I'm familiar with let you map queries to arbitrary objects. I use CTEs, joins, groups, aggregations, custom columns, etc in SQL queries (with Hibernate) all the time.
But... At that point what does hibernate even do for you? You can just use a lighter query wrapper at that point, or almost even roll your own wrapper around jdbc directly. The latter is a bit more work but it might be worth it depending on project since it gives you absolute control.
I'll take a query wrapper that's designed so I supply queries directly and it does the work from there (binding to objects) over using hibernate that way any time.
Having seen quite a lot of 'roll your own' solutions for wrapping queries: they were almost all unstable, inefficient messes that have a tendency to pick up lots of ad-hoc functionality. I'm not particularly fond of Hibernate but at least it's well tested and efficient for what it does and does things in a standard way so you can mostly look up stuff in the manual instead of guessing your way through a homebuilt framework.
> depending on project since it gives you absolute control
Absolute control to hunt down your own bugs, instead of using something that is an industry standard and any competent programmer jumping to your project can be up-to-speed with it instantly..
Very few developers are competent with hibernate, many fewer than with sql. Writeback cache nature of hibernate makes it very complex, order of operations on the app won't match order of operations in the DB.
Even an "industry standard" framework can get in the way. Given the constraints of executing efficient queries; handling errors and timeouts on the application's terms; and enforcing concurrency and transaction handling policies even a "good" ORM can be more a problem than a solution.
All of those will be worse off by hand-made “solutions”. Especially that you are not forced to use ORM for everything. Use it for like 90% of your generic db usage, and use its in-built escape hatches for the rest, like native queries or go straight to the db driver if needed.
I agree that ORMs are bad, but disagree in that I think that SQL is also bad--it has this bizarre natural-ish language syntax that is inconsistent and often difficult to reason about, with a huge number of keywords (including many that are noise-only). I would welcome a simple, modern alternative to SQL. I think the fact that people keep building and using ORMs is evidence that SQL is difficult to use.
In general, one of the thing I hate most about SQL is "the order of clauses is wrong" (should really start with the FROM clause and end with SELECT, see https://news.ycombinator.com/item?id=21150606), and PRQL fixes that.
And if that's the biggest perceived problem with SQL, then SQL isn't going anywhere.
It can't just be a little better than SQL, especially in terms of syntactic aesthetics; it has to be substantially and provably better than SQL to the point where companies are willing to pay for retraining, folks are willing to buy books and classes about it, and database engine developers have to consider it worthwhile to implement.
I don't see PRQL coming close to reaching that bar.
But at that point, it’s no longer SQL, it’s something better. That seems like a very nice point in favor of Clickhouse! But then again, Clickhouse is not a RDBMS, right?
I always hated ORM and saw it as an anti-pattern since I first used it many years ago for these exact reasons.
I remember back then, devs would look down on me for voicing my critiques about it.
There are many tools and techniques which I am forced to use today as a dev which are also anti-patterns.
The unfortunate reality of our industry is that most of the popular tools which software devs are using suck and are encouraging anti-patterns. The thing about the tech industry is that there are a small number of well-connected developers and ex-developers who call all the shots in this industry and they suck.
Or maybe you haven't taken the time to learn those tools and techniques and this is why you think they are anti-patterns. The people who set the trends in the development industry are extremely smart.
Take for example ORMs - people already know SQL and are often too lazy or un-motivated to learn their ORM of choice in depth. So for them the ORM is confusing and SQl just works. But an ORM eliminates a ton of code that needs to be written and maintained. Boilerplate code to read a row from DB than map it to an object or a map. Boilerplate code to do DB inserts and updates. Optionally boileplate code to track changes. Any decent ORM allows one to bypass it's mechanisms and in the edge cases write SQL directly.
It's essentially the same argument people used countless times in the past - e.g. why learn a higher level language, if they never work and I can do anything in C.
That's usually a good comeback since people who don't like something tend to avoid using the thing but because of the way the tech industry works, it's not possible to avoid the tools so I actually became an expert at them even though I hate them. I use some of these tools daily and have been using them for years to the point that I'm basically an expert at them. They suck. They're over-engineered (introduce more problems than they solve), have too many dependencies, slow down development speed, etc...
Anyway, these days I know to shut my mouth and cash my cheques... Software development work pays by the hour after all.
Of course everyone is free to have their own opinion, but for good reasons the opinions of the people who create new ORMs, frameworks, languages, OSs, lead big complex projects at big corp are generally valued more that the opinions of those who see dev work as nothing more than a way to make a buck.
And yes ORMs and other tools used in software industry often look over engineered if one doesn't want to understand in depth the problems they are solving, but wants to quickly solve a business problem.
> The people who set the trends in the development industry are extremely smart.
Sure. But, why do the trends keep changing, then? Why don't these smart people just get it right once and for all?
It's because, frankly, the people who set the trends are often wrong. E.g., Java-style OOP with deep single-inheritance chains is not even considered good style IN JAVA anymore.
They're just celebrities, and we're (as a group) just as susceptible to group-think and cargo-culting as people who try every fad diet because some celebrity did it.
Well still, one could argue that if (hypothetically) scientists had been smarter than they were, there would have been less back-and-forth. There may have been less collective effort wasted on scientific dead-ends. It would have been a good thing.
I think where the comparison fails is that science is far more tolerant of alternative theories than coding is. Coding is almost a mono-culture. The choice of tools and acceptable approaches is really very limited and there isn't much tolerance for alternatives; this is especially problematic given the fact that none of these tools and techniques are backed by math. Coding tools and techniques are backed entirely by subjective opinions concerning productivity across a highly inconsistent range of environments and these opinions are riddled with biases and conflicting business interests.
Programming ideology is built on flaky foundations (vague, inconsistent and biased beliefs about efficiency and maintainability) and yet there is intolerance towards alternatives. This is at odds with science which is generally built on far stronger foundations supported by mathematics and yet is more tolerant of alternative theories.
That counter argument sounds reasonable at a superficial level, but let me explain why it's really not the same thing at all.
For context, first you said: "Or maybe you haven't taken the time to learn those tools and techniques and this is why you think they are anti-patterns. The people who set the trends in the development industry are extremely smart."
This is pretty clearly an appeal to authority fallacy. It directly implies that calling a currently-popular programming technique or tool an anti-pattern is evidence of a lack of understanding (not taking the time to learn them). It's not assessing whether there is good evidence or arguments about the value of said tools and techniques, and asserts that the techniques are (at least very likely) good because "extremely smart" people said so.
Then, I asked: "But, why do the trends keep changing, then? Why don't these smart people just get it right once and for all?"
Your rebuttal, to my honest reading, is that hard science theories also change over time and those theories are also constructed by "extremely smart" people; therefore my incredulity over software engineering trend-setters should be no higher than my incredulity over basic science theories.
Your argument fails for multiple reasons.
First and foremost, scientific theories only change in the presence of empirical or logical (by which I mean hard or formal logic, not "logic" like we use when arguing politics at Thanksgiving) evidence. Software engineering is not science (and real computer science has nothing to say about programming practices out in the real world), and has almost no means by which it can test any hypothesis with the same rigor as the basic sciences. At BEST, you can do surveys and case studies of software defects in some very fuzzy, statistical, sense, which would put it much closer to the social sciences like psychology, than hard sciences like physics. But, even those kinds of studies are exceedingly rare. The VAST majority of the time, someone just makes a good sounding argument for or against a technique and we either buy it or we don't.
You're fooling yourself if you think there's any comparison between real science and software engineering.
Next, you're actually somewhat making my point for me. If even basic scientific theories evolve over time, then why would you be so confident that the "extremely smart" people who push for certain programming techniques are right about them, and that the skeptics among us just don't understand them? Surely you're admitting that the "extremely smart" people are wrong sometimes. Now, to prove a physics theory wrong requires a LOT of work, data, peer review, etc. Since these smart programming trend-setters did none of that for their preferred technique in the first place, it really won't take nearly the same effort to "prove" them wrong--they are in a MUCH shakier position than a scientific theory.
So, yeah, I stand by my claim that the programming trend-setters are mostly just cult leaders. They have no objective foundation on which to build their ideas, unlike science, so it's much more reasonable to question them than to question a currently-accepted scientific theory.
Competency is a function of the human and not the tools.
People who create piles of garbage will do so regardless of what you give them.
Let's not pretend ORM made poor behavior more manageable. It was just a different style mess to clean up.
You give them TDD, they'll create bad broken tests. You give them linters, they'll put in arcane and asinine rules. You give them git and they'll have branching that looks like the Hapsburg family tree with commit comments like "fix".
You aren't going to fix behavior with tools. It takes time, discipline, probably a bunch of therapy...
This argument has been presented forever to excuse bad tools, and it is a strawman. Of course good tools don't fix bad behavior. The argument in favor of good tools is that competent people with good tools are insanely more productive than competent people with bad tools.
I don't agree because a large part of competency is choosing the right tool for the right job.
It's like giving Novak Djokovic a hammer instead of a racket and still expecting him to win tennis tournaments.
I bet he invests a lot of thought and effort into selecting his racket, his shoes, his clothes, his food, his coach, his sponsors, his medications, his strategies... Without the ability to choose his tools and techniques, he would be nothing.
... And yet that's exactly what we expect from developers.
If you're advocating for tool fetishizing and rewriting things by chasing hype cycles, then that is part of the problem.
The world is awash in half baked tools that are extremely popular until people somehow collectively realize it's a waste of time.
The frenzied mania of mobs shifting from framework to framework every 6 months is like the world of top40 radio brought to programming; relentlessly promoted crap that just disappears silently to be replaced by more irritating crap with a different version of the same problems because there's always a new generation of teenaged programmers who think they've fixed everything.
The people who follow and fall for it are part of the trainwreck programmers I'm talking about.
Really I want to stop the burnout and churn of the industry. I strongly believe the decrease in quality is because senior people tend to exit. We're 45 years into the microcomputer revolution and 65 or so years into career programming yet whenever I go to a startup, conference or meetup, the "in industry < 5 years" outnumber the old by a significant margin (except for sysadmin stuff).
It's an intrinsic cultural disposition that reinforces itself and we're going to continue to be stuck in this interregnum until we can somehow unmoor ourselves, collectively, from it.
I think a cessation of tool fetishizing and substitution with tool skepticism will help.
There's a bunch of them. I'm opinionated on physical real world actual concrete versioned implementations, not theoretical amorphous relationships existing in pure thought stuff, which is how a lot of these projects "documentation" is written - as if the code lives in abstraction like some improvised jazz scatting.
That practice is pure cult bullshit.
But yes, plenty of orms are fine, as long as it's a convenience. It has to empower the user, not just set up blockades in order to adhere to some ideological purity. That's once again, cult nonsense
It had nothing to do with competency. It was just comparatively more sucky to maintain regardless of competency. If the coders were low quality, it sucked more then low quality ORM based code. If they were high quality, it sucked more then high quality ORM based code.
Thank you for saying this. I don't consider myself some kind of 10x dev. Hell, most days I feel like a 0.1x dev. But, sometimes I can't help but feel like "everyone else" is doing it wrong.
The cargo cult has been wrong many times in the past, yet if you go against today's cargo cult, they act like you're insane. Do we not learn our lessons? Or at least accept that maybe today's "best practices" won't always be considered as such?
Let's keep in mind that if any of us had said anything about OOP being overrated in the 90s and 00s, we'd have been laughed out of the room. It was "obvious" that OOP was the best way to write reliable, maintainable, projects. Now, it's almost the opposite: it's super cool to write a blog post shitting on OOP (while almost never actually defining it...).
Likewise, it was "obvious" that statically typed programming languages were just tedious and getting in our way, so let's write everything in Python, Perl, and JavaScript. Except that now, most people seem to be in the opposite camp: it's all about Rust, Swift, TypeScript, etc.
I hate ORMs. Every single one I've tried feels like a bad abstraction and doesn't really do what I want.
For added, unrelated, controversy, I think Java's checked exceptions are a good language feature, and I absolutely don't care that the creator of C# did an interview 12 years ago declaring them bad. I'm vindicated every time I see someone praise Rust's Result type, or when I see someone write a TypeScript or Kotlin library that implements a Result/Either type. We're moving back toward statically declaring how a function may fail and trying to force the caller to deal with it. It's exactly the same as checked exceptions.
I have also seen some laughably bad and inefficient code written in non-functional languages just for the sake of writing in functional style, when it does absolutely nothing to actually increase the quality of the code/project. FP will absolutely be ridiculed in ten years, the same way OOP is being ridiculed today.
In fact, there are many smart people, and relatively few of them have their manager's ear, while management also listens equally to a bunch more not-smart people pointing to the popular smart people, saying "these people are smart", ignoring the merit of the position of other smart dissenters.
In short, it's all a big train of reasoning based almost entirely on the "appeal to authority" fallacy.
if it works and it's stupid... it's not stupid. If companies around the world can get stuff done with "anti-patterns" then, by definition, they don't "suck".
I'm all for doing better and trying to improve my "craft" but I'm not going to look down on decisions that work.
For a technology to not work at all it has to be really really shitty. Anti-patrern means pattern that is bad. If the tech didn't work at all it wouldn't be a pattern at all.
You could write your website in handcrafted assembly and it would work, but it would still be really stupid.
I for one an looking forward to the advent of a frontend framework where every endpoint would be handled by a (possibly) handcrafted web assembly module.
You can drive a nail with a brick if you try hard enough. Does that make it not stupid to go looking for a brick while there's a hammer right there in your toolbag?
Nope. If it works and it's stupid, it's still stupid.
Just because companies can get stuff done in dumb ways doesn't mean those ways aren't dumb, it means that by brute force, sunk cost and obstinacy they can make things happen. There are any number of technical and non-technical dumb things in any large organization. You definitely can't assume something isn't dumb just because a company does it.
Noone is looking down on ORMs because they work, people are looking down on them because they don't work well enough to justify their downsides in most use cases. That's why people call it an anti-pattern. Yes you can get it to work but in general it's going to make things worse not better.
Now are some things that people call "anti-patterns" actually the right thing in certain circumstances? Absolutely. Just because something sucks in one situation doesn't mean it isn't the right choice in another.
> Nope. If it works and it's stupid, it's still stupid.
Successful businesses with successful and profitable products say otherwise.
The fact that stuff gets done means that it's not dumb, at a base level. You may not like it - and a lot of people don't because it removes the need for overengineered, oversmart and other "solutions" when sometimes the best solutions is literally the "stupid" solution.
> Noone is looking down on ORMs because they work, people are looking down on them because they don't work well enough to justify their downsides in most use cases.
A decade of experience in a few companies I've worked at and every company I interview with says otherwise. EF is top notch and removes a large swath of "boilerplate" code and works more than well enough.
I've had much better experiences with something like EF than I have with stored procedures and triggers and other "better" solutions.
> Now are some things that people call "anti-patterns" actually the right thing in certain circumstances? Absolutely.
See? you even admit that "anti-patterns" are the right patterns "in certain circumstances". The only thing you disagree with is how common those circumstances are and for the vast majority of circumstances? ORMs are a lot better than "custom" layers underneath.
Nothing worse than creating a custom way of doing something that's already done. Inserting your own bugs. etc.
Smart programmers don't solve problems that are already solved. I wouldn't create a PDF parser from scratch (outside of learning excercises). I wouldn't create a Word Document parser. XML parser. etc...
Why would I create a data layer management package when I have better things to do with my time that actually adds value to a company?
I guess you have never had to come in and solve a bad situation that has arisen specifically due to the bad short term decisions someone made in the past.
Not all tech debt is bad, as long as it is recognized when being created and gets accounted for in near future planning. That way it is a deliberate decision taken, after weighing pros and cons, knowing it will need to be dealt with at some point.
The worst of these is when those making the decisions don't know as much as they think they do, forgo weighing pros and cons, and paint them selves into a corner by accident.
Those kinds of situations can literally kill a company (and have).
There are ORM with good pros/cons ratios out there. Python has espacially good ones:
- Django comes with a clunky and poor performing one, but it's very well integrated, super practical, productive and has nice ergonomics.
- Peewee gives you an ORM in a small package, which makes writing those little programs a joy when you don't need anything fancy, but feel lazy.
- SQLAlchemy requires a lot more investment, but is very flexible, generates clean SQL and has extremely correct behavior. It also exposes a lower level query builder for when you don't want the OOP paradigm and wish to express idiomatic SQL behavior, but abstracted away with Python.
This becomes a standard engineering decision, where you analyze the ROI.
I absolutely hated having to deal with SQLAlchemy. It took me days to get through the ugly documentation to a level where I could more or less competently use it, and for what? The code is harder to understand than just writing SQL and the performance is worse. I have yet to learn of a single advantage of ORMs that don't boil down to developers not wanting to deal with SQL.
Currently have to use sqlalchemy on an existing project, and while I am not necessarily disagreeing with you on ORMs or performance, just wanted to add that chatgpt has been able to reduce my time fighting with their documentation by what I estimate of 90+%. To be fair, if I would have written pure sql, I might not have needed chatgpt at all, but i also would have to learn pythons database connector calls for the umpteenth time.
Yeah, the productivity and performance with the Django ORM is amazing. I have trouble using any other web framework now...
The thing people are missing is relations. The most important aspect of your application is how data is related to each other. A good ORM makes that easy.
> Every non-trivial long-lived application will require tweaks to individual queries at the string level. The proper way to build a data layer is one query at a time, as a string, with string interpolation. The closer you are to raw JDBC the better.
This is bullshit, IME. I've been doing this for over 10 years, and there's not a single one of those "requires a tweak at the string level" queries that couldn't be handled better by actually spending 5 minutes reading the Hibernate documentation and doing what it says to do in that situation. But the same SQL fanboys who believe every developer ought to memorize tri-state truth tables and COBOL-style function names can somehow never be bothered to do that.
> there's not a single one of those "requires a tweak at the string level" queries that couldn't be handled better by actually spending 5 minutes reading the Hibernate documentation
So instead of learning and using widely applicable SQL skills you learn and use Hibernate specific skills? And that’s supposed to be a positive?
Once we're talking about this kind of detailed tuning those "widely applicable SQL skills" are usually tied to a single database vendor.
Using any tool requires a certain investment in learning the tool. If you don't want to learn an immensely popular and widely used library that's in pretty high demand career-wise, well, that's your lookout, but it's a crazy double standard to claim that Hibernate sucks compared to handwritten SQL when you spent weeks tuning your SQL but wouldn't spend hours tuning your Hibernate.
> If you don't want to learn an immensely popular and widely used library that's in pretty high demand career-wise, well, that's your lookout
You’re making the argument against yourself. Even including different dialects an in depth knowledge of SQL is far more valuable than knowledge of Hibernate.
Over a decade ago now I used NHibernate, the .NET version of Hibernate on a project. Since that project I’ve written code in JavaScript (Node), Python, Rust, played around with a little Go and Kotlin.
In all those years the only database engine I’ve used seriously is Postgres. Knowledge of Postgres has been invaluable throughout my career. I can’t remember anything about NHibernate. It’s not even specific to NHibernate either, I had to use ORMs in other situations (like Sequelize on Node) and I haven’t retained any of that either.
> Over a decade ago now I used NHibernate, the .NET version of Hibernate on a project. Since that project I’ve written code in JavaScript (Node), Python, Rust, played around with a little Go and Kotlin.
> In all those years the only database engine I’ve used seriously is Postgres.
Interesting, I would think changing database happens a lot more than changing language for most developers. Over the last decade I've worked on systems that used MySQL, Postgres, Oracle, H2, and Sybase, but they've all been on the JVM and almost all used Hibernate (one used EclipseLink but a lot of the knowledge transferred over - even if something isn't called exactly the same thing, they have the same kind of capability structured in a similar way - much smaller than the differences between databases IME. Indeed even having to use e.g. Django ORM occasionally a lot of stuff transferred over). Spending time learning to use an ORM well was absolutely a good investment, far more useful than a bunch of SQL details would've been.
Like, if a company has a Ruby/MySQL stack, they're going to hire someone whose experience was Ruby/Postgres over someone whose experience was Python/MySQL. So if you're a Ruby dev then you open more doors for yourself by learning Ruby's standard ORM in depth than you do by learning a specific database in depth.
> I can’t remember anything about NHibernate. It’s not even specific to NHibernate either, I had to use ORMs in other situations (like Sequelize on Node) and I haven’t retained any of that either.
I suspect that reflects more on what you paid attention to and were interested in using rather than anything fundamental about how easy or hard ORMs are to learn.
> a lot of the knowledge transferred over - even if something isn't called exactly the same thing, they have the same kind of capability structured in a similar way
At a certain point this is getting beyond parody. That statement sounds exactly like it’s discussing SQL!
> I suspect that reflects more on what you paid attention to and were interested in using rather than anything fundamental about how easy or hard ORMs are to learn.
Nor am I saying anything about anything being easy or difficult to learn. I’m talking about what’s transferable. In my experience ORMs are far less transferrable than SQL knowledge.
I think it’s safe to say we’re going to have to agree to disagree here.
> At a certain point this is getting beyond parody. That statement sounds exactly like it’s discussing SQL!
Nope. The underlying relational model is great. Knowing what tables are like, how and why indexes work (and when they don't), transaction isolation levels, all that stuff is extremely useful transferrable knowledge (and will help you out whether you're using an ORM or not). Knowing the three or four different bizarre pseudo-COBOL variants that different database vendors use to express something like "pull rows recursively from this table" or "format this string using these two numbers and this date" is stamp collecting at best.
Disagree. All too often, JPA/Hibernate dumbs the relational model down into record-at-a-time processing with navigational queries reminiscent of pre-relational network databases and counting and joins implemented by unnecessary fetching and client-side processing rather than aggregate queries and joins. Furthermore, it brings fuzzy locking semantics/locking as a side effect of lazy/eager fetching and artifacts such as a pattern of unnecessary "id" columns due to lack of support or asymmetrical and idiosyncratic handling of compound keys, also extra magic for "many-to-many connectives" (the wording alone is non-relational), lack of attention for decimal arithmetic/overflow with money values, idiosyncratic date/time conversions and mappings of enums into ordinals, etc. etc.
Not my experience at all. The support for aggregate queries and joins is very good if you actually use it; likewise the support for decimals, date/time and enums (though I say that from a position of thinking a lot of database date/time handling is wrong, so take that as you will). I'm not a fan of lazy fetching but if it's something you want then the locking semantics are a natural consequence of that. Many-to-many connectives are indeed non-relational, but again if they're something you want (and again I mostly avoid them) then they work the way that they have to work.
Changing databases on an existing application doesn't happen, but I've worked across teams within a single company for ~8 years and in that time I've used:
1. SQL Server
2. MySQL
3. Neo4j (very briefly)
4. PostgreSQL
5. DynamoDB
Each of those was for a different set of applications, and none of those applications changed database, but point being sometimes an engineer will be made to use a variety of databases in their career, even sometimes within the same company (although you could also chalk this up to a particularly laissez-faire style of tech direction).
In our case company was strictly "Oracle-only", but one team did a quick implementation in startup-style of Rails and used mysql. No one forced them to migrate to oracle, company just hired DBAs that now mysql.
Changing the database for an existing system happens very rarely, but changing the ORM or language is even rarer. So learning the ORM well is just as rewarding as learning the database well, IME - and more so once you take into account changing jobs, since you're more likely to take another job using the same ORM but a different database than vice versa.
I've used many different databases. They're all very similar I've not had trouble adjusting. Every ORM I've used has required more time to learn. I'm not against ORMs but I've yet to see a situation where one was a net benefit. Aspects of ORMs are great but going all in tends to add significant complexity.
I'd argue that means it's doing its job quite successfully! The point of using an ORM is so you don't have to also be a DBA on top of writing code. If I have to remember a bunch of implementation details and program weirdly because of the leaky abstraction that is an ORM, then the ORM has failed. If I grab all the rows from the database and then WHERE clause them in my code, instead of querying the database with a WHERE clause, and then wonder when my program's slow, that's on me. But if the ORM makes it straight forwards not to, so I don't need to grok the database, so I can focus on the other problems, then I'd consider it a success.
what's there to remember? NHibernate you basically use Linq, so if you remember Linq you remember NHbernate. The only extra bits are the setup and mappers, which are sort of trivial. The pain points come around because NHhibernate decides to lazy load everything or ends up doing crazy joins. But overall I found NHibernate not too bad, it certainly was fast doing a lot of basic CRUD type stuff.
However, these days I use Dapper.NET and SQL. Dapper gives a nice mapper for parameterized queries and maps results to types. I think it's a nice middle ground.
That's the same debate as whether you to hand craft xml files or have a library do it for you. You'd better know in details how the format works, but going through a library will help cover most common issues you wouldn't want to deal with by hand.
Except that an XML file that passes the spec and contains the required data is as good as it will ever get. The world's foremost XML-typing genius is not going to improve on that.
On the other hand, with hibernate, you get mediocre, inefficient SQL, and could produce much better results by focusing your energies on learning the database instead of learning hibernate.
On hibernate: there is a common approach of having pure ORM handling of 99% of the queries and use a lower level query building tool for the 1% that needs it.
The query to fetch your user profile should be mediocre, properly cached, and fully defineable in the ORM. And most of your queries should be straight and obvious joins between tables with no crazy pants handling.
Something in your basic data structure that throws off an ORM would also probably throws off your new employees looking at the code for the first time, and could be fixed for readability and maintainance.
A thought exercice: XML entities can be self referencing, have you ever though of how you validate a document that has recursive entity definitions ?
That's one quirk that comes first to mind, but the XML format definition is probably at leat a few hundred pages and I'm hopeful there's enough in there to fuel anyone's worst nightmares.
It's kinda interesting in itself that XML is seen as a plain and boring data format. I don't wish on anyone to be the receiving end of an XML typing genius' documents.
The XML specification does contain a schema: the DTD. This is why it is 37 pages; without the DTD part it would be at most a half of that.
Other schemas are not merely popular, they are more powerful. In DTD the type of the element depends only on its own name. In XSD the type depends potentially on the context: the type of 'xxx' in 'aaa/xxx' and 'bbb/xxx' may be different. And in Relax NG we can define different types, for example, for the first 'xxx' and for the subsequent 'xxx's. These extensions make the validation somewhat more elaborate, but still linear, as it we remain within the complexity of a regular expression. These are formal validators; then there is Schematron which seems to be more like a business-rule kind of a validator that also has its uses.
I think I was seeing all the XSLT and XQuery and all the kitchen sink directly bound to XML, but those are just meta tools that don't bear directly on the language.
No. I see that as a reflection of the reality of the database; if your database contains rows that violate your domain invariants, what would you expect to happen?
It may not be a bad idea to fail fast by ORM calling the constructor (same way as Jackson does it when parsing JSON).
Broken invariants may propagate and cross system boundaries making things much worse (I have seen a case, when $200M transaction was rolled back in 19 systems because data was not properly validated at entry - it was a crazy day for everyone in production support).
That comparison would only really make sense if people were advocating writing SQLite storage files by hand.
In this situation SQL is the library. It’s the interface that allows you to query and write the underlying data while knowing nothing about the underlying format. An ORM is just a library sitting on top of the library you already have. There’s just as much to learn, it’s just a higher level of abstraction (until it isn’t because you need something low level).
I think that "requires a tweak at the string level" is not precisely true/correct, but the overall idea is true. For any non-trivial, long-lived, application you will at least have to inspect the ORM-generated SQL string for some query and you'll have to either adjust your ORM calls (e.g., add even more magical Hibernate annotations) or use an "escape hatch" to write the query by hand.
The larger point isn't really that you can't accomplish everything with the ORM, IMO, but rather that the conventional ORM value proposition of "you don't have to write/understand SQL" is totally false. You will have to understand SQL and you will have to fiddle with your ORM to generate the SQL you want at some point. But, if you're going to end up fiddling with your ORM and inspecting the generated SQL anyway, then why not just start with something closer to the SQL in the first place?
Phrased differently, we have to learn SQL no matter what--and you won't convince me otherwise--, so why should I have to learn SQL, and Hibernate's APIs and gotchas when I do Java, and Laravel's Doctrine APIs when I do PHP, and Entity Framework when I do C#, etc, etc.
I interpret parent's point as doing the SQL level tuning through the ORM's dedicated mechanisms, and not through a raw string.
Any decent ORM has extra procedures to tweak the resulting query and inject optimizations that it can't arrive at automatically, kicking the ORM away every time there's some tweak to do is counterproductive.
> Another selling point which is "you don't need to know SQL"
It has never ever been a selling point by anyone with an ounce of brain.
ORMs are made for OLTP workloads, not OLAPs (one of the creators of Hibernate have also said so, but I won’t look it up now), and their primary utility is to save you from writing long and error prone insert and update queries, plus they automatically map an sql row to an object. That’s it.
You will reimplement plenty of parts of it if you go the raw way, so it is again, not a zero sum game.
> Every project I've been involved with that got saddled with an ORM was originally justified "so junior developers don't need to know SQL".
The people who made "not knowing SQL" the main reason to use an ORM should not be making the decision to use an ORM. They can only use it incorrectly.
If there are ORMs that document "not needing to know SQL" as a key selling point of their ORM, they should remove those sentences because it's not.
As someone with 15 years experience using ORMs, you absolutely do need to know how your database works to be effective at using an ORM well. Once you do, and you understand how your ORM works, I still think the benefits of a well-integrated ORM into your codebase make it worthwhile. Generally the code becomes easier to understand, modify and reuse if you're using an industry-standard ORM. Like with most good libraries or frameworks, the number of use cases the maintainers encounter usually means you'll find something pre-prepared when you come across a new problem.
To be fair, in our DAL we use Entity Framework (which is kind of just an ORM++) and a good 95% of the queries we do are just straight up linq. The last 5% which use raw SQL are critical paths we found either directly through testing or as our customers systems gradually got slower as the database grew and needed optimisation.
Eventually the Juniors are expected to learn SQL, but until such a time as they do they can still make useful contributions to the code base.
Honestly, where are they even getting junior developers that don't know some basic amount of SQL or can't learn it in an afternoon? It's such a non-issue. What a great way to ensure juniors never gain proper SQL experience/expertise, and they'll be the later senior developers too, which just propogates the problem.
Sorry to hear that you encountered such teams. It tells a lot about their professionalism and experience. There are teams which do not use this kind of argumentation.
I wish. I have it justified to me as "our SQL is not good" as the reason to use ORMs in my workplace. The result is code that primarily centers around the idea of fetching objects into memory and flushing them back to the database. JOINs are not used, you do lookup chains in Java instead. And if you need to update a field, you fetch the entire object into memory (a SELECT *), update one field and flush the whole thing back to the database. That's what an ORM gets you. A plainly wrong way of thinking about how to manipulate data for developers.
It's not that they don't know SQL. They know SQL. The problem is, the only SQL they know is fetching objects by primary key/some other condition because that's the only thing an ORM does well. The SQL table could very well be a dumb key-value store because that's all that their code treats the SQL database like.
Charitably, they probably meant the process by which the parameterized query string itself is built, in the case where you need a dynamic query structure.
I swear to god I'm going to write a SQL library some day that doesn't take strings as arguments. Although LINQ kinda already does what I'm thinking about, so maybe I just wait for someone to port it.
"With the new SqlQuery method, you can now write your own SQL queries and map the result to any type that you want.
This is useful for tasks that are not directly related to the day-to-day business of your application. For example, this can be used to import data, migrate data, or to retrieve data from a legacy system."
Sooo it’s dapper with string interpolation override to make queries parameterized so you don’t need to manually put the parameters on as 2nd arguments?
> Sooo it’s dapper with string interpolation override to make queries parameterized so you don’t need to manually put the parameters on as 2nd arguments?
Pretty much. C# does some really cool innovations around string interpolation which proves very useful for logging, SQL query etc.
You can create your own string interpolation handler, which is what the new SQL query does and what several log libraries do. So basically you can use interpolation safely and remain assured that you do not introduce SQL injection errors. Also, this way string interpolation does not prevent query plan caching, as normal string interpolation would do.
You can do it in typescript/es6 using tagged template literals. I have done it before and it was awesome. Seriously wish more languages supported custom variable interpolation for it.
Details. But sure. In truth the best way to do the data layer is to use stored procs with a generated binding at the application layer. This is absolutely safe from injection, and is wicked fast as well.
Having been on the team that inherited a project full on stored procedures (more than once): no thank you, not ever again.
Opaque, difficult to debug, high risk to invoke, difficult to version and version-control properly. I’m sure they could be done “less worse”, but I doubt they can be done well.
Thanks for echoing my pain. I am paying for the bad decisions taken by a bloke who only knew SQL and used that hammer to bang on every nail he could find. Everytime someone asks why the result of the stored proc is the way it is, I need to manually run the steps one by one until I find the point where an expected row failed a condition and dropped off. And have extra fun wrapping those statements with declarations incase you are heavily using declared variables in those statements. To top it off, if you are troubleshooting in an environment where you don't have insert/update permissions, you day just became amazing. Fuck you to those who use stored procs without strong justification (like high volume data processing on database side where it totally makes sense).
By version control, I assume you mean the inability to use different versions simultaneously without hacks, unlike a library built using Java where you get to pick and choose the version you want to use. Because of you mean version control of stored proc sources, that would be just like a normal git asset.
I like stored procedures, or I guess to be more specific functions in postgres, but they have to be done in a really specific way:
1) It's all or nothing. If there's functions then they better feel like a complete high level API, enforcing the consistency of data, perfectly handling locking, etc. If there's tons of complex SQL in the app and then also a some functions then it's better to have no functions at all.
2) They need to have a consistent interface, such as always returning an error flag, error message if applicable. Functions must never surprise the caller. It either succeeds and the caller commits, or fail and the caller rolls back and knows why.
3) No monkey business with triggers. They can maintain updated_at columns, or maybe create some audit trails, but anything beyond that will make behavior surprising to callers and that is the worst.
As for version control it needs to be maintained as a separate application, which the server depends on having certain versions of. Even if you don't use functions you have to worry about schema versions and you can't always run two versions at the same time or roll back schema changes easily as a practical matter.
Holy crap yes. This thread has triggered a long lost memory from over a decade ago where everything, absolutely everything was done through stored procedure. Such a wild, wild waste of time.
> - Are maintainable by a team. "Oh, because that seemed faster at the time."
> - Are unit tested: eventually we end up creating at least structs or objects anyway, and then that needs to be the same everywhere, and then the abstraction is wrong because "everything should just be functional like SQL" until we need to decide what you called "the_initializer2".
> - Can make it very easy to create maintainable test fixtures which raise exceptions when the schema has changed but the test data hasn't.
ORMs may help catch incomplete migrations; for example when every reference to a renamed column hasn't been updated, or worse when foreign key relations changed breakingly.
django_rest_assured generates many tests for the models (SQL)/views/API views in a very DRY way from ORM schema and view/controller/route registrations, but is not as fast as e.g. FastAPI w/ SQLalchemy or django-ninja.
> Opaque, difficult to debug, high risk to invoke, difficult to version and version-control properly.
Traditionally, maybe. With databases like Neon (postgresql) and Planetscale (mysql) supporting easy branching / snapshot it's at least made this a lot nicer.
That might be true, but that’s like saying “now that my mechanic can repair my car for free, it’s way less worry to replace my radiator by removing it with a chainsaw”.
The solution to stored procs being awful isn’t making a whole branch just to see what it does, it’s fixing the problems with stored procedures at the root level.
Though im on the fence with stored procs and have seen complicated messes that make it depressingly no fun to work with I also had very good experiences with systems based around stored procs, not in an abusive dogmatic way. The usual answer I’m going to invoke here is that it depends how the tool is used. Any tool, methodology, philosophy can be borked in various ways, that experience traumatize people that they prefer to move on. Of course some tools are clearly worse or less useful than others but you can find some consensus if you know where to look.
I (softly) disagree with stored procs being definitively the "best way to do the data layer." Stored procs are an extremely powerful tool to have in your belt, but lord have mercy do they have their own tradeoffs.
What I've found is that devs tend to be able to write "OKish" SQLs, but fall over when trying to optimize them, so the DB side can then take over and write optimized stored procs, which can have a separate cadence of code updates and version control compared to the backend team.
Stored procedures often increase CPU load of the DB instances, deploys can be more challenging, and security is only better if one takes care how the procedure is authorized or delegated.
They can save some network round trips, and unify interfaces if there are a lot of different stacks sharing one data store. Though it's not universally better.
Stored procedures often increase CPU load of the DB instances
This is of course true if you're doing a bunch of computational work in your stored procedure (as opposed to just doing pure CRUD stuff) but I'm struggling to think of a real world situation where this might be the case.
Can you name some examples you've encountered? (I'm not doubting you, I'm just curious)
I did some complex expiration calculations in a stored function, it cut down on round trips and kept the app simpler. Though it did cause some modestly higher CPU load.
I've also seen MVs cause CPU spikes where there is a lot of SELECT massaging going on. Even without the MVs themselves certain functions like REGEXP stuff can impact performance. Language of the PL matters too, if you've got choices like Python.
Ah! Thanks. I'm not sure why my brain wasn't making the connection.
Materialized views are almost criminally underused. I feel most people don't know about or understand them. They can be a very effective "caching layer" for one thing. I have used them to great effect.
A lot of times I see people pulling things out of the database and caching them in Redis. When in fact a materialized view would accomplish the same thing with less effort.
> Materialized views are almost criminally underused. I feel most people don't know about or understand them. They can be a very effective "caching layer" for one thing. I have used them to great effect.
They are great if the frequency of changes isn't very high and the underlying tables are read-heavy and running it directly against the tables would be expensive (e.g. complex joins).
> A lot of times I see people pulling things out of the database and caching them in Redis. When in fact a materialized view would accomplish the same thing with less effort.
Typically Redis is used in medium to high volume of many smallish but repetitive queries where you want to store some key bits of information (not entire records at which point going to database might be simpler) that are sought often and you don't want to hit the database again and again for them - a bit different from the materialized view scenario.
False. I've developed enormous systems fully using stored procs for data layer and it's absolutely not the "best way to do the data layer". Next you'll be saying triggers are a good idea.
Yes they can be fast, and they're controlled by the DBA, that's why you'd choose them. Injection is not an issue in modern data access layers, that's a 1999 problem.
If your default stance in 2023 is that you should start with stored procedures, you are doing something very wrong and you are a risk to the survival of your business.
For CRUD stuff if you don’t care about execution plan cache and other optimizations it could save you some time sure to query directly. For chunks of code that transactions/procesing large data directly on the server I’d reach out for stored procedures without thinking too much
it's not like stored procedures are inherently
faster than normal queries, right?
They're doing the same amount of "work" with regards to finding/creating/updating/deleting rows.
But you (potentially) avoid shuffling all of that data back and forth between your DB server and your app.
This can be an orders-of-magnitude benefit if we are describing a multistep process that would involve lots of round trips, and/or involves a lot of rows that would have to be piped over the network from the DB server to the app.
Suppose that when I create an order, I want to perform some other actions. I want to update inventory levels, calculate the user's new rewards balance, blah blah blah. I could do all of that in a single stored procedure without bouncing all of that data back and forth between DB and client in a multistep process. That could matter a lot in terms of scalability, because now maybe I only have to hold that transaction lock for 20ms instead of 200ms while I make all of those round trips.
There are a lot of obvious downsides to using stored procedures, but they can be very effective as well. I would not use them as a default choice for most things but they can be a valuable optimization.
For example you can look up millions of rows then manipulate some data, aggregate some other data and in the end return a result set without shuffling back and forth client/server.
you can do that equally well in a stored procedure and a single query
like, you can write a query which does all of these transforms in sequence, and returns the final result set
the data that goes between client and server is only that final result set, it's not like the client receives each intermediate step's results and sends them back again?
If you’re going to mix multiple queries with procedural logic — eg running query A vs B depending on whatever conditions based on query C, then a stored proc saves you the round trips versus doing it in your app code. That’s all he’s saying.
It seems you don't have a lot of experience or understanding regarding stored procedures.
Obviously if you just take a single query and turn it into a stored procedure then yes, the round-trip cost is the same. This seems to be where your knowledge ends. Perhaps we can expand that a bit.
Let's look at a more involved example with procedural logic. This would be many, many round trips.
I'm not exactly endorsing that example. Personally, I would almost never choose to put so much of my application logic into a stored procedure, at least not as a first choice. This is just an example of what's possible and not something I am endorsing as a general purpose best practice.
With that caveat in mind, what's shown there is going to be pretty performant compared to a bunch of round trips. Especially if you consider something like that might need to be wrapped in a transaction that is going to block other operations.
Also, while you may be balking at that primitive T-SQL, remember that you can write stored procedures in modern languages like Python.
good lord man, the condescension is so thick and rich, it's like i'm reading an eclair, and the eclair is insulting me based on its own misunderstanding of the topic of conversation
That'll happen when you're publicly and confidently wrong. If you scroll up, you'll find any number of posts where folks pointed out your misconceptions in a more kindly fashion. When we factor in the fact that you're wrong (as opposed to my post which is correct, informative, and cites examples) I think many would say your incorrect assertions are a lot ruder and less HN-worthy.
Sometimes folks know more about a given thing than you do. That is okay. The goal is to learn. I am sure you know more than I do about zillions of things. In fact, that is why I come here. People here know things.
haha, man, the absolute _chutzpah_ you need to make (incorrect) accusations like this, even as an anonymous person on the internet, is really breathtaking
i hope you reflect on this interaction at some point
You seem to be firm in your objectively wrong belief that:
a stored proc is just a query saved on the db server, nothing more
Absolutely not. They can contain procedural logic as well. You can do a wide range of things in a stored proc that are far beyond what can be done with a query. Again.... I provided some links with examples. You don't need to believe me.
i hope you reflect on this interaction at some point
If you're just writing a single access app for a DB I agree it doesn't really matter much. Where SPs really help is when you're maintaining a DB that multiple projects from multiple teams access and you can present them a single high-level API to use, so that Accounting and Fulfillment aren't reinventing the same wheels over and over. So it's more about stack management than anything else.
My first thought is to scream in horror as Bobby Tables says hi to "string interpolation". complaining about evil ORMs and then just asking to get taken to the cleaners...
I really feel like a good chunk of the discussions around ORM/not ORM and such are because the core method of interacting with our databases sucks.
Passing strings around is an awful interface, and maybe if we explored the space to fix that, a good chunk of these subsequent problems would be obviated.
Man, you know what you can really just mostly ignore when you’re using an ORM?
Injection
If i’m building an app or an API, I will ORM till I die (as long as it’s Django). If I need anything so much as a groupby, though, i will drop right into SQL or create an aggregate table that my ORM can just do a basic SELECT from
If you’re just worried about injection, you just need to use bound parameters instead of string interpolation. Boom, avoided at the database driver level even with plain sql. I admit though that some cases of string interpolation can be harder to catch in your code than when using ORMs.
Humans make mistakes and we can't rely on them not to make mistakes in software development, yes. But preventing SQL injection is not really an "oopsie.". It's a fairly easy thing to spot and prevent, and none of the code I've ever reviewed had a SQL injection bug in it because string interpolation looks completely different from parametrized queries.
I have no problems with ORMs, but you make it sound like it's as easy for a SQL injection bug to go into production code as a memory bug in C. You'd have to actively be fucking up at that point, and if a Junior dev does it, it's a 5 minute talk and they'll never do it again.
I didn’t say you had to be magic about it in the way of git gut, no.
One can for example hook up a static analyzer to your GitHub and configure it to complain if there are any sql calls that can’t be automatically verified as free from interpolation of potentially unsafe data unless those dynamic calls are marked with a comment annotation as having been individually verified by a human as definitely not interpolating potentially unsafe data. I suspect such an analyzer may already exist as a SaaS offering.
If nobody lies about those annotations, a static analyzer can handle the common cases. If you have a human faking such annotations, whichever motivation causes them to do that would create other security issues even with the ORM.
And the analyzer could even handle some kinds of dynamic sql depending on how aware it is of your programming language and type system and what markings you’re willing to apply to functions and/or data with potentially unsafe and/or safe origins. Being vague only in the service of generality here since the specifics vary so much by company.
If there is no SaaS service for this, maybe one ought to exist.
Who is making those claims? Really? I've used Hibernate probably about as long as you have, and to me: it's fine. Is it perfect? Of course not, but if I have to make the trade-off between Hibernate (or nowadays Spring Data) or rolling my own abstraction to interact with the database, for any kind of non-trivial application, I would not roll my own.
When you roll your own, you know what you're doing: still ORM!
And like with many things: date and time zone math, encryption, writing databases, and ORM, if you can pull in a specialized tool, you can focus on your core business, and don't have to become an expert in a field that's not your core business.
It’s frustrating to see “not needing to know SQL” as supposed reason for using an ORM. If you don’t have a decent foundational knowledge of SQL and relational DBs, of course you are going to have a bad time with ORMs.
ORMs can help you to write less SQL, and less boilerplate for getting data in and out of the database. But pretty soon you’re going to need to debug something, and without a solid understanding of DBs and SQL, you’re going to be hopelessly lost.
This year happens to be the first year when I worked with SQL enough to say that I know my ass from a hole in the ground in the language, and my job title is Senior Backend Developer and I have 11 years in tech and 5 as a developer. My journey with SQL is basically the GIF of Colin Farrell running in circles avoiding the violent American tourist in In Bruges; I have some good qualities as a developer but diligence in learning SQL is not one of them.
I learned SQL many many years ago using Microsoft Access. It has (or at least had, I don’t know how it’s changed) a great visual query builder which would generate and run the SQL for you.
Seeing what SQL it generated, and the data it produced from the diagram I had created, was really helpful in learning interactively. Any tool that does this (well enough) should help a lot.
Most of our projects are mysql, and in those we don't use any db-specific features. Those all use in-memory sqlite for tests. Only in one of our projects do we use jsonb with postgres, that one does use postgres for the tests as well. Though, that project probably didn't need it - we're just sort of stuffing semi-regular data from another system into it. It would have worked just as well as a text field that was parsed as-needed, since as far as I remember we don't query it directly, only query based on other columns in that table.
Yes, that looks like a docker-compose config. But, it doesn't really matter. The point is that you could run MySQL in a way such that its storage (the actual data files) are on a virtual filesystem running in RAM.
It's a nice way to do it, IMO, because I prefer testing against a real DB instance in my projects anyway (rather than pretending that a unit test against a fake 'FooRepository' interface proves anything at all).
I come from Java and moved to nodejs for backend four years ago. I now use sequelize as orm. Most of your pain is due to hibernate itself. It may be the only really enterprise level orm but it is a pain in the ass.
Sequelize is extremely simpler and writing code with it is a joy compared to hibernate.
When you say nobody uses different database that’s true 99% of the time but i worked with a company which developed a tool that must be placed within customer infrastructure and this type of customers force you for the db choice since they have highly paid db support teams (financial sector) so they had to support multiple dbs.
A year ago i had to develop a big Java application without orm (cto’s choice): i didn’t remember how tedious, error prone and slow is development without orm!!! Never do it again!
I think the best approach is to use orm for common crud tasks and add specific sql queries when things get a little bit complicated.
Hibernate is not that bad. The problem with ORMs in the Java web space is lots of people used them as an 'extreme DRY' solution to link database 'objects' to the front-end which breaks separation of concerns and forces you to use the same model or resort to ugly hacks for further mapping.
In a properly separated application it works fine.
> A year ago i had to develop a big Java application without orm (cto’s choice): i didn’t remember how tedious, error prone and slow is development without orm!!! Never do it again!
To be fair, JDBC is an awful, awful, API. In a sane language, with a sane SQL API/library, there's really no reason you shouldn't be able to just pull out a statically typed tuple (with proper handling of NULL, unlike JDBC) from a query result in one line.
I don’t get how people jump from “it doesn’t work sometimes” to “never use it”.
You can write raw sql when the ORM fails. But for many queries the ORM does work, so why not use it?
You don't always know before-hand when, where, or how it's going to "fail". SQL doesn't fail. Sure, *I*, the programmer, could use either incorrectly, but that's not what I mean- I mean that the ORM library is always a bad abstraction that either hides the features of the DB you're actually using, and/or has gotchas and workarounds.
If you use both, now you have to deal with integrating them both into your code in some semi-consistent way. Most (some? just me?) of us would balk if someone said they included two different HTTP request libraries into their project because one of them was more convenient most of the time, but the second one was the only one that worked for some of the requests they have to make.
ORMs are an investment. They're always complex and leaky abstractions that require a significant amount of time to learn and wrestle with. If they only serve to make the easy queries a little bit easier, but you still need SQL for the actually-hard stuff, then it's hard to imagine a scenario where that investment is actually worth it.
> If you use both, now you have to deal with integrating them both into your code in some semi-consistent way
I have no idea what you mean by this. What does it matter to my business logic whether a repository function got data from an ORM or from Raw SQL? In either case, I would return the same kind of data.
The edge of your application, where side-effectful stuff like DB queries sits, is exactly where you don't need to do anything consistently. You're always fetching random stuff. At some point, you need to turn that into a format that your code understands. That doesn't change if you decide to only use raw SQL.
I apologize for not being clear, but it was somewhat intentional because I didn't want to get dragged down into arguing about a specific ORM if I made too concrete of an example.
What I'm thinking of is situations where you might have multiple queries/statements that need to be done together within a transaction. Depending on exactly what your ORM's API is like, it may or may not be easy to run a piece of ORM-using code with a piece of Raw-SQL-using code inside the same transaction at that "application edge".
It can also be especially tricky if the ORM in question is one of those real heavyweight solutions that does caching, "dirty" checks for updates, etc. You have to be careful if you ever mix Raw-SQL functions with ORM functions in a use-case.
So, those are the kinds of things I mean. This probably isn't much of an issue for more lightweight, simple, ORMs, but it's an issue I've run into with at least one of the big, heavyweight, ones.
This is why I moved to jOOQ years ago. Nice lean abstraction over SQL, to the point where it's just a typesafe SQL builder and also does the mapping to domain objects if you want. Abstracts and polyfills SQL capabilities just enough for you to be able to port your queries between DBs if desired.
I'm a big fan of jOOQ. I've been able to implement the most complicated SQL expressions in jOOQ. Though there is one particular use case that it cannot do, but my memory fails me. There is an open issue in Github for it that's been open for years. Luke, the creator of jOOQ, is not able to figure out the solution to implement it. But no worries, it's still my favorite way to access my DB.
I've only had a brief stint in Java in my career, but I got to learn about and use jOOQ, and I think it's such a fantastic option in this space. I'm still a diehard SQLAlchemy fan, and I'd use it in Python-land. For Go, I think sqlc is a decent option, but it's no jOOQ. I'd love jOOQ for Go.
jOOQ is far and away the best library in this class for any ecosystem.
It's so good you would need to think long and hard not to choose the JVM for a SQL heavy application because it's just that damn good and most other library requirements are relatively interchangeable.
I dislike ORMs, but good ORMs are significantly better than none at all. They act as a fairly simple layer between the database and the application, and are great for inexperienced engineers to get started with an experienced engineer helping out occasionally for performance hints.
Furthermore they are a good adapter in general. Converting the way the world looks to a DB and the way the world looks to a web-app is always a time consuming effort.
I would have to say that overall ORMs have saved me more hours than they wasted over the years, with AREL in ruby being pretty much 80% time savings minimum. Sure there are occasions where I just drop down to SQL, but most of the time it is just a win. It is the perfect library: Make simple things simple, and complex things possible. Which includes AMAZING debug logging in rails 7. Like game-changer style.
I've used Hibernate in the past, and I feel like it probably saved me 20% of the time, but required me to learn a complex system. It did have some nice features like caching and such.
The ORM itself matters, I strongly do not believe ORMs are an anti-pattern.
As far as the original selling points... meh
- use any database, never worth even thinking about this. If you switch DBs you will have issues.
- you don't need to know sql. True, but eventually you will need to learn it. But I've seen Jr. engineers get far and use this as a learning experience. I think thinking of the problem from both the Sr. level and the Jr. level is important. I happen to have a TON of db experience and am often the guy debugging sql, even if I'm on the front-end team, but many people simply aren't there yet.
Edit: To everyone saying "just let me write normal sql". I want to challenge you to think of how to extract knowledge of relationships and query patterns into a codebase without constantly repeating the same damn joins. SQL is a bad language, but it is the one we have. And so ORMs solve a lot of SQL's weaknesses. They are _terrible_ for writing custom reports, but are excellent for making structured backends. In the end, that knowledge abstraction you'd make if you wanted to abstract the knowledge of data relationships into a codebase ends up being... an ORM.
We switched database providers and our ORM saved us a ton of effort. Sure we ran into issues due to differences between the DBs, but every single insert, update and JPQL query in our app ported over without issue. The few places where we used native SQL, on the other hand, needed to be tweaked and rewritten. If we were making heavy use of native SQL and stored procedures, it would have been an order of magnitude more time consuming.
I used to be of the same opinion that swapping DB providers isn’t worth worrying about, because you’re screwed regardless. But our company was bought, and forced to make the change, and the ORM made a huge difference.
> One of the selling points, which is now understood to be garbage, is that you can use different databases.
It was a major selling point back in the days. You can say that's now a legacy but it was definitely a thing pre-cloud / SAAS.
Lots of software used ORM to offer multi-database support, which was required when you sell a license and users installed it on-premise. Some organizations strictly only allowed a certain brand of database.
You couldn't spin up a random database of flavor in AWS, Azure or GCP. There were in-house DBAs and you were stuck with what they supported.
Lord I can't say enough how glad I am those days are over. Even with ORMs the RDBMS of the early/mid 00s had plenty of differences so that it was a collosal undertaking getting apps to run on multiple DBs.
Sales team: We've got a huge potential client, but they require DB2.
Eng team: <spends 6 months getting shit to work on the insanity that is DB2>
Sales team: Damn, after all that, the potential client went with another vendor.
Eng team: Please pour gasoline on us, we'll light the match.
I have always thought that ORM with overrides could be the answer. At the call site objects to be extracted/inserted are defined. The implementation checks whether there is DBA written query and executes that or tries to generate something. A fancy DAO with auto-generation if you will.
If you want any sort of database maintainability you just cannot have queries concatenated from strings scattered around code, especially in environments with code hot loading. Otherwise, database migrations quickly start requiring shims for old interface. So ORMs/DAOs are absolutely necessary in any larger application just to maintain (hehe) maintainability.
At this point why not have the abstraction layer auto generate queries? DBA time is much better spent optimizing those "few" queries that do matter for performance than writing thousands of straightforward CRUD queries.
> Another selling point which is "you don't need to know SQL", is also garbage
On one hand, programmers do not need to know SQL beyond basic data extraction techniques. DBA is going to be better than them at the job anyway, even if for the reason that DBAs have access to (and to look at) performance metrics. On the other hand, auto-generator is going to be worse than a DBA too, therefore "you don't need to know SQL" is garbage. We already have to fight SQL engine, now we have ORM layer to fight on top.
Worst mistake I ever made, beside of course the night when we left a fully loaded automatic rifle on the roof of our van, was a move from Tomcat + JDBC to JBoss + Hibernate. It went from fast (JDBC) and stable (Tomcat), to unstable (JBoss) and slow (Hibernate).
It was that period in my life with involuntary dealings with guns and ammo, called the military. I was commander of a security detail and we often went about in our region with our gear and live ammo.
This particular night we had our last stop at about 4AM and went back to base. At base it appeared the UZI belonging to the driver was missing. Headlines like "Boy finds military weapon and empties it at the gas station" is a certain ticket for jail time, so I made sure my concerns resonated with the team. Luckily the driver remembered leaving the UZI on the roof, so we went back to exactly the last spot and spend a good hour searching in that neighbourhood.
I think it was then about 6AM and it was getting light. We were getting desperate and I called another commander to assemble his team to help searching. At that point I figured why not replay the whole event with another gun? And so it happened: when that fell off the roof, it landed just a few meters from the UZI.
As a tool for mapping query results to objects and doing crud, hibernate is fine. Certainly it is much better than writing your own hibernate on jdbc. Which you do eventually because everyone gets sick of typing the same boilerplate over and over.
I think you hit a few key points though - don't try to hide the database flavor and don't use JPQL (just write native SQL).
I do. And I use Hibernate to abstract over mysql and oracle, since I want to keep clients that use both.
You need a little more work than that, but it make it possible.
Hibernate is not the technology I love the most, but it does some non trivial work like managing the unit of work, graph caching and mapping that can be useful. It is also highly prone to be used wrong by devs.
You might not need hibernate, sometimes it might help. It should probably be used less than what you see in the wild, though.
> The proper way to build a data layer is one query at a time, as a string, with string interpolation. The closer you are to raw JDBC the better.
Or use a query builder if a good one exists for your language, e.g. jOOQ for the JVM. After suffering through years of Hibernate I could not have been happier.
Counterpoint: We are in the process of switching from Oracle to Postgres and the ORM (is definitely helping. It's otherwise a pain. We've literally written a query in SQL and asked ChatGPT to give us the equivalent in the ORM (then checked it was right, of course).
Haha, that’s funny. I wouldn’t trust ChatGPT but it’s something I may change my mind in the future of course and would give it a go too if I had to go anywhere near oracle, to me it’s quite unapealling and am sure the project it lends itself to is also ‘fun’
For this use case, I don't have to trust ChatGPT, since I can just read the SQL that the ORM outputs. The project isn't terrible, it's just old. Oracle could very well have been the right choice 25+ years ago.
Big +1 to this. I’ve been using JDBI lately and could not be happier. It is not an ORM, only a lightweight mapper. And it supports all modern libraries and paradigms (Vavr, Immutables, immutable “value classes” …)
I used to have a similar opinion, but today I'm not so much against it. There was a project where I did not have access to the "official" database (Oracle) and was forced to develop against MySQL instead. Granted, it was a reasonably simple application, but still, most of the queries _just_worked_ on both MySQL and Oracle.
I would also argue that for querying you don't need ORM (might be better off with something like Jooq), but for complex updates, ORM can be a godsend.
> is that you can use different databases. But no-one uses different databases.
The main selling point to me was that you could write a lot of wicked fast integration tests on top of H2 and exercise your ORM. But that was already at a point where "the testing pyramid" was a diagram that some of your coworkers had seen and were trying to figure out how to explain to everybody else.
So my desire to write fast backend integration tests never lined up with my job responsibilities + opinions on relative test counts.
> The main selling point to me was that you could write a lot of wicked fast integration tests on top of H2 and exercise your ORM
Nowadays in the age of containers you can spin up the database you are using in production easily in a few seconds. Unless you are stuck with using one of the old commercial monsters like Oracle this is not really worth doing anymore.
> Nowadays in the age of containers you can spin up the database you are using in production easily in a few seconds.
Is that a database for ants? At $dayjob “a few seconds” isn’t even enough to send a dump over the network. Which is a moot point anyway because there’s absolutely no way random j dev will get access to all the PII that’s in the production db.
I think the parent pretty clearly meant that you can spin up an instance of the same kind of database used in production. E.g., if you use MySQL in prod, you can spin up a local MySQL docker instance for tests rather than an in-memory not-MySQL option like H2 or SQLite.
ORMs have other selling points. Primarily, it looks like code and works with tooling designed for code.
So changing a column name or type, for example, is pretty easy to refactor, which may not be true for SQL queries and may take significant testing.
Additionally, ORMs allow you to easily hook into things (post-commit, etc) that is often useful, and to define custom things, like JSON serialization with custom types, without a massive amount of work.
>So changing a column name or type, for example, is pretty easy to refactor, which may not be true for SQL queries and may take significant testing.
You can also get that with raw sql on the right ecosystem. I once developed a simple application using hasql and hasql-th(on Haskell). Very close to raw sql strings but with type checking at compile time. Feels like you're driving a lambo.
I agree with this sentiment, been working against ORMs just as long.
ORMs universally suck in all languages, all projects I've worked on (java, c#, python).
You lose productivity, lose performance, lose visibility of what you're application code is actually doing vs. what it should.
I eventually learned how every ORM was implemented differently, spit out different queries even for the simplest select queries and to tweak them is a wild trip around ORM's limitations reading decompiled ORM code. As an aside, java ORMs present the distinct displeasure of poorly thought out, incompatible opaque annotations.
You gain ORM knowledge that does not apply across different ORMs and also subverted SQL knowledge and insights that are immensely useful as a developer.
When my queries are simple, I don't see any reason to use it.
When my queries are complex, I see strong reasons to not use it.
I actively work to remove or not use ORMs in any project I've worked with. The hardest person on the team to convince is usually the newbie or the manager/lead who never has written a single SQL query.
I'll often use different databases, since it allows me to test everything but the database via unit testing, and I've moved projects from one backend to another with relative ease thanks to it.
But I abstract it out to a class filled with database functions that create specific transactions around whatever application I'm working on needs to do. Not via an ORM.
I'm not sure if you're speaking specifically about Hibernate, but if you are making the argument against ORMS generally, I feel like there are some counterarguments to the points you mention:
> Every non-trivial long-lived application will require tweaks to individual queries at the string level
I suspect that the difference in expectations here is about how many of the queries will need that, and whether the number of less sensitive queries is large enough that it's worth having less boilerplate for them even if some other queries do need to be able to drop into something lower-level. People will quibble about where this boundary is, and maybe the real disagreement is about where to drawn the line between "trivial" and "non-trivial" applications, but I don't think it's quite as obvious a conclusion that ORMS aren't useful in the general case as it sounds like you're arguing.
> The proper way to build a data layer is one query at a time, as a string, with string interpolation
This also isn't obvious to me. I can understand the advantages it provides, but it only lets you move the problem of conversion to native types in your language to outside the query layer, not eliminate it. I think there's a solid argument that splitting up the problem that way is better, but again, it seems more nuanced to me than there being one obvious answer.
> The closer you are to raw JDBC the better
How are you sure that it's impossible to provide any sort of useful abstraction here? There are plenty of other cases where using the lowest-level language possible isn't the obvious correct choice all the time; it doesn't sound like you're writing your code in raw assembly either, so there's obviously some utility in using higher-level abstractions.
“ One of the selling points, which is now understood to be garbage, is that you can use different databases.”
Of course there are many cases where it fails, but years ago we /successfully/ used the django orm to write test fixtures that used an in memory sqlite db for quick tests with a mysql production db.
It is not automatic and not free, but it is absolutely not garbage.
If you sell on premise self-hosted software, sometimes you actually need it. A big client will say, your software uses only PostgreSQL but our admin team can only support MySQL. Can your software use MySQL instead of PostgreSQL.
Saying "yes" can help you close a contract worth hundreds of thousands of dollars.
> The older I get the more I like having having separate data and functions.
Am I missing something or is this just OO but with func(obj) instead of obj.func()? Like the data is your instance but you have to track the types and remember what functions work on what. The coupling is just implicit now.
There is an old joke somewhere about some monk and a programmer who is looking for enlightenment and circling through OOP, realizing it has always been FP, which has always been actors, which has always been OOP..
yes but, when you think some more about that means a lot.
There is no such thing as inheritance, you route your data through different functions. There is no such thing as private data, its all public by default.
You can still have strict types I guess, to make sure you are passing the right kind of data into the right function. But most of my experience coding this way is js and lua.
I only code like this for my home projects, I have no idea what it would be like working in a team of 10 programmers all rushing to jam new features into a giant code base.
> There is no such thing as private data, its all public by default
Not in C, put the strict declaration in the h after and the definition in an implementation source file.
Callers of the implementation functions can pass the strict around without seeing any of the fields. Only functions in the implementation can access the fields which results in all acce to the fields going through the implementation only.
Cleaner and safer than letting the caller see everything while only accessing some things.
Coupling is only implicit if you use a dynamically typed language.
Also, in languages with UFCS func(obj, something, else) can be expressed as obj.func(something, else). The calling syntax can thus remain the same even without defining the function within some arbitrary object.
In the rare case where you need runtime polymorphism it is nice. I also think people over use it (hello C#).
If you don't need it for polymorphism its only syntactic sugar anyway, and it gets a heck of confusing (imho) when people think it does s.th. else.
Like how does it make code cleaner when you put the sinus function into a class?
And while this might be an extreme example, you can have a function which operates on two data elements. But because your doing "OO", you arbitrarily put the function into on class. Everything else would be a code smell.
I dont think this is what OO was originally about, but it doesnt speak for the average developer what happened there...
If you mean static methods inside classes, I don’t think those are any different than a namespace. They should be thought as such (e.g. java’s Math::sin, not sure how it’s done in C#)
I mean both. Non static non virtual functions are essentially just normal functions where you don't see the "this" pointer. And you call ob.foo() instead foo(ob).
Of course it's better syntax and often convienient. But many people think there is more to it ("Im doing object oriented programming"), while it is semantically the same.
PS: I get that for namespaces it's not that bad. But still, why not use a namespace? Now you use classes for two completely unrelated things. Congrats Java and C#, your programming language needs one keyword less.
That’s just the “Turing-equivalency” argument, of course in the end we just jump to different points in the code. That doesn’t make it OOP nonexistent.
> How many man-hours have been wasted on ORM, XML, annotations, debugging generated SQL, and so on? It makes me cry.
Agreed. All my greatest successes with ORMs have been ripping them out of projects I've inherited. The results have been more maintainable code that's faster and less resource intensive.
Every non-trivial project that uses an ORM will, at one point, fetch the whole database.
Either by someone accidentally creating mappings that do it without noticing or someone not understanding that there's a whole-ass database in there and they just fetch everything local and filter with a for-loop.
I have been coding on Java since v1 and wrote basic ORMs before Hibernate was a thing (yes, in EJB world). There were pretty big and complex projects in fintech and healthcare, but I have never encountered the situation you describe. Maybe I was lucky, but logically you cannot say “every non-trivial project”.
Some years ago Entity Framework followed navigational properties (foreign keys) by default, which quickly escalates. The scenario could be a user table referencing a organisation table, so that when quering for an org, you’d get every user.
I dont think EF ever did that. However, if you used automatic lazy loading and you somehow inadvertently referenced the collection, then yes, it would lazy load the related entities.
This must be a mistake that you do once as a junior and then never repeat. When you declare a field with collection, would not it be natural to ask how it works?
Definitely! The problem though, is that this issue creeps up on you over time, and maybe only i production, because the performance is directly correlated to the amount of data in the database.
But I guess issues like these are what transition you from junior developer into whatever comes next.
> The proper way to build a data layer is one query at a time, as a string, with string interpolation.
That seems like a false dilemma? Yes, ORM is garbage. But that doesn't mean you can't represent SQL in your host language by eg at least it's AST or something else more sophisticated than strings.
If your host language has a rich enough type system, you can even type your tables in a way that's understood by both SQL and your host language. (But that's not too easy. Even mighty Haskell struggles a bit to give JOIN a proper generic type.)
I've done many database migrations from db2 to MySQL. It's a thing that happens quite a lot. ORMs and Java's usage of database access layers over the underlying sql drivers vastly simplified the process.
Sounds like you've have misused hibernate if you had an 'anaemic domain model'. The only reason to use hibernate is to have a full domain model and then let hibernate map it to the DB. I recently worked for a company that was using hibernate in the way you mention and I was surprised as I had never seen it used in that way in the ~10 years I used it.
I'm not trying to defend Hibernate, but if used correctly it can be a good tool for a good number of scenarios.
There are tons of scenarios where this is simply not true and an ORM is a huge time saver.
You could be building internal tooling that needs to be flexible enough to work with different databases. You could work for a massive multinational consulting firm like Accenture or professional services division at a company like Microsoft and build applications that any customer with a general relational database should be able to use.
This is an honest question. Have you worked on such systems, and which ORMs have you used with multiple databases?
I ask this because (of course) every database has different features, and some of the ORM stuff might work differently depending on what the underlying RDBMS actually is.
For example, many ORMs have an API where updating a table row will also return the new, updated, entity. In my experience, there's usually no alternative (first class) API to update without returning the updated entity. With PostgreSQL, that's not a big deal- you've mostly just wasted a few bytes of memory allocating an object in your application that you didn't need. With MySQL, though, there's no functionality to get the row(s) that were just inserted or updated as part of the same query, so the ORM always ends up doing two queries when the underlying RDBMS is MySQL; now you're doing twice as many real database queries as you actually want/need.
Another example is pretty much all of SQLite. Lots of ORMs "support" SQLite, but SQLite is sufficiently different from your MySQL, PostgreSQL, MSSQL, etc, that the documentation often ends up listing plenty of "gotchas" and caveats for what things actually work with SQLite, specifically.
There are other small things, but those are the main ones I remember encountering frequently. If you're working on one of these systems, how much can you actually get away with pretending the underlying RDBMS doesn't matter?
Yes I’ve typically used Sequelize. There are tons of small differences between databases that add up to a big and annoying waste of time. Quick example: https://www.w3schools.com/sql/sql_unique.asp
I agree ORMs are not a silver bullet and I’ve had my share of pain with Sequelize, most recent being broken migrations specifically with MSSQL when it involved dropping a column with a Unique constraint.
But all in all, it’s still been a big convenience. Part of my career has been in consulting and there, you notice lots of companies are trying to solve similar problems or automate similar business processes.
For example, contract approvals that integrate Dropbox, docusign, Salesforce, and Slack. Something like Zapier may not cut it so you develop a custom app, but it requires storing state and audit records in a database. The app has to work with different databases to cater to different client’s requirements and the application itself is relatively simple. ORMs are great to use in this case.
Looking back, I actually quite liked it - you had conditionals and ability to build queries dynamically (including snippets, doing loops etc.), while still writing mostly SQL with a bit of XML DSL around it, which didn't suck as much as one might imagine. The only problem was that there was still writing some boilerplate, which I wasn't the biggest fan of.
Hibernate always felt like walking across a bridge that might collapse at any moment (one eager fetch away from killing the performance, or having some obscure issue related to the entity mappings), however I liked tooling that let you point towards your database and get a local set of entities mapped automatically, even though codegen also used to have some issues occasionally (e.g. date types).
That said, there's also projects like jOOQ which had a more code centric approach, although I recall it being slightly awkward to use in practice: https://www.jooq.org/ (and the autocomplete killed the performance in some IDEs because of all the possible method signatures)
More recently, when working on a Java project, I opted for JDBI3, which felt reasonably close to what you're describing, at the expense of not being able to build dynamic queries as easily as it was with myBatis: https://jdbi.org/
I don't think there's a silver bullet out there, everything from lightweight ORMs, to heavy ORMs like Hibernate, or even writing pure SQL has drawbacks. You just have to make the tradeoffs that will see you being successful in your particular project.
Can't vouch enough for MyBatis, using it since forever and it never let me down, or produced any sort of frustration.
It strikes the perfect balance: on code level, you still work with domain objects and the db access sublimates away into single lines of code (select, update, ...), but they are backed by handcrafted queries that can leverage the full potential of the db. What MyBatis does is mapping from an arbitrary result set to your domain object, and it's _really smart_ about it (it knows how to convert most data types and perform arbitrary level of recursion/nesting of objects, or return collections). For all those things that it can't possibly do (like intermediary JSON conversions or things like that) you have an handy TypeHandler abstraction, that you can tuck away in a dedicated package for autodiscovery and doesn't pollute your code or queries.
Also, you pay only 3% over raw JDBC in the average case.
If you love MyBatis please donate: it can't go away.
Within the same installation, that's true. Within different installations, not necessarily. GotoSocial uses `bun` as its ORM which allows you to use the same code for a Postgres, MySQL, SQLite or MSSQL backend[1]. Which is extremely handy if you're writing something for other people to use in a system you (obviously) don't control.
[1] Although I believe GTS only officially supports PG or SQLite.
This mirrors my opinion about ORM quite well, but it's missing an important angle that might help understanding why ORM made it that far in the first place:
The big selling point, in hindsight, was running document-style persistence on relational. Today we happily dump JSON blobs in there and we accept that we'd have to deal with the consequences if it ever came to querying on some oddball property (chances are consequences won't even be so bad), but back then it was always relational first, and relational everything. All that busywork of spreading things like the change history of the user's favorite ice cream out to glorious relationality, stuff that will never ever be accessed outside its natural tree structure ("the document")? ORM were excellent at that.
Now that we have arrived at something that I'd consider close enough to consensus that some data is fine too keep in document blobs (instead of spreading out), the value proposition of ORM is much, much smaller than it used to be.
"But no-one uses different databases." Not true. A client wanted an on-premises instance of our product and specified SQL Server while we had only used MySQL before. Due to the ORM there were only tiny changes required to be up and running very quickly.
Thanks man I really feel the same when what should be 2 joins, becomes an all day hellscape of inspecting orm emitted sql, to debug why its blizzard of exists-subqueries is absolutley jacked and why fixing that breaks other code depending on such bugs
As a side note, I actually do use an ORM to support multiple databases with LLDAP, since it's a self-hosted application that can be used with either SQlite (for the most light-weight), or PG/MySQL for a more HA-friendly setup.
> very non-trivial long-lived application will require tweaks to individual queries at the string level.
It's possible we can have LLMs tweak queries to perfection given ORM-style input. Obviously ORMs of the past don't do this, but it's not impossible we couldn't do it in the future.
is an anti-pattern because it fetches all of db.users and then does the filtering in Python, which is horrible. But an LLM could optimize this code to an SQL query like
You don't need an LLM to be able to write SQL though?! Aside from attempting to shove LLM into every conversation, I'm failing to see its relevance here.
You still have to learn the ORM syntax and quirks as well, since they may be neither intuitive nor consistent. Why not just focus learning efforts on the (more) global skill of SQL queries instead of the leaky abstraction that is ORMs generating SQL queries?
I don't think this is a good example of where ORMs fall apart. If you have devs doing the first thing you have issues.
ORMs fall apart when they either do unexpected things, or "expected" but often terrible things, like lazy-loading data that causes a bagillion queries.
I've never worked in a place that didn't use different database. Maybe we're not talking about the same thing, but how are you developing new things if you don't have separate databases? Are you making changes directly on your production database while you're creating new things? How do you test things?
Do you keep everything in the same database, or do you have multiple and then collect data from different APIs?
Not that you need an ORM to have several or indeed different databases. I'd argue that it doesn't even make it that much easier.
They mean database agnostic - that you can easily/magically migrate your application from a DBMS to another and the ORM will handle it for you. That usually works only for trivial databases and does not include data migration.
It’s unlikely that you would be using different db vendors.. not different databases for different environments, that you absolutely should be doing … but pgsql in one and mssql in another is super unlikely
> The proper way to build a data layer is one query at a time, as a string, with string interpolation. The closer you are to raw JDBC the better.
This fits in my head a lot better than an ORM would but I'm a database guy so my brain is already wired different. How do you handle CRUD here? Like if I have a dozen columns in a table and I want to update two of them is that one bespoke query? Or do you always update the full record? Or do you do some dynamic SQL with the skeleton of an update?
This feels like a basic question but it's still one I don't grasp all these years later.
I always make sure to update only those columns they’ve changed if I know in advance. This is to minimize my data getting polluted with bad data should a bug creep in application logic.
That said, there are often times it’s difficult to known in advance. For instance user profile update which is a web form with like 10 fields all of which can be edited. Even if you can figure out exact fields that have changed to write that bespoke query will be intractable as it will lead to combinatorial explosion of update queries.
> This is to minimize my data getting polluted with bad data should a bug creep in application logic.
Yeah makes sense, seems like begging for a race condition. In Postgres it also generates vacuum churn.
So if you do know that I updated a few fields how does that work? Do you run a few single-field updates in a single transaction or do you generate dynamic SQL?
In the case I described I just update all the fields. Anything else is just way too tedious and big prone. I hear that ORMs come in handy in such case but then they have their own set of bigger problems as others have pointed out here.
The good part though is such cases to update all the fields are very few. In 90-95% updates I exactly know which fields are getting updates and for what reasons.
My favorite thing about Hibernate is that it’s sophisticated enough that it has its own query language, which is very, very similar but not identical to SQL. Makes me want to take a drink just thinking about it.
What's your opinion on primarily SQL based toolkits which mainly serve to serialize / deserialize results? For example
Android Room (you define DAOs with methods, and annotate each method with SQL query. `Room` takes care of deserializing results into your domain types etc..).
Or Sqlc in Go: you write SQL with annotations, and sqlc will generate Go code corresponding to them.
There's also libraries like JDBI which lend to less structured usage while still avoiding most boilerplate.
I am optimistic that we are converging on the good parts of the ORMs without the baggage that hibernate brings.
> What's your opinion on primarily SQL based toolkits which mainly serve to serialize / deserialize results?
Without `crud` or some other interface, I would probably have yeeted Go into the sea for all my db-related work because `sql` is horrible to work with.
I've used ORMs quite a bit and I agree with you, but the advantage for me has been getting the database objects into your application, and to a lesser degree good ORMs providing a schema for other code to integrate with.
It's not super popular but I've been using Scala's Quill quite a bit lately. I don't think it's an ORM exactly, it's more like LINQ and automatic mapping between the database and case classes, but it's really nice and does everything I want from an ORM.
I 100% agree. My goto is «hibernate makes simple things easier and hard things harder».
JDBC can be a little bare-bones so I prefer something like JDBI.
Years ago, I worked on a project that used an ORM and multiple databases. An ORM made sense at that point, as setting up and running the database properly was an exercise in self restraint at the best of times. Nowadays with RDS and/or docker, there's no excuse.
That said, ORM's offer one killer feature - code gen/serialisation.
> One of the selling points, which is now understood to be garbage, is that you can use different databases
Yup, and even if the ORM supports multiple databases, once you have committed to one database, it's almost impossible to switch to another due to different convention being used for data mapping.
Like most 10x technologies such as ORMs, AWS, and high-level languages, the goal should be to gain the advantage they provide early on but have a plan to move toward technologies with better long-term outlooks (ideally home-grown alternatives).
Well, in ORM's defense, it auto-generates the "data transfer objects" from the schema for you, so you don't have to. You can save minutes of typing in exchange for years of painful maintenance.
I used to agree 100% with this sentiment, as dissatisfaction with available ORMs at the time (early days of doctrine in PHP) drove me to actually write my own. Turned out an amazing exercise in why orms are hard.
Anyway a few years later I was in a position to start things fresh with a new project so thought to myself, great lets try to do things right this time - so went all the way in the other direction - raw sql everywhere, with some great sql analyzer lib (https://github.com/ivank/potygen) that would strictly type and format with prettier all the queries - kinda plugged all the possible disadvantages of raw query usage and was a breeze to work with … for me.
What I learned was that ORMs have other purposes - they kinda force you to think about the data model (even if giving you fewer tools to do so) With the amount of docs and tutorials out there it allows even junior members of the team to feel confident about building the system. I’m pretty used to sql, and thinking in it and its abstractions is easy for me, but its a skill a lot of modern devs have not acquired with all of our document dbs and orms so it was really hard on them to switch from thinking in objects and the few ways orms allows you to link them, to thinking in tables and the vast amounts of operations and dependencies you can build with them. Indexable json fields, views, CTEs, window functions all that on top of the usual relation theory … it was quite a lot to learn.
And the thing is while you can solve a lot of problems with raw sql, orms usually have plugins and extensions that solve common problems, things like soft delete, i18n, logs and audit, etc. Its easy even if its far from simple. With raw sql you have to deal with all that yourself, and while it can be done and done cleanly, still require intuition about performance characteristics that a lot of new devs just don’t possess yet. You need to be an sql expert to solve those in a reasonable manner while, a mid dev could easily string along a few plugins and call it a day. Would it have great performance? Probably not. Would it hold some future pitfalls because they did not understand the underlying sql? Absolutely! But hay it will work, at least for a while. And to be fair they would easily do those mistakes with raw sql as well, but with far few resources to understand why it would fail, because orms fail in predictable ways and there is usually tons of relevant blog posts and such about how to fix it.
It just allows for an better learning curve - learn a bit, build, fail, learn more, fix, repeat. Whereas raw sql requires a big upfront “learn” cost, while still going through the “fail” step more often than not.
Now I’m trying out a fp query builder / ORM - elixir’s ecto with the hopes that it gives me the best of both worlds … time will tell.
I feel these complaints are going to become a non-issue when we can write the optimal SQL then ask AI to do the conversion to ORM code. Or just ask AI to write the optimal ORM code with known data/performance considerations.
More languages are getting destructuring semantics built into them. I think this problem is slowly taking care of itself. You write a query that renames everything to your struct names, or you write a mapping that does it, then let the programming language do its thing, instead of a third party library doing it.
Essentially your code for talking to the database starts to look more and more like the code that deals with JSON responses, which all just look like quasi-idiomatic code in the programming language.
In any sufficiently complex application, something like a query builder that allows different parts of the application to work together to lazily compose queries (vs combining/filtering/sorting the data in memory) will be created at some point.
ORM is a tool that can make this easier, it's also a tool that can make it easier to shoot yourself in the foot (ie by making it easy to create N+1 queries without knowing). Like all tools, there are tradeoffs that need to be accounted for together with the actual use case to make a decision. Operating on SQL statement strings is not something I'd recommend in any case.
Different people have wildly different experiences with ORMs as they used them for wildly different levels of integration and tasks, and in wildly different languages with different features that make ORMs more or less useful, yet there's always someone willing to go out of their way and ignore all that and make absolutest statements about how it's good or bad.
We should just learn to recognize it for what it is, someone that's being controversial for attention, and move on. Let's save our attention for the ORM article and discussion that starts out along the lines of "ORMs can provide benefit, but it's important to recognize where, and not let the problems of their use outweigh their benefits. Here's what I've found."
> Different people have wildly different experiences with ORMs as they used them for wildly different levels of integration and tasks, and in wildly different languages with different features that make ORMs more or less useful, yet there's always someone willing to go out of their way and ignore all that and make absolutest statements about how it's good or bad.
I don't have a horse in this race, I could care less if you do or don't use an ORM. But, and maybe this is the cynic in me, there are practices in software development that absolutely, 100%, for certain, have no good reason and are perpetuated in part by this belief that there must have been a good reason for it to exist (Chesterton's fence and all that).
Null terminated C strings are a prime example. There is absolutely no good reason, other than the fact that the authors may have wanted to save 3 bytes, that C strings should be null terminated. Fortran was created in 1954, and passed the length of the string with the string itself. How many countless bugs and CVEs have risen due to errors in handling null terminated C strings (one example[0])? And for what? To save 3 bytes or just because of the authors decision at a whim's notice?
Likewise, decisions made at Javascript's inception have burdened it for its entire life. Decisions that were made at a whim's notice, like implicitly converting numbers to strings sometimes, and sometimes implicitly converting strings to numbers! (Tell me what `10 + "10"` is and what `"10" + 10` is without using the inspector). And the million and one ways to define something that's undefined.
Anyways, when somebody tells me there's absolutely no good reason for a development practice to exist, sometimes, that is the absolute truth. And I would rather have more people throwing away these crummy practices that lead to unnecessary headaches (or at least questioning them) then people continuing to laud the practice and perpetuate it ad infinitum.
> Null terminated C strings are a prime example. There is absolutely no good reason, other than the fact that the authors may have wanted to save 3 bytes
Or, you know, they wanted to make the language track assembly as closely as possibly (which was possibly back at that time when processors were much simpler and instructions weren't constantly reordered), and if you've ever written assembly, you know you're not really working at a string level, you're working at a byte and character level. Sized strings are more complex than null terminated ones in that you either need to set a max size or you need to waste multiple bytes per string (which actually mattered on many systems C was used on when it was developed) or you have to use masks on those early bytes to or determine if the next byte is part of the string or a continuation of the size.
And honestly, when you're working on systems where ram (and maybe storage) is in kilobytes or less and speed is in kilohertz, the extra code to do that and the extra time to process them and the extra space to store sized strings is a lot less of an obvious choice to make.
Was C a good choice for the time and where it was used? Possibly. Is it a good choice these days? Probably not without a bunch of extra utils and compiler guards to beat back the worst problems. I do t blame C as much for that as I do the people that continue to use it without extra safeguards.
What was that you said about Chesterton's fence? That's one of those terms you should be careful about throwing around when supporting an absolutist position...
I could almost buy your argument, except for the fact that Fortran, which was created in 1954 when systems like the IBM 650 had a maximum of 35KB of memory[0] (which I'm assuming included program memory), and it still included the size of the string with the string as a convention.
But that's just me guessing. There's no reason for us to do that when Dennis Ritchie wrote down the reason for this:
>> This change was made partially to avoid the limitation on the length of a string caused by holding the count in an 8- or 9-bit slot, and partly because maintaining the count seemed, in our experience, less convenient than using a terminator.[1]
So this was a change made primarily for convenience. And if the limitations of 255 characters was really a huge blocker, they could have easily created a spec like UTF8 to allow variable length encoding depending on the size of the string, which funnily enough Ken Thompson who also worked with Ritchie, later did invent. You mentioned that the processing time would have been an issue, but C strings require you to process the entire length of the string to determine the length, and Ritchie notes that as an additional tradeoff for this convention.
But that wasn't done. And I can't blame Ritchie for that either, because he didn't think this language would become what it is today! Later on in the paper he alludes to this:
>> C has become successful to an extent far surpassing any early expectations[1]
All throughout the paper you can see him referring to decisions that were made out of convenience, and not because he had done extensive analysis to determine whether the tradeoff for the convenience was worth it:
>> Two ideas are most characteristic of C among languages of its class: the relationship between arrays and pointers, and the way in which declaration syntax mimics expression syntax...In both cases, historical accidents or mistakes have exacerbated their difficulty.
>> C treats strings as arrays of characters conventionally terminated by a marker...and as a result the language is simpler to describe and to translate than one incorporating the string as a unique data type.
All that to say, yes of course there were reasons that decisions were made the way they were. But, and this is what I've noticed more and more in programming communities, these decisions are often made with little to no analysis and usually made out of a subjective preference, or to make the implementors life a tad easier. So, yea, I think it's right to call out a lot of "best practices" because history has shown that programmers really don't put too much thought into their decisions. And then you end up with gurus proclaiming that a decision made out of convenience was actually the best decision available and we should never change the way we do things because clearly this is the right way.
> I could almost buy your argument, except for the fact that Fortran
Fortran was not created for the same purpose. Fortran existing as an invalidation of C's choices is like Java existing being an invalidation of C++'s choices. There's a reason Fortran and Java are not common choices to write in OS kernel in, while C and C++ are/were. There's a reason why C and C++ aren't often used for web development, but Interpreted languages are. Different design choices fir different niches better or worse.
> So this was a change made primarily for convenience.
Convenience can mean a lot of things, and in this context and in the absence of contrary evidence I interpret that statement to be entirely inline with what I said above. It was inconvenient to have a more complex type to deal with, for multiple reasons. I'm not sure why you would think it different, it's not like I said it could not work the other way, just that there were things that went into the reasoning that made it less obvious than in today's world.
> You mentioned that the processing time would have been an issue, but C strings require you to process the entire length of the string to determine the length, and Ritchie notes that as an additional tradeoff for this convention.
Yes, tradeoff. If what you're doing with strings most the time is parsing them, knowing the length ahead of time may be of little benefit, since you're going to step through them character by character anyway. For many operations, knowing the length ahead of time is irrelevant.
> decisions that were made out of convenience, and not because he had done extensive analysis
I'm not sure anyone is arguing they used extensive analysis. I'm certainly not. But when the reality you live and work in is that you are running up against real hardware constraints routinely, that's bound to affect your ad-hoc reasoning about what choice to make when you don't do extensive analysis.
> All that to say, yes of course there were reasons that decisions were made the way they were.
Given that this started because you wanted to support absolutist statements with "there is absolutely no good reason, other than the fact that the authors may have wanted to save 3 bytes, that C strings should be null terminated." and your prime example has now been walked back to the fact that yes, there were some considerations beyond that, including making it a simple language to describe, I think you've proved my original point.
Who is to say that C's simple description and ease of implementation for additional architectures isn't a major factor in it's success and spread? And yes, while we've been paying the price for that for quite a while now, it may also have allowed for a level of software portability that really helped advance computers beyond where they would currently be otherwise.
I don't like C all that much and I don't use it for anything, but I'm also willing to note that it must have done quite a few things right to get to where it did, and I'm not willing to call out any aspect of it as completely without merit while still acknowledging the immense benefit the language brought as a whole, because at that point we're getting into conjecture about alternate histories.
> Decisions that were made at a whim's notice, like implicitly converting numbers to strings sometimes, and sometimes implicitly converting strings to numbers!
I see this as a problem that's horribly inflated by those who don't use JS on a daily basis.
Practitioners of the language largely don't care, because in actual code you rarely see cases where it would matter.
Now that we have template strings it's even less relevant.
I use TypeScript on a daily basis for my job. It was an entire language created to make up for JS's shortcomings. There's no horribly inflated reasoning going on when most of the industry has decided the best idea is to just throw away the language and use a different one that transpiles to it.
Isn't that the whole point of this discussion? We're talking about conventions programmers treat as absolutes that are detrimental to the maintenance and security of the programs.
So I don't see how this has much bearing on the ultimate point I'm making, which is that yes, conventions can be bad haha.
> In any sufficiently complex application, something like a query builder that allows different parts of the application to work together to lazily compose queries
I don't see this as a given and I don't accept it in my own applications. If you need data, go to the data access layer. If it doesn't provide you what you need, build a new repository / provider / whatever your pattern is.
Whether it's an ORM or a home-built query compositor or whatever, one thing I know from experience is that once your application is "sufficiently complex" that you start to (incorrectly) believe you need this, your application has become too complex to use it reliably.
You absolutely will be mistakenly evaluating these builders in the wrong layers, iterating results without realising you're generating N+1 queries, etc.
> I don't see this as a given and I don't accept it in my own applications. If you need data, go to the data access layer. If it doesn't provide you what you need, build a new repository / provider / whatever your pattern is.
Whichever layer you put it in, you need a way to compose two query fragments. Otherwise you have to write N*M queries manually instead of N+M query fragments.
Either you use an ORM to help you with this or you don't, with all the usual tradeoffs about using a library or not. But you still have to solve the problem. (Or you do a lot of tedious copy-paste work - with all the usual tradeoffs of that)
sure, sometimes, rarely -- these are exceptions, not rules
in general, it should not be possible for user input to produce arbitrarily complex queries against your database
each input element in an HTML form should map to a well-defined parameter of a SQL query builder, like, you shouldn't be dynamically composing sub-queries based on the value of a text field, the value should add a where or join or whatever other clause to the single well-defined query
sometimes this isn't possible but these should be super rare exceptions
I prefer using something like Rails or Django to build 10 fully working CRUD interfaces with well-defined yet dynamic filters in a day instead of spending two weeks needlessly writing the equivalent code by hand.
You’ve never actually implemented a real world implementation, have you?
You’re going to have parameters that are compound. You’re going to end up filtering on objects 3 relations removed, or deal with nasty syncing of normalization. You’ll have endpoints with generic relations, like file uploads, where the parent isnt a foreign key.
It’s going to be a mess. They will NOT always be simple to write.
> it's pretty rare for queries to be dynamically composed from arbitrary sub-queries
I'm talking static, not dynamic. You still need to compose two pieces together into a single query, and you can either use an ORM to help with that or not.
> the interface between the application and the DB is actually a string! it's not an abstract data type, it doesn't benefit from being modeled by types
No it isn't. You can't send an arbitrary string to the database and expect it to work. At the very least you benefit from having an interface that's structured enough to tell you whether your parentheses are balanced and your quotes are matched rather than having to figure that out at runtime.
> when your app queries the db, the query is not composed from several pieces, it is well-defined in the relevant method
> this is a single query, not multiple
And when you want to query for multiple related things together, the whole point of having a relational database? For different purposes you need different views on your data, and those views are generally constructed out of a bunch of shared fragments; you can either figure out a way to share them, or copy-paste them everywhere you use them.
> the db accepts a string and parses it to an AST, it does not accept a typed value
> this means the interface is the string
The DB accepts a structured query, not a string. It might be represented as a string on the wire, but if that was what mattered then we'd use byte arrays for all our variables since everything's a byte array at runtime.
> unbalanced parens and whatever other invalid syntax is obviously caught by tests
i'm not sure what you're thinking about when you say "multiple related things"
every "view" on your DB should be modeled as a separate function
every possible "thing" that's input to a function which queries the database should be transformed into a part of the query string by that function
> The DB accepts a structured query, not a string. It might be represented as a string on the wire, but if that was what mattered then we'd use byte arrays for all our variables since everything's a byte array at runtime.
...no
the API literally receives a string and passes it directly to the DB's query parser
if the DB accepted structured queries, then the API would rely on something like protobuf to parse raw bytes to native types -- it doesn't
like `echo SELECT * FROM whatever; | psql` does not parse the `SELECT * FROM whatever;` string to a structured type, it sends the string directly to the database
> every "view" on your DB should be modeled as a separate function
OK, and when a significant amount of what those functions do is shared, how do you share it? (E.g. imagine we're building, IDK, some kind of CMS, and in one view we have authors (based on some criteria) and posts by those authors, and in another we have tags and posts under those tags, and in another we have date ranges and posts in that date range. How do you share the "fetching posts" bit of SQL between those three different (parameterized) queries?)
> if the DB accepted structured queries, then the API would rely on something like protobuf to parse raw bytes to native types -- it doesn't
> like `echo SELECT * FROM whatever; | psql` does not parse the `SELECT * FROM whatever;` string to a structured type, it sends the string directly to the database
In both cases the parsing happens on the server, not the client. "echo abcde | psql" and "curl -D abcde http://my-protobuf-service/" are both doing the same kind of thing - passing an unstructured string to a server which will fail to parse it and give some kind of error - and both equally useless.
> How do you share the "fetching posts" bit of SQL between those three different (parameterized) queries?)
your application has a fetch posts method, that method takes input including (optional) author(s), tag(s), etc., it builds a query that includes WHERE clauses for every provided parameter
the code that converts an author to a WHERE clause can be a function, the point is it outputs a string, or something that is input to a builder and results in a string
i'm not sure what a "fetching posts bit of SQL" is, a query selects specific rows, qualified by where clauses that filter the result set, joins that modify it, etc.
> the code that converts an author to a WHERE clause can be a function, the point is it outputs a string, or something that is input to a builder and results in a string
So you do string->string processing, and you explicitly won't treat the queries you're generating as structured values? Enjoy your rampant SQL injection vulnerabilities.
creating a query string that's parameterized on input usually means you model those input parameters as `?` or `$1` or whatever, and provide them explicitly as part of the query
Ok so now your WHERE clauses are no longer strings, you have to have some structured representation of them that knows which parameters go with which clauses, and something that understands the structure of your queries enough to line up the parameters when you stitch together different WHERE clauses to make your full query - exactly the kind of thing you were saying was unnecessary.
Say you create a nice little utility function that makes a call to the db func1 and you write func2 that also makes a db query. If you have a func3
that needs func1 and func2 conceptually your options are:
* Accept that this will just be two trips to the DB.
* Write a new function func12 that writes a query to return all the needed data in one query and use that instead.
* Have your tool be able to automatically compose the queries.
If you do with the second option you have to do that with every combination of functions that end up being used together which is multiplicative in general.
CTEs and views are both a bit bigger than the parts that you would normally want to share and reuse, and they're also not very well standardised. Plus no-one really agrees on how dynamic tables should be, so you end up with the Excel maintainability problem of no real separation between code and data.
Following this advice too closely and your application logic starts leaking into your data layer.
One simple example is when you need to make atomic changes to two different types of entities. In the data layer, this is usually trivial — just run two queries within a transaction — but you need to expose a function that boils down to `updateBothAandB`. Rinse and repeat enough times and your data layer is a soup of business logic.
Exactly, in my opinion, separating your business logic from your data access layer completely is futile. You will need to either have business logic in your data access layer or fine grained query controls (when to add a filter clause, when to execute a query, what should go into a transaction, etc) in your business logic for performance reasons. This is OK, it's incidental complexity.
There are patterns to resolve this. Depending how you structure your DAL, your data access classes can provide transaction handles. It's a data access layer, you don't need to abstract away the fact that you're dealing with a database.
I mean, yeah, but the patterns all look like the thing you said you don’t accept in your own applications: “something like a query builder that allows different parts of the application to work together to lazily compose queries”.
An ORM is a specific kind of data layer. It is general-purpose, and as the name suggests, it is an Object-Relational Mapper whose primary purpose is to map relations to objects and vice-versa.
So no, bespoke data access layer code is not an ORM.
Why are a data access layer and an ORM mutually exclusive? An ORM is just an abstraction over your database. You might find use in it as a tool to access your database but contain the use of the ORM to within your data access layer.
Mind sharing an example of a large(ish) app that doesn't make use of an ORM? Last time this topic came up, I went looking for one (admittedly not too hard) and I came up empty handed.
> In any sufficiently complex application, something like a query builder that allows different parts of the application to work together to lazily compose queries (vs combining/filtering/sorting the data in memory) will be created at some point.
Welcome to LINQ - introduced in 2007 (?).
LINQ isn’t exactly an ORM. You create strongly typed LINQ statements that are turned into expression trees that are then turned into a query by a query provider.
Being able to run queries over data without having to implement the various boilerplate is excellent.
Unfortunately basic mistakes can result in so many of those N+1 type queries if you're not careful. It can also result in reading entire table(s), possibly inside those nested N+1s.
It gets expensive when it's a remote DB server and/or dealing with a lot of records.
Profiling tools are essential.
It's why I'm not a fan of LINQ (and Entity Framework) for talking to DB severs without very strict controls.
Many times I've been called in to deal with a "slow" server, only to find out the issue is someone chained together a bunch of calls through EF and we have hundreds of thousands of queries that can be replaced by one or two.
> Unfortunately basic mistakes can result in so many of those N+1 type queries if you're not careful. It can also result in reading entire table(s), possibly inside those nested N+1s.
I don't know which "basic mistake" would do this. Maybe when using Lazy loading Proxies?
We use EF Core and it is quite easy to control eager loading or load-on-demand on a per-case basis.
Would you consider SQL an abstraction over DynamoDB? ElasticSearch’s native query language? Apache Presto to query files? Mongo?
All of those can use SQL as a worse query language than their native counterparts.
You can also create very bad SQL if you don’t know the underlying engine. For instance if you try to write SQL for a columnar database like you would for a traditional database, you are in for a world of hurt.
You seem to be talking about databases who's native language is not SQL. In those cases, yes, SQL is definitely an abstraction. There's some layer that's transforming that into (say) Elastic query or Dynamo queries.
I would argue that when talking about writing SQL, most folks are talking about applications who's query planner talks some dialect of SQL. MS SQL Server, MariaDB, MySQL, Oracle, SQLite.
In those cases, it's not really an abstraction any more than writing assembly is an abstraction over (say) Intel's CPU microcode is.
The query planner takes your SQL and turns it into something else, sure, but you can't generally do that yourself. It's the lowest layer of abstraction that's reasonably available.
Iirc, Linq is exactly a query builder and not an orm. IMO what truly makes an orm is when object fields have data binding against the database (updating a field triggers an update in the db)
It's kinda gross because the developer needs to make a decision about what is authoritative source of truth, the programming model makes you feel like you can trust your code (probably the wrong choice), and all the footguns around distributed state kick in (possibly even worse if you have a frontend with two way data binding and data structures that last longer than an http request in your backend)
LINQ is a query builder and the .Net runtime creates expression trees. Entity Framework translates the expression tree to SQL and maps data back to objects.
Linq is really just a high level abstraction for querying any data that can be queried. EF adds some more abstractions mostly related to mapping C# objects to database tables. Then it adds database specific query providers (implementations) that combine the (abstract) linq query and the mapping to produce sql.
You can really use linq to query anything (APIs, for example), including databases without using EF. It's just not very common because building a custom query provider is a lot of work.
LINQ is just an interface, you need some sort of an implementation below it (be it the unmaintained LINQ-to-SQL that AFAIK nobody should be using anymore, or the latest EF rewrite).
It's also the first thing I recommend to anyone claiming that concatenating SQL strings together is a good use of your time in the 21st century. Makes writing things like complex HTML tables with configurable columns and tons of optional filters so much faster and more maintainable.
In my experience, across a wide variety of applications built wide a number of different teams, LINQ has almost always been an anti-pattern. Whenever you inject a leaky abstraction (as Spolsky would say), things start to go awry. The crimes of grotesque inefficiency I've seen because the magical LINQ is there to shield devs from proper data tiers.
Absolutely, there are times where programmers try to do way too much in LINQ and don't realize that the query being generated to back up their chain of LINQ operations is a monstrosity of thousands of lines of SQL that will grind the server to a hault. Or the abstraction leaks badly due to the programmer using something EF doesn't know how to translate to SQL, so it ends up loading a much larger dataset and trying to complete the rest of the filtering in memory.
As developers get more acclimated to the .NET ecosystem they usually pick up a good spidey sense for what query "shapes" are appropriate to express in LINQ and which are asking for trouble.
However, for every one of those awful LINQ queries in a codebase, I think there are 40 to 50 run-of-the-mill queries that make it worthwhile. The biggest win that it delivers over a simpler, "stringier" ORM is the ability to return structured data. I have no problem writing SQL, but processing the results back out of flat rows into objects gets extremely annoying.
Say I want to load some Foos with their Bars. If I use straight SQL with a row mapper, I define a type to represent each FooBar row, load a List<FooBarRow> into memory from my simple left join, then I'll have to do a GroupBy in memory on that List to get the actual structure that I want: a list of Foos where each one has its own list of Bars. Notice that I also have to handle the empty-list case specially.
var fooBarRows = await connection.QueryAsync<FooBarRow>("select f.FooId, f.Name as FooName, b.BarId, b.Name as BarName from Foos f left join Bars b on b.FooId = f.FooId where whatever");
// re-shape result set manually from flat query results
var foos = fooBarRows
.GroupBy(fbr => fbr.FooId)
.Select(g => new Foo
{
FooId = g.Key,
FooName = g.First().Name,
Bars = g.First().BarId == null ? new List<Bar>() : g.Select(b => new Bar { BarId = b.BarId.Value, BarName = b.BarName }).ToList()
});
In EF it's just:
var foos = await db.Foos.Where(whatever).Include(f => f.Bars).ToListAsync();
Saving that clutter -- both the code and the otherwise useless intermediate FooBarRow type -- really adds up because the apps I write have a zillion queries very much like this.
So I'll take the occasional shitty query that has to be tracked down and optimized in exchange.
>I have no problem writing SQL, but processing the results back out of flat rows into objects gets extremely annoying.
I'm pretty sure Dapper lets you do this fairly easily. You've got a couple options - you can either provide a mapping function, or you can have your query/sprocs return multiple result sets. Then you can assemble the result sets into the object structure.
It's not quite as magical as LINQ, but it's also not quite as annoying and fraught with so much marshalling code as your example.
When they say it forces client evaluation, what they mean is that any LINQ operations you run after the ToListAsync() will run on the client side - since it's just a normal list in memory at that point and no longer has any knowledge of the database. So if you want to make sure some code will execute client-side, you should ToList or AsEnumerable your query first, then do your additional client-side operations on that list.
The IQueryable operations composed before to the ToListAsync() will run as SQL on the server. In the case of my example it will look pretty similar in structure to the string SQL I wrote before it: a straightforward left join with a where clause. Performance will be similar too. It will not load the entire universe of Foos and Bars and then filter them down on the client.
One of the habits I have developed from working with EF for the past 10 years is to be explicit and as local as possible about forcing the query to evaluate. It lets me pretty reliably predict what the SQL is going to look like.
You can return IQueryables from methods and pass them around through many layers of your program, adding complexity to them as you go, and lazily getting the results at the last possible moment, five layers up the call stack from where the query first started.
It seems at first like that would be good, because that way the maximum amount of logic will run on the DB server, and letting the server do stuff is better, right? But that's also where you open yourself up to being very surprised about what your SQL looks like by the time it executes, with multiple layers of your program each tacking complexity onto the query. It can get ugly. Also you can have problems if you're hanging onto IQueryables that you still haven't executed after you've Dispose()d your DbContext. For this reason I try not to let IQueryables travel very far through my program before forcing evaluation.
It's sad that in programming world, something is either anti-pattern and you must not use it at all, or something is so awesome that you need to use it everywhere.
LINQ in general is a very bad indicator if it isn't very restricted. I don't think it's a crazy claim whatsoever: When LINQ appears littered through code, it usually mean there has been perilously little data access planning, so instead of planning out and centralizing concise, planned, secure, optimized data access avenues, just pretend that LINQ being available everywhere is a solution. Every LINQ-heavy solution I've had the misfortune of using has always, with zero exceptions, been a performance nightmare. And instead of one big function you can just optimize, it is death by a trillion cuts.
I get that there are big LINQ advocates. It's convenient. I'm sure there are projects where it is use spectacularly. I've never seen such a project.
First time I’ve heard that claim was when I had to start working with Go. I was telling someone how much I miss LINQ/functional programming, and he said exactly the same thing “ugh, LINQ is why .NET is so slow”. I was perplexed. Do you even know what LINQ is? How does it make .NET slow? And to the best of knowledge they were comparing JIT startup time to Go’s AOT time. Smh
This comment make me question if you really have any experience with LINQ. If you had, you would not make a comment where you seem to think that LINQ is used only for data access.
LINQ is (at the most basic level) list comprehensions done better. It is functional programming for the imperative C# programmers. It has the potential to remove most/all loops and make the code more readable in the process.
>This comment make me question if you really have any experience with LINQ
The classic fallback.
>If you had, you would not make a comment where you seem to think that LINQ is used only for data access.
List comprehension is data access. Accessing sets in memory is data access.
>If you had, you would not make a comment where you seem to think that LINQ is used only for data access.
It has the potential, and almost the certainty, of allowing one to thoroughly shoot themselves in the foot. See without the "magic" of LINQ the grotesqueness of many patterns of data access (which, as previously mentioned, includes in memory structures. Pretty bizarre that anyone actually in this field thinks this only applies to databases, or that only DBs are "data") would lead one to rethink.
LINQ is almost always a bad indicator. It is actually a fantastic thing in one off scripts and hack type code, but when it appears in production code, I would say 90%+ of the time it is absolutely gross, but it hides how gross it actually is.
> LINQ is almost always a bad indicator. It is actually a fantastic thing in one off scripts and hack type code, but when it appears in production code, I would say 90%+ of the time it is absolutely gross, but it hides how gross it actually is.
Yeah, you definitely have no idea what LINQ, or an list, even is.
It is just a slightly slower than e.g fors, so unless this is hot path, then it is basically not relevant meanwhile it improves readability of the code
Good points! As someone who uses both ORM and raw SQL (I actually really like SQL as language) I sympathize with both sides of this debate. I prefer the readability of ORM in my models when developing, but closely monitor and optimize for N+1 and other inefficiencies, when needed. "When needed" is never cut-and-dry, but I try not to optimize too early. I actually tend to start new projects from the database and will scaffold with factories, seeders, and raw SQL queries to get a sense of the data model prior to coding.
ORM's such as Eloquent in Laravel also have some nice methods to resolve N+1 and perform lazy loading, but it's always tradeoff.
There is no other way — people who don’t know one should not touch the other. Otherwise they are either juniors, or crazies who just like to complain that “the plane is a bad vehicle because I can’t just sit inside and land it properly without years of training”.
There are libraries that flip the concept of an ORM on its head. Instead of a library that allows lazy query composition, you write your queries and the library generates the code for that query at compile time. It’s a much better model in my opinion.
E.g. you could write a query like this:
getUser:
SELECT * FROM users WHERE id = ?
And the library would generate a class like:
class GetUserQuery {
static getUser(id: String): GetUserQueryResult
}
We had a query builder in our app and ended up stripping it out in favour of raw SQL and string interpolation (for dyanmic queries, not for passing in data). We found the raw sql was much more readable.
We still used the library for inserts and updates.
IME, the accidental inefficiencies aren't a huge problem. What I hate about some ORMs is that they want to be considered the foundation of a system or, worse, multiple systems, rather than just a tool for implementing pesky infrastructure details. LINQ-to-SQL and Entity Framework feel like they want to be at the center of everything, though they don't have to be use that way. Something like Dapper, on the other hand, seems like it just wants to get your app the data it requires, through classes the application defines, and then get out of your way.
In short, I don't think ORMs should generate class libraries to be referenced by applications, they should help you get exactly what you need, when you need it, nothing more, nothing less, and no one expect the person that maintains that code should ever need to care how the data got there. If a column gets added to a table that has nothing to do with a specific application, that application shouldn't need to be updated.
I've seen them help with things like dirty checking, that's a lot of work to take on properly. But in the end I think relying on them for significant load and scale is naive.
> In any sufficiently complex application, something like a query builder that allows different parts of the application to work together to lazily compose queries (vs combining/filtering/sorting the data in memory) will be created at some point.
Right, but is an ORM a good way to achieve this? I would posit no. The problem, correct me if I’m wrong, is one of an in memory representation of working data and a data access layer (how to get/update the data). An ORM provides a framework for both of these, but in my opinion it’s very inefficient to build, maintain, and optimise. You don’t need to turn to raw SQL as the alternative data access layer, but having one that is easier to customise is better in my opinion.
Not at all an expert, probably nonsense, please correct me.
SQL is an industry standard well understood by legions of senior developers and DBAs alike. SQL skills learned are repeatable, widely useful and build on past experiences.
Debugging and optimizing SQL queries is a well understood art, barring some rare DB specific optimizations (which ORMs don't even touch)
Getting expert SQL advice and support when you need is very practical and the results show.
Putting an ORM as the middle man generating your queries, negates all those benefits for some pithy premature laziness.
Operating on raw SQL is something I'd strongly recommend.
I would challenge your assertion that "any sufficiently complex application" will require lazily composed queries. I have worked on quite complex applications that had no such need. Well encapsulated string construction in a Data Access Object pattern worked just fine.
If you find yourself needing to lazily compose queries, then by all means, reach for a query builder. But I would encourage you to first examine whether you've split code into services that really should live together and whether that code should live closer to the data layer before you build a whole query builder.
And don't reach for a query builder until you need it.
In a sufficiently complex application, the database lives in another service that exposes a fixed set of queries through an API and certainly does not let you compose arbitrary queries.
How do you do anything with that? Like do you just accept that if some part of your app needs the data from query1 and query2 you make two trips to the database?
You file a ticket with the database team so they can add your fixed query to the service API. After a few rounds of exchanging messages and meetings it might be added.
If it's a third-party API (say, a weather forecast or market data) the answers will range from "no" to "yes, and it will cost you X".
the line between an ORM and a query builder is a very blurry one
your query builder isn't just string based to avoid a lot of potential bugs which are easy to introduce and miss in tests? It also has a simple way to (de)serialize row from/to POD structs? Now you already have a thin ORM.
The N+1 problem feels a bit like y2k in my experience; would definitely be a problem if smart folks didn’t come up with clever solutions, but since they did it’s not nearly the issue it’s made out to be for the average implementer.
That has not been my experience. Then again, I’ve written tens of thousands of sql queries over the decades, so sql is as easy as breathing for me. No need for any sort of query builder.
Query builders are generally not intended for developers to build "static" queries, but to build dynamic queries programmatically while being safe (eg. from SQL injection) based on dynamic data, often with user inputs, the likes from ASTs from parsed user input in query fields.
Query Builders are a subject I am thinking about at the moment. The question is whether or not OO is the right model.
JooQ in Java is a DSL for writing SQL in Java, it works amazingly well. It’s not guaranteed every JooQ statement compiles to valid SQL but I have pretty good experiences writing crazy complex queries using things like string_agg in pgsql. Here the relational model is primary and the representation as Java objects is secondary, given the SQL is persistent and the Java objects ephemeral this makes sense.
My RSS reader uses Arangodb and Python and I’d like to stop a bit and develop a query builder inspired by OWL class axioms which if you look at them the right way look like building blocks for a Scratch-like SQL query builder. I want both ordinary queries (browse my favorites, say an article with the keyword “niantic” that appeared on Tildes) and rules (don’t want read articles from the Guardian that have “- live” in the title)
> In any sufficiently complex application, something like a query builder that allows different parts of the application to work together to lazily compose queries (vs combining/filtering/sorting the data in memory) will be created at some point.
Then your devs have way too much time on their hands. Find them some actual problems to solve.
My dislike of ORMs mainly stems from the tendency to treat modern SQL engines as glorified dumb bit buckets.
Where a CTE or LATERAL join or RETURNING clause would simplify processing immensely or (better yet) remove the possibility of inconsistent data making its way into the data set, ORMs are largely limited to simplistic mappings between basic tables and views to predefined object definitions. Even worse when the ORM is creating the tables.
SQL is at its heart a transformation language as well as a data extraction tool. ORMs largely ignore these facets to the point where most developers don't even realize anything exists in SQL beyond the basic INSERT/SELECT/UPDATE/DELETE.
It's like owning a full working tool shed but hiring someone to hand you just the one hammer, screwdriver, and hacksaw and convincing you it's enough. Folks go their whole careers without knowing they had a full size table saw, router, sander, and array of wedges just a few meters away.
Most ORM frameworks have the ability to execute raw SQL, so using an ORM does not preclude you from using these features. ORMs tend to put you in the mindset of using simple CRUD statements most of the time - but I think that’s probably for the best, and these more advanced features should be used sparingly.
Not only does each ORM have a different API and access pattern to do this, once you hit this point, you're managing the object model ALONG WITH custom SQL for migrations.
There is also non-trivial danger in letting your ORM decide what data types your database schema should use. Got UUIDs? The ORM will store them as strings. Need start and stop timestamps? A range type with an exclusion constraint may be the right tool.
What you appear to consider "advanced features" are what some others of us consider "perfectly normal." Let's be honest: most devs consider knowing the difference between a LEFT JOIN and an INNER JOIN to be an advanced topic.
Devs will wax poetic about the marginal benefits of monads in limited scenarios, but throw up their hands in defeat when you mention a window function for a dashboard. They'd rather calculate it in the app layer with all the added latency that implies.
> Not only does each ORM have a different API and access pattern to do this
I think it is to be expected that different ORM frameworks would have differing APIs and access patterns. Similar to how different RDBMS's often have differences in the syntaxes and features that they support. It's not a big deal to look up the correct syntax in my opinion.
> you're managing the object model
Which means in the worst case that for some portion of your app you're basically back to where you were if you weren't using an ORM. Although I've found that JPA, for example, plays pretty nicely with native SQL queries (caching gets more difficult, but I think that's to be expected when moving logic out of the app and into the database regardless of whether you are using an ORM or not).
> ALONG WITH custom SQL for migrations
Personally I've always used custom SQL for all migrations. While ORMs often come with a tool to automatically generate a schema, I've never worked on a project that actually used this tool in production.
> There is also non-trivial danger in letting your ORM decide what data types your database schema should use. Got UUIDs? The ORM will store them as strings.
Some data types can be a bit tricky, but any competent ORM should at the very least have hooks to override the type on a field or implement your own custom handlers. [1]
> What you appear to consider "advanced features" are what some others of us consider "perfectly normal." Let's be honest: most devs consider knowing the difference between a LEFT JOIN and an INNER JOIN to be an advanced topic.
I said "more advanced features" - as in more advanced than basic SELECT/INSERT/UPDATE statements. I don't think any feature that you mentioned is particularly challenging to understand. And literally every entry-level developer I've ever hired has easily been able to explain the difference between LEFT and INNER joins in an interview.
I'm lucky enough to work on projects usually lacking any kind of database anywhere. But when I do, I also tend to make sure to use a tiny subset of DB features. Simply because I consider "code" to be the part of the world I have any control over and "database" to be the dangerous and out-of-reach part where errors show up later (Such as in slower/larger integration tests). Any logic that is happening in a DB happens outside of low level testing reach too and that scares me. I'll probably never even use a select-from-select query... I'll happily leave most of the tools in the shed.
> Simply because I consider "code" to be the part of the world I have any control over and "database" to be the dangerous and out-of-reach part where errors show up later
Sounds more like a personal phobia than a substantive critique of databases, especially ACID databases. You can test any query outside of a huge integration test by… running the query in a smaller, more targeted test suite. This can be done for performance checks, conformance, sanity, etc. The thing is, compared to most other programming languages, SQL is largely side effect free. As long as no one is dropping tables or indexes, the query you ran today will return the same result structure you run tomorrow. Obviously changes in the volume of data will affect performance, but that's true no matter what data store you use.
"But what about testing writes," you ask? Same deal. Start a transaction, run the INSERT/UPDATE/DELETE and then ROLLBACK. You get your performance timing, your ability to detect errors, and no permanent changes to your dataset.
Data storage and persistence is hard. It doesn't become easier using NoSQL or avoiding data altogether. But if data makes you personally uncomfortable, perhaps it's best you keep your distance. It greatly limits your career choices, but that's ultimately your choice.
There are tons of antipatterns out there. Often the production database is completely different from anything I can or want to set up on my local machine (E.g. it's a cluster of database type A, while my local dev env is a single node of type B.
> But if data makes you personally uncomfortable
I wouldn't say uncomfortable, simply bored. Not because I think database are bad tech, they are awesome. But we still haven't solved the impedance mismatch between programs and data, so any time you need to work with them, you spend 10% of your time on business logic and 90% on deployments, migrations, ORM and CRUD. Which is just something I'm lucky enough to be able to avoid. I don't like NoSQL either (just like I don't like dynamic typing). I don't think anyone can "avoid data" in programming, but my way of staying sane has been to avoid the web.
If you have fewer than ten thousand simultaneous users, this is not you. After ten million, you're not using an ORM in front of a bare relational database either.
Scale at high numbers adheres to no rules or off-the-shelf tools.
I wish. I'm constantly having to reinvent materialized views on engines that don't have good support for the concept (ms SQL, sqlite) because my users want to sort/filter on a calculated column, meaning that I have to revert to "dumb bucket of bits" for records in seven figures and user-counts in the dozens.
Relational algebra is good. I like the idea of nornalized data.
I hope to be able to use it one day instead of SQL.
So… use one of the engines with better support for materialized views?
This sentiment is like saying, "Java doesn't support traits, so I hate all programming languages."
If an engine doesn't support a feature you need, use a different engine. If your workplace does not allow this, that's an issue you have with your workplace, not the tools.
I've worked on a few projects in the past that "didn't need all that fancy database stuff". It's frustrating to deal with the consequences. You spend most of your time building elaborate workarounds to re-invent basic data integrity safeguards. Roll your own database might be a fun learning experience, but not for production apps.
SQL may be a language for all those things, but unfortunately it's horribly bad at it. Wilfully obtuse syntax. Incredibly poor compositionality. No testability worth the name. The deployment model is a half-baked pile of perl scripts, and that's if you're lucky.
So yeah, I use SQL like https://xkcd.com/783/ . I'm sure you have a bunch of cool data analysis tools in there, but please just shut up, give me the contents of my table, and let me process it in a real programming language where I have map/reduce/filter as regular composable functions that I can test and reuse, with a syntax that doesn't make my eyes bleed.
Give you the contents of your table, so you can process it in the app tier with map/filter/reduce?!
Ah, the hubris of devs who honestly believe they can whip out a solution in a day that beats a dedicated army of developers singularly focused on the task of large data management, storage, and serialization. And almost always forgetting that serialization off of disk and over a network isn't free.
There's a reason why SQL is going on 50 years when most technologies are lucky to remain dominant past 10 in this industry. And if you think it's just because of inertia or lack of imagination, you're deluded. SQL isn't perfect (nothing is), but as a DSL for set theory, it does a damn good job, even 50 years later. Far better than any map/filter/reduce whipped up yet again by someone who doesn't fully understand the scope of the problems being solved.
It's doubly troubling when you can't grok that SELECT = map, WHERE = filter, and GROUP BY + aggregator = reduce. I sincerely hope you aren't avoiding JOIN by loading both tables ahead of time.
> Ah, the hubris of devs who honestly believe they can whip out a solution in a day that beats a dedicated army of developers singularly focused on the task of large data management, storage, and serialization.
A "dedicated army of developers" who've been "going on 50 years" but whose flagship solutions are still single-point-of-failure, still noncompositional, still untestable, still have bizarre and incomprehensible performance characteristics. Yeah, no, I'm going to do stuff in regular application code, thanks.
You pay a huge cost in moving those bits over the network though. How do you even deal with tables that don't fit in memory or the lack of indexes for tables where sequential scanning is too slow?
I mean at that point you're getting into real big data stuff. Move the code to the data rather than moving the data to the code; have some way to stream in the data in its native format rather than having to read it in; do pre-aggregation and indexing as the data is written (but not in a way that's blocking your OLTP). Which, yes, is stuff that SQL RDBMSes do for you up to a point, but they do it in invisible and unmanageable ways; IME you're better off doing it explicitly and visibly.
In seriousness SQL databases can be good for ad-hoc exploratory queries, so having your data processing pipeline spit out a read-only SQL database dump on a regular schedule is worthwhile, but using that for anything more than prototyping is a mistake.
I am not particularly opinionated on the matter and i think i can appreciate both sides of the argument.
SQL is somewhat analogous to C. It is entirely legitimate to look at its archaic usability features and wonder if we could not do better and at the same time it has hit a sweet spot of adaptation to the domain that ensured its longevity.
Something that might shake up things a bit is graph databases / query languages. We can think of them as a generalization of the sql universe and an opportunity to modernize it
Yeah, if your devs are loading entire tables into code and using filter on them ... I haven't personally seen that, but oh god.
Not that it isn't valid in some cases. It is. If for example, you're using all of the table in your logic and need to filter on to do something like pull out certain items for a certain need. Sure. Then it is good, absolutely!
SQL is a declarative language. If you ask it for an address, it'll give you an address. Not sure what you're trying to say with that xkcd.
SQL is a real language. It is testable and reusable. It's a DSL, so the syntax isn't something you have to look at most of the time unless you're a DBA. That said, not everyone hates the syntax.
You're probably losing efficiency in your quest for functional paradigm perfection.
We use integration tests to test our SQL queries. Basically fire up our persistence layer and a database instance together, call some create/update methods, and assert that the expected results are returned. Writing tests is pretty straightforward - the tests proved incredibly valuable recently when we migrated to a different RDBMS provider.
But yeah I guess we don't really have a good way to write automated tests for migrations that run outside of our application code.
DDL is like code. DML is not. Tools like Sqitch are quite adept at testing DDL.
DML is declarative. Like HTML. SQL is a DSL for set theory. Do you test your HTML like the rest of your code too?
But to be clear, Sqitch and pg_tap (off the top of my head) are both effective testing tools for Postgres databases. MS SQL and Oracle both have extensive testing frameworks available.
Sure. HTML doesn't impact as much as the app as SQL does. Missing records in a SQL query can have a huge impact on many systems. A missing table row (HTML) doesn't usually impact that much.
> SQL may be a language for all those things, but unfortunately it's horribly bad at it.
Any piece of software used incorrectly will seem "horribly bad".
> Wilfully obtuse syntax
This is an opinion.
> Incredibly poor compositionality.
Another opinion.
> No testability worth the name
I would argue that "try it, observe side effects and rollback" as a language feature is more testable than many/most programming languages.
> The deployment model is a half-baked pile of perl scripts, and that's if you're lucky.
>> Any piece of software used incorrectly will seem "horribly bad".
I appreciate that you hate it. I don't even think you should care or learn it if you don't want to. But I think positioning things you don't understand or _want to use_ as "horrible" or "bad" or whatever is acting in bad faith. You're allowed to say that you don't like something and don't want to learn how to use it, without suggesting it's the tool's fault.
> So yeah, I use SQL like https://xkcd.com/783/ . I'm sure you have a bunch of cool data analysis tools in there, but please just shut up, give me the contents of my table, and let me process it in a real programming language where I have map/reduce/filter as regular composable functions that I can test and reuse, with a syntax that doesn't make my eyes bleed.
This comment directly demonstrates "I don't understand, I don't want to, just let me do whatever I want however I want", which is fine, but again, not the tool's fault you don't want to use it.
Quoting again because it's also relevant to this last comment:
>> Any piece of software used incorrectly will seem "horribly bad".
At some point there is an objective underlying reality. I've worked at a lot of places and never seen SQL done well; even the people who were happy with SQL had no automated testing, no compositionality to speak of, and a crappy deployment model (and would usually acknowledge all this! They just somehow didn't mind). If one person uses a tool badly that's maybe a problem for that person, but if everyone uses the tool badly that's a problem with the tool.
“Everyone” in this context means “the places one individual person is aware of”, and when compared to _actual_ “everyone”, is mostly irrelevant though. The things you’re complaining about are solved problems, you just don’t care/refuse to acknowledge those solutions, it seems. Which is fine! Like I said, you can just say you don’t like something without trying to smear it.
I do care, but the "solutions" are always vaporware. It's like when I complain about C++ memory unsafety and someone says "it's fine, under the new standard you can just use a safe subset of C++" and I say "ok, where's the definition of this subset and how can I tell whether a library follows it or not?" and they say "uhhh....". At some point you stop asking.
This is the second time you’ve said “it’s bad!” And then used a strange hand wavy analogy (in the first post, an xkcd comic) as an out. Again, if you don’t understand something and don’t want to, just say it, don’t paint it as bad.
Enjoy your vastly inefficient and expensive map/reduce configurations if that’s what you like.
You want to be concrete? OK, literally every real-world SQL setup I've seen, when not piggybacking off the application-level infrastructure:
- Has no way to deploy a specific historical version of the SQL (e.g. deploy this git tag)
- Has no automated checking that the SQL behaves as expected (e.g. this query with this test data should produce this output; if it doesn't, the git tag will not be created). Even checking that the SQL is syntactically valid is rare.
- Has no reuse of expressions smaller than a view/table
- Has not even basic type checking, e.g. something that alerts if you are combining an expression that might produce null with an expression that does not handle null well
- Has not even basic library/dependency management
- Has no practical structured values (since none of the infrastructure that would make using ad-hoc temporary tables safe exists). CTEs are an improvement but still thoroughly noncompositional since you can only put them in one place.
Let me guess, "these are solved problems" but no details of what you do to solve them, because none of the solutions actually work.
> - Has no way to deploy a specific historical version of the SQL (e.g. deploy this git tag)
Can you clarify this? Do you mean the version of a specific schema/query?
> - Has no automated checking that the SQL behaves as expected (e.g. this query with this test data should produce this output; if it doesn't, the git tag will not be created). Even checking that the SQL is syntactically valid is rare.
This is part of the purpose of transactions [0]
> - Has no reuse of expressions smaller than a view/table
Incorrect [1][2][3]
> - Has not even basic type checking, e.g. something that alerts if you are combining an expression that might produce null with an expression that does not handle null well
Incorrect [4] [5]
> - Has not even basic library/dependency management
Incorrect [6]
> - Has no practical structured values (since none of the infrastructure that would make using ad-hoc temporary tables safe exists). CTEs are an improvement but still thoroughly noncompositional since you can only put them in one place.
Can you clarify? How is a transaction + CTE/subquery not sufficient?
> Let me guess, "these are solved problems" but no details of what you do to solve them, because none of the solutions actually work.
Replace "none of the solutions actually work" with "I don't know how to use them", and yes you're correct.
I'm happy to answer more of your questions in good faith, but I'm not really interested in debating a person with their head in the sand. Please let me know if I can help you, if you're actually interested. But also, as I've said, it's also fine to not want/not like this type of stuff. Different strokes and all that.
> Can you clarify this? Do you mean the version of a specific schema/query?
I mean being able to version a system implemented in "advanced SQL" the same way I'd version one implemented in an applications language. Make some changes, keep them in VCS, at some point decide to do a release and deploy. A few days later, discover some issue with the changed logic, roll back to the previous release of my "code" (without affecting the data, so not just restoring a database backup).
> This is part of the purpose of transactions [0]
Right, but the actual workflow tooling around how to use them for this is missing, and for whatever reason the culture that would build and standardise it seems to be missing too. Like, in most ecosystems there's a standardised test-edit cycle that everyone understands; you make your changes and then you run npm test or cargo test or whatever, and you get some assurance that your changes were correct, and that same tooling is also in control of your release workflow and will prevent or at least warn you if you try to do a release where your tests are failing.
> Incorrect [1][2][3]
Postgresql functions with composite values (and more generally the fact that composite values exist at all) sound like exactly what I was looking for, so that would be a big improvement if I could use them (although for the record they're not a standard SQL feature; MySQL functions can only return a scala value, so there's a major missing middle between functions and views). But even then they have the same problem as temporary tables/views of existing "globally", in the place you'd expect data rather than code to be, and deployment/use tooling being inadequate to use them safely (or at least widely perceived as such). Like, everywhere I've worked has had a de facto rule of "no DDL on the production database except for deliberate long-term changes to the schema that have gone through a review process", and I don't think that's unreasonable (maybe you do?).
> Incorrect [4] [5]
Those functions exist, I'm talking about having tooling that can tell you where you need to use them. Even in something like Python you have linters that will catch basic mistakes, and they're integrated into the workflow tooling so that you won't accidentally release without running them.
> Incorrect [6]
A list of libraries != library/dependency management.
> Can you clarify? How is a transaction + CTE/subquery not sufficient?
If I want to pull out an expression that appears in two arbitrary points in my query, and use it as a CTE, that requires a lot more thought than it would in most languages, because it goes differently depending on whether it was in the SELECT or the WHERE or the GROUP BY or.... The grammar is just somehow less consistent than most programming languages, and the scoping is more confusing, I think because it's kind of lexically backwards (like, you have to declare things to be able to use them, but a lot of the time the declaration goes after the usage. But not always!).
> I mean being able to version a system implemented in "advanced SQL" the same way I'd version one implemented in an applications language. Make some changes, keep them in VCS, at some point decide to do a release and deploy.
It's tough to follow these posts in part because SQL is a programming language used to interract with a database. It seems like you want to treat the SQL you write synonymously with the system you're writing it against, which is...I don't know...confusing? SQL is declarative, you say make it so and the underlying engine makes it so. I've had trouble parsing whether you dislike _writing SQL to interract with a database_ or _the way RDMS systems manage data_. The only "logic" that exists in most databases is surrounding constraints, which, if there's a bug in your constraint, you tell your database to update its schema, and it does so. You're free to store your SQL queries however you'd like, just like any other programming language.
> A few days later, discover some issue with the changed logic, roll back to the previous release of my "code" (without affecting the data, so not just restoring a database backup).
As is the nature of this conversation, this is just inherently not how these systems are intended to work. Asking for "new data but old shape" is legitimately ridiculous.
> Right, but the actual workflow tooling around how to use them for this is missing, and for whatever reason the culture that would build and standardise it seems to be missing too.
I guess I'm confused as to what "workflow tooling" you're looking for? Every RDMS supports multiple databases and schemas, which enables you to create and run test configurations with test data in real-world environments. A transaction enables you to test real queries on real data without risk.
> Like, in most ecosystems there's a standardised test-edit cycle that everyone understands; you make your changes and then you run npm test or cargo test or whatever, and you get some assurance that your changes were correct, and that same tooling is also in control of your release workflow and will prevent or at least warn you if you try to do a release where your tests are failing.
Sure, and in database systems this is the same. Craft a query out of a transaction, when it does what you want, commit it (either via a transaction or to normal source control). Everything you're asking for literally already exists, exactly how you're asking for it.
> Those functions exist, I'm talking about having tooling that can tell you where you need to use them. Even in something like Python you have linters that will catch basic mistakes, and they're integrated into the workflow tooling so that you won't accidentally release without running them.
The more I read what you write, the more it seems like you think you're unable to write SQL in a file and execute it against a database? Which...you can do...it happens all the time. I didn't bring it up because it's so obvious and common I thought there was some other misunderstanding.
> A list of libraries != library/dependency management.
I'm not sure what else you want? Your RDMS of choice maintains a list similar to a package.json containing dependencies and their versions, and you can interract with either that list directly, or with RDMS-specific commands similar to "npm install react". Again, what you're asking for literally exists exactly how you're asking for it.
> If I want to pull out an expression that appears in two arbitrary points in my query, and use it as a CTE, that requires a lot more thought than it would in most languages, because it goes differently depending on whether it was in the SELECT or the WHERE or the GROUP BY or....
If you open a java file, are you allowed to write arbitrary code wherever you want? No, that would be ridiculous. Enforcing some shape or structure on source is common and expected.
> The grammar is just somehow less consistent than most programming languages, and the scoping is more confusing, I think because it's kind of lexically backwards (like, you have to declare things to be able to use them, but a lot of the time the declaration goes after the usage. But not always!).
I'm happy to meet you in the middle (really, one millimeter from where I currently stand) and acknowledge that it can be at times frustrating to modify a SELECT portion of a query prior to getting to the FROM clause, which reduces the ability for editors to assist in typeahead (because as you've said, you haven't gotten to the FROM yet).
> The only "logic" that exists in most databases is surrounding constraints, which, if there's a bug in your constraint, you tell your database to update its schema, and it does so.
Well, complex operations on data - even if those operations are just selection and aggregation - require logic. So either that logic lives in SQL, or it lives in application code.
> I guess I'm confused as to what "workflow tooling" you're looking for? Every RDMS supports multiple databases and schemas, which enables you to create and run test configurations with test data in real-world environments. A transaction enables you to test real queries on real data without risk.
So where is the equivalent of npm/cargo/maven? And where is the unit testing framework? Transactions are low-level functionality that you could build this tooling on top of - but as far as I can see no-one has, or at least not to the extent that it's standardized and accepted in the community. Where in RDBMS-land can I check out an existing project, make a small edit to one of the queries, and then run that project's tests to confirm I haven't broken it? A few projects have some support tools for doing this, but they're inevitably ad-hoc and unpolished.
> The more I read what you write, the more it seems like you think you're unable to write SQL in a file and execute it against a database? Which...you can do...it happens all the time.
I want a project with a bit more structure than a single file. And I want to share and reuse pieces between multiple projects rather than writing everything from scratch every time. Again, that's the low-level functionality, but where is the workflow tooling that actually builds on that to let you do day-to-day things in a standardised way?
> I'm not sure what else you want? Your RDMS of choice maintains a list similar to a package.json containing dependencies and their versions, and you can interract with either that list directly, or with RDMS-specific commands similar to "npm install react".
What? Where? You linked to a list for postgresql that literally has 12 packages available, total (I'm pretty sure I've published more packages than that in Maven central myself).
> If you open a java file, are you allowed to write arbitrary code wherever you want?
Not quite, but I can select any subexpression of an expression almost anywhere and pull it out into either a local variable or a function - usually by doing nothing more than hitting a key combo in my IDE. I haven't found anything like that for SQL.
To be fair, a lot of those tools have that one weird caveat you need to know, and are also strongly dependent on your dialect.
For example there's an aggregate function in DB/2, LISTAGG, that joins strings... the caveat being that if they get too long, the query blows up. It's in OracleSQL too, which has a syntax where you can tell it to truncate the string.
SELECT, INSERT, UPDATE, DELETE work more or less the same whether you're on MariaDB, Postgres, DB/2, OracleSQL, etc.
As opposed to the dizzying array of ORMs that all have multiple caveats you need to know and utterly incompatible in fundamental ways between each other?
Folks go on about how they're "stuck" with a single dialect of SQL but completely ignore how utterly impossible it is to switch ORMs without a full app rewrite. And ORMs are all language-specific, so if you have two different app clients in different languages, you can't always share even basic API access patterns.
Got a Java Spring app alongside a Django app? Good luck!
ActiveRecord has always felt great to me: yes there are some corner cases to deal w/ but it hits the 80/20 spot, let's you kick out to SQL when you need to, adds a lot of excellent life-cycle related hooks that clean up domain logic, and is pretty easy to reason about
pragmatic ORMs that don't try to hide everything behind a "bundle" or introduce their own query language are great, it's the ones that try to totally "solve" the ORM problem that are hard to work with
Yes I can’t imagine going back to doing big CRUD apps without AR, readability and maintenance would take a hit. The catch is you need a prior good knowledge of SQL to use it correctly. Problems happen when people don’t learn SQL and then don’t understand the queries AR is generating and the limits.
Some ORMs are awful to work with, but that doesn't mean the genre is a bad idea. I've loved working with SQLAlchemy over the years because it doesn't try to re-invent the whole idea of SQL. Instead, it gives you a convenient layer for building queries, passing them around, etc., while it concentrates on getting the details right so you don't have to futz around with them.
My only real ORM experience is with SQLAlchemy (and a tiny bit of trying to learn ecto) and based on this thread I have to ask are all the other ORMs just terrible?
I've been learning some rust lately and writing out boilerplate code to handle simple CRUD is painstaking and it feels like a waste of my time.
SQLAlechemy's ORM can get in the way a bit for more complex queries (or ones you are trying to optimize) but you can always just drop down and write the SQL for those specifically.
From what I’ve seen yes. My first ORM was flask-sqlalchemy. Migrating a database was literally just updating an object and running a command. No other ORM has been this seamless.
As with anything, there are situations where it works well and where it doesn't, and the tech can be easily abused or misused. ORMs (in e.g. dotnet Entity Framework), particularly when working database-first, can be a great asset.
If you are working in DBs with hundreds of tables and all setup with proper foreign key relationships, an ORM can help navigate a complex schema and provide compile time sense checking when things change. Embedded SQL is a double edged swords with perceived performance benefits but with very difficult to maintain "hard coded" verbose SQL particularly for joins. I'm talking about when there are hundreds of tables, not when you have 5-10 tables.
I've no ambition to convince anyone one way or the other, but don't believe everything your read on the internet about best practices especially since people may be talking about something quite different to what you're looking at. Some people will be pretty shocking developers regardless of which tech is being used.
A point I'm missing is: without using an ORM (or however else you're calling your database interface) you will eventually end up writing and maintaining one yourself, at least for CRUD stuff, and it will be filled with weird bugs, edge cases and security holes.
> A common complaint about ORMs is that they break two of the SOLID rules. If you aren’t familiar with SOLID, it is an acronym for the principles that are taught in college classes about software design.
And that right here is the problem. Academia is teaching people stuff that makes sense in an academic environment where time, maintainability, money and effort aren't of a concern (as long as you have grant money, but that's not relevant here) - and that leads to a very nasty collision when fresh graduates enter the workforce. They're used to shaving yaks and reinvent wheels all day long in the quest for a "perfect" solution, and if the company doesn't have (enough and qualified) senior/lead developers to catch that, you end up with ... very interesting products, budget overruns and/or dumpster fires. Or a combination of all of them.
ETA: There's something that can go even worse: when stuff designed by and for academics enters public availability. The best/worst example for that is OpenStack... enough knobs and twists to support the specialized environment of CERN and probably hundreds of universities worldwide, with components modularized to an insane degree, but almost impossible to get started with because it's so byzantine.
This entire thread is filled with people complaining about ORM not matching vendor promises which is true.
* It won't let you swap a database with no effort
* It won't remove the need to learn and understand the SQL it generates
* It won't make schemas seamless
We use it since it's a standard abstraction that we would need to replicate otherwise. It makes caching the database access possible which is remarkably difficult to do at scale for manual SQL.
It comes with standardized tools and best practices that make issues like n+1 irrelevant.
The main problem with ORM are:
* Misaligned expectations
* Bad ORMs
All you need to do to appreciate ORM is open a project written without it and try to change ANYTHING in that project... ORMs are amazing!
Most programming languages have long since gone towards OOP so it is only natural for people to think of their database in the same way (which coincicentally is the same reason why MongoDB got so popular).
The key thing is: most applications won't ever reach the scale where it's a better investment to ditch the ORM and most of your data model than to increase the "instance-class" parameter of your RDS database. So many projects fail because they think from the beginning what life would be as a unicorn and burn all their money on an insanely complex tech stack that could have been S3 buckets for the frontend, a couple of EC2 servers for the backend and RDS for the database. The really basic stuff can get you really far without exposing you to vendor-lock in or yak shavings.
> most applications won't ever reach the scale where it's a better investment to ditch the ORM
Aside from working in a large application that used ORM (and had many, many problems because of it), my personal experience with using them - specifically Hibernate - to develop an application from scratch was based on a relatively simple application to parse and help reconcile bank statements. This was never going to be an application that required "scale".
What I found was that, relative to just using SQL, the ORM version of my application was much more complex, required significantly more knowledge of the tools, and produced inferior code.
While I was able to get an MVP up and running slightly faster with the ORM, the productivity benefits diminished - and rapidly became negative - while the performance of the application for even a single user was very poor. Eventually, what should have taken milliseconds in SQL was literally taking seconds in the ORM. Not only did it generate stupid queries, each query had much greater latency. And what I was doing was extremely simple.
In addition to the poor performance and greater complexity, ORMs - like all dependencies - are not cost-free. There can be a significant learning curve just to get started with an ORM like Hibernate, and that learning curve then needs to apply to all developers who touch the code base.
In our larger application, we found that hibernate specifically had so many edge cases - not to mention its own SQL syntax - that the overhead of using it came to dominate the development time for new features. The need to create multiple objects (database entities AND entity objects) that could get out of sync, problems with cache consistency, latency, HQL... the list of gotchas and principle violations (DRY, KISS, SRP) was endless.
These days I simply refuse to use ORMs, and tend to do any complex database work in the database itself, i.e. with plpgsql. I find working directly with SQL to be insanely productive, and the resulting code to be in the order of 10x - 100x faster than what I used to be able to achieve in Java using ORMs.
> The really basic stuff can get you really far without exposing you to vendor-lock in or yak shavings
In my experience, vendor lock-in and yak-shaving are exactly what you get with ORMs.
Since a few departments and multiple companies struggle / fail because a super brilliant guy(s) was chasing perfection. No useful work could get done. Or it took 20-100x longer than a quick and dirty approach.
Perhaps the worst was when a team of highly skilled HTML designers were forced to learning functional programming.
Sure the back end wizard to configure some setting was a marvel of modern engineering and elegant design. It just took 10 people a year to build.
We heard so much negativity about SQLAlchemy that we decided to make our own lightweight query builder instead. 8 months later, and we're using SQLAlchemy...
The SQL language is wildly complex. It is seriously not trivial to make a fully featured query builder. Forget the actual 'relational mapping' part.
The third time I have to figure out whether to write obj.setCreatedAt(rs.getTimestamp("created_at", calendar)) or obj.setCreatedAt(rs.getLong("created_at")) I'm going to write code to look at the schema and the setter and figure it out automatically. And that’s where new ORMs come from.
> A point I'm missing is: without using an ORM (or however else you're calling your database interface) you will eventually end up writing and maintaining one yourself, at least for CRUD stuff, and it will be filled with weird bugs, edge cases and security holes.
The middle ground I've found is to use code generation for the CRUD boilerplate with a framework that makes it easy to do free-form SQL or easy query building. It gets rid of all the unnecessary contortions for advanced queries while still giving you a nice API for all the basic stuff because you get helpers instead of abstract objects. An example would be sqlboiler for Go.
IMO SQL is one case where you should "just program the damn thing".
There's no point at all in learning some SQL or ORM library. Learn and write SQL directly.
Database access is a fundamental thing you'll be doing for your entire career probably. Best to learn the actual technology directly.
This is true for certain other technologies too - CSS for example - don't learn a CSS library - learn and use CSS. In fact that's true really for almost all aspects of front end development - don't use a "forms library", program the damn forms APIs in the browser.
Also my personal experience is that SQL is much easier to learn than certain ORMs anyway.
Indeed, my introduction to databases was extracting stuff using Perl's DBI and getting good at pushing the work out to the DB. When I was introduced to ORMs it was instantly clear that they were a game changer for MVC development.
I've transitioned a few legacy (large) Rails applications through Arel and often did not understand why we were putting the work in, other than to make it easier on new/future developers who did not understand SQL. In pretty much all of them we ended up having to keep some non-Arel queries -- you know, the ones where `STRAIGHT_JOIN` cuts 5 minutes off a query!
> Learn to write SQL directly...[it's] a fundamental thing you'll be doing for your entire career probably
SQL has been called "the eternal language". I can say that across many target development platforms from mainframe, to desktops, to client/server, to web, and across all the languages used on them, SQL has been a common thread through it all (acknowledging they all have had their own idiom). Completely agree any developer is a stronger developer if they are well familiar with the data layer and what's going on down there. Classic question is - if you're going to spend your time learning the peculiarities of an ORM, why not just spend the time invested in learning SQL? The compound interest on your SQL experience will likely be worth more in ten to fifteen years than some platform or language specific ORM. I know there are reasons to learn and use an ORM but speaking in broad terms here. Avoidance of having to learn SQL shouldn't be the sole reason one is biased towards an ORM.
Isn't it more a case of "learn the fundamentals before learning the abstraction"
so you know what you are "paying for" (unless it's "zero-cost" abstraction)?
I'd say it's more of use the fundamentals. Learning them is, hopefully[0], table stakes. But even devs who know how to wield SQL, will be tempted to wrap it into some kind of ORM (either COTS or a DIY one), or a set of structures and functions, forming a flat and generic "data access layer" - all because that's how they've been taught the code should be structured. This is the pressure they need to resist.
That said, I myself haven't found the perfect balance. If you're going to honestly embrace the database, and follow a data-first approach, chances are you'll end up with an application that does most of its business logic on the database side.
On the one hand, it makes perfect sense - if your software is really about data transformation, there's no good reason to pull data from the DB, just so you can reshape it by hand using inferior tools not fit for purpose (i.e. most code you normally write), and then put it back into DB.
On the other hand, it feels wrong. Can't exactly say why, but somehow, having my app be a PostgreSQL database serving its own REST API, feels too direct, even if it literally does 100% of what I need. Closest I can come to explaining this is, RDBMS doesn't give me enough control - maybe I want the REST API to work slightly differently, or have more specific security needs, etc. But then, do I really need those alterations? Are they worth all the complexity and overhead that comes from writing software the "ordinary way", where RDBMS is only a dumb data store?
I think you are right, but databases are complicated enough where people can have different interpretations of what the fundamentals means.
For example, is being able to write the basic SQL query enough, or do you need to know what the difference between a hash join and nested loop is?
Without a pretty deep understanding, I think it can be hard to pick out faults with the ORM, and there is always the chance the ORM might avoid making a mistake you do yourself.
Done that, been doing that since 1992 (using Pro*Pascal, CS130 - it's a wonder I didn't immediately quit and take up llama farming) but currently I am using what would be classed as an ORM (`crud` by `azer`) for its marshalling to/from Go types because Go's stdlib `sql` is bare bones and you essentially have to write your own object (un-)marshalling and Good Lord, it is tedious.
I’m a big fan of the “middle ground” micro-ORMs like Dapper or Diesel.
Write your queries in SQL, but you get to have your strong typing and automatic result set to objects mappings. Productivity combined with the full-featured “real” query language.
Also, I’ve become a fan of using the object mappers only for read-only queries. Updates occur via the command pattern — stored procedures. These can be mapped through to look like native methods.
Absolutely. Dapper is easily the most pleasurable data access solution I have worked with. Developers need to stop pretending SQL doesn't exist and just embrace it.
Personally I like not having to either write my own model functions which just map back to sql, or having to write a boilerplate php to do parameterized queries correctly. Eloquent abstracts enough of that crap out of my way i can focus on the logic.
Sincerely someone rewriting an 'old' php app that is just arrays and queries to something using models and eloquent, what a game changer
Eloquent is one of the worst ORM I worked with. It works until it doesn't, there is too much magic and it mixes SQL/Entity in the same class. Since then I use doctrine and/or repository pattern which makes the code more extensible and testable, it's also trivial to do your own raw SQL request and map it to your entity if you don't want to rely on an ORM.
This thread only underscores the total mindshare dominance of ORMs. The implicit alternative to ORMs in almost every comment is raw sql in string literals in app code.
For 6 years now, we've had tools that let you use sql as a language, then generate the wrapper exposing your query to your app as a method. (queryfirst, pgtyped, pugsql, sqlc) This is (or could be) a paradigm shift. It's clearly, demonstrably a superior way of working, but it exists on the fringes because ORMs (and this unchanging discussion) have sucked all the oxygen.
SQL is a language, an extremely high-level one at that. Trying to stitch it together using a lower-level language is a prime example of "if you have a hammer, every problem looks like a nail".
An illustration. Let's say I'm coding in assembly (low-level) and I interact with a large system. The system's docs say that the input can only be formulated in Java Script (higher-level than assembly). So there's the choice: I can either import an arcane library that allows me to stitch something in pure assembly, without ever committing xxx.js file to repo. Or, I can express myself in .js files, using the power of a high-level language.
100% this! After using sqlc for a project, I don't think I can ever go back. Having a single source of truth for the SQL queries being utilised also makes it easy to optimize queries (e.g. create new indexes) since the queries themselves are the thing you end up writing and maintaining.
I always disliked ActiveRecord, and always really liked arel, which is what it uses to construct its queries. Abstracting over raw sql strings is very useful, but pretending that relations are objects is convoluted.
NeXT’s Enterprise Objects Framework supported this 25+ years ago. (And TopLink probably did too.) This is how it handled mapping between stored procedures and local entities.
The wikipedia page says exactly the opposite. "EOF abstracts the process of interacting with a relational database by mapping database rows to Java or Objective-C objects. This largely relieves developers from writing low-level SQL code." https://en.wikipedia.org/wiki/Enterprise_Objects_Framework
What I said is not mutually exclusive with that. I said that’s how EOF maps stored procedures to local entities. EOF also maps tables to entities and supports queries via its own qualifier (predicate) syntax. It’s not exclusive to one or the other.
datagrip in pycharm seems to be on a similar trip. The SQL stays in string literals, but datagrip looks inside the string literal to validate and propose auto-completion.
It's not an antipattern and it will never be. Like many solutions there are scenarios in which using an ORM is the simplest thing to do.
Also there are many different implementations of ORMs each with different features (Martin Fowler has a list from his Patterns of Enterprise Application Architecture: https://www.martinfowler.com/eaaCatalog/index.html). For example in the JVM world, jOOQ and Hibernate provide very different solutions.
* jOOQ provides a table data gateway (when used without records)
* Hibernate: unit of Work, Lazy load, multiple inheritance strategies, Query object, Versioning, lifecycle events, and much more. Lot of the grief about Hibernate comes mainly from people who never bothered RTFM.
Having said the above, if one needs flexibility and high performance (with complex queries), using vanilla SQL is invariably the best way.
I may be in the minority here (based off of other comments), but using raw SQL commands has been more performant, easier to write, easier to understand, easier to learn, and easier to build complex queries/ commands than an ORM has ever provided.
Updating some records in a CTE and using the results in a subsequent SELECT? It may take a minute or so on stack overflow, but significantly less than it would with SQLAlchemy.
Want to make a composite primary key? I have no idea how you would do it with most ORMs, but it’s relatively straightforward with SQL.
A lot of times the stuff you want to do won’t even be available in your ORM, but the person lobbying for using it always pleads “it’s okay, it still allows you to make raw queries! We have an escape hatch!” Then when you start writing raw queries, your PRs get rejected: “Please stick to the ORM”.
I don’t understand people’s love of ORMs instead of simply learning SQL, it seems to bring on myriad headaches for nearly zero benefit.
I don’t use ORM, but depending on the complexity of the SQL statement I might build it using SQL alchemy core. I hate to deal with all these rules about the right way of escaping SQL Colum names with “, ‘ and [. I find it sqlalchemy core to be the right balance where I am still in charge of the transaction management around the statement but rely on a library to build the textual SQL statement, without it I would probably have to reinvent some parts of sqlalchemy.
> I don’t understand people’s love of ORMs instead of simply learning SQL
> Want to make a composite primary key? I have no idea how you would do it with most ORMs,
Since you mentioned SQLAlchemy
class MyTable(Base):
col1 = sa.Column(sa.Integer, primary_key=True)
col2 = sa.Column(sa.Integer, primary_key=True)
ORMs aren't an alternative to learning SQL.
* Type safety on all SQL operations.
* All the benefits of a query builder.
* Reverse relationships that read better in code `parent.children`.
* External file management that is generic, works on all your tables the same way, and supports multiple simultaneous back-ends making migrations easy.
* Object de/serialization and transformations letting you use native types like datetime.
* Typed/parsed JSON columns, I use Pydantic for this.
* Handling single/multi table inheritance for you and giving you type safety.
One is based on the domain models and will either generate tables and columns or has some kind of mapping from domain model -> database.
The second type does the inverse approach and generates domain models from an existing database.
I vastly prefer the second version as the database is and should be the source of truth and generating types from the existing database as well as queries (you write sql and it generates type safe wrappers for your language).
The first approach has been done quite a lot and is very error prone and there will be drift from your domain model compared to the database which can lead to subtle errors.
The second approach is very flexible and you can still write SQL as you would have otherwise done but you can integrate it easily in the code as the type safe wrappers are generated automatically. Also your domain model (database -> model) is always up to date.
I’ve used both but never experienced the issues you describe about the first approach, though your description is pretty vague so maybe I’m not understanding what you’re saying.
My biggest fear with the first approach is that I make a change to my model which in turns causes some sort of unwanted change to my production database. It’s never happened to me, but I worry about it enough to steer away.
My favorite ORM is still ActiveRecord, which falls somewhere in the middle of those two schemas. You define the models (tables), but the columns/types are inferred from the database structure.
ORMs are all problematic, but some are less problematic than others. Some years ago I started writing more Elixir with Phoenix and Ecto, and I quickly realized just how much better Ecto is when compared to ActiveRecord. I wrote about it back then [1] but the gist of is it: SQL > Query Builders with some ORM elements > ORM qua ORM, and the less you blur the boundary between your application and the database the better.
You say this in your post (and allude to it here), so not trying to be pedantic, but explicitly surfacing it here for other folks that don't click through: Ecto isn't an ORM.
I love Ecto. In my opinion, Phoenix has a lot of strengths, but Ecto is the superpower in that ecosystem.
That being said, at least _some_ of the brand of "problems" mentioned in the post can exist into the Ecto world. I've seen plenty of folks new to Ecto fall into N+1 queries using `preload` naively, so I'd argue it doesn't completely erase that blurred boundary. I also personally much prefer the level that Ecto sits at and think the tradeoffs are totally worth it.
I never understood the rant on ORMs. From my point of view the main problem they solve is get rid of overinflated data access objects. In big project with complex data structure in database you can get hundreds of tables and you end up with data access objects with more than hundred methods like:
ORMs elegantly get rid of most of these stuff and you and up with code like user.getOrders().with("item") or something like that. The downside of that is it becomes a bit harder to profile if some offender method is doing hundreds thousands queries to a database but still possible with a little bit of more work.
There are situations when you need something more complex than "SELECT ... JOIN ... WHERE ...". For example you might want to force server to use some particular index or you use subqueries or CTEs. In that case you can use native query functionality when you pass a raw SQL query to ORM and it just populates data objects with the results of a that raw SQL. Most of ORMs have such functionality. But it's very rare when it's needed, usually you have several occasions per service if it's complicated enough.
right, ORM is great for most scale and complexity out there. lots of folks that can’t stand it have been bit by it being used for heavier analytical use cases.
example: a dev writes a loop that will call a method that runs “user.PutOrders(list) when user.Name != Fred” and your database crashes because every row is scanned by that query and its running that query in a loop. often it’s really difficult to show that this query caused the issue.
ORM makes it way easier to do that, but the other things it makes easier still make it worth using. except for rare scenarios unique to specific ORMs themselves, usually with problems attributed to ORM are just bad (and poorly tested) code
I love the Django ORM. It's concise, generates good queries and just works.
However, there's a danger - for complex applications, you want neatly separated domain objects and data layer. Django makes it really easy to make your model your domain object, and at some point that starts to hurt.
But a large part of any app is just objects backed 1-on-1 by a database table, and CRUD actions on them. It's fine.
Just watch out for those parts where the separation is needed, and separate them.
People will try to reinvent the same thing over-and-over - mostly because of missing core knowledge. So the cycle will go this way: discard ORM and use native SQL queries. Make your own management of what has changed and what not. Make your own (object) relationship/collection management. Make your own caches of loaded objects. And - the most non-obvious but the real deal-breaker: deal with complex updates where MANY objects are involved. Then they will come and start praising ORM. Because they learned the core knowledge...
The problem is that there are different kinds of ORMS.
One where the model is created and either the tables are generated from the model or some mapping exists. The other kind takes the database as a source of truth and from sql queries generates type safe wrappers and models.
I believe most people only have experience with the first kind which has some problems but never really experienced the second kind which is very near to the database.
A problem that I haven’t seen a good solution to is that most (all?) ORMs are also caches. And once you step outside the object box, cache invalidation is incredibly hard. Taking an example from this post:
await postRepository
.createQueryBuilder()
.update(Post)
.set({ status: 'archived' })
.where("authorId IN (SELECT id FROM author WHERE company = :company)", { company: 'Hooli' })
.execute();
What happens to all the pre-existing Post objects when you do this?
(Maybe some ORMs try to get this right. It’s a very hard problem. Certainly the ORMs I’ve used don’t, but I admit I haven’t used very many.)
ActiveRecord approach. Objects aren't kept in cache calling update like that won't update other object in memory. Each time you call select you get a different object.
Hibernate approach. Object are kept in cache and running update will update existing objects in memory because each select returns the same object.
The only one I gave experience with is SqlAlchemy (1.x). You certainly can do a query builder update and get current subsequent data out of ORM objects, but it’s really easy to mess up, at least in my experience.
Coincidentally I had to solve this problem at work a few weeks ago. I don't know how (or if) other ORMs deal with this, but for TypeORM it was relatively painless to implement optimistic concurrency[0] along with transactions at function-scope level which all but eliminates this issue.
Also, most ORMs make it hard or ignore the possibility of using database transactions, checkpoints, partial or full rollbacks, etc. Which might be fine for trivial queries, low write load and unimportant data. But for anything moderately important or complex, transactions are essential.
They do? Every ORM I've used supports explicit transactions and rollbacks. EF, NHibernate, GORM, SQLAlchemy. What major ORMs deny you this functionality?
It isn't that there is no such thing as "a way to open/close transactions". It is that transactions usually break all the supposed ORM benefits like composability. Often, you cannot even method-chain transactions like foo.search().begin().insert().insert().commit(), you have to do it in multiple lines. Even worse, rollback handling is often an exception, breaking control flow of your application.
And most only support basic transactions, not checkpoints.
"My big complex application is big and complex! It must be my (orm|monlith|OO|dynamic typing|database)!"
No, it is just a big complex app is complex. You can shuffle the complexity around, but you can't magic it away. You can change a pattern here or a tool there and make some subset easier to work on, but you've probably just made some other subset harder to work on.
> Let's me use SQLite on my dev machine but something else on my server.
This is my favorite feature too and forces me into better SQL hygiene because sqlite is just weird enough that supporting two databases keeps me on the beaten path.
My experience is that ORMs are like training wheels for fledgling teams/developers. They're a fantastic way to make sure everyone is following the same pattern/rules and that changes will more-or-less work over time without too much headache. If you have a team with developers who are actually skilled in SQL and use things like Dapper when writing code, you will actively detest the notion that the raw SQL should be abstracted away.
I also believe that you are more likely to completely go off rails if constrained at domain modeling time into an ORM-compatible schema. The real world usually has some cranky things like circular dependencies and self-joins. Modeling these effectively with an ORM can be a nightmare (in my experience). If you have access to SQL, determining how to manage a circular dependency is much more obvious & direct. The effect is that you are free to model you domain in such a manner that it more closely resembles actual reality. In turn, this makes it easier to talk about and write the SQL.
There is 100% a virtuous cycle here. The more you force yourself to endure the SQL, the better every aspect around it will become. You will feel the fire of a bad schema the moment you see if it you have been spending thousands of hours troubleshooting queries over bad schemas. Knowing when an arrangement of tables is "smelly" before they see any data can save you so much frustration. I think a lot of people who hide from SQL aren't hiding from SQL. They're hiding from someone else's horrible interpretation of a problem domain.
I still don't get it, none of what you mention is impossible or even difficult in many ORMs (at least none of the ones I've used), and using ORMs != being unskilled in SQL. I've had many cases where I start with the SQL, optimize it and then convert it to ORM to take advantage of not having hardcoded columns in my code for example. I've also done the reverse, converting ORM syntax into raw SQL (still sing the ORM), and I've also written ORM free apps. In my experience, using ORM makes collaboration much easier, especially if it helps take advantage of statically types languages.
I think there is this obsession by engineers to signal skills/experience, where everything that looks like it may also make it easier for less experienced developers must be bad or a result of some useless trend. However, we don't need to litter our app with SQL literals to show how skilled we are. I was part of a team once where one of the developers made it a point to do this because ORMs are for losers, and surprise, most of the bugs were things like changing column names and not updating related queries, plus insufficient testing because "we're moving fast!". Even if it did not cause bugs, isn't it a lot more effort than having one place you need to change?
I think we're hired to build products and solve problems, our egos should really get out of our way.
> I still don't get it, none of what you mention is impossible or even difficult in many ORMs
You may have not had to deal with cycles in data or other unusual views over the object graph that were not originally intended. Again, its not that you can't solve the problem using an ORM, its that its an opinionated way to solve a problem that constrains your design choices in such a way that you now have less fidelity at domain modeling time, which may result in yucky SQL down the road. ORMs effectively remove SQL capabilities from your tool belt. Yes - There are always hackarouds, but for most teams these "soft" limits are effectively hard ones and lead to bad career/product outcomes.
> using ORMs != being unskilled in SQL
Not necessarily, but I've personally never seen a developer go from writing CTEs in Dapper back to writing POCOs and using Entity Framework.
> I think we're hired to build products and solve problems, our egos should really get out of our way.
This is literally what the "stop using ORMs" crowd has been trying to sell as well. SQL has been the answer for half a century now. If you actually managed to kill your ego, you'd try to fit everything in that box as tightly as possible.
idk, this sounds a lot like don't use an IDE, use vim/notepad because IDEs can be opinionated. Maybe we've had different experiences, but I just think a lot of these arguments are unreasonably absolutist. Granted, some IDEs are terrible, but there's a sweet spot and many older IDEs have learned from it. I think ORMs have matured in this direction as well.
The “automagic” nature of ORM frameworks makes it easy for developers to inadvertently implement “N+1 selects” access patterns. Most ORM frameworks will also offer a number of ways to mitigate this issue, though, and if you are going to use an ORM you must thoroughly read the documentation to understand how it works and what options are available to accessing data.
That being said, the worst codebase I ever worked on was created by a team who felt that ORMs were “evil”. It was littered with N+1 select issues from developers trying to reuse snippets of code without thinking through the performance implications.
So I’d say it really comes down to the skill/experience of the engineers to use tools correctly rather than what tools you use.
The issue of pattern/anti-pattern is, I think, redundant. It's a little bit like calling something an "ism" to shutdown discussion.
The problem is one of use case: is using an ORM the best decision for my use case? I've seen both sides of that decision play out, and success and failure for each of those decisions. I've built ORMs in support of those decisions (1/10 would not recommend).
If the problem you have in front of you is a simple CRUD use case with Fat Client then it makes a tonne of sense to use an ORM. As other people mention, you may find yourself with a non-trivial problem created by the ORM; there are well defined approaches to those problems.
But, if the model you are dealing with is inherently complex, and there are a good number of business rules working across objects, consider not using an ORM and going down an abstraction layer so you have greater control over the system. It's more code; you may see some duplicate query logic here and there; IDE support isn't as good. But you get control.
The big issue I have when talking with engineers is an understanding of the trade offs and taking a business-rules/domain model first approach. Often it's just "I know SQLAlchemy so that's what I'm using".
One additional point of note: it is also very easy to break encapsulation with an ORM, meaning logic gets sharded between controllers, views, models, services etc. This is a personal bug bear of mine. A bad tradesperson blames their tools and all that, but it's something I see a lot, where engineers treat the objects as Records and manipulate state outside of the object boundary.
I love ORMs. I recently wrote a little script to scrape a website and put some data in a database. I used Sequelize and it's excellent. I don't want to write an
INSERT INTO blah (A, B, C, D, E, F, G, ...) VALUES (a, b, c, d, f, e, g, ...)
statement where, whoops, I transposed two values! Will SQL throw a cryptic error because the types don't match, or will it silently accept my awful query because the types do match? Perhaps I'll find out after a frustrating night of querying data that doesn't seem to make sense.
"But post-it, why don't you write a loop that builds a query string?"
I could! But if I'm already abstracting away the misery that is SQL, I may as well use a library.
"But post-it, with an ORM I can't do <X thing to speed up my query>!"
I'm never going to need that. 99% of people using SQL are never going to need it. They're going to INSERT < 100 million things and then they're going to do a bunch of SELECTS and some JOINs and they won't care if the query takes 150 ms or 50 ms. And if they do, they'll optimize and write that one thing in raw SQL.
Modern databases offer incredibly useful and powerful functionality that abstraction layers typically don't implement.
Postgres has so many useful features and functions that ORMs just don't usually surface - that's not what ORMs are about.
So it's not just about "learning SQL", it's about learning the database itself and what it has to offer.
Search engines, JSON in/JSON out, features like SKIP LOCKED for queueing, arrays, the list goes on and on, and if your view of the database is through an abstraction library you'll likely never even know what's in there.
So don't just learn SQL, learn the database itself.
No, good ORMs are awesome. As their name says, the primarily map objects and object-trees to relational database tables both ways. They provide a standardized and type safe way to do so.
They often include a query builder that generated SQL. An important feature for an ORM is, that you can inspect the generated SQL and that you are also able to use them with SQL queries. Some ORMs even let you mix SQL into their generated queries.
Generating code or generating database schemas from code is also a common feature, including migrations. It may satisfy your need, or it may not. So use it, or don't use it. It should be easy to change the strategy at any point.
The big mistakes only happen if the developers think they don't have to know or care about SQL anymore. That's just wrong, because an ORM is just a tool to work with a SQL database. Like the Tesla Autopilot, it helps you drive in many situations, but you still need to know how to drive, supervise it all the time and correct it in many situations.
It took me a long time to realise this but I discovered the only thing I really needed from an ORM was a type safe query builder. Essentially a drop in replacement for trivial SQL queries, which is most of them.
This article sums up the difference between old and new quite nicely. Prisma is fantastic for rapid development and type safety but doesn't require you to completely change the way you write code as it is still basically just a query generator, unlike traditional ORMs which use hidden magic to try to do everything for you with surprising consequences.
I don’t think you can call Prisma a “query builder”…
It introduces its own global DSL for describing relationships between tables, and then you query that. So if you need a relationship for any query, it’s now present in the model for every query. That doesn’t feel like a query builder to me.
A query builder should still look like the SQL if you blur your eyes. Prisma does not.
A different view on this is that relationships is a property of the data, not of the query. So it makes sense to represent that in a single place instead of repeating it for every query. Now, if multiple teams have differing understandings of what relationships are present in the data, they can go ahead and use separate Prisma Schemas.
i feel like i'm old enough to see every new generation of people come in talking about how ORM is the devil lol. balance in everything, i maintain that ORM for 90% of the people out there is the right solution (basic or complex CRUD apps).
is it the most efficient? a big no, but at most people's scale it's fine and nothing that a caching layer can't solve if you really need it. you can always drive down to the raw SQL as well.
i can't tell you how many times in my career i thought i was a purist for writing raw queries or using even a query builder and ended up creating some frankenstein "ORM" in the end. if you're app has any sort of complexity involving relationships you are in for a bad time trying to cobble this shit together yourself.
My favorite ORM is TypeORM.
A client insisted we use that, mostly because they were already using it.
Somehow, a few left joins led to ridiculous (talking tens of gigabytes of ram when the whole database itself was at most a hundred megabytes) memory leaks, blowing up the instances.
Turns out, it is a known issue [0], and you’re better off joining by hand.
I won’t complain though. I got a nice blog post out of it.
Now, this wont’ be specific to ORMs, and any dependency could blow up your service by leaking all over the place. But so far it’s been the only one to do this to me.
You say "known issue", and add a link to it, but that's not a link to a bug, it's a bug to a comment from someone that has implemented a workaround to some sort of problem. Then it's not clear if it's really a TypeORM bug or not (if it was, there would be clear aim from the maintainers to fix it).
It is a link to a specific comment on a GitHub issue [0] titled "Excessive memory usage when loading models with relations" where multiple users chime in and report facing the same issue.
Someone honestly trying to refute my comment in good faith could have looked around and easily understood this.
They would then have found another comment [1], mentioning a pull request [2] supposedly fixing this exact issue, with the explicit aim to fix "performance issues".
If that was not a bug, then I don’t know what is.
But then, I understand that this can be a lot to ask.
Personally, I believe the only valid criticism that could stand here would be that, not having bothered to check if it still was an issue, I perhaps should have instead used the past tense, out of precaution, instead of assuming the bug had been fixed two to three years after being initially reported. And I shall try to remember that.
> This is mostly false. ORMs are far more efficient than most programmers believe. However, ORMs encourage poor practices because of how easy it is to rely on host language logic (i.e., JavaScript or Ruby) to combine data.
The ORM encourages this, unfortunately.
> Now don’t get me wrong, ORMs are not as efficient as raw SQL queries. They are often a bit more inefficient, and in some choice cases, very inefficient.
Maybe a bit pedantic, but this and what's above contradict each other.
The problem is essentially this: you shouldn't use the abstraction until you understand what you're abstracting. But that's not how we write software typically.
It's more that ORMs are often perceived to be far slower on average, but in reality, they are not that much slower most of the time, and only far slower in choice scenarios.
But yes, agreed on your comment on general abstraction.
If the performance hit is seen as the overhead of the actual object oriented pattern (assigning fields to objects, caching, etc), then yeah, not that much slower, after all, the allocations would have to happen at some point. Patterns like DataMapper pattern present an alternate path that gives ergonomic advantages while not hiding the hard parts of dealing with state AND being incredibly performant.
But the real performance cost of ORMs has always been what typically gets left on the table: well written queries and schema. Poorly written schemas inspire poorly written queries which in turn completes the ouroboros by inspiring poorly written indexes. And ORM's like ActiveRecord encourage this. One easy example of this is the Paranoia gem as an addon to ActiveRecord, which encourages the misguided pattern of keeping around old data in your RDBMS.. There are many more examples.
I personally am thankful for ORM in Laravel's Eloquent.
I have requirements to always have a changing query which if done with raw sql can increase the maintenance effort drastically.
I speak this because I also am maintaining a legacy project which uses raw SQL and it takes a tremendous effort to even add a single feature which I know can be done in minutes with ORM.
Overall, I'm with the ORM gang, however, I also respect the other view and I know well enough the freedom they have with using raw sql.
But I guess it's fine for me to be constrained with ORM, especially with Laravel's Eloquent ORM. It greatly improves my productivity working as a software engineer, and I also get to know more edge cases of raw SQL by using ORM.
Imho, ORMs make the easy queries easier (terser) but simply do not cut it for complex queries. So you need some solution for that too.
You could drop down to SQL-as-string-queries, but then you loose all type safety on the queries. We use jOOQ (sort of LINQ on the JVM) which brings significant type safety to the database interaction layer.
Would I use an ORM if I had to start over? No.
Would I use a db + jOOQ if I had to start over? In many cases, yes.
To me they are not so much a anti pattern, but usually come with many anti-patterns, like coupling validation to entities to caching to ...
jOOQ was one of the first times in my career where I (a junior, at the time) went to bat and convinced my company to shell out for a commercially licensed library. Our application was basically nothing but complex queries; getting our ORM, Hibernate 3 back then, to do anything sane & performantly was a nightmare.
Those licenses were money well spent for developer productivity alone, but honestly they've probably paid for themselves multiple times over just in terms of "energy usage at runtime" cost.
It's not always ORMs, but it's a pattern I see over-and-over again, somebody that doesn't know SQL tries to abstract away all the uncomfortable (to them) parts of SQL. It always ends up as an unmaintainable and non-performant disaster. _It's relations all the way down._ If you're going to fight SQL at every turn, you're better off using something other than SQL to store your data. (I still die a little inside every time I open up a SPROC that's just a giant cursor. Won't somebody _please_ think of the sets?)
To me, the article doesn't make very compelling observations or arguments.
There are some interesting things to note about object-relational mapping:
* They map data to objects (so they are really tied to OO languages)
* They generally trade control for convenience
* Over the maturity of projects that use them, the convenience gets eroded, due emergence of system issues that require control to be wielded.
Honestly, if you are debating the use of an ORM, I would ask why. What is it that you are trying to get out of introducing an opaque abstraction between you and your data? It better be worth it, now and in the future, because it will probably take a good amount of effort to move away from it later.
The trade of control for convenience is usually a good trade at first (during prototyping and early stages of feature development). Later, there is usually an inflection point where it becomes more of a hinderance than good. The trouble is, at that point, migration away can be too risky/costly to justify.
Finally, I have noticed that, generally, mature, well-functioning code bases tend to move away from traditional OO-patterns. Specifically, code and data tend to separate as code bases grow and age because it is easier to reason about. This tends to diminish the value of ORM further.
Overall, I've moved away from ORM on the whole, in favor of query -> struct mapping.
My prior/current experiences with most ORMs are that developers feel like they need to know less about how a relational database functions than they should because they can focus on the abstraction. This leads to schema designs that aren't ideal or not understanding where it is efficient to perform an action (should this be filtered in the database or application, when does a query need to be optimized, where is caching happening, indexes?).
Great for initial productivity, but without monitoring, can run into some serious issues down the line. I've seen lambda expressions hitting ORM interfaces where it wasn't immediately obvious how much data was being grabbed, cached, and pulled back. Then when the database was moved just a few milliseconds further away (network hops) from the application server, it caused significant issues because of the number of round trips happening magnified that small latency increase.
This is ORM specific, but not all of them are great at handling transaction blocks. In such cases dropping into a database stored procedure may be ideal, but then again you are no longer using the ORM and having to take your object and do mapping into a different structure (the stored procedure).
My personal preference is for just enough quality of life tools around the database access. In Java land, jdbi for quality of life plus flywaydb for migration (written in raw SQL) is my current preference. jdbi fixes many basic mapping/binding issues JDBC doesn't handle and flywaydb provides a good foundation for keeping environments in a consistent state.
It never was. Using ORM made the pool of candidates to work on a codebase bigger, and it made applications better on average (not because of some ultimate performance or security metric but because it evens out all the horrible hacks).
In a vacuum, on a super specialised team or in an academic setting it might be something one could write such an article about, but the average CRUD-app-programmer tends to not have the bandwidth or specialisation to spend effort on manual data wrangling. The same applies the other way as well: as soon as saving object-level details or fetching related data becomes something that requires thinking, plenty of junior and medior engineers end up forcing loads and saves by hand. Not because it's the best way or the only thing they could do, but because it works, it delivers the features and makes the company money, and everyone else also working on the code understands what is happening. In a way, 'bad' code that works and can be worked with is not all that 'bad'...
If you're using JS or Python, the SQL call just returns an array of objects/records. No muss. No fuss.
If you're using something like Sqlx in Rust, the SQL is not just checked for validity, it seamlessly assigns the result set to the exact type you define for that use case.
ORMs make basic CRUD fast and easy while making everything else massively more complex. If all you ever need is CRUD, I guess that's alright. Many of us work with projects that far exceed it though. And for that, you need to know SQL and not just some half-baked SQL punch through the ORM deigns to afford to you.
You can use cqrs to just build simple reporting views that orms get and set and for your real concurrency boundaries build aggregates that hydrate their state from all the events that have happened to it. Handles complexity to the nines and gets your reports and screens wicked fast at any scale you can imagine. You won't need to join much or optimize queries. Maybe figure out a few non obvious indexes.
My main issue with ORM's is it tends to treat the persistence of data as an after thought rather than being the really important thing.
The data is often much more significant that the app - in two ways - the value of the data outlives the life time of the app, and often the data has connections or uses beyond the app.
ORM's tend to result in an app centric, rather than data-centric view of data persistence.
Now you don't have to use it like that - but even if you design the schema and mappings yourself, you still have the problem of the ORM assuming it soley owns the access to the data - and if you turn off that assumption performance is often terrible.
If I had an app with modest performance requirements, where the data and the app where one - then that's probably the ORM sweet spot.
But if the data is important, has multiple uses ( and so you might want multiple views in terms of objects ), and will likely outlast the application then I'd stay well clear.
SQL is already a powerful abstraction - learn it instead.
One thing I really like is using Entity Framework inside of LINQPad (C#) for ad-hoc sysadmin-style queries.
They are much, much easier to write; and because most sysadmin-style queries aren't performance bound, it hits a great sweet spot where the ergonomics of an ORM are a win.
I'd really like a universal ORM, though. I'd like to be able to write queries in a SQL-like language and have my classes automatically generated for me, no matter what programming language I'm using. I'd like to write joins in a SQL-like language and have the data structures automatically populated, no matter what programming language I'm using.
I think this exposes the real flaw in ORMs and SQL: ORMs are too tied to the programming language, and SQL is too tied to its own domain. For example, if I could write a set of joins that returned structured JSON, and a JSON schema, perhaps some of the ORMs could be replaced by a simple JSON parser?
> I'd like to be able to write queries in a SQL-like language
You can do this using Entity Framework Core.
var query = from obj in context.Objects
join objChild in context.ObjectChildren on obj.id equals objChild.ObjId
where obj.Name = "SomeName"
select new {obj, objChild}
var res = await query.ToListAsync()
This will give you a list of anonymous objects. But you can also just have the query create non-anonymous objects.
I really think this depends on the situation.
If you have an objet with foreign key relations to a lot of other objects then using lambda syntax is most likely preferred:
var res = context.Objects.Include(x=> x.ChildObjects)
But if you have to do joins then definitely query syntax is much clearer. For example the join above in lambda syntax looks like this:
Comparing a JavaScript ORM to a C# ORM like Entity Framework feels like comparing C macros to Lisp macros.
You can't really have a reasonable conversation about ORMs without defining the feature set your ORM has. If an ORM doesn't apply type safety to SQL queries and translate typed code expressions to equivalent SQL expressions, it's not performing two of the key functions I associate with ORMs. If I didn't take these features for granted, I would probably question how useful ORMs are too!
I think the reason it's so hard to have a conversation about ORMs in general is that we're often lumping radically different systems together and then making a blanket judgement. Again, it's like working with C macros, getting annoyed with them, and then judging all macro systems to be bad.
I feel like if your query ends up being quite large in complexity then maybe it is a time to create a view or stored procedure depending on the use-case and have ORM call that one instead.
To me ORM is quite useful to have some data classes that I can use with it to automatically handle various generic operations and maybe some relationships. Many ORM frameworks will also auto-create tables with the relationship constraints, provided you setup your data classes correctly.
The moment you start linking up too many ORM operations together makes me want to abstract it away into a view/procedure. It will be easier to debug that view/procedure rather than inspecting what your ORM has generated and how to fix it.
I am interested to hear if someone has other insight into how to utilize ORM frameworks properly.
Unless you're on Java. Just need three columns out of twenty? That's too bad. The JPA spec mandates you pull all the columns in that table.
But fundamentally on any programming language, all popular ORMs treat the database like it's MySQL 5.7—the bare minimum in modern functionality. It's like owning an Audi but being told you have to drive it like a 1971 Chevy Vega—a lowest common denominator.
I think that if you want the most out of a RDBMS, then ORM is never the way to go. The problem being that objects and database rows are not the same thing. Not even a little bit. If you squeeze that square peg through that round hole, you shave off 80% of the point in doing so. Object Relational Mapping is basically a contradiction.
If you are only using the subset where the cost of using an ORM is negligible, you're probably not really using the RDBMS to its full extent and maybe you should consider a document oriented database instead. Relational databases are stupidly powerful if you let them do their thing. Kind of a pity to reduce them to a k/v store.
That said, there's totally a segment of CRUD:y non-performance-critical applications where ORM is a non-terrible choice.
They have operations that objects do not, such as joins and projections. It's a completely different paradigm for dealing dealing with data.
You can take an "object", pick parts of that, take a few parts of another related "object", perform some manipulations on those selected "fields" and view that grab bag that as though it was a whole new "object". There is really nothing like that in OOP.
In short, you are very very unlikely to ever change your database.
DBs are full of useful features that an ORM hides.
Instead, I now pick a DB that has the features I want and then use the hell out of the features.
I remember a pre-ORM time of so many manually created db side configs, stored procs, views, triggers, etc. that a DBA would have to manage. I think these unmanageable messes were one of the catalysts to getting people on board with ORMs.
To avoid going back that far. I make sure all config is part of code. Managed by migrations. Or otherwise checked into a repo and managed by a CI/CD pipeline. Now I get to use a full featured DB again. Actually use all its features. And things don’t turn into unmanageable messes.
If the code is handwritten or not does not really matter when you are looking at the generated execution plans and so on for troubleshooting.
I worked with several systems not using ORMs that still have the N+1 problem. And if a schema generator also is used it won't forget to add indexes.
My main issue with ORMs is that schema changes over time becomes more and more scary. Another thing is that many libs comes with a bit of magic. A seemingly innocent change can move things from SQL to the client or the other way around. Ending up with large performance impacts.
EDIT: And probably 100% of all the security related issues I have fixed in SQL would have been avoided with an ORM framework.
I can’t speak for all ORMs but I’ve found over the last couple years that I really like Mongoose. It’s proven to consistently be one of the least troublesome parts of our stack, and once you get to know the ins and outs, it provides a ton of useful things that are organized in a very logical way. Custom query helpers in particular were a revelation for me. Being able to reliably compose different pieces of a query to standardize how you access things across different parts of an application is really cool and incredibly useful.
I didn’t choose to use Mongo, but having Mongoose has made it mostly pleasant, at least after taking the time to really understand the library.
You don't have to listen to what other people "consider harmful" or think are "anti-patterns" or "clean code". You don't Gotta Catch 'em All the design patterns or figure out if coffeescript is the flavour of the week or plug in event-sourcing because it was on HN yestreday. You can just solve the problem you have. Nobody will know. I don't why programming is so vulnerable to these memes and false consensus and blog fad driven architecture.
I have some styles I like and I use them. Some of them aren't compatible with each other so I don't use them all at the same time. I don't wait for John Carmack or Bob Martin to tell me if it's okay. You shouldn't either.
On the front-end you have a new javascript framework every 15 minutes, on the backend you have this. But I'm with you, I'll use what works for me until it no longer works for me
Unfortunately, this sentiment can rarely be expressed before someone has come full-circle. An apprentice cannot have that insight. A journeyman is too busy DRYing things up and peppering their code with patterns. It's only once you've felt the pains inflicted upon you by all those "one true ways", when you've survive the betrayals, only then you can finally accept that there are, in fact, no solutions, only trade-offs. Congratulations, you've become a veteran.
Despite all the good souls trying to turn programming into a discipline with sharp rules and well defined constraints, it remains a craft. Part prickly, part gooey. As far as I can tell, if it has tenets they sum up to four, in that order: code has to be correct, bug-free, reasonably efficient (i.e. enough for the problem at hand), and maintainable. The last two bits lie in overlapping shades of grey. You either learn to be comfortable with it or you will yourself to make black and white observations, which you turn into rules and patterns (or anti-patterns, as the case may be). Then you write an article or a book about them.
Very well said. It wasn’t always this way, but it has been long enough now that I don’t see it changing while I’m still around. I’d do anything else if I could earn even half of what I make.
So, in summary, ORMs are only bad when you use the ORM parts and not when you mostly ignore the ORM parts and only primarily use the query generator and basic abstraction parts.
Well yes, that's because whenever things get hairy, you're no longer dealing with an ORM, you're dealing with the query generator. You don't have to translate your complex graph queries into their relational variants by manipulating the output of an ORM.
And I have to agree that query generators have a hell of a lot more value and are way less likely cause trouble than ORMs.
So yes, ORMs are no longer an anti-pattern, because most modern ORMs aren't really ORMs anymore, and when they are, they either swallow the performance hit and resulting issues when trying to compensate or don't encourage you use the ORM parts except for very specific cases, and instead encourage you to use the far more powerful query generator parts and all the random mild abstraction parts.
The question left is, is it better to just learn SQL including the specific variant of SQL you're targeting, or is it better to instead learn a specific query generator and its database specific parts?
Addendum:
Really, at this point I don't think it matters what people use, the number of people who actually know how to design a database is insignificant compared to the people who are tasked with designing them. I am now all but convinced that the reason ORMs are popular is because they make it slightly quicker and easier to deal with a fucked up database design than having to do the same with pure SQL or a query generator.
The thing I always disliked about ActiveRecord ORMs is that you need to learn "their way" of writing SQL too much, because everything beyond the "Getting Started" tutorial is too complicated to use the built-in query building features.
The thing I like about ActiveRecord ORMs is the way you can iterate over a data set. The thing I dislike about how this works in practise a lot of the time is that it takes up quite a bit of memory because the relationships are built and held as objects.
Another thing I like about ActiveRecord ORMs is that I can have classes that have functionality, but get their attributes from the database. The thing I dislike about ActiveRecord ORMs is having to create a class for each of my database entities even if I don't need them (and/or having classes automatically created for them, still just as bad if you ask me).
I created PluSQL, a non-ActiveRecord ORM because I prefer to write SQL most of the time, but want to minimise the amount of boilerplate, and also because I like iterating using objects but don't like holding the entire object tree in memory and only want to create classes to represent a database entity when I need one:
The thing I always disliked about ActiveRecord ORMs is
that you need to learn "their way" of writing SQL too much
Yeah. I see people straining to write complex queries in their ORM and I think it's just insane. At some point you're going to have to understand and debug the generated SQL anyway.
I think Rails' ActiveRecord implementation is pretty ideal because one of the explicit goals was to get out of the developer's way and allow them to write raw SQL when needed. I don't know how well other ActiveRecord implementations fare with this.
I created PluSQL, a non-ActiveRecord ORM because I prefer to write SQL
Cool! I'm not a PHP dev but it looks neat.
Even though I just defended ActiveRecord, I kind of miss the days of thinner data mappers / ORMs.
It’s not an abstraction, it’s an indirection, and that is why so many people get frustrated with using them.
Stick to the conventions and patterns and you’re happy: you don’t have to bother yourself with making thousands of decisions about schemas, relations, indexes, constraints, naming conventions for all of these things, patterns and so on.
They wouldn’t be so frustrating if they were a proper abstraction: if they had a denotational meaning and simple, elegant theorems about them could be established. But they don’t have that: you get a layer of indirection between a parser and some structures in your code. You get an evaluator that builds strings from some definitions and expressions. Nothing about them is formalized and gall of it is convention and hand-waving. So the way each library does it is different and how it works is left as an exercise to the reader.
So is it an anti-pattern? I think I agree with the article: it depends. If your data follows common, repeatable patterns then it might be best to take advantage of an ORM to generate and manage a bunch of code. They’re not bad-by-default.
But I’ve been around since the 90s and people have been writing ORMs, swearing off ORMs, and writing new ORMs in waves. It’s never going to end I suspect until someone formalizes it and builds a proper abstraction by giving a denotational meaning and proving theorems (which some folks have been working on by using category theory as the underlying formalism).
Until then I tend to avoid them myself as I find that many applications I work on have access patterns and structures that don’t fit the mould.
The abstract concept of ORM is fine, unfortunately the particular ORMs we use suffer from pretending all objects and their relationships exist in RAM, there's a single instance of every entity and it's always up to date. An abstraction that isn't merely wrong, but falls apart in your face at every opportunity.
This problem is not unique to mapping objects to SQL databases, but mapping objects to anything remote at all, say a GraphQL or a REST API.
OOP as we presently interpret it is an inherently synchronous, reference (or handle, rather) based paradigm, that only works when all your state is local and in RAM.
This is why distributed object protocols keep failing. They'll keep failing until OOP programming reorients to use values for messages and makes references explicit, so their impact is seen and felt (and it's especially seen and felt when you reference an entity on another machine halfway around the world, in terms of lag, fragility, eventual consistency and everything).
We see hints of this with value types in Swift and .NET, unfortunately I'd say rather rudimentary so far. But it's coming. The ideal example of such a set up is Erlang. A language that Joe Armstrong has called "probably the only OOP language in the world". A statement Alan Kay agrees with (source: https://www.quora.com/What-does-Alan-Kay-think-about-Joe-Arm... ).
Maybe I'm mistaken, but I think someone who is already using a relational database through an ORM can learn enough to make powerful queries in an afternoon.
Let's see what we need:
- SELECT a,b,c FROM table WHERE condition : 1 hour
- INNER JOIN, LEFT JOIN : 1 hour
- GROUP BY, HAVING, aggregation functions: 1 hour
- subqueries, EXISTS : 1 hour
- recursive queries : 1,5 hours
- end of the afternoon
You'll probably need to practice a little after that intense afternoon, but I do think that after learning just the five points above, you can become the most knowledgeable person about SQL in your team.
You originally said "everything you can do with it" rather than just the basics though. Those goal posts moved pretty far.
I agree with learning most of your examples in an afternoon, but really knowing SQL takes a hell of a lot longer than that. In fact even the most SQL shy ORM preferring devs I've worked with still know those basics.
Where do you work? That must be a great place! Most devs I know, even talented ones, have never done a recursive query, and a lot of them don't even know it's possible.
There is a common problem - some fundamental technical problem is complex and has a steep learning curve. In comes the ORM or Library or Plugin or ... These things are all layers on top of the fundamental technology that simplifies the underlying thin. Or that provides a layer that hides the complexity of multiple implementations of a technology. So you adopt it. And you get running. Then you run into a problem with the layer. It does not implement a fundamental part. Or it has a bug, or the implementation you need is not covered.
At this point you try to fix the ORM or adapt to it or fix it or something. Because you still don't know the underlying technology.
I'm not saying this is a mistake. Sometimes the ORM works and sometimes it does not. It depends on how fundamental this technology is to your product and a host of other issues. BUT! It is always worthwhile to learn and understand the fundamental technology. SQL is not that hard. It is worth understanding. Even if you want and can use and ORM, not knowing SQL will hurt you. And knowing an ORM is just an extra thing to learn.
I actually like SQL. I find it hard, a different way to think, but fun. I use React a lot. It helps except when it doesn't - and then being able to work in html, css and POJ saves the day. Even though I never code in assembly language (ever!) knowing how registers, CPU's and instructions work makes me a better coder.
So when faced with the choice of a simpler way to do something that provides a layer over a fundamental technology, I am now extremely skeptical. Sometimes it works out, but most often it is not a good choice for me.
I can see that people have already heaped plenty of fuel on the fire, but..
The amount of time I have wasted babysitting Microsoft's EntityFramework stuff :-/. That is stackoverflow-googling 'why does EF suddenly behave this way, and how to I get it to stop?'.
In my most recent adventure, it had suddenly started to fight with some of our foreign-key delete-cascades. That is, even though EF had itself generated migrations that told SQL-Server to use delete-cascade, this would catch EF off guard, and it whould suddenly scream 'there used to be rows in this table but now they are gone!'
Stackoverflowing suggested to bypass it by specifying an alternate delete-cascade policy (presumably one where db was allowed to cascade and client would pretend everything was OK.)
That .. sort of worked, but now our unit-tests were failing, because they didn't like that new cascade-rule..
So now we have conditional compiles, where our unit tests compile with the rule set one way, and production code sets it the other way.. I have no idea what our tests now claim about our code, relative to production..
Another favorite of mine: EF will sometimes generate constraints on your tables or columns, that can't really work.
But the generated migrations are generated with drop-reversers that don't work
(ie if you try to execute them, you get SQL-errors.)
BUT, EF also generates random names for these constraints, so you can't just drop them by hand - you don't know what they are called,
and they will have different random names on different db instances!
So you (that is, I) end up having to write convoluted stored procedures,
that query the db for 'what kinds of constraints do you have that satisfy these requirements, what are they called?',
THEN you can remove them in an order that makes sense
(instead of the autogenerated nonsense that won't execute..)
Oh, and EF's code for handling its own migration-table used to be buggy;
they used to have a column they later got rid of,
and some of their code would fail on this column either being or not being there.
My final gripe (for today) is that their migration-table doesn't include a timestamp for when the migration was executed.
This has caused me headaches, because sometimes bugs in client data would depend on, whether a migration had happened early or late,
since it would determine against which version of your db model client data was processed. sigh.
1. inability to represent arbitrary query results. Results get mapped back into 'domain-objects'. Now you've removed yourself from the extreme expressiveness of the SELECT statement as well as GROUP BY, pivot tables etc. etc.
The best you can hope for is domain objects that are only partially initialized with values.
2. lazy loading of attributes - certainly configurable, but if you rely on SELECT statements being generated while you are navigating a 'persistent' object, you will run into unexpected performance issues.
To solve those, you then start to eagerly load domain data for certain access patterns. More annotations, more configurations and more ... stuff - where a nice SELECT query would have sufficed to get the data you need.
3. Troubleshooting is a pain. Especially if you rely on dynamically building queries. Have fun getting the actual SQL that was run and then translate back and forth between SQL and the query builder syntax to optimize it - if you can even optimize it.
4. Doesn't treat data as data. First of all data coming from SQL queries is stale the moment it arrives in your objects. Trying to pretend otherwise and having an object graph that pretends to be the living counterpart of the state in your DB is foolishness.
5. No/limited control over writes/updates.
The moment you rely on your ORM to create UPDATEs and INSERTS to reflect the state of your object graph back into the DB, good luck with your chosen locking strategy and good luck optimizing those. Pessimistic locking is the best worst you can hope for.
Stay away from ORMs (as in Hibernate and its ilk) and treat data as data. Use structs/records.
> 1. inability to represent arbitrary query results....
You can either use any SQL or HQL with hibernate.
> 2. lazy loading of attributes - certainly configurable, but if you rely on SELECT statements being generated while you are navigating a 'persistent' object, you will run into unexpected performance issues....
Some ORMS can detect and actually solve n+1 query problems. And, as with SQL, you can just specify what you want to eager or lazy load. If you don't like that, you can always use SQL with your ORM.
> 3. Troubleshooting is a pain. Especially if you rely on dynamically building queries.
No it's not.. Hibernate can check your queries for syntactical problems, which helps testing. The query-builder is just that... a query builder. If you really need to optimize certain things and need some vendor-specific SQL comments to give query hints, you can fall back to using SQL, and STILL use all the benefits of the ORM.
> 4. Doesn't treat data as data. First of all data coming from SQL queries is stale the moment it arrives in your objects....
The whole point is that it maps your relational model or objects. But if you don't like that, you can still use the framework to retrieve dataframes / rows. It's not foolish to think about an object model.
> 5. No/limited control over writes/updates....
Again, you can use the ORM to update, and either skip or use model validations. You can run update queries, and if you really want, you can just use SQL.
Your whole post scream someone who's comfortable with SQL (good thing, I am too, most developers are not), but who has played with some ORM for like 2 hours, and got frustrated with the functionality and didn't take the time to actually read up on the docs.
The other type of person with this stance is a front-end developer on some huge site, which needs to squeeze all the performance out of the CPU, where the requests are mostly read-only, and don't really do anything else.. For example showing some posts on your Facebook homepage. The *only* downside of an ORM in that case is some cpu and allocation overhead of mapping your data to objects. In that case, use rust instead of php ;)
1) I'm querying JSONb for that from postgresql and then the conversion to Ruby types is trivial.
2) Parameterized queries and you don't have SQL injection
3) The migration tool is a script of a few lines (bash in my case). It has a "migrations" table, it reads a list of migration files in a migration dir, check which ones don't exist in the migrations table and runs them in order. A migration file is just a .sql file
My whole setup is a few lines of code. A very small library covering the first two, a small script for the third one.
I get it, but I want to use SQL. It's a lingua franca and has a lot of power for writing migrations that I'd rather use than inventing my own.
That said, it's awesome to hear from someone using jsonb with migrations. Most places I've seen using jsonb push responsibility for handling different schemas onto the application, which <shudder>.
I think I expressed myself in a confusing manner.
I write _SQL_ for migrations and nothing else. What I meant was that in order to _run_ the migrations it's enough to have a simple bash script of a few lines (which basically calls psql, ls and feeds the sql migration files in order).
I don't use jsonb with migrations. My migration files look like
migrations/1645698230_create_orders.sql
create table orders (.... );
migrations/164595623_setup_some_trigger.sql
create trigger...
--------
In response to "Elimination of boilerplate to load data from a table into a type"
Unrelatedly to migrations, when working with postgres, I use postgres'json capabilites to get nice nested structures that I can convert to any other language easily (or even send it to the browser if serving a webpage). Instead of using tabular responses, I get json structures _directly_ from postgresql.
Very often I just write SQL queries that return a single row with a single json/jsonb field that has the whole JSON response to be served to a web endpoint. Basically I write most if not all my webservice in SQL. And I just have a very thin layer of a server language to work as a proxy between a frontend application and my database application (written in SQL).
ORMs are like any category of tool. Some are good. Some are bad. Either way, you need to master it (in additional to SQL) to get the most of out it.
These days, my approach is to start with some repository interface/abstraction that exposes well defined access patterns/operations and models. Then I hand code the implementation with just normal SQL that I hand code. You can get really far with this. Eventually, if it starts feeling unwieldy, I'll look at reimplementing it with an ORM. This rarely happens. In fact, its only really happened twice that I can think of.
That said, if I joined a team that was set on using a particular ORM and they had experience with it, I wouldn't think twice. The only thing that is a deal breaker for me is using any ORM feature like Hibernate's auto flush insanity (something that figures out which objects you've dirtied and automatically generates updates and inserts when you close the session). It will be very difficult to ever untangle that if you need to. Far better to explicitly have to persist things than rely on some auto-magic like that.
I don’t really understand the N+1 issue. Most ORMs I’ve used make it very easy to generate the exact query you’d generate without it, with the added benefit that you define the relationships at the model level. This helps a lot when you need to make changes like renaming a field without having to worry about propagating these changes in every query and migration in your app. When you build a statement with an ORM, you can also output it and tweak it to be what you would write as raw sql. You can look at generated migrations and tweak your models until migrations are efficient, etc. if you’re generating N+1 queries I don’t know if you would do better with raw sql. Besides, most ORMs today let you switch between 3 levels, default ORM, query builder and raw sql. You also get to choose if it converts the results into your models or if it just returns raw results. Maybe I’m missing something but so far it seems like the problem with ORMs lie on not understanding when and how to use them?.
Someone should tell me how they think relations map to objects.
At runtime, you submit a query like "SELECT foo, bar FROM baz". At compile time, you make an object that can hold foo and bar. This, unfortunately, is backwards. Runtime happens after compile time, but runtime is when you know what objects you would have needed to compile to understand the result.
I have a feeling that ORMs have most of their popularity in dynamic languages where there is no type system. You get whatever the database thinks you should get today, and defensively code around that (or crash hard when a query starts returning an integer instead of a string).
I feel like the maximum amount of syntax sugar you can put around a database is something like sqlx.ScanStruct. If you have a query that returns "foo int32, bar string", then sure, an API like `result := struct{ Foo int32, Bar string }; row.ScanStruct(&result)` seems fair to me. Anything more than that, I think you're just creating problems for yourself.
P.S. I have no idea what the link has to do with ORMs.
I had an irrational hatred of ORM's for a long time until I used them for a while.
They are great for CRUD applications, they simplify a lot of your application boilerplate.
Yes, you still should know and learn SQL, and know how your database works, and yes for any kind of complex query you will need to reach for raw queries, but that's fine.
We are definitely going to look back on this era and wonder why we stuck to archaic tech for so long.
It's clear that the database needs to be more integrated in our applications.
SQL is not a good fit for most apps. The reporting/analytics features are great, but that's not what most of our apps are doing these days. This kind of stuff is moved off into other databases most of the time anyway. The core idea of the "relational model" is not a good fit for most apps. Interactive user interfaces are about listening to changes in single models...we don't really care about "relations". Relations are forced upon us to allow automatically optimizing query execution in the database. But this optimization doesn't take into account what other queries we run, so we have to handle caching ourselves, and completely breaks reactivity. People fight against the query planner all the time.
I think the solution is to have control of the query plan.
Without a database, people write code that is similar to what a database is doing, often without realizing it.
Take: `users.map(user => ({user, posts: postsByUserId[user.id]})`. We have created an index, and we're doing an apparent hash join.
Now this usually turns into a tangled mess because of all these adhoc hashmaps we create and pass around. You will see this kind of spaghetti all through modern web apps. The more of these adhoc transformations you have, the more complexity you have because we don't have good tools to visually trace these transformations throughout your app. Your app becomes hard to change. But it also gives us a lot more flexibility when it comes to caching query results.
What would be nice is if our sql dbs were broken into layers, and we had access to the query planner so we can write reactive and cacheable queries. And also that our sql dbs could be run in the same memory space as our app code...which means running in a web browser.
If you want to persist and query data structures, there are LOTS of non-relational databases that do that (e.g. Mongo).
The "relational model being not a good fit for most apps" is a 40 year old argument, and was wrong then and wrong now, it's like saying "mathematics is not a good fit for X". In practice, fighting the query planner abstraction via query hints can be needed, but that's a pretty minor pain IMO. Inevitably for any long lived app, future requirements lead to unpredictable patterns of access or update occur and this is where the relational model has always shined.
Frankly, I am cynical. ORMs produce bad queries, but the SQL queries I've seen in some code bases are also nothing to write home about.
If the choice is between terrible spaghetti-string-concatenation-manual-sql-from-hell and sometimes-inefficient-but-fixable-when-needed ORM code, I prefer that people stay with their ORMS.
In the world of Java enterprise applications it is not anti-pattern but a best practice. It is not just encouraged but often mandated by Enterprise Architects. At least in my experience of 18 years it brings the developers who are neither good in SQL nor in Java and result in stunningly crappy applications.
It's common to isolate all database calls to a specific part of the code. Usually each potential database operation has its own function, such as "update_user", or "get_spending_report", etc. This is good.
However, I think it's an anti-pattern to use both an ORM and still isolate your database code. It's like a double-abstraction where only one is needed. You have a write a bunch of CRUD functions and you have to use the ORM inside those functions. ORMs are complex, but they can be convenient to use, but if you're isolating all the database calls then the convenience of the ORM is wasted inside those half-baked CRUD functions. You end up getting the worst of both worlds.
So I say either (1) use raw SQL isolated inside CRUD functions, or (2) use an ORM and use the ORM abstraction freely throughout your code.
Seems like most people are commenting in the context of web apps but I find ORMs are most useful on the client side (mobile/desktop apps) where objects can be long lived and you want to take advantage of ORM specific patterns like identity maps (https://martinfowler.com/eaaCatalog/identityMap.html) so you don't have multiple conflicting versions of an object in memory.
If you're writing a web app and just pulling data from a database and dumping it to HTML or JSON then yeah it matters way less whether those database rows get materialized into objects before getting rendered to output, because you're throwing the objects out at the end of the request anyway.
Martin Fowler's Patterns of Enterprise Application Architecture has a section called "Data Source Architectural Patterns" with some data access patterns. It doesn't explicitly call out "ORMs" but they tend to combine multiple patterns. It also has a section on "Query Objects" which cover some of the method -> SQL magic ORMs do. It also lists pros and cons of these patterns
As with many things, ORMs come with pros and cons. Imo they can be especially productive for simple CRUD applications and prototyping but may quickly increase complexity with object or data models that use more complex query patterns.
As a fun exercise, I suggest trying to build a mapper that gets a URL with the `type1/id1/type2/id2/type3` pattern and spits out a set of objects of type3 that have a parent of type2 and id2, under a type1 with id1. Now do that with arbitrary types and depth.
With a good ORM, this is so trivial to write that, most of the time, you don’t even need to: you can rely on decorations or any similar mechanism. I have codebases that have barely any code devoted to implementing that, and that will happily build the query on the fly for any type in the model it’s allowed to use.
Some problems fit ORMs really well. Some don’t, of course, and it’s fine - knowing when to use a tool and when not to is key with any tool.
I believe the underlying truth and cause one uses an ORM is the implicit mapping from tables to models/types and the near hidden exposure to the database.
There is, I believe, a positive gain with the use of a slim ORM which at the same time exposes an API for querying with real SQL.
Dapper/Linq/Fluentmigrator is my favorite stack for interacting with a database from system code. It's slim, fast and very near the database.
Whatever the arguments are around SQL and hiding it, I believe that to be the real anti pattern. If you must interact with a database in your system, make the use of SQL as absolutely easy as possible. It's a great nd immensely powerful language which no ORM can substitute in a nice manner.
SQLAlchemy is one of my favorite libraries of all time. I actively choose Python for projects because I get to use it. If you’ve never used this ORM you don’t know what you’re missing. And I say this as someone who for years actively campaigned against the pattern.
ORMs are fine so long as they are used exactly what they are intended for, which is to make your job easier. As soon as the ORM [inevitably] gets in the way, just bypass it and write straight SQL.
What drives me nuts is when I work with engineers who insist that we do all database queries through the ORM, no exceptions, just in case (TM) we someday switch from MySQL to PostgreSQL or whatever. That day has never come anywhere I've worked. It's generally a fantasy.
If you're going to invest in a tool, invest in the database, not the ORM. You're more likely to save time writing complex queries in SQL than in the highly unlikely circumstance you need to suddenly switch database technologies.
Yup. And I love when there’s a well-built escape hatch for this in whatever ORM you use.
In Django you can just write your own SQL for a query and then say, “trust me, this query will populate this structure.” So the moment you measure performance and it’s a problem, you can just opt to DIY without having to rewrite much of anything.
And of course if you need to go further, go nuts. There’s nothing locking you in.
Overall, I think the article is pretty good. ORMs fall under the same category as garbage collection or try/catch exceptions: it is a tool. Tools solve a lot of common cases, but that doesn't relieve the developer from understanding what is actually happening under the hood.
* Garbage collection solves pointer problems, but the developer has to know what is being allocated and when to prevent churning memory.
* Try/catch exceptions solves breaking the tie between where exceptions can occur and where exceptions can be handled, but the developer still has to determine if all of the exceptions are being handled.
* An ORM solves relating objects to data persistence, but the developer still has to know what relations weren't covered in the original query after the objects are used.
> Active Record struggles with this, and that’s why we refactored our billing
> subscription query. Whenever we got results we didn’t expect, we’d have to
> inspect the rendered SQL query, rerun it, and then translate the SQL error
> into Active Record changes. This back-and-forth process undermined the
> original purpose of using Active Record, which was to avoid interfacing
> directly with the SQL database.
This is where I disagree. ActiveRecord was meant to map objects you have to your relational database, hence Object Relational Mapper. It is not a reporting interface. But it is reasonable to get around this. One way is to get a lot more specific in building the query with `arel`. Alternatively, open the raw connection and execute SQL with `ActiveRecord::Base.connection.execute`.
The disconnect here is believing that ActiveRecord solves billing/reporting SQL statements. It does not. It solves mapping objects. If your access pattern doesn't match (reports, N+1, GraphQL) then you have to understand the tool well enough to create your own mapping solution (custom SQL, preloading/scoping, data loader).
I remember when Java was considered really, really slow. The guy used a code example to prove a point. I rewrote the code to stop relying on the GC allocating/freeing data structures over and over and sped the program up by 400% on that alone.
I don’t mind query builders. They help you fall into the pit of success and reuse useful queries. The thing I don’t like about ORMs is how you get back a graph of objects, and you can mutate any of them and call `.save()` on any of them. They pretend to be a live representation of what‘s in the database, but they are a proxy. And so when saving, the ORM has to sync the alterations you made to your objects with what it believes is in the database. It treats database tables as just a bag of attributes.
I prefer to work with values and be explicit about what’s been read and what needs updating. If you need optimistic concurrency control be explicit about that too.
In my view, ORM replace some type of bugs (SQL injections, wrong table/column names) with different type of bugs (inefficient queries, hard to debug).
One of the things I really do hate about ORMs is that they usually try to be DB-agnostic, thus preventing you from actually leveraging powers of the underlying database. And that they always end up being really complex.
On the other hand, I hate string manipulation (that you sometimes need to do if you do some types of queries) and in general that by dropping to SQL, you are switching to a different language, that kind of sucks (nobody sane actually enjoys SQL syntax, I think).
I use TypeORM for Node.js, and it's extremely frustrating and littered with bugs, but even then it still saves me time vs. using SQL for everything. Automatic table synchronizations are a huge time saver. Even if one doesn't use an ORM, I think one will eventually end up rolling their own, so why waste that time? Anyways ORMs have query builders, so one can always write SQL queries when necessary. I just wish TypeORM's query builder was just knex because knex is far superior to the TypeORM's bug-ridden querybuilder.
Anyone who eschews an ORM just end up writing their own half-baked badly documented ad-hoc ORM. “Jusr use SQL” is not an alternative since you still need to map the result into a data structure.
> But after some meditation on the matter, we’ve reasoned that ORMs do not suck. They are simply an abstraction with a quintessential pro and con—they abstract away some visibility and occasionally incur some performance hits. That’s about it.
This isn't an insight. Yes, everything is a tradeoff, but that doesn't mean all tradeoffs are equal.
The higher the ratio of abstracted away complexity to the simplicity of an interface, the better the abstraction. For ORMs, this ratio is low. They abstract away very little and present a huge interface.
ORM isn't the anti-pattern, but rather the current way we do OOP. Object orientation is a pervasive, powerful paradigm, but our current approach only works for in-process, highly interlinked, synchronous objects.
We need to move to OOP more similar to Erlang. Message based, where the messages are values, not references to references of references, which is the case with Java-style objects (you work with an object via a handle, which is in fact a reference to the object, allowing mutable shared state).
I purely use EFCore for managing migrations easily, almost all queries end up complicated enough or need atomicity so I write them as raw, the automatic object mapping is just a nice bonus.
It's amazing how many SQL Injection vulnerabilities I see in brand new code. At least with an ORM, this is abstracted away unless you try extremely hard to create such a vulnerability.
Parameterized queries are a thing in every popular driver in every popular language (except Lua). It's been quite a while since I saw a tutorial using string concatenation.
Thank heaven those dark days of horrible PHP/MySQL SQL injection tutorials everywhere came to a close.
For the record, I've seen SQL injection attacks in ORMs too, though far more rarely than the "100 separate queries to render this web page" insanity that avoiding SQL knowledge inevitably brings.
Heh, I remember lots of "100 separate queries to render this web page" insanity with doing some CMS work ages ago (actually 200+ was sometimes common). Only thing was it didn't have an ORM either :)
Someone will probably correctly guess which one it was...
Part of the madness is domain modeling. One imagines their software’s domain model includes things like accounts and persons and try to build those as objects.
This is a lie. The domain has been modeled in the database already with constraints. The domain of the backend is not to manipulate persons or accounts but rather to do things requiring secrets, like call the database.
As such the actual meaningful structures are things like SQL clients, transactions, JSON decoders, situation specific POCOs and secrets.
What really needs to exist and doesn’t is languages with first class support for direct relational queries in the language. We have functional, procedural, and OO programming languages but no relational programming languages.
The fact that a database is this weird thing bolted onto your code with a bunch of glue is asinine. The data is the whole point. It should be in the center with code around it not off to the side. The database should be the runtime with the data as the center of the universe.
Sometimes, you need dynamic code gen and you dont want to manually do it.
…but mostly, it’s a bad idea. You can’t easily debug the generated code, you can’t easily tweak the behaviour, and writing the templates for it is hard.
Mostly you can replace it with a hand written or pre generated (“compiles to sql”) solution… but I’d like to point out you can’t always and if you do need dynamic SQL please, use an ORM, don’t template your own half baked solution.
The anti pattern is not the tools but how the tools are used. ORM is abused a lot by people that don't necessarily understand databases very well. This abuse leads to overly complicated table structures based on an illusion that the ORM tries to keep up that internal domain models map 1 to 1 to the database. This is basically self inflicted pain. If you've never heard about the object impedance mismatch, that's basically the problem with modern object oriented application architectures and trying to shoehorn that into a database.
Complex table structures have all sorts of issues related to complex database migrations and operational overhead related to that, additional complexity in the form of lots of joins, a need for indices to support those, all sorts of cascading behavior when things are updated or deleted, transactionality issues, and so on. That complexity tends to leak back into applications.
If you know what you are doing, you can avoid all of that of course. But there are a lot of examples of people using ORMs where things get a bit messy.
I've fixed a fair few projects where hibernate usage got out of hand a bit. These are not fun projects to join usually. Performance issues, flaky behavior, wonky test failures (and slow tests), etc.
Usually a good way out is to cut down on the number of tables. My mantra is that if you aren't querying on it, maybe it can just be a json blob instead of a lot of silly tables and columns that need to be joined together. I like document stores. And you can easily adapt sql databases to build one that has nice transactional behavior.
A few years ago I consulted a company to help them build them a search engine. They had completely over-engineered their data model, spent months obsessing on their domain model, and there were 20 or so tables and all the usual issues related to that (hard to extend, issues with complex transactions, stupidly complicated joins, etc.). Basically, I sat their tech lead down and went over the requirements:
- get a thing by id
- do some CRUD on things
- put all the things in the search index (for each thing do X)
- use the search index to find things
That's it. The whole point of the system was to act as a single source of truth for the search index.
So, I told them: that sounds like you need 1 table with a json blob to represent your thing. We had a few other columns for things like timestamps, userIds, etc. That vastly simplified the application architecture. We could add features without requiring a lot of database migrations. Etc. All good things.
Isn't the root issue, that relational model is superior for the job. And everyone was using it since codd wrote this famous paper in 1970? (I'm not that old btw^^)
And if you would read only the first two pages, you would see that ORM (and also NO-SQL) has some of the problems he describes in systems used prior!! to 1970!!
I think 90% of users must not understand the relational model, because it's impossible, that the other models are the right tool for the job so often.
ORMs will always be bad. They take control away and lock you into a system where fewer people are using than the underlying tech being hidden.
SQL is not hard, and I today's world most people should be using indexed documents, vs trying to do inner and outer joins to save some bytes. And in the cases where all the advanced SQL stuff matters you probably are not going to want to use a ORM anyway.
They’re still valuable in larger enterprises where applications are maintained for decades without rewriting. I’ve seen db vendors get switched more than once in contexts like this. Code gen tools are nice, saves a ton of boilerplate. Everyman type programmers still get a lot of mileage out of them. Majority of queries HQL handles fine and you can always drop into SQL to handle the ugly ones.
I'm always confused that ORMS don't standardize on a syntax where N+1 queries are literally illegal in all cases. With `forEach` and callbacks and everything you could easily have a syntax where you can't await until an `run` and curry everything into a single round trip. It lets you write "standard" javascript but disallows N+1 loops always.
Ruby sequel (http://sequel.jeremyevans.net/) is the only library where you can combine classic ORM Model bases usage, with a more raw query builder "just get me all the data into plain objects". You'll never need anything again in your career life.
Are you all discussing web development? Then, I assumed that most web developers use ORM. So, I'm purely surprised by the number of people here who are against it.
I agree that abstraction can be leaky, and it's true that you often end up writing some SQL-like code.
However, isn't writing raw SQL potentially dangerous, considering aspects like security, access control etc?
Im imagining a build tool to take raw SQL statements and validate them and produce typescript code from them. So ie const data = query(select * from table_a where pk = x) is a promise of the appropriate type etc. I currently use prisma and it is kind of good but can definitely be leaky and is rough around the edges. I like doing raw SQL statements but making it type safe is annoying.
I only have experience with EF from C# + LINQ and I was very, very happy using it. The query language is insanely good. Just to have statically typed queries was a total godsend. Ad-hoc projections which lead to optimized SQL. It literally has saved me days, maybe even weeks of work. Sure sometimes you need to fallback on raw SQL. But that was less then 1% of cases for me.
I love the DynamoDB mapper model of the ORM. It only handles object serialization and deserialization but everything else you write code as if you are working with the native DynamoDB API. I want an RDBMS ORM that does only that - translate a ResultSet to a Java POJO and convert a Java POJO to an SQL insert/update statement. I would gladly handle everything myself.
I don't agree to ORM being anti-pattern. I'm using Java for many years and I'm using Hibernate for many years. It's awesome and incredible productivity booster.
Though ORM requires extremely high qualification, that I would admit. It's definitely not a tool for unexperienced developers. SQL is much easier to use and brings less surprises.
when you write code to build a query in an ORM in an interpreted language like Ruby or Python, does it typically go out of its way to actually build the query once? or does the query get "re-built" (or cached?) every time you call the function that contains your query-building code?
I've been wondering about this and other things while working on my own mini-ORM thing for SQLite—I have it so queries are built once, at compile-time, type-checked, and stored in read-only memory in the executable. but I'm going down the route of having the user manually write SQL for everything that isn't super basic, instead of making a sort of mini-DSL for query-building. so far it's been going great but there's a lot of prior art in this space, and while I've used both Rails and Django before, forever ago, I've been kind of going into this all blindly (and so far it's been going great!).
Why bother caching queries that are pretty quick to generate? Most caching around ORMs focuses on caching the results of the query since they require more resources to generate.
well if you're using an interpreted language without any sort of build step then I suppose that's all you can do, but if your backend is written in a compiled language with compile-time execution functionality, then it seems useful to do stuff like this then, instead of at run-time, likely multiple times per request—that's a lot of unnecessarily repeated string-building, without any benefit for doing so!
Love how the tech community keeps veering from “ORMs are awesome!” (rails era) and “blockchains are the future” (2016) to “they suck!”
No, ORMs don’t suck. We used them for over a decade in our projects. There is no downside because you need middleware that writes your SQL queries for you anyway, so the app layer can reason about sharding and other data, and to avoid SQL injection. So you may as well make it object-oriented. Just make sure your ORM can handle primary keys with more than one column.
What sucks is using relational databases when you could be using graph databases. Because graph databases don’t waste memory duplicating primary and foreign keys across tables, and don’t waste time doing O(log n) lookups in an index for each row, 99% of the time you can directly get the ENTIRE array of Foos that are related to a given Moo in essentially O(1). https://neo4j.com/blog/data-modeling-basics/
I would have used them but MySQL was ubiquitous so we went with RDBMS. Many people chose RDBMS because they were mature and very widespread. But graph databases have been catching up.
Better visualization/viewability of what's happening again seems like the best thing. If we could have a uml diagram of the orm's model, then see the retrieval access patterns flowing across that model, the risks/badsides of ORMs would be much reduced. Making clear what's happening is as usual a win.
Custom SQL or optimizing bad SQL can sometimes be very difficult to do without completely breaking open the ORM.
The promises of ORMs are typically not kept.
I have always found a thin wrapper around JDBC or some other SQL driver is always sufficient in avoiding the code explosion of reading from/writing to SQL databases.
EF Core is my favorite ORM. Some Java ones are alright, and Javascript ORMs all suck because of the lack of RTTI. I try to keep my webapps as cruddy as possible and do complicated things in application logic with transactions for atomicity. I don't like relying on SQL
Regarding the topic of ORM being an antipattern, I can also highly recommend this methology by Hettie Dombrovskaya and Boris Novikov: https://github.com/hettie-d/NORM/
Been using ORM for a decade. It’s been amazing for my use case. Absolutely amazing. Tools are tools. Blanket statements about tools say a lot more about who’s saying it than about the tool.
Don’t be afraid to conclude a given popular tool is wrong for your needs, or an unpopular one is right.
But most implementations have issues of varying degrees of servility. And the important part is understanding this issues and deciding weather or not this issues are worth it compared to the benefit a _specific_ ORM might provide.
Yes. What the author is describing with the fancy TypeScript query builder is strictly worse than plain SQL. You still have to understand how SQL works, but on top, you all so have to understand this weird DSL on top of it.
The obvious solution is to use nor ORMs or raw SQL generated with various helpers in your language, but to open up DB engine to execute low level bytecode. Think: DB=GPU, SQL=GLSL, what's actually executed -> compiled code.
all the absolutism I read in the comments speaks miles about the problem not being in ORMs (which are a tool that is very useful given the right scenario) but existing between chair and keyboard.
I'd rather have an ORM than having to deal with some of the garbage code I've seen over the years (specially around many-to-many tables or relations in general) to handle the deserialization from db "manually, hand crafted, artisanal, more maintainable", specially for very basic CRUD or almost CRUD scenario where a decent ORM pulls most of the project with little hassle.
Hand crafted queries have their places just like ORMs.
We have LLMs which are pretty good at SQL now, this seems like a good usecase.
Rather than using ORMs, have an AI tool to generate an SQL query to join xyz objects and generate a test. You get to fine tune your SQL and speedily make queries.
In Symfony the ORM (Doctrine) layer has a normal relational database in the back. Performance can take a hit, but you can always build custom SQL for faster querys of larger sets. Or am I missing the point?
If we really wish to dig deeper, you only need an general purpose languages to host HTTP Server and perhaps HTTP client to connect to other external services.
Everything else including business logic can be done in RDBMS.
“At lago we…” Hmmm… as a forced user of Lago’s product at work, I’m sorry but I stopped reading at that point. Terrible software, terrible support, enjoy your ORM hellscape I’ve flipped the bozo bit.
I think ORMs are fine for small queries. For example: User.find_by(email: "xyz"). But as the queries get more complex ORMs fall apart and simply using raw SQL is better imo.
There are ORMs (like Doctrine) that don't break SRP.
I think ORMs have a bad reputation because of the ActiveRecord pattern. Not every ORM has to implement that pattern.
ORM doesn't seem like an anti-pattern in my experience; sometimes they are very useful for certain applications, other times they aren't. This applies to all tools.
I use ORMs to create custom line of business apps, mostly CRUD-ish and it works fine. The ORMs I've worked with have a way to write your own queries if the need arises.
ORMs major selling point is solving the n+1 queries problem.
They can extract the tree of objects very well.
But for fast batch processing they have many drawbacks.
It's interesting that in general I hear a lot of hate for ORMs, but the Rails devs I know all seem to like ActiveRecord and can't imagine working without it.
ORM is a crutch. One of the things that has always separated good developers from great developers, is that the great developers are masters of database querying.
In that case, I'm happy to be 'just' a good developer who has used an ORM to create a product that runs an entire company, and has been doing so reliably for 12 years and counting now. OTOH, I took over a role years ago where the 'great developer' before me left behind masses of crazily complicated and hard to understand SQL queries.
As long as you can deliver reliable and maintainable software in a relatively timely manner, it doesn't matter how it's done, and clients sure don't know or care either.
ORM, Python GIL, need for k8s, mongo is webscale - pick your summertime bikeshedding topic instead of going outside and re-evaluating your life friends!
I think this is how an ORM for SQL databases should be. Unfortunately I don't use Node on the backend (Rails, Django, Phoenix) but I wish there would be ORMs like that for those frameworks.
TLDR: write SQL with interpolated parameters, get the results parsed into objects.
> ORMs produce objects. (Duh! That’s what the O stands for.) Objects are like directed graphs—nodes that point to other nodes but not necessarily to each other. Conversely, database relational tables contain data that are always linked bidirectionally via shared keys, aka an undirected graph.
> The second issue is that ORMs sometimes make multiple roundtrips to a database by looping through a one-to-many or many-to-many relationship. This is known as the N+1 problem (1 original query + N subqueries). For instance, the following Prisma query will make a new database request for every single comment!
I'll throw in another: combinatorial blowout between cross-joins is another problem. If you inner-join collections directly, you will end up with every possible permutation of each of the collection values. This is M x N or even M x N x O x P ...
Another similar problem is the idea of having "dictionary" tables which have surrogate keys/enums/etc and you want to put the keys in other tables but have a frontend show the display value. And really the display text is not part of the object, it's part of the dictionary object. And while you can manually write a query which pulls these back/hydrates them, it's not automatic and it's not performant unless you write something manually, you are individually loading N different rows with N queries!
This problem is real, but it's actually not a problem with ORMs but rather a problem with JDBC-style/row-oriented transfer paradigms. It's an impedance mismatch with the way we query the database, not with using ORMs.
What programmers intuitively want is to pull over some pool of objects from a given query. And that doesn't necessarily map cleanly onto a single SQL query - in fact they may be different types of objects in different tables etc! And you can't just inner join and pull them back in a single row - because then you get combinatorial blowout if you have more than one collection.
Obviously you need to define what pool of objects you want, because like the dictionary example, you could potentially keep hydrating deeper and deeper into the structure. You have the query string, but you also need to have some "hydration" string. And it turns out JSON:API has already defined a nice little standard for how to define what parts of the objects you want hydrated.
So what shakes out of this is "JOPT" - Java Object Pool Transfer. You can probably build it on top of JDBC, just be aware that it probably involves multiple queries, one for each type of object. The ORM loads all objects of all types, puts them into a Map<keytype,MyObj> for each type of object in the pool, and then assembles the object graph using the foreign keys.
You can do the data pulls relatively efficiently with temporary tables containing the pkeys that you want for that object type, that you populate for each query and then inner join against the tables. "SELECT * FROM table INNER JOIN tempTbl.pkey" will run much more efficiently than a "SELECT * FROM table WHERE pkey IN (:1, :2, :3)" because every time you want to pull a different number of objects you get a different literal SQL string which blows up the query planner.
It might also be desirable to have some better SQL support for maintaining these pools as a cached result rather than having to run a separate query for each type. You could pull back the primary result and have the pools stored as an aggregation column, then turn it around and pull the other object types, but it's just clunky in general, SQL is not expressive in the ways that is ideal for object pools. It would be nice to have a "cursor" that can traverse across multiple object types, or cache the included results between multiple queries in a transaction, etc.
--
The other thing is that the "collections" model is fundamentally bad in SQL too. Really this is the fundamental problem that originally drove practical adoption of NoSQL, I think. At the end of the day everyone's data is relational, the examples of "non-relational" applications ("netflix movie reviews"/IMDB isn't relational, really?) are always extremely contrived and obviously one feature-request away from being relational, that's been bullshit since the start. But it's obviously desirable to just put the IDs of the collection members into a text field/etc and that's why people like NoSQL - collections are literally so painful in SQL that people just choose to not use SQL.
With JSONB being very advanced on Postgres you can probably do this quite easily, or you could use arrays, etc (I think arrays might be built on the text?). JSONB can have foreign keys, indexes, etc, and TOAST can store up to 5GB of text (or something like that) per item if you need it to. Practically speaking, I think you could build a very successful application without using a "join table" or a one-to-many schema, and the object transfer protocol will just pull those uuids for you. Postgres can dynamically un-aggregate text fields back into rows for inner join/etc too, if you want.
Alternatively, you can map the one-to-many or join-table into an aggregation query that turns the actual collection table into a string_agg() function or similar.
A fully general model might require these to be implemented via recursive CTEs that aggregate all of the collection object uuids into a pool for the ORM to pull back, because the relations you're filtering might be many layers deep.
But these technical details are the type of dumb technical query building that ORMs are great at. Yes, you have to come up with the code that unrolls a collection text column into a couple rows, does the WHERE query, and aggregates back out, but that's just annoying to write, not technically difficult, and ORMs don't care about annoying code.
--
Anyway that's what I think people want from ORMs. They want to write a query in HQL or an object-based/programmatic query builder, and just pull back a pool of objects that's hydrated according to the include-string. People don't actually care about the relations and it's actively tedious to manage that aspect of ORMs because there's so many ways for it to bite you without you doing anything that's obviously wrong. It works, especially on toy problems, it just runs like shit, or pulls back M x N permutations of collection members, etc. Having to janitor that is the biggest headache of ORMs and honestly SQL in general.
It's functionally insane that we still interact with ORMs in SQL-like and row-style syntax. It's an object persistence layer, the tables are an implementation detail, the entire point is decoupling that but instead we just end up with even more complex logic to make the ORM behave. That is the incongruity that I think people are searching for here. Stop making people manually build the parts of the query that handle the administrivia to hydrate the object, and just make the querying and object mapping work properly, fix the combinatorial blowouts, etc. And do it without falling back to hammering out object requests one at a time like lazy loading tends to be.
If you're pulling back records one by one, or using the dynamic/lazy-loading function for relations between objects - yeah that's gonna be super slow and you're gonna hate it. In my experience that's the #1 reason people don't like ORMs, they're not running one query, they're running 1000 queries and hammering the database hard. The lazy-loading functionality might as well not exist and tbh the default should probably be to have that getter throw a "you fucked up the query, this object hasn't been joined!" exception unless it's specifically configured to allow lazy-loading.
I am going to play the devil's advocate in this sea of ORM-haters.
First of I am an avid EF-Core developer. But I feel like I have a pretty good grasp on when I should let the datbase do the stuff the database is good at (cascading deletes). And I have experienced times where I had to use an SP because EF-Core was slow and would lock up a table or two.
This article shows an example of an ORM system that makes use of raw sql to construct its final queries. I can't be the only one that thinks that that is weird. Why use an ORM that doesn't abstract from the database - that's half the point of an ORM. And I think that is the key point of ORM... Abstraction.
Some of the advantages I can think of when it comes to ORM.
* ORM speeds up the development time. Now you don't have to write long queries. In EF-Core for example we just use LINQ either with method or query syntax. Query syntax already looks a lot like sql.*
* More tools. EF-Core Migration is great. Now I can spin up an mssql container. Quickly create a bunch of tables. Then go ahead and run integration tests.*
* Because the database is abstracted. I can now run unit test on an in-memory database where I can have whatever data I want and I am not depended on queries being only compatible with MsSql.*
* If anyone has ever worked with ASP. Not ASP.NET. Just ASP. Legacy technologies appears irregardless of age. When writing queries you might just try to convert a bool to a text string. Works fine on your computer. Shit's the bed on the server because the server's language is something other than English. Argueably this could have been prevented if ASP weren't just so ASP. But also an ORM could have prevented it.*
* I don't have to worry about the dialect of the database I communicate with. Because the ORM takes care of that. This also incldues the formatting of my DateTime variables.*
* Error handling next. I can specify in EF-Core how big many characters can be in my column. And do error handling accordingly if the amount of characters are exceeded. This is the power of code-first approach - something I believe is a strong approach to designing your database scheme.*
* Last small advantage is that I can combine it with GraphQL and then I wont have to make a new endpoint for every new "edge-case" request.*
Disadvantages I can think of.
* Slow. Naturally when you add an abstraction layer it becomes a bit slower than if you were to use raw sql. So if you work with a a lot of data at once you have to rethink your approach. And adding GraphQL into the mix is another abstraction layer that is just going to make it slower.*
* Implementing the database scheme is definitely faster using db management system rather than a code first approach.*
* Merging in EF-Core is for me still a lost cause. You first have to grab all the rows to see which matches and then update this whilst inserting the rest. You can probably pay your way out of this one using BulkExtensions at the fantastic price of $1000/year. Currently I solve this using a temp table and a stored procedure.*
* Duplicate and unnecessary operations has been apparent until EF-Core 7.0. Before when I had to delete or update an entity I would have to first fetch it up the database in order to track it before I can do either. This is two db-transactions that can be reduced to one in EF-Core 7.0 now. But before that was an absolutely pain.*
* Similarly to previous point. Adding an entity to a database is slow and can result in deadlocks if you have a lot of inserts happening rapidly. I've cirumvented this using EFCore.BulkExtensions which does an optimized insert for each dataprovider. But using a third party library even when is free is a pain... Especially since the repo owner now have gone the way of paid licensing of the library.*
Complex queries such as partion by and pivots is not possible in EF. Probably never will be. But in regards to partion by I think it should be implemented at some point mostly because I have experienced I can make a query many times faster by using it.
Summary
ORMs are great. Especially in REST services when your queries are small. But don't use ORMs when you have to handle a lot of big data.
Oh yeah, another bad reason for ORM: to support the "Domain Model". Which is always, and I mean always, devoid of any logic. The so-called "anemic domain model" anti-pattern. How many man-hours have been wasted on ORM, XML, annotations, debugging generated SQL, and so on? It makes me cry.