Mythbusters: Stored Procedures Edition

sc68cal · on Aug 1, 2011

The article relies on major features of Oracle in order to "bust" the myth of the unmaintainable stored procedure, which leads me to believe that these are big band-aids over what is still a difficult problem: maintaining database changes and getting developers to put database changes into version control. Unlike the author, I am not fortunate enough to have Oracle, and am instead using MSSQL for our database with 500+ stored procedures. Here's my experience:

Even the first "myth" that was "busted" had me cringing, putting all the stored procedures into individual files and using a source control system to manage those. The person I replaced had done the same thing with 500+ stored procedures and still had never bothered to actually update the files. He just updated the stored procedures on the development server. Partly because he was a massive fuckup, but also because it was "out of band" management.

I ended up moving the code out of SourceAnywhere (an all around terrible version control system and company) and placing the code into Github, and cleaning up the database so that it could actually be dumped to an SQL file and then loaded from that file as an NUnit test, run by the CI server that I set up.

Even with these huge strides forward, it has taken me months to clean up and fix these problems, while still adding new features to the existing code and database. The experience has left me feeling that the low quality of code that crap coders can get away with in stored procedures, and the lack of testability still leads me to strongly avoid stored procedures because I ended up having to fix them with sweat, tears, and blood.

jasonkester · on Aug 1, 2011

Hang on a second... You use MS SQL Server and don't follow the practices he describes in this article? Why on earth not?

SQL Server is just plain awesome at this stuff. Schema is all scriptable (it even prompts you to let it script itself if you mess with things in the modeling tools). That means that you get automagic change scripts that tuck nicely into source control and are just waiting to be sucked in by your automated build and deploy system of choice.

Procs and everything else are all just scripts, which get their own files. So just like you have individual files for all 500 of your classes, you'll have 500 .sql files for those procedures. It's not difficult in any way.

It sounds like you ran into problems because you inherited a system from people who went out of their way to mess things up. Messed up things are hard to work with, regardless of what technology they live in. Try not to blame the database for the fact that you can build ugly things with it.

F_J_H · on Aug 1, 2011

I think one needs to be careful of the reasoning that goes something like "X makes it easy for people to do things poorly, which makes my job harder, and therefore X is bad." I've heard the same argument against PHP. "PHP is so easy to use that anyone can hack together crap, which is difficult to maintain. Therefore, PHP is bad."

I love this quote:

UNIX was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. - D. Gwyn

sc68cal · on Aug 1, 2011

"PHP is so easy to use that anyone can hack together crap, which is difficult to maintain. Therefore, PHP is bad."

I know you are attempting to highlight a weakness in my argument by highlighting how my reservations are fairly generic, but by the same token, I have never been lucky enough to see high quality PHP code. I don't think PHP encourages best practices, just like Stored Procedures. Either that, or PHP is just such a large community that the number of bad coders, even though it might be a constant percentage across all languages, but because PHP is so large I've run into a lot of them. Who knows?

While I have no doubt that there are some real superstars that can make beautiful PHP code and write wonderful stored procedures, there is still the 99% of programmers who can't and end up making unmaintainable nightmares.

SoftwareMaven · on Aug 1, 2011

I think that is the wrong argument, though. Trying to develop a language that tries to not let you do bad things gives you something like Java instead, where doing anything is cumbersome (and still doesn't succeed in preventing bad things from being written).

There is a lot of bad PHP code because there are a lot of people who start hacking using PHP. However, any language somebody first starts hacking in will result in ugly code (you should have seen my Scheme code when I first started with that in college! :).

Unfortunately, the idea of "teaching languages" doesn't really work, because people want to continue using what they've already learned.

(BTW, I don't think you deserved downvotes. You are continuing the conversation, after all, not trolling. I really think we need to make downvoting more painful.)

greyfade · on Aug 1, 2011

The problem with PHP isn't that it lets you do dumb things. It's that doing dumb things is idiomatic.

lkrubner · on Aug 2, 2011

Yes to this:

"(BTW, I don't think you deserved downvotes. You are continuing the conversation, after all, not trolling. I really think we need to make downvoting more painful.)"

It would help if the voting was non-anonymous.

ZoFreX · on Aug 1, 2011

You could use the same thing to argue that business logic shouldn't be in PHP, or that it is hard to version control, because my predecessor put all of his functions in one giant PHP file and frequently updated websites live on the server without committing those changes to SVN.

Anything can be mis-used. Whether or not we should use it should depend less on potential for mis-use, and whether it can be used correctly (and in a multi-developer environment, whether or not you can enforce correct usage in some way, either through technical or social means).

jules · on Aug 1, 2011

That's like committing code to your application server and not putting it into version control. This should be attributed fully to the crappy developer and not to the software running on the application server or in this case database server.

smiler · on Aug 1, 2011

If you use SQL Server, then you absolutely needs to be using the RedGate SQL tools, which also includes SQL Source Control, which will make versioning your 500 SPs very very very easy.

Don't delay - www.redgate.com

Duff · on Aug 1, 2011

Great points. If you exercise solid controls around your database operations, you "can" have a the same level of version control as you do on code outside of the RDBMS.

But "can" isn't the same as "will" -- in practice, there are practical issues that make this painful.

The other point is that using advanced Oracle/DB2 features to accomplish things means that you need to keep some RDBMS high priests around. That means lots of money.

statictype · on Aug 1, 2011

We built a version control tool in-house to help with this problem. It would examine the database and generate files (one per object) on disk which could be committed into version control. The tool has a -merge mode that examines scripts in files against the objects in the database and spits out sql commands that modify the DB to match the versioned files.

It has a few drawbacks but mostly works ok.

jasonkester · on Aug 1, 2011

That sounds like a bit of a bandaid fix for a history of bad practice. Why not, instead of forensically trying to rebuild scripts for changes, just save your change scripts as you change things?

You end up with the equivalent set of scripts in source control, but it seems like they'd be a lot more readable (they won't be all lumped on top of eachother and out of order, etc.). Is there another advantage to what you're doing that makes it worth deviating from the "best practice" way of doing things? Is, for example, the -merge functionality something that you use on a regular basis to roll backward?

statictype · on Aug 2, 2011

Why not, instead of forensically trying to rebuild scripts for changes, just save your change scripts as you change things?

Well, DB objects were put into version control much later than other source files, so I wanted a way to guarantee that what's there in source control matches what we actually have in the database server.

You're suggestion is that basically developers maintain the discipline of having changes in the files reflect what they run in the DB, right? I thought its significantly cleaner to have the database as the single source-of-truth and take everything from there.

but it seems like they'd be a lot more readable

Didn't quite get that part. The generated scripts are saved in one-per-file and reflect the name of the table/procedure/etc... that it contains.

Isn't that basically what the 'best practice' way of doing it is?

Merging is just to take the latest set of source control and bring your DB up to date with it.

How do you handle version control? I'm certainly interested in a better way to do it.

msluyter · on Aug 1, 2011

I work with PL/SQL daily and I believe several points in the article are... debatable, if not misleading.

1. Yes, you can store your packages in SVN or whatnot, but two developers cannot work on the same package on the same database at the same time without stomping on each other's work.

2. Managing database changes: yes, you can analyze explicit dependencies for a given object, except that EXECUTE IMMEDIATE statements are not tracked. Nor, of course, are any queries done by your java webapps or whatnot.

3. "And PL/SQL has object oriented features too." We've moved away from most of the object oriented features because either we encountered bugs (in 10g) or because performance was abysmal.

4. EXECUTE IMMEDIATE -- I love the idea of execute immediate, but in practice, it's has problems. Performance tends to be poor (queries often need to be re-parsed) and as noted above, they break dependency tracking. If over-leveraged, they can also result in highly unreadable code.

5. Unit testing -- I've looked at several PL/SQL unit testing packages and they've always seemed to me to be fairly half baked. Unit testing procedures with side effects is also often inherently difficult because they rely on database state. I'm not against it, but it certainly isn't as easy as using jUnit. That said, if anyone has successfully used a PL/SQL unit testing package, I'd be curious to hear it.

5. Portability -- Irrelevant if you're using oracle specific features (which, you probably should if you're using Oracle).

None of this is to say that I'm against stored procedures. On the contrary, they provide some level of encapsulation, and I've generally found PL/SQL to be fairly transparent to understand. If I'm doing some complicated joins and/or several sequential DML statements with exception handling in a transaction, I'd much rather do it in PL/SQL than from an external language. I just think the article paints a picture that's a little to rosy.

SoftwareMaven · on Aug 1, 2011

The problem with #1 is cultural. For whatever reason, DB developers don't seem to have caught on to the "run a local, development instance on my machine; run a deployment for production." I worked at a company that did a lot of Oracle PL/SQL, and pretty much all of their developers shared the same "development" server.

greyfade · on Aug 1, 2011

To be fair, commercial DBMS are rather quite expensive. Setting aside one additional multiprocessor license for every single developer on your team is... prohibitive.

gphil · on Aug 1, 2011

Those types of licenses are generally free for developers.

http://www.oracle.com/technetwork/indexes/downloads/index.ht...

VladRussian · on Aug 1, 2011

oracle is free for dev.

IgorPartola · on Aug 1, 2011

Relating to #1: isn't that also true for say Rails/Django/PHP code? Two developers would stomp all over each other trying to modify the same piece of code on a production server.

Relating to #5: doesn't your objection to unit-testing code that has side-effects also apply to say Rails/Django/PHP code that also modifies state in a data store? I suppose you could argue that if you are using Rails/Django/PHP, you could mock your entire database access layer (maybe through your ORM), and that this would be very difficult inside PL/SQL. However, mocking the entire data access layer is pretty expensive, in any server-side environment if you want to get some actual testing done (as in nuances of resolving foreign key constraint issues, or how locking of rows/tables must work for your code to run concurrently and deadlock-free).

I am not arguing for or against PL/SQL. Just wondering why you think those points might have different obstacles in different environments.

tomkarlo · on Aug 1, 2011

"Two developers would stomp all over each other trying to modify the same piece of code on a production server."

Yes, but it's trivial to create both developer instances and an integ instance of Rails/Django/PHP so you don't end up trying to both modify code on the production server. Doing that with the db, especially if you want to test against "production scale" data sets, is at the very least going to be expensive.

IgorPartola · on Aug 1, 2011

Are you suggesting having one production DB => multiple virtual machines with application code? If so, I suppose that might work for some read-only workloads.

If you need to bring up a "production scale" server for a developer to go along with their virtual machine with application code, then you have to do it no matter what environment the application code runs in. I imagine modifying data using experimental code on a production system is not really desirable no matter what the code is written in.

tomkarlo · on Aug 1, 2011

No, I'm not suggesting that. But having say 3-4 DBs for integ, testing and production is a long way from having one for each developer.

piranha · on Aug 1, 2011

For #5 you usually have some database setup/teardown stage, where you create (or destroy) new testing database and setup all the stuff you need. This is pretty hard to do with pl/sql.

IgorPartola · on Aug 1, 2011

Yes, that's what I usually end up doing: having a DB dump that I can load back in at the beginning of the test, then drop it at the end. I can see how a purely stored procedure test harness would have a hard time with this. However, if you just wrap your unit tests in a shell script, you should be able to do this.

iamgoat · on Aug 2, 2011

1. You can say the same about database schema changes. A lot of the bumping into one another would come from your sproc/udf trying to handle too many things at once. Which is typically a bad design decision.

peregrine · on Aug 1, 2011

This article reads very much like an article for Oracle, and thats not even my biggest gripe with this article. I have written my fair share of SP's in my very short career and I will say this.

Pros: Stored Procedures tend to be fast if you know what you are doing. Stored Procedures are a good way to abstract away the "relational" differences between OO and SQL.

Cons: Stored Procedures are a good way to write a bunch of horrible to maintain code. Most developers spend their time writing front-middle end code and don't write SQL that well so they write horrible SPs. Even worse is someone who really understands SQL and writes almost the entire application in a SP and writes awful front-end code. If I had a dollar for every time I opened up a 2k+ line SP with nested queries, queries written as strings and 'eval'd or an awful number of joins I'd probably be arrested for suspicion of dealing drugs.

In almost every company there are a small group of people who call themselves "DBA's" and believe in writing unmaintainable code for speed and rally against those using ORM's or things that basically remove them from the equation .

Its crazy the amount of pain I've witnessed and felt because all of a sudden I realize I have no clue with a SP is doing or why its written a certain way. And having to sit down next to these DBA's so they can "explain"(mostly fix it because they have no time to explain) it to me.

My last job we finally got the OKAY to try out Entity Frame work(.net ORM) but the DBA literally rallied for weeks, showing how much faster SP's are. We finally came to an agreement where we would use entity with autogenerated SP's as a backend and fall back to SP's on slow calls. I'm still not sure why he was given a say but it is what it is and I don't work there anymore.

gaius · on Aug 1, 2011

Unfortunately this is common, but as a "DBA" I can tell you right now: it's inconsistent for a developer to sneer "databases don't scale" and yet advocate the use of an ORM that generates bad, bad SQL! When speed matters, you have to get the compute and the data as close as possible. You can slurp it all into the application and work on it there, or you can just do the computation in the database, which is actually easier, and more efficient.

FWIW I work on a trading system that happily handles thousands of commits/sec, in Oracle, along with complex calculations of exposure and the like.

peregrine · on Aug 1, 2011

I agree there is a point when optimization is required. At the time we were building a new product that had rapidly changing requirements and few users. So you can see my frustration when days of man hours where spent on this database.

gaius · on Aug 1, 2011

You say the DBA "showed" that SPs were faster. It's hard to argue against actual timings. Why shouldn't he have had his say?

peregrine · on Aug 1, 2011

Because it was also shown in several project's before and the current one that writing these custom SP's and using autogenerated SP's added huge amounts complexity and development time to our application.

Which again was still being built and had no need to be a high performance machine. In fact none of the products that the company served up needed to be high performance. These guys were smart and bored and built huge monolithic systems full of complicated approaches to simple problems.

Edit: Further more nobody wanted to work with this guy on the team. He was an "expert" on everything and ran every single conversation about anything, including areas he had no experience in, into the ground because nobody's opinions were valid because he clearly knew everything. /rant

gaius · on Aug 1, 2011

Well, if you were a PL/SQL dev, you'd be making the same complaint about the complexity of J2EE. I've no experience of working on systems where performance wasn't an issue :-)

gnaritas · on Aug 1, 2011

> When speed matters, you have to get the compute and the data as close as possible.

And when scaling matters, you often need to do the opposite. And scaling is often more important than speed.

gaius · on Aug 1, 2011

Ermm, that's nonsense. You can move the compute into the database (stored procedures) or you can move the data to the compute (caching - and deal with cache invalidation yourself). There is no situation ever in which you would deliberately seek to make them as far apart as possible.

gnaritas · on Aug 1, 2011

> You can move the compute into the database

Yes, often doesn't scale because the database is an extremely limited resource that generally doesn't scale horizontally.

> or you can move the data to the compute

Which is what I said, which allows you to scale computation horizontally.

> There is no situation ever

You just named one, however I see where you're misinterpreting me or I you; when I said do the opposite, I meant move the computing away from the database.

babebridou · on Aug 1, 2011

Not necessarily nonsense: figuratively, you can (and should in many cases!) be putting as much of the data as possible, as far away as possible from your code with divide and conquer to work on only a fraction of it.

But anyway, that's what indexes are for in RDBMS, so you still have a point, even figuratively :)

cameronh90 · on Aug 1, 2011

I disagree with this article. SQL is an decent (though crusty) language when used specifically for the purposes of interacting with relational databases. The fact that most database vendors have hacked mutually incompatible proprietary extensions on-top of it for doing other tasks doesn't make it a good choice for them. Why would you use SQL over a language that has been specifically designed for code reuse, integration and interacting with multiple systems (rather than just the database)?

> Myth #1: Stored procedures can't be version controlled

Of course they can, but unlike normal code which runs/builds directly from the file-system, a sproc tends to be written directly on the database server (since you won't get all those fancy refactoring/debugging tools unless you do so). Copying it to the fs before committing your changes is a step that people often forget, or one that requires yet more tools.

> Myth #2: Managing the impact of changes in the database is hard > Myth #3: Database tools lack modern IDE features

His examples of basic refactoring tools hardly compares to something like ReSharper.

> Myth #5: Code in the database can’t be properly encapsulated and reused, you need an object-oriented language for that

They can be encapsulated, but it's a pain. Even just getting data from one stored procedure to another can involve hackery like opening a connection to the database from inside the calling procedure. See this link: http://www.sommarskog.se/share_data.html - none of these will work in even close to all situations.

jasonkester · on Aug 1, 2011

a sproc tends to be written directly on the database server ... Copying it to the fs before committing your changes is a step that people often forget

No they don't. At least not more than once.

Changing something on your dev database and forgetting to save your change scripts and check them in is analogous to changing html directly on the production web server and forgetting to save it locally and check it in to source control. It's a very bad idea, and nobody would ever actually do that. But if you did do it, your changes would evaporate during the next deploy, and after a few minutes of venting to the sky you'd learn to never ever do that again.

cameronh90 · on Aug 1, 2011

Except my IDE doesn't require me to be editing while connected to the server to get all the useful refactoring/etc. tools. It only requires being attached to debug.

Admittedly in most SQL editors, you can open up the file, then connect the IDE to the database. But it's hardly seamless (particularly if you have multiple databases), and certainly isn't like normal development where the filesystem is exactly what's running, and is the only user visible copy. The tools just all feel extremely immature compared to any IDE I've used.

And to clarify, I'm talking about the dev SQL server, not developing in production.

jasonkester · on Aug 1, 2011

Right, but we're not talking about IDEs. We're talking about SQL.

No matter what you use to modify your database, be it a modeling tool, the table editor that comes with the database, a fancy VS.NET plugin, or a mysql command prompt, you're still producing SQL change scripts. Those change scripts always need to get saved out to a file and checked into source control.

This is especially true if you're developing on a local database with multiple stages between you and production. You need scripts for everything so that your build can push things out to the next environment down the chain and eventually live.

If you're not doing that, I can't imagine the pain you're introducing for yourself.

tewolde · on Aug 1, 2011

I completely agree, technology is a lot like time & money. Throw enough of it at any problem and you'll likely solve it.

When dealing with scale problems, the biggest asset is flexibility as you will likely need more than one kind of "database" and your logic layer will be the glue that holds it all together.

Stored procedures are simply too focused a tool and often tied to a particular database to offer any comprehensive solutions.

Sauce1971 · on Aug 1, 2011

The fear of SQL, stored procedures, proper database constraints, the love of ORM and the embrace of nosql exists because a large portion of influental programmers just don't bother to understand relational databases and SQL. It's not in fashion, so it becomes a necesseary evil the primadonna rock star will try to avoid at any cost. Just like a real rock star might drop to learn to read notes or sing for that matter. Obviously DB theory should be regarded just as important as understanding the latest greatest language feature and framework.

InclinedPlane · on Aug 1, 2011

I don't think that's true at all. I suspect that the most ardent noSQL advocates are actually considerably more versed in rdbms specifics than the average dev.

gaius · on Aug 1, 2011

Not really. The NoSQL movement is based on the assumption that the limitations of MySQL apply to all databases. Their ideas of what an RDBMS is are 15-20 years out of date - really.

jbooth · on Aug 1, 2011

It's more based on the assumption that the limitations of MySQL should apply to all databases.

I've worked 2 places with Oracle stored procedures. At both places, they were a giant nonperformant mess of spaghetti that could never be cleaned up because changing things == breaking things. And since all of the application logic had been written as transactional SQL with lots of joins over normalized tables, it was gridlock heaven inside the DB and adding more cores to the single machine barely even helped.

gaius · on Aug 1, 2011

You can write bad code in any language!

jbooth · on Aug 1, 2011

Yeah but that's a total nonpoint.

Are you arguing we should all use COBOL and every other stupid-ass approach, as well?

InclinedPlane · on Aug 1, 2011

I think he's saying that anecdotes are not necessarily the most valid form of data for making decisions.

jbooth · on Aug 1, 2011

Well, it's a subjective judgment, and the anecdotes I've seen/heard are 90% in one direction, with the other 10% coming from people who make their living dealing with PL/SQL.

Good apocryphal quote, it's impossible to convince someone of something if their job depends on it not being true.

InclinedPlane · on Aug 1, 2011

So you're saying that, for example, google uses bigtable because they don't understand rdbmses?

FooBarWidget · on Aug 1, 2011

Part of NoSQL is about scalability using commodity hardware and without spending tons of money on licenses. Sure Oracle can scale, but people want those features for free.

gaius · on Aug 1, 2011

Paraphrasing Zawinski, it depends what your time is worth. Remember in business, the concepts of "cheap" and "expensive" simply don't exist. A thing is "worth the money" or "not".

greyfade · on Aug 1, 2011

I suspect that most "noSQL" advocates have a poor understanding of the difference between relational/structured data and unstructured/sparse data. Virtually all of the "noSQL" software is incapable of managing complex relational data, as far as I can tell. Why no one points this out is baffling to me.

"noSQL" is useful for for application layer - where caching and short-term storage are important; but most "noSQL" advocates I've seen seem to think that it should replace an RDBMS, when that couldn't be further from the truth. "noSQL" is useless as a structured backing store where RDBMS are used today, especially for large systems.

FooBarWidget · on Aug 1, 2011

The problem I have with that statement is that you're dismissing everybody who have a legit reason to use NoSQL databases.

Show me an autosharding solution for an open source SQL database that doesn't involve paying tons of money. Until then I'm not convinced I should ditch MongoDB.

tluyben2 · on Aug 1, 2011

Yes, came here to say that as well. I hear people saying 'Oracle can scale as well'; that could be true, or not. I don't know: I don't have a lot of experience with Oracle on large scale projects (I did a lot of J2EE projects using Oracle, but those really could've just as well worked on SQLite, and I mean that... Complete overkill.). I know my database theory and I don't understand how Oracle can scale like a NoSQL database while preserving their RDBMS nature; as I understand that's not possible right?

So how is there no room for NoSQL databases? I understand that most sites running NoSQL at the moment could just as well (or even better) run on Postgresql, even with 'biggish data', but is for 'enormous' data and near real time analysis of it, NoSQL not a proper solution over 'traditional' RDBMSs. Not trolling; really curious what people have to say.

Do I understand it correctly that Oracle 'Exadata' just uses fiber cables and such to threat the entire rack as 'one node' or does it work differently? How does Oracle scale? Any information about that (tech information, not opinions/interpretations of reality as OP posted)?

babebridou · on Aug 1, 2011

Why ditch anything?

Just go with both. MongoDB can't make your data live & evolve the way an RDBMS can. You can't inject intelligence in your noSQL database, but you can sure extract data, a lot of it, a ton of it, with as many concurrent clients you wish. If you can't find a use for a relational model in any project, then I'm not sure what to say. But if you can, maybe try a postgres & Mongo combo for the best of both worlds?

clutchski · on Aug 1, 2011

I work at a company who's entire business was written in stored procedures and it has been a massive source of complexity and pain for us. Here are my issues with it:

- Most web languages have a wealth of tools for testing, introspection and instrumentation that don't exist or are hacks in PL/SQL. Unit tests, basic logging, aspects that log function execution time, posting to error tracking & app metrics services, etc.

- PL/SQL is not expressive or concise. String processing, in particular, is a nightmare. This means more code, buggier code & thus reduced developer happiness.

- At scale, developers need to be parsimonious with joins in the data layer, otherwise partitioning is impossible. This is technically possible in PL/SQL, but runs completely against the grain of SQL, so I can't imagine this actually happening in the real world. This is probably my company's biggest source of technical debt.

- SQLPlus doesn't return a non-zero error code when certain errors happen when installing PL/SQL, so release scripts have to install then assert the user_errror table is empty. This is just ugly.

babebridou · on Aug 1, 2011

Number 3 is where the real money is. We had success with multiple schemas, using views and materialized views to partition everything we could, but both the good, the bad and the ugly in PLSQL come from the wonders of joining so it's always a very hard decision to make with high risk and high reward.

Nerdfest · on Aug 1, 2011

Number 2 for me is the big one. PL/SQL (and any other SQL dialect I've worked with) is an exceptionally poor choice of languages for implementing business logic in.

_delirium · on Aug 1, 2011

The idea of the database storing the entire business logic, i.e. the relational data and arbitrarily complex computations on it, rather than just storing the data and performing relatively straightforward retrievals, is an interesting indirect victory for the old "deductive database" idea. That was one of Datalog's main conceptual claims over the SQL databases of the day: that databases shouldn't just regurgitate rows and joins/filters of rows, but should perform, within-DB, whatever computational logic needs to be applied to the stored data to produce the information that the application needs. It was also an angle that SQL defenders used to use to attack deductive databases, for putting arbitrary logic in the DB where it allegedly didn't belong. It's interesting to see that the SQL world has adopted the idea after all, though in a bit more manual/procedural way.

pkteison · on Aug 1, 2011

It's expensive to scale by adding more DB servers. It's cheap to scale by adding more webservers or 'middle tier' servers. You can't just completely ignore the scaling argument by saying "it's no problem, just spend money."

Lots of other things are fairly hand waving. Stored procedures can be version controlled... if you write some code to do it yourself, or buy RedGate's tool for that. Still, I've never worked in a shop that did as good a job at version controlling their SQL as they did their code. It's harder. Stored procedures can be refactored... but I'm not sure what tool is being used, and I don't want to have to evaluate 20 different tools to find the good one. I know I currently have better tools for writing compiled code than I do for writing database sprocs.

There are many valid reasons for doing as little as possible in the DB layer, most of which are listed here, and I just don't see this article as successfully debunking them.

kogir · on Aug 1, 2011

In my personal experience most of this is true.

At Loopt we used MSSQL from the beginning. Things didn't start this way (because I was young and foolish), but in the end:

* Each developer had their own local instance of the DB(s) for development.

* All schema was kept under source control (now using RedGate SQL Source Control, a fantastic product, and earlier as simple sql scripts in Hg/Svn and database projects in Visual Studio).

* Unit tests of the DB were done in code (C#). They set up their own initial state, ran against the local DB instance, and cleaned up after themselves by rolling back transactions.

* Commits in source control triggered automatic SQL updates in dev using RedGate SQL Compare (though for a time we used the SQL tooling in Visual Studio, also command line scriptable, for this task).

* Debugging performance was easy with performance monitor, set statistics (profile|io|time) on, and built in management views. Right click -> "Show what's expensive" kind of easy.

* Debugging functionality was easy with step-in line by line debugging from the C# code calling it.

* In many cases 5-10 round trips were avoided by keeping data local. Lots of network bandwidth was saved as well by only returning final results.

* Used correctly, stored procedures encourage correct handling of parameters and can increase security -- Web server code didn't have read or write access to the raw data. It was constrained to calling the stored procedures (no way to dump all password hashes, for example). Think of it as another layer of defense.

We tried to keep most of the complexity out of the DB because T-SQL is a terrible language. You'd have to be insane to write anything you didn't have to in it. To get optimal performance we occasionally had to do complex optimizations (table hints, indexed views, etc.), but all of that was still far easier than proper cache invalidation. Loopt ran with no cache because the DB worked just fine when tuned correctly.

rch · on Aug 1, 2011

Just in the last few weeks, I have started writing PL/SQL for the first time in a long while. Coming straight off a 2 year period of working almost entirely in Python, I expected to begin hating life immediately... but it just hasn't been the case.

In a world of light-weight HTTP services, micro-frameworks, and high-quality ORM, the making the 'right' architectural decisions seems downright simple -- or at least, it seems easier to get everyone involved to agree on how things should fit together.

greyfade · on Aug 1, 2011

This article unfortunately doesn't address many of the things that concern me the most about stored procedures: Mainly my problem is that writing "business logic" (Can we stop using that phrase? It smacks of managerial buzzwordism.) in SQL is painful at best and downright unmaintainable at worst.

I've recently been tasked with rewriting some of our scheduled processing code from what it is now (some very badly-written C#) to a collection of stored procedures and functions. The biggest issue is that the code does a lot of string processing, much of it very unusual. T-SQL seems to lack all of the facilities I require to actually do that job, and my boss won't give me even the leeway to write new .NET code for the database, which would make my job not only easier, but the project more maintainable - I think because a previous programmer poisoned the water with his abysmally bad code. (The company now fears all .NET code. It's that bad.)

I think the matter of maintainability outweighs all of the "myths" this article "busts," but the article addresses none of that.

tewolde · on Aug 1, 2011

From an architectural standpoint I've always found that separating concerns is "always a good thing". Getting the database right is a difficult enough task without adding in extra complexity of the logic layer.

My recent experience has actually pushed into the more extreme conclusion that even the database task itself is too complex for a single tool, so we use different kinds of databases to accomplish different aspects of the same db task.

If you add the logic layer to this picture it becomes apparent the most valuable asset on has is flexibility, keep your tools simple and focused.

canadiansaur · on Aug 1, 2011

Some of their arguments seem debatable, but my biggest issue with stored procedures with Oracle is purely economic - It is MUCH cheaper to scale the application layer than to scale the database layer. Therefore, if you want to scale cost effectively, and you want to use Oracle, you should put as much logic in the application layer as possible.

The article says 'if you have millions of users, you should be able to afford decent hardware.' - that is misleading, because the cost of the hardware is miniscule compared to the cost of the Oracle licenses you would need.

DrJokepu · on Aug 1, 2011

I thought this whole stored procedure debate was settled years ago. I had the impression that everyone agrees that stored procedures are useful for queries inherently procedural in nature (so that they are very difficult or impossible to express in the functional style of SQL queries) because they save the roundtrips between the database server and the application server. For CRUD queries, they are just overcomplicating things with minimal, if any, benefits.

darklajid · on Aug 1, 2011

I had to do a double take and check the posting date when I saw the 'can be version controlled' part, complete with a _CVS_ screenshot. Wow.

My take: Stored procedures are a valid tool that belongs in everyone's toolset. I dislike them for one reason only: I'd rather define as much as possible in _one_ language. Having a codebase that is mixed between languages can be worth the tradeoff, but I'd like to avoid it where possible.

tluyben2 · on Aug 1, 2011

When and where did Oracle say that this Exadata thing can run Facebook (or what handle Facebook's entire computing load may mean)? Any data about it?

johnwatson11218 · on Aug 1, 2011

What about the idea that it is still way too easy to mess up 'for' loops in pl/sql? You have to track your loop control variables manually and it is not uncommon to have nested loops that span several printed pages?

What about the fact that implementing a hash table is an advanced topic in pl/sql. You get dictionary style data tables that can only be indexed by binary integer?

Encapsulation is nice and everything ... but oh yea the entire database is visible to all the code all the time. What does data encapsulation even mean in that environment?

What about memory management? What do you do when your code is using too much ram? It seems much less clear than in java.

What about the fact that I have never been able to measure a significant runtime difference between pl/sql and raw jdbc code? Usually the cost of inserting rows into a heavily indexed table outweighs all other factors.

I'm not opposed to running code in the db but in practice everything feels like it is 10 years behind the state of the art.

ZoFreX · on Aug 1, 2011

> If your business logic is not in the database, it is only a recommendation.

Absolutely fantastic.

This is brilliant timing actually, as I've been wondering lately why stored procedures are so maligned as they do seem to have a lot of advantages. As well as being informative and well-written this post is entertaining, I enjoyed it a lot :)

room606 · on Aug 1, 2011

If you're lucky enough to be using PostgreSQL with pl/python, you can write your stored procs in Python

https://www.postgresqlconference.org/content/speeding-django...

romaniv · on Aug 1, 2011

I don't agree with a lot of the things said in the article. For one, stored procedures are not just files, they are a part of your DB schema. Your SQL code (whether PL/SQL, or T-SQL, or whatever) is actually executable code that modifies that schema. The schema also includes tables, types, functions and so on.

If you want to do meaningful version control for your DB, you need to do version control for all of those things too. It's not impossible, but it is very difficult, because databases also have data. If you want to add a row to your table, for example. you can't just change the code you used to create it. Instead, you need to write new code that modifies the existing schema by adding columns.

thirdstation · on Aug 1, 2011

I created a customer service app years ago and decided to put most of the database logic into stored procedures. The front end was Cold Fusion and I figured they would get rid of Cold Fusion way before Oracle (which they had recently adopted company-wide).

o_nate · on Aug 1, 2011

Stored procs can actually improve the separation of business and logic layers, because they help to encapsulate the data layer from the logic layer. You don't want the logic layer to know too much about the internals of the data layer. Stored procs give you a clean, abstracted interface.

Also, I wouldn't be too much of a purist about the data/logic distinction. Sometimes doing a little processing on the data in a stored proc before returning it makes sense. On the other hand, I would hesitate to put very complex logic in a stored proc. So as usual, there are going to be grey areas. Do what makes sense.

garyrichardson · on Aug 2, 2011

so ... much ... venting .. to .. be ... done.

Clearly a sales pitch for Oracle tools.

SP are just fine if your building Enterprise apps being used by a few hundred people.. you're environment is fixed and probably a handful of developers will ever touch the code.

Could you imagine writing a huge open source app using SP? Not every has the same toolset, and pretty much most of them won't have access to Oracle's debuggers/version control/etc.

Also, the argument of "If you have a few million users, you can afford to scale your database up" is complete bullshit. If I were the marketing department for Oracle, this would be the sort of seed idea I'd be trying to place in order to sell more licenses.

amaeze · on Aug 2, 2011

any tool/technique can be abused/misused. some more easily than others. except you are developing for fun, you need to take ability to misuse (and the ramifications) into consideration.

productivity and maintainability IMHO should be the most important decision makers when choosing what to use. what tools allows me to be the most productivity as a developer? who easy is it for others I work with to change what I've developed? optimize later.

they are very successful solutions developed in a variety of ways. let's not forget that. arguing over what is best is missing the point.

ck2 · on Aug 1, 2011

Good article but I wonder how long until a C&D appears for diluting the brand name (along with images).

Could argue "fair use" but since lawyers know you cannot afford to go to court...

swanson · on Aug 1, 2011

I wish there were some links to some of the tools used in the screenshots, particularly the one showing package dependencies and the autocomplete editor.

Anyone happen to know?

F_J_H · on Aug 1, 2011

I believe it's Oracle SQL Developer:

http://www.oracle.com/technetwork/developer-tools/sql-develo...

chadgeidel · on Aug 1, 2011

I use an older version of Oracle SQL Developer as we aren't allowed to install the latest and I usually turn the autocomplete stuff off. Half the time it's awesome, half the time it takes several seconds, during which the IDE is unusable. This is most likely due to me being remote to the database which is about 1700 miles away.

y0ghur7_xxx · on Aug 1, 2011

Quest TOAD

http://www.quest.com/toad/

peregrine · on Aug 1, 2011

Microsofts SQL Management Studio does this also.

thorin · on Aug 1, 2011

Also have a look at the tools from Quest software E.g. Toad and Pl/SQL Developer. Both great tools

br1 · on Aug 1, 2011

Another take on moving logic to the DB server: thehelsinkideclaration.blogspot.com

babebridou · on Aug 1, 2011

As the former lead dev on a rather large Oracle-based architecture, I feel obliged to chime in.

Some of this article is extremely true, a couple things are utter bullshit though.

I'll go with the bullshit first.

1) source control

Source control is mandatory, so you have to do it anyway. But for any Oracle architecture where there is actual PLSQL work beyond funky table functions, meaning triggers, jobs and their ilk, you need to keep around a large schema with enough consistent data, especially with all the issues one can have with the way Oracle optimizes the execution plan of its queries depending on the statistics of each table. You need this information when you develop, so you need amounts of data that approach the order of magnitude of the production system, which is potentially huge.

It's simply impossible for every developer to enjoy their own instance, unless each dev has his own server, and even then: there's far too much maintenance to be made on an active Oracle database, and once a certain scale in the schema has been reached it's practically impossible to maintain an acceptable level of performance on a frigging desktop, especially when your devs are doing more than just PL/SQL, but, also, you know, work on the other tiers.

So in the end, you have one dev instance, two at most, and you keep versioning of the code just for looks because unless your system is trivial (only reversible transactions), any procedure, job or trigger that is executed on a database will massively change it forever. So in the end if your business logic lies at least partly on PL/SQL code, older versions of sources are only kept as a trace, not for proper versioning.

Besides, there are so many encoding issues and your database is so critical that you can't even risk anyone commiting a file rogue-style because if the sql was edited with textpad, notepad, notepad++, created with touch, right click + new, saved as from toad or came from a copy-paste of an e-mail sent from an ipad, the encoding will be different and will mess up tons of stuff. Things like accents in code comments can be interpreted badly and "eat" the following line feed character, which in turns comments the next line, which can lead to ultimate data corruption if it concerns a piece of code that is invoked by a trigger.

This is a risk that you don't, ever, want to take. So your PL/SQL SVN server is only for the two/three people that are entitled to actually write the Patch. Don't leave anything to chance here. The rest of the devs follow the chain of command of writing something, testing on the dev server, crashing it and eventually ruining it, send their script by email or dropbox or whatever to the devs in charge of the patch, who then use SVN to version it properly.

2) Unit testing

Unit testing is cool and all, but if all it does is essentially compile scripts. You can only unit-test functions if they are isolated enough, and isolation is all relative when the code is in the database. Let's unit test a select table function: first you disable triggers on this table, then you insert test data, then you do your test, then you delete the inserted data, then you reactivate the triggers. Happy? No! Why? Because it will take me a week to write all the test case inserts & delete!

I'm very caricatural here, but this is what happens in real cases. Cases where the unit test is very hard, much, much harder to design than the code it's supposed to test itself, not because of bad logic or whatever, but because in the end, it's mostly data that's required, and sometimes a simple select unit test requires a metric ton of one-line table inserts and deletes, and there can be a factorial quantity of test cases for tables with more than a couple dozen columns. "Just peer review" is the sensible and agile way to do it. TDD is not.

The rest of the article is mostly spot on. I loved the power of PLSQL for this particular system (railway undertaking system). I lack distance still, but I have no idea how we would have handled it had the customer said "no oracle, use [insert NoSQL tech of your choice]". The kind of calculations that were required was: "okay this train here is 5mins late and that train there is 6mins late, we have 6 thousands concurrent wagons circulating in the system that need to be rerouted before any train, anywhere, leaves its station, just warn me asap if this incurs any delay in the deliveries of high priority merchandise within the week, but if possible, fix it automatically yourself".

makay86 · on Aug 2, 2011

hiiiiiiiiiiiiii

makay86 · on Aug 2, 2011

gnghhng