I'm here to declare my love for sqlite, a database that doesn't asks me to choose a root password (wtf, everybody hate passwords) and an application user password (double wtf), doesn't open a socket (being it unix on TCP), doesn't install a daemon somewhere in the system, doesn't need to be root to start, doesn't asks me to create a new dba user in my system, doesn't require me to be understand how the root daemon starter hands over the control to the dba user when something goes wrong at startup, and generally doesn't bother me with stuff that has really nothing to do with storing and querying data.
I don't run this in production only because I would have a hard time convincing everybody to do it or by cargo-culting my fears (depending on the project). But one day I will.
There are very few instances where I end up in a position where I have to decide whether I need to use postgres or sqlite, they have very different use cases.
So in short... if anyone gets the chance of an sql injection, they have access to all tables and all schemas from any app that can access your database. Privilege separation is a very good idea. Not opening the database file directly is a very good idea too.
Just to be clear, I'm not saying every application needs it. If you actually use it as a replacement for local files, that's great. If you use a multi-user app containing sensitive data - "root password (wtf, everybody hate passwords)" this is so wrong. Reminds me of "chmod 777 everything" - wtf, everybody hates dealing with permissions.
The gp point that sqlite gets rid of all the "stuff that has really nothing to do with storing and querying data" still stands.
Sqlite eliminates almost all of the management overhead that other databases force on you, so its trivial to have one database per application. Indeed, I have never seen anyone bother with a multi-application sqlite db, which could also cause serious performance problems. So that renders you sql injection point moot.
As for permissions, its better to let the OS handle that than some half-baked database permissions system. Root accounts/passwords are a wtf in a production environment because they destroy auditability. If you really want permissions, it is trivial to create system user accounts for the application and assign the correct file permissions, or even isolate particularly sensitive information in its own file. Combine this with AppArmor to further lock down individual applications.
All of this is overkill, but so is a full blown RDBMS in many cases. If you are starting out building a new application, just target sqlite until you actually need something bigger. Odds are, YAGNI.
> Sqlite eliminates almost all of the management overhead that other databases force on you
Multi-user support is what forces that overhead on you.
It's just that SQLite consciously aims to be a single-user system.
> As for permissions, its better to let the OS handle that than some half-baked database permissions system
This is nonsensical. There isn't an isomorphic mapping from the Unix permissions model to SQL's permissions model, particularly when you start introducing stuff like row-level permissions.
If you write an application which uses a DB backend, and you require multi-user capability, you can either delegate some of that to the database, or roll your own. In which case, half-baked permissions systems are back! Especially if you can't isolate database connections (because it renders any app-code permissions system moot if you allow 3rd party code execution, such as plugins).
All that overhead is there for a reason. Those reasons are irrelevant in the single-user case. But they are not irrelevant in the multi-user case.
SQLite's classic use case is where SQL-like access to data would be convenient but a full RDBMS would be tremendously inconvenient, e.g. embedded systems. People seem to be citing inappropriate environments for SQLite as reasons to say it is of no value. You get the same sort of silliness from NoSQL zealots.
> As for permissions, its better to let the OS handle that than some half-baked database permissions system.
Sorry, but that doesn't make sense. For example: if you want your application to be able to add new rows, but not delete them (for audits actually) you cannot do this with OS permissions. Only database-layer permissions will make it possible.
There are many other examples of such use-cases. You simply cannot use OS/apparmor/selinux/whatever external to the database to grant/revoke per-table access.
> Root accounts/passwords are a wtf in a production environment because they destroy auditability.
On this side however - why would anyone know / have access to your root account? Applications should never have it. Servers should never have it (in a form different than the hash). Only people should have it.
You could, as you mentioned, set up a new user for each database. At a certain point, the permissions systems in a traditional RDBMS may be easier to manage.
The stuff that would get rid of the SQL injection, is the database API, removing the general purpose query(string) function. And forcing some structure on the parameters.
First a little breaking to get the easy injections out, and progressively forcing the whole host language mapping of the query, including the result schema. People will scream about the debugging, but they will gain a little bit of stuff in exchange (like projection in a variable, where clause re-use etc).
It's a single user database, hence it doesn't need multiple users. It's great for a single user application (e.g. Firefox) but not at all good for multi-user, multi-session applications.
I don't intend to say its no good, btw. It's fantastic! But you have to use it where it is meant. You aren't comparing apples with apples I'm afraid.
If you are using sqlite for hosting a production scale app, you are doing it wrong.
If you want an database for your mobile app, or a web app use a full fledged database like MySQL or Postgres.
sqlite is for embedded storage applications, especially in the backend. Where the resource utilization foot print is supposed to be minimum, and where you running into situations where are you are reimplementing SQL like DSL's to query and write to files. Or say you just wish to maintain some structured form of inventory.
Best example I can give you is a web browser. There is a requirement to store a good deal of structured information, in a mechanism which requires little resource utilization, which doesn't require additional software like demons or a server to run or complicated deploy and maintain scenarios. Or even a strict configuration.
If you look at this use case you will see tons of backend application would be better off using a tool like sqlite than play with huge xmls and endless rewrite parts of SQL badly.
Most of this is not true for PostgreSQL either. The only two things PostgreSQL needs to run is a daemon (it does not have to be actually daemonized) plus a unix socket, passwords, user accounts, root privileges, etc are all optional.
Sure SQLite (which I love) is easier to get running, but do not exaggerate issues of traditional RDBMs.
In PostgreSQL 9.3 which will be released in a couple of weeks they will also fix the cases where you may need to use sysctl to iincrease shared memory limits.
My favorite thing about SQLite is the business model. My understanding is the product is free, but they charge for their test suite. If you build hardware that is going to use SQLite, then you should pay for the test suite, and that funds the future development. It is a tautologically perfect business model. Revenue toward a rock solid project pays to make it even more rock solid.
I don't know how much the test suite costs by itself, but it is included in Sqlite Consortium membership, which is a typical open source support-style business model. For $75k, you get almost a month of sqlite developer time and tons of support (eg their home phone numbers). Not a bad deal for companies that want deep sqlite integration or customized versions that fit on tight embedded systems.
Given that SQLite is intended as a replacement for fopen(), I'm not sure how many normal files an application uses (or should use) are (or should be) encrypted. I cannot think of a lot of use cases.
At work we need to because we are distributing map data that was licensed to use under the condition that we don't ship the data unencrypted to the clients, however nonsensical that is, given that you have to distribute the encryption key the same way.
But for use cases like Lightroom which stores its catalog in an SQLite database or Far Manager which stores the config there I very much doubt they should be encrypted by default.
> Given that SQLite is intended as a replacement for fopen(), I'm not sure how many normal files an application uses (or should use) are (or should be) encrypted. I cannot think of a lot of use cases.
SQLite has been under development for thirteen years and has hundreds of releases. I don't think the "feature X? you're doing it wrong, we're just a simple replacement for fopen()!" refrain holds a lot of water at this point.
FTR when you work in the financial industry, all kinds of files and databases get encrypted. I agree that encrypting a file or database is not extremely secure since the decryption key is sure to be nearby, but it's still an important need, there are security audits and all kinds of things that call for encryption when available.
That said, it's the financial industry. If the encryption feature is offered as a paid one, that's not so terrible.
Back in the sands in the time Sun also did this with Java. IIRC the associated license was a great cause of pain for open implementations of the JVM, since without passing the suite you couldn't call yourself Java, and with passing the suite (insert some horrible restriction here I forget)
Without passing the suite, you couldn't get the patent licences, so Sun could put you out of business if they chose to. Sun used the test suite to define what was and wasn't "Java" (you had to pass the whole test suite or you couldn't ship product) & thereby prevented anyone else from using the core language in interesting ways that Sun didn't approve of.
Google did an end-run around these restrictions by using the Dalvik virtual machine instead of the JVM where the crucial patents lurked, so Android didn't have to implement the whole of the Java library before they shipped.
SQLite is one of the most important pieces of software you never see. I've used it in projects from many years back to recently and you know what's awesome about it? It Just Works.
Unless you want to use it with multiple users and connections, in which case it stops working. It's good for embedding data storage in clients and testing.
Until 2010, I ran a forum on SQLite with 600 users/day + around 1000 posts/day on a single OpenBSD machine with 2gb RAM and 7200RPM HDD. We moved to Postgres when we passed 1600 posts/day.
There was some minor slowdown noticeable around 6AM - 10AM, but besides that, we never had major issues. The bottleneck was usually the network, not the DB.
You can run something like that with no noticeable performance issues on a classic 120MHz Pentium and flat files (I'm speaking from experience of running a USENET news server handling 30,000 groups for an ISP on a machine like that, which was shared with lots of other functions), so if the bottleneck had been anything other than the network, it'd have been shocking.
I thought SQLite locked the entire DB when doing an INSERT. Seems like that would slow a forum app considerably. Though I guess at 1600 INSERTs per day, which averages to about 1 per minute assuming even distribution throughout the day, you won't have that many lock collisions.
The sqlite locking model can eat 1600 inserts before breakfast with only minor lock contention. The whole file lock doesn't take effect until the moment the write is ready to go to disk.
Writes were generally quite fast so the next write rarely got queued. Also we ran it single-threaded. When we moved to Postgres, that also gave us the opportunity to safely switch to phpfpm without worrying about the lock issue.
The nice thing about Postgres as opposed to SQLite is that we finally got "searching" as a feature. ;)
For comparison, 8 years ago I wrote a queuing system that did either in-memory queues, or wrote to sqlite for durability. With a little bit of tuning, the Sqlite queues could handle at the very least hundreds of thousands of messages a day - we never pushed it to it's limit.
We always limited transactions for UPDATES and INSERTs that involved more than one table.
It was quite the learning experience since you do have to think of different ways of working with data. You get to learn very quickly the difference between what like to store vs. what you actually need to store to get a particular functionality. But that turned to being a boon in the end because we got a much simpler forum as a result. Fewer bells and whistles meant users focused on actual discussion and not ancillary, shiny bits and bobs.
But on a machine with 2GB of RAM running a forum that small, pretty much none of those selects should ever need to hit disk - it should all pretty much be in the buffer cache, unless there were lots of full text searches of the entire posting history.
EDIT: And that is without any app specific caching.
Put simply, you thought wrong. SQLite is one of the most-used pieces of software out there, absolutely including production software. http://www.sqlite.org/mostdeployed.html
Please clarify "for development purpose only"? SQLite is used in a lot of production software you rely upon daily, and is one of the most robust, useful pieces of code going.
Now it's an embedded database, which is why leokun's comment is out of place. However you can absolutely drop a queue in front of it, for instance, and serve many "simultaneous" users for a purpose such as a web forum.
I don't think you were downvoted because you were wrong. I like to think the HN community isn't that shallow.
Reasons for downvoting is usually due to posts containing no new information, excessive snark, flippant attitude etc... Users and moderators are extra touchy about avoiding the Reddit ambiance here. Basically if it doesn't contribute to the discussion, then it may be downvoted. That said, I didn't downvote you because what you said allowed me to add to the discussion.
As of version 3.8.0, the SQLite library consists of approximately 84.3
KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or,
in other words, lines of code excluding blank lines and comments.)
By comparison, the project has 1084 times as much test code and
test scripts - 91452.5 KSLOC.
"Query planner are what make SQL database engines so amazingly useful and powerful. (This is true of all SQL database engines, not just SQLite.)"
No it isn't. I have been using SQL databases which got absolutely horrible query optimizer totally ruining any performance exceptations. I have blogged about that, and I was very frustrated and disappointed with results.
Which is faster if id>100 or simply walking database by using series of selects select ... id=101 ... select ... id=102 ... Turns out that the later method is a much faster. Engineers also then suggested that I could optimize the query by using structure select ... id=101 UNION select ... id=102. That's simply insane. How ever the query was written, query otpimizer / planner should make sure that all these queries are delivered using similar background process. Of course that walk is impossbile to optimizer, because it's series of individual queries. But it's just crazy that select id>100 is slower than that!
Yes, you might have guessed that the SQL database I'm referring to isn't SQLite.
And finally, I love SQLite and use it for most of my (work & hobby) projects as local database.
Just curious, what database is that? If the index on id is a B+ tree rather than a hashtable, it's very unlikely id > 100 would be slower. You are issuing one query with it vs. N queries with the 2nd approach.
Were there a lot of qualified rows for id > 100? If there're a million rows qualified for id > 100, which way is faster?
The optimiser might not be using an index in the most optimal way. Sometimes you have to guide the optimiser and it may not have enough statistical data to get the best plan. One way of doing this is to rewrite the query. In this case, if I had to hazard a guess, the developers have suggested you break down the where clause from OR conditions to instead use a UNION ALL, which may help influence the optimizer to choose the most optimal plan.
Adding statistics or an index may help. Just understand that if your data distribution changes radically the index or rewritten query may no longer be valid.
I love query planners and optimisers. However, here's where I get a bit bemused: as it states in the article, a query planner must often go on incomplete information. Therefore you have to know the underlying assumptions built into the planner.
This is where explain/execution plans are great. Unfortunately, even these aren't completely documented. SQL Server is not bad, and Postgres is not only well documented but you can read the source. Oracle is often a basket case - there are literally thousands of undocumented parameters that you can set, not to mention underlying data views that also haven't been documented. In fact, there have been whole books trying to unravel the Oracle CBO, but even the most obscure often admit that Oracle fix optimiser bugs which can cause an impact on query plans.
In some cases, you must implement hints, because there is no way of catering for all queries under all loads based on all possible data distributions. Sure, you can add a variety of different types of indexes, histograms and other statistics to try to influence the execution plan, but at the end of the day sometimes you simply have to add a hint to force a particular sort of join, etc.
I guess my main point as I muse on query optimizers/planners is that none of them are perfect. It occurs to me that one of the best so far is actually Postgres, but alas - they don't allow for hints.
What's my main point here? It really comes down to the fact that the ideal for database queries is to allow for a declarative syntax ("give me all rows of product x that are over $100") without having to worry about the mechanics of retrieving the data, it's just not entirely possible. You really need to give developers and database folks the ability to nail down the actual mechanics of data manipulation.
The write up of the NGQP seems (at least superficially?) weird: they say that the reason for the rewrite was poor performance on a particular artificial benchmark (8-way join.. how often do people join 8 tables in practice?). At the same time, I was surprised to learn that the only kind of join algorithm SQLite supports is nested loops. I would think they would benefit more from adding hash and merge joins instead of rewriting the QP from scratch.
If I were to judge based on my current experience, I'd say more often than they make queries with less than 8 joins. In the stored procedure I'm working on now, there are many queries with up to 20.
People in this thread say SQlite is fine for apps with lots concurrent writes (people advocate running forums and other multi-user apps using it) but that's not correct in my experience. I wrote a blog[1] in Python that uses a Sqlite database and on each post view a view counter was incremented[2] using a simple UPDATE ... views = views + 1 on the post row. Removing this update caused page speeds to increase hugely, from ~10 requests a second to ~200 or so. Maybe I was doing it wrong, but I can't see how.
I'm not familiar with the ORM here, but is it really just an increment ("update posts set ctr = ctr + 1 where id = 9") or a select, add one in code, and then an update? Also bear in mind that one operation in an explicit transaction like this is quite slow, at least prior to WAL.
Ah, the beta presupposition:
SQLite version 3.8.0 is actually one of the most heavily tested SQLite releases ever. Thousands and thousands of beta copies have be downloaded, and presumably tested...
I really wish there was something like sqlite to replace MS Access - actually I think what I really want is something to replace MS Excel that isn't MS Access. The ability to have a self contained file based database that can be moved around and shared, but that can also hook up to external data sets has a lot of use cases in enterprises. But it would require a good query view, the ability to create forms and reports, and have a sane sql dialect.
SQLite version 3.7.15.2 2013-01-09 11:53:05
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table a(id integer);
sqlite> create table b(id integer);
sqlite> create table c(id integer);
sqlite> select a.id, b.id, c.id from a
...> left outer join (b join c on b.id=c.id) on b.id=a.id;
Error: no such column: b.id
SQLite is now the only major database I'm aware of that still can't handle this syntax (postgresql, mysql, oracle, SQL server, firebird, etc., no problem. I'm also well aware of how to use a SELECT subquery as a workaround).
Supporting some crude form of stored procedure would be nice too (you can already add SQL functions, why not procedures?), so that a 3rd party can provide semi-native ALTER support (don't worry, I don't want to put ALTER into SQLite itself, I understand it requires rewriting of table data, I'd just like there to be a standard implementation of that approach somewhere).
Is there something about sqlite that postgres can't come in to replace it completely (a kind of dynamic postgres that starts up and shuts down like sqlite). With mysql,postgres,and sqlite, the open source db industry gets diluted. Some kind of merger would be great.
edit: to below posters, I'm fully aware sqlite is embedded, i use it everyday for android. I'm just asking if there's actually a technical reason why sqlite and postgres (and mysql) couldn't be merged. A db that can be both embedded or run as server. This would reduce the duplication of effort in open source db community.
SQLite isn't a replacement for postgresql (or vice versa). SQLite is a replacement for fopen.
Among its many advantages in that situation: a database is contained in a single file with minimal overhead, the source is shipped as one .c file and encouraged to be statically linked, and the code size is relatively small (compared to something like postgresql).
Why exactly do you need stored procedures in sqlite? You can just write code. Are you using it in a scenario where you can't write code to interact with sqlite?
Why would you want to use PostgreSQL in the contexts that SQLite is used?
SQLite is fantastic when you need a data store for mobile or embedded use, you'd like ad-hoc queries over your data with the option of adding indexes to speed up frequently used ones, you want transactional update to the data, and multi-user access is not a problem.
It's basically a much better replacement for the hash table on a disk approach of DBM libraries.
PostgreSQL would just be total overkill for this kind of scenario.
I just would like the simplicity of use of sqlite on my server. No socket, no connection, no driver that has to match the server, no config file in /etc, no user in /etc/passwd, no cruft.
The separation between the client and server that brings all the 'cruft' makes it a lot simpler to have high performance concurrent operations, a major advantage of a full DBMS over SQLite
> I just would like the simplicity of use of sqlite on my server.
The simplicity comes from largely from not supporting multiple user accounts (authn/authz are left to the owning application), and secondarily (I think) from not being designed to support large numbers of connected processes. If your server need fits that profile, then you can use SQLite and benefit from its simplicity. But if -- as is often the case -- it doesn't, you can't, and its not because no one has merged SQLite with PostgreSQL, its because there is inherent complexity that comes with serving certain use cases.
If postgresql could have a single file, embedded mode, then it could replace sqlite.
The advantages of such a design would be:
1. All the optimisations and transaction guarantees of a full-featured DB
2. All the conveniences of a single file DB
3. Better utilisation of open-source resources as the OP points out.
On the other hand, the postgresql system might be bigger (in terms of code size and heap usage) than sqlite. So it may not be feasible to use in all embedded applications.
I think embeddable mode is more realistic than single file since the single user mode used by for example pg_upgrade is quite close to that. Some features are missing though for wide use.
I guess when you want to use a pure Java solution. H2 is good that it's just a jar file. You pre-configure and package up everything in one bundle. There will be no setup. Things just work when you copy the bundle over.
I don't run this in production only because I would have a hard time convincing everybody to do it or by cargo-culting my fears (depending on the project). But one day I will.