Postgres: The Bits You Haven't Found

parfe · on March 2, 2013

The slides recommend UUIDs as a primary key ("just use UUIDs. seriously"). I took a look at the manual and there isn't too much of a Why (as well as 4 different versions of UUID to choose from). Anyone have a write up that explores UUIDs vs integer primary keys? I was only able to find people asking permission ("I have X with a UUID, can I use it as a pk?) rather than people talking about Why you would prefer UUID as the default.

Sidenote: Literally any button press you could reasonably expect to move to another slide works. PgDown, Right, Down, Space. Stop shitting up a decent submission to whine about it.

pvh · on March 2, 2013

The biggest reason is that your IDs become universally unique - across shards, across database recoveries, rollbacks and session problems, you name it. These are all the kinds of things that can happen over the lifespan of a dataset.

In Postgres, UUIDs are stored in 128 bits, so it's hardly any overhead in almost all use cases, so if you can save yourself a day or two, or three, or a week of work just by switching your mental default from an integer to a UUID, that's a pretty big win.

gingerlime · on March 2, 2013

I see some very good points were already made, but I think UUIDs are also good for at least three other reasons:

1. Offline / sync - you can have out-of-sync or offline clients (e.g mobile apps) inserting new data without worrying about pk mismatch or collisions.

2. security - it's obviously a bit of security through obscurity, but not being able to guess the key of an object can be very useful some time. It shouldn't be relied on as a first line of defence, but it gives you a naturally impossible-to-guess reference. If you want to avoid someone crawling through your image-base or any object that is generally accessible, but you want to avoid having it accessible in-bulk, UUID is your friend.

3. Silly mistakes - If you pass an object id, you really want to know it can only be for one object. Maybe the front-end is sending an id of one object, but the backend is working on another (shit happens). When you use sequential ids, you increase the chance of weird stuff going on. With UUIDs you simply can't use the wrong id, even by mistake.

FooBarWidget · on March 2, 2013

I've found UUID and other kinds of random primary keys to be very bad for performance once the data set no longer fits into memory. When that happens, the working set of the primary key index becomes that of the entire index, so you end up with a lot of random disk I/O. Unless you have SSDs, your database performance then goes down the drain.

Of course you can also use a UUID that's less random, e.g. something with an incrementing first part and a small random last part.

jacques_chester · on March 3, 2013

If you want UUIDs to double as a nonce (I use them this way), then you will need to use v4 UUIDs and yes, they are basically impossible to index because they should be normally distributed across a very large range of possibilities.

If you want incrementing UUIDs, use v1 UUIDs, which are based MAC address + a timestamp. These change in a predictable way and are more indexable.

jacques_chester · on March 3, 2013

s/normally distributed/uniformly distributed/

This is what happens when I try to sound cleverer than I actually am. :(

sunjain · on March 2, 2013

Using UUID(or anything other than sequentially increasing numbers) for primary keys is a good idea because when you have high number of concurrent inserts happening, if you have a sequentially increasing primary key(implemented via a b-tree index), the inserts into index will result in hotspot(as all the inserts will be at the end, in the same place). Whereas if it is random number (like UUID), the entry into indexes will happen at different places thus avoiding hot spots.

sandGorgon · on March 2, 2013

isnt a hotspot a good thing? since it is localized data, ergo less cache misses. is the complexity of an insert >>> cache miss penalty.

Someone · on March 2, 2013

Yes, but with many treads modifying the same structures, you end up doing a lot of locking.

So, your system would not be memory bandwidtht constrained, not because it has many threads using roughly the same memory blocks, but because it has one thread at a time working on those memory blocks.

helper · on March 2, 2013

You will find that UUIDs are often used as primary keys in distributed systems. By their nature you can generate them independently without worrying about collision and without any coordination overhead (e.g. querying a central authority).

abalone · on March 2, 2013

Instagram (a big postgres user) blogged about it. The main reasons they didn't use UUIDs: storage requirements and lack of a natural sort.

http://instagram-engineering.tumblr.com/post/10853187575/sha...

saurik · on March 2, 2013

Sort was a design requirement, but not a reason not to use UUIDs, which are trivially sorted by date (in fact, Cassandra not only does this, but in many contexts actually renders UUIDs as timestamps). So, while that rules out various other implementations, the issue with UDID was just the size.

What they ended up with was pretty much UUID but for a more limited deployment: the real issue is just that UUID is extreme overkill, designed to allow every individual user to generate unique identifiers (something that sounds cool, but in practice is worthless as you can't trust the clients).

stock_toaster · on March 2, 2013

> Sort was a design requirement, but not a reason not to use UUIDs, which are trivially sorted by date

Assuming UUID v1 right?

saurik · on March 3, 2013

Correct. However, that is both the generally fair assumption when dealing with use cases that actually care about UUIDs, and is the fair thing to compare Snowflake against, as Snowflake is pretty much a 64-bit limited-entropy implementation of the general concept of a v1 UUID.

jokull · on March 3, 2013

I was under the impression they went with a home baked id generator was to be able to encode the logical shard id into the id. I think UUID 1 encodes info about the hardware (mac address), but their data changes physical hardware as they split their data up to new shards.

saurik · on March 4, 2013

Where the data ends up is not important to its identifier: the reason to have a "logical shard ID" is to provide for sequence and timestamp uniqueness.

The way to think about the problem is that at the granularity of your timestamp you lose the ability to uniquely generate identifiers across multiple nodes (on the single node, this is handled with the sequence number), so you need some kind of identifier for the instance of the generator itself: with UUIDs, they spend a relatively immense number of bits storing the computer's MAC, but if you know you know you only have a smaller-scale deployment and are able/willing to centrally plan identifiers for the deployed nodes, you can get buy with something akin to this "shard ID".

(BTW, note that a v1 UUID need not be generated with a true MAC: the specification both notes that you can just by node addresses from them for a fairly small cost, or to use fake addresses that are marked as such by setting the multicast bit. The result is that if you want to use UUID v1 off-the-shelf with "shard IDs" you are more than welcome to do so, not just in the "analogously-equivalent" sense but in the "to the letter of the spec" sense. You can find more information in Section 4.5 of the UUID specification.)

abalone · on March 10, 2013

Actually they also noted that it helps with easier mapping. By including the logical shard ID in the ID, they don't need to keep a giant index of IDs-to-shards to figure out which machine an ID lives on. Just a tiny mapping of logical to physical shards, which every app server instance can cache in memory.

One less moving part at the expense of wedding yourself to a prepartitioned logical shard scheme, I guess. (I wonder how painful it would be to rebucket data into a different logical shard should the need arise...)

pvh · on March 3, 2013

The Instagram guys said they didn't actually bench using a UUID + created_at column when they spoke at SFPUG, they just used their approach because it was recommended to them and seemed to work. That said, it is a pretty awesome robust mechanism, it's just arguably complex and not proven to be necessary even at their scale.

eksith · on March 3, 2013

Actually don't use UUIDs.

We use something very similar to what Instagram does with Postgres : http://instagram-engineering.tumblr.com/post/10853187575/sha...

This allows far greater flexibility with regard to sharding and scale. Bigint with a function generated nextval will sustain up to 9223372036854775807. If you need more than that, your next option is Numeric, but I don't think we'll exhaust bigint any time soon.

jacques_chester · on March 3, 2013

You still need to correctly merge the records for aggregate reporting.

If you use UUIDs, this is easy. If you used SERIAL, you now need a scheme to intelligently merge records. Not hard when everything is in a single table. A bit tougher when you have a wholly or partly normalised schema, because now you need to come up with correct renumbering schemes for every single table which has another table holding a foreign key into it.

I can tell you from personal experience that this is not fun.

v1 UUIDs are "guaranteed" to be unique, modulo people fiddling with your MAC addresses. v4 UUIDs are random keys and the odds of collision are so infinitesimal that it's worthless to fuss about it. v4s also have the nice property that they can double as nonces, but the sucky problem that they are not that great for indexes.

Then there's stuff like replaying logs reliably on multiple machines, blah blah blah. Every instance of 186360 is indistinguishable from every other instance of 186360. So you will need additional logic to keep them straight.

But take this uuid:

69c8e822-83b0-11e2-a6a5-001f5bfffe14

The odds that this has been created anywhere else, at any point, is as good as zero. You're more likely to have an integer key collision due to cosmic radiation flipping some bits.

So no special merging / field-source tracking logic is required.

eksith · on March 4, 2013

I can't be too sure of the specifics, but our db admins decided against UUIDs after a few days debating and lots of coffee.

We aren't using serial, but instead using functions to generate our own nextvals with the server IDs, epoch etc..., kinda the same way as Instagram (though, obviously our databases are no where near the same size). It came down to how much logic we were comfortable implementing in the schema from the start to avoid continuing complexity in the applications themselves.

jokull · on March 3, 2013

I’m not sure I understand. The way Instagram do it, they ensure uniqueness across shards so reporting should not have to merge anything.

malkia · on March 3, 2013

I guess the point is, that you don't need to "ensure uniqueness across..."

ExpiredLink · on March 2, 2013

UUIDs are used as surrogate keys: http://en.wikipedia.org/wiki/Surrogate_key

chris_wot · on March 3, 2013

Integer values are also used as surrogate values. Use of a natural key, IMO, is bad practice.

e12e · on March 3, 2013

Why do you think natural key(s) (w/index(es) as needed) are bad practice? Data model refactoring?

chris_wot · on March 3, 2013

A natural key as a primary key is terrible practice because I've never even once seen a natural key that wasn't tied to business logic. Logic that potentially can change.

Don't do it, you only make it worse for yourself if you pick a natural key for a PK!

parfe · on March 2, 2013

In a way that an integer is not?

ExpiredLink · on March 3, 2013

I may have misunderstood the question. A UUID is a (very long) integer.

stox · on March 2, 2013

One note, Postgres is well over 18 years old. Postgres using SQL is 18 years old. Postgres had its own query language which was replaced with SQL in Postgres95, which in turn became PostgreSQL.

rgbrenner · on March 3, 2013

PostgreSQL evolved from the Ingres project at the University of California, Berkeley. In 1982, the project leader, Michael Stonebraker, left Berkeley to make a proprietary version of Ingres. He returned to Berkeley in 1985 and started a post-Ingres project.

Starting in 1986, the team published a number of papers describing the basis of the system, and by 1988 had a prototype version. The team released version 1 to a small number of users in June 1989

http://en.wikipedia.org/wiki/PostgreSQL#History

bjourne · on March 3, 2013

> caveat: WITH expressions are optimization boundaries

This is a pretty big caveat and one of the rare areas in which postgres does worse than other database systems. In SQL Server a non-recursive common table expression is treated by the optimizer similar to a macro - You can break up a complicated query with unions and group bys into easier to read cte:s and be confident that the optimizer will piece them all together into a query whose execution plan is equal to the original one.

But with postgres you can't because of the optimization boundaries. Its optimizer will attempt to make each cte as individually efficient as possible which can lead to a much worse execution plan overall. You can use views instead which, in contrast to cte:s, postgres optimizer can see through.

gleb · on March 3, 2013

It's more subtle than "worse." Apparently CTE optimization fence is part of SQL standard, and it can be useful to hand-optimize queries.

But you are right, when using CTEs for readability it tends to hurt more than help.

Interesting thread about being able to control this: http://www.postgresql.org/message-id/201209191305.44674.db@k...

bjourne · on March 3, 2013

Yeah, I've also heard that claim being made on the mailing list but it is probably wrong: http://dba.stackexchange.com/questions/27425/is-the-optimisa.... For writable cte:s it matters, for read-only ones the number of execution times doesn't.

pvh · on March 3, 2013

Sorta. It can be handy for guiding the optimizer away from a bad optimization choice but that's really a bad habit to get into.

kermatt · on March 3, 2013

Are there more details about this? As a regular CTE use is both Postgres and MSSQL, but was unaware of this difference, I would like to know where I may be shooting performance to hell.

atombender · on March 3, 2013

From the docs (http://www.postgresql.org/docs/9.2/static/queries-with.html):

> A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary sub-query. The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)

In other words, something like:

    with x as (select * from users) select * from x limit 1

will be executed identically to

    select * from users limit 1

However, a query like this:

    with x as (
      select * from users
      join images on images.id = users.image_id
    )
    select * from x limit 1

will incur a huge overhead compared to:

    select * from users
    join images on images.id = users.image_id
    limit 1

In my test, the "WITH" query had a cost of 97840 and touched 8881 buffers, compared to a cost of 132 and 41 buffers for the second query.

In other words, the planner pretty much considers the "WITH" subquery separately from the outer query, and plans them separately.

pvh · on March 2, 2013

Navigate with arrow keys - these are slides from a talk I gave.

untog · on March 2, 2013

I can't find the arrow keys on my phone :/

njharman · on March 2, 2013

They are renamed "swipe" on mobile devices.

untog · on March 2, 2013

Definitely doesn't work on Android Chrome, at least.

melpomene · on March 3, 2013

Workng on 4.0.2 with Chrome for me. Landscape mode and then swipe.

yareally · on March 2, 2013

Some keyboards on mobile have arrow keys. I know Swype does and so does swiftkey.

edit: couldn't get the keyboard to pop up on Android, but putting it into landscape mode worked swiping slides (on the stock browser with Android 4.2 [not Chrome]).

yangyang · on March 3, 2013

So does Hacker's Keyboard, along with ctrl, alt and a load of other useful stuff: https://play.google.com/store/apps/details?id=org.pocketwork...

weaksauce · on March 3, 2013

Tap works well on iOS.

lysium · on March 2, 2013

Interesting read! I assume, this is all not portable? Or is there similar functionality for other DB, say MySQL (not that I am aware of) or Oracle?

pvh · on March 2, 2013

Some of it is in the standard, some of it is PG specific. MySQL is pretty terrible and doesn't support any of it that I'm aware of. I expect Oracle has their own versions of much of this stuff, but I've made it a hobby in my life to not learn Oracle.

ibejoeb · on March 3, 2013

Oracle has subquery factoring (the "with" clause) and recursion; native arrays, hash maps, collections, nested tables, and other complex data types; remote database links; queues (similar to listen/notify); regex querying; attribute-level timezone support; extremely powerful window functions; text search; tons of built-in crypto (if you really insist); and most of the related things on this list.

The biggest killers (vs. postgres): no simple helpers (generate_series, for example, is achievable with simple, efficient recursion, but you have to write the whole statement); a truly antiquated client program (sqlplus); built-in json, but you do have associative arrays/hash maps and nested tables, so you can achieve exactly the same thing, and it's indexable*; and range types (meh imo).

jacques_chester · on March 3, 2013

I think I've had Oracle blow up on multiple AS statements, so that's one place where Postgres would come out well in front.

Basically WITH-AS lets you declared temporary named subqueries. That is, a pretty stock standard programming technique -- give a name to a fiddly expression to simplify subsequent code. Very useful.

Oracle has lots of annoyances though. No SERIAL or IDENTITY type. Having to write the same boilerplate sequence/insert trigger/update trigger over and over and over is very tedious and a recipe for subtle errors when you forget one of them or mess it up.

No boolean type in SQL. I don't even know what to say about that.

jfb · on March 3, 2013

"No boolean type in SQL. I don't even know what to say about that."

Tom ties himself into unbelievable knots trying to defend this. It's the purest form of technical Stockholm Syndrome I've ever seen from a putatively intelligent person.

fusiongyro · on March 3, 2013

It's worth noting that using some of these features bump you up into a different pricing tier. And last time I checked the usual way to achieve a recursive query was with the non-standard "CONNECT BY" syntax. I have found rlwrap does a lot to mitigate the pain of sqlplus, but I still use it (and Oracle) as little as I can possibly get away with.

mercurial · on March 3, 2013

Even with rlwrap, sqlplus is awful. However, you can try SQL developer, which is fairly decent GUI frontend:

http://www.oracle.com/technetwork/developer-tools/sql-develo...

fusiongyro · on March 4, 2013

I actually wind up using both. But SQL Developer is pretty heavy and I prefer using psql to a GUI tool with Postgres, so I wind up in sqlplus a lot (when dealing with Oracle, which is seldom).

mickeyp · on March 3, 2013

Or if you're an Emacs user you can use sqlplus.el, a very nice wrapper around SQL*Plus.

e12e · on March 3, 2013

I'm not very familiar with it, but the big Free SQL Server that everyone seems to keep forgetting, Firebird, appears to have at least some of these:

  http://www.firebirdsql.org/en/sql-conformance/

Whenever I think of Firebird, i get a little bit of a bad conscience because I've never really played with it :-)

knightni · on March 3, 2013

WITH and window functions are portable (and both awesome - they will change the way you write your SQL for the better). I'm not sure if WITH RECURSIVE is portable - my recollection is that oracle and db2 at least don't need/use the RECURSIVE qualifier for recursive ctes: they just work it out for you.

Time intervals are also in the SQL standard as I recall (although not in the format listed here - postgres supports standard syntax too though). Not sure how fully supported it is cross-database, though.

mcintyre1994 · on March 2, 2013

How do I move through the slides? Zooming out suggests there are more slides, but there's nothing to advance through them.

Chrome console:

Failed to load resource: the server responded with a status of 500 (Internal Server Error) http://postgres-bits.herokuapp.com/js/sh_lang/sh_bash.min.js

Uncaught HTTP error: status 500 sh_main.min.js:4

znowi · on March 2, 2013

Yes, the bash highlight is missing and the showoff handler kicks in instead resulting in 500. Thanks to the dev mode you get complete stack trace with code preview :)

craigkerstiens · on March 2, 2013

You should be able to progress through the slides by using the arrow keys.

mcintyre1994 · on March 2, 2013

I admit that was a stupid thing to miss, it works fine, sorry. The Javascript error persists though, not sure if that's affecting anything, but there's nothing noticeable. Works fine for me on Android (default browser JB) too.

bithavoc · on March 2, 2013

In case someone is interested, I've created a ruby gem that uses hstore as backing store of multi-language text fields for Rails models. Contributions are welcome -> https://github.com/firebaseco/multilang-hstore

pdog · on March 2, 2013

The slides recommend using UUIDs (and not "numbers") as the primary key:

    CREATE TABLE t (
      uuid uuid PRIMARY KEY 
                DEFAULT uuid_generate_v4(), 
      name text);

I understand that rows are stored physically in primary key order. Any idea if the "uuid-ossp" module, used to generate UUID primary keys, ensures that new rows are written sequentially on the disk after existing rows (which helps for both read- and write-locality)?

pvh · on March 2, 2013

Postgres does not store keys in primary key order, actually, but in write order. This is an artifact of the MVCC model. There is a CLUSTER command that would do what you describe above, but it is (effectively) useless since you can't really run it without taking the database offline while the table gets rewritten.

atombender · on March 2, 2013

You can do online clustering with pg_repack (https://github.com/reorg/pg_repack).

Given a table T, it will:

1. Create a new empty table T'.

2. Add a trigger on T that will clone any operations to T'.

3. Copy all rows from T to T' in clustering order.

4. Atomically switch the tables so that T' becomes T.

Unlike CLUSTER, the only lock required is in step 4, when it needs an exclusive lock on the table; it cannot do the switch until all current transactions are done with T (you can specify a timeout). The only other drawback is that that it does not support TRUNCATE, CREATE INDEX, or some ALTER TABLE operations being run while it's busy in step 3.

Also, unlike PostgreSQL's own CLUSTER command, it does not need an index to order by.

We run pg_repack regularly at night to keep our databases clustered. It works really well, though I really wish Postgres could get native online clustering. SQL Server has had automatic clustering for many years, as has DB2, and Oracle has index-ordered tables.

pvh · on March 2, 2013

> though I really wish Postgres could get native online clustering.

Tell the community - pg_reorg could be cleaned up, improved, and merged into core... and should be!

atombender · on March 2, 2013

I agree, although by native clustering I meant that the Postgres engine itself should support it, much like incremental vacuum was eventually built in.

pg_repack is a fork of pg_reorg (which has not been maintained since 2011), by the way, which has become the de facto replacement. No idea if the code is in need of cleanup or not, though.

fdr · on March 2, 2013

The indexes, though, both benefit (locality) and suffer from (lock contention) such random identifiers.

That having been said, many data sets work with UUID just fine and the fact that such sets are easier to catenate and compare is very convenient.

Unless you actually need some sequential semantics to your records or are redlining a system where the locality of the index really helps, I prefer the non-coordinated assignment of IDs overall, as pvh points out in his slides.

Also, as other posters mention, uuid has sequential forms that can obviate some of this and reduce it to a 128-bit number...but the common form of a good-quality random one is really quite good enough until the indexes really start to get quite big, and this path is most useful if one remains disinterested in partitioning, which may be ill advised. Having to renumber (aka 'renaming', a form of one of the "two hard problems" in computer science...) everything is a big barrier to partitioning, so if one can afford to avoid renumbering, I think avoiding it is a good idea.

ozataman · on March 2, 2013

How were the slides generated? Looks pretty slick.

pvh · on March 2, 2013

Using Showoff, like this: https://github.com/pvh/postgres-bits

phillmv · on March 2, 2013

Note from someone who organizes conferences:

Do NOT use any slide deck framework that doesn't have a solid export function - jpgs, pdfs, whatever.

The videographer who will be editing your video will not be able to deal with your esoteric format - and screengrabbing every slide is horrible busy work.

craigkerstiens · on March 3, 2013

showoff can generate PDF's just fine

pvh · on March 3, 2013

awesome, would you generate a PDF of my slides then?

michaelmior · on March 2, 2013

Fantastic! I'm mostly a MySQL user, but the more I read about Postgres, the more I like.

jaytaylor · on March 3, 2013

I used to be really, really into MySQL, but since getting into Postgres I've come around to preferring it. They're both good.. but anecdotally Postgres is more developer friendly and generally easier to deal with in production. Postgres also comes with features that blow MySQL away in terms of flexibility and letting you do truly cool stuff. One example: Advanced query optimization with partial or reverse indexes.

crypto5 · on March 3, 2013

One of the things I am missing in postgresql is compressed tables. It's actually big game changer in some cases.

throwawayG9 · on March 2, 2013

I moved away from MySQL to PostgreSQL recently, you can do it too =). Just be sure to read about its differences and how to tune it for performance.

michaelmior · on March 3, 2013

That's my biggest concern about making the switch. I've gotten used to the performance characteristics of MySQL and some of it's quirks. Some small side projects on Heroku have been pushing me to learn more Postgres and perhaps I'll use it for something more important soon.

jnazario · on March 2, 2013

really really great stuff, thank you for posting. i had no idea about a lot of these. and you note that the source is still pretty clean after all these years.

alexanderh · on March 3, 2013

Can you re-order columns yet?

http://wiki.postgresql.org/wiki/Alter_column_position

18 years, and still, nobodys implemented this yet?

Its not a total deal breaker, but jeebus christopher columbus christ. It certainly would be nice.

ww520 · on March 3, 2013

Can you just specify the order in your query? Why do you care how the physical order is stored? Or do you even care about the physical order? Would an "apparent" logical order be sufficient?

alexanderh · on March 3, 2013

If you're using postgresql strictly as a programming data store, its fine. Your app will certainly abstract away the ordering of the columns

Its just nice though, if you often find yourself working on raw tables using one of the many Admin tools. Sometimes you just need it to work like Excel to be productive. MySQL makes all of that trivial. PostgreSQL obviously keeps track of the order you created the columns, so being able to edit this ordering seems trivial in my mind.

It just seems odd that there certainly is some demand there for it, yet nobodys picked up the slack in 18 years. Obviously there are workarounds. Why dont they include the work arounds as scripts with the distrobution, etc?

Maybe I will try to implement something and submit a patch. No experience in database dev though :(

Like I said though, its not a deal breaker. I'm just used to Open Source projects over-implementing features. Not under-implementing them. Especially after 18 years.

ww520 · on March 3, 2013

So you only care about the logical ordering of the columns. It seems trivial then to add a column mapping layer in the metadata to map the user-defined apparent order to the physical order so that select * would return the order you want.

In relational algebra the ordering of the rows and the ordering of the columns are undefined. It's up to the database to arrange it in the best way possible. The ordering of rows and columns is specified when queried.

e12e · on March 3, 2013

  > Its just nice though, if you often find yourself working on raw tables using one of the many Admin tools. Sometimes you just need it to work like Excel to be productive.

Clearly this is a bug in the admin tool? I seem to recall MS SQL allows you to "drag columns around" -- obviously doing absolutely nothing to the database -- it's just a view. After all rows are just relational tuples -- they have no ordering.

alexanderh · on March 3, 2013

True, a lot of the tooling for PostgreSQL is in its infancy when compared to something like MySQL. This problem definitely could be abstracted away by the tool. But because PostgreSQL doesnt support it natively, a side effect has been that many of the popular tools, comparable to ones in MySQL, dont have this feature either.

Anyone have any good suggestions for web frontend to PostgreSQL that supports reordering of columns as easily as MySQL? phpPgAdmin doesnt seem to compare to phpMyAdmin.

Admittedly I probably should be using a more robust OS native PostgreSQL admin tool, and not a web fronted. But alas, all of this is solved with MySQL quite nicely. I would like to see PostgreSQL catch up.

jacques_chester · on March 3, 2013

Oracle and SQL Server can't do this either.

Most databases are, by default, row stores. Moving columns around on disk kinda sucks in such situations, and there's no need to when you can specify results in any order by naming fields.

You name fields in your queries, right? Right?

alexanderh · on March 3, 2013

I know, I know. Its a minor gripe. But think about how trivial something like this would be to implement. And with the power of an Open Source project as big as Postegre, just seems odd nobody's had a weekend to knock it out. Here we are discussing all these cool advanced features, and it doesnt even have something as simple as this :P

I'll certainly try, if I thought they'd ever accept a patch from a total newb.

PostgreSQL obviously keeps track of the column order, as far as the order you created them in goes. So being able to edit this order doesnt seem like that tall of an order. Would be a pretty convenient feature that many other DB's do infact have.

Its just a nicety that PostgreSQL should have, if its hoping to win over people who are familiar with MySQL, which does sorta seem like its goal these days.

pilif · on March 3, 2013

Trivial? The rows are ordered in creation order because that's how they are stored on disk. So to be able to move columns around, you either have to rewrite the whole table (needing a long-lasting exclusive lock which most of the schematic altering operations in postgres do not require) or you introduce another abstraction to keep track of the order which you then have to use for every query to get the correct position within the row you just read.

This always means more complexity and a performance overhead and frankly provides next to zero value because columns are named in queries anyways (or the order becomes meaningless when joining)

jacques_chester · on March 3, 2013

I'm going to guess that MySQL can do this largely as a side-effect of swappable engines.

There is, however, a lot to be said for close coupling between query planning, on-disk format, indexing and so forth.

edit: as for patching, the PostgreSQL source is some of the best I've ever read. Carefully organised, fastidiously documented, immaculately consistent to coding standards. It's a delight.

jaytaylor · on March 3, 2013

Sure it'd be nice.. but really, what's the difference? The sooner you stop worrying about that the sooner you can start building awesome shit with it!

kermatt · on March 3, 2013

Where is this an issue, where you dont just specify column order in a query? If from something like Excel, why not use a passthrough query.

Does any major RDBMS support this?

JoachimSchipper · on March 3, 2013

Not the most elegant solution, but a view plus some triggers will let you create a fully functional "table" with any column order you please.

thomseddon · on March 2, 2013

Shame you can't view it on mobile (iPhone), saved for later.

np422 · on March 2, 2013

This was a really good presentation, I've worked with postgres and enterprisedb on and off for many years but I still managed to learn a few new things.

lysium · on March 2, 2013

Hm, how is that site supposed to work? Looks like there should be slide, but I can only see the title page.

Ensorceled · on March 3, 2013

It looks like dozens, maybe hundreds, of people all had problem figuring out how to see the next slide.

Do the math, that's, cumulatively, probably an hour or two of life wasted perhaps more.

I expect webpages to have a mouse based interface, not a keyboard based interface and I'm not alone. It's not churlish of us to complain about our wasted time.

Interesting! And, as a postgresql lover, thanks for posting.

pvh · on March 3, 2013

Glad you enjoyed the slides. I put a comment about navigation on the first slide. Hopefully that will help.

Ensorceled · on March 3, 2013

Me too!

I absolutely learned a ton from it ... I hope other Postgres fans didn't just walk away.

TommyDANGerous · on March 2, 2013

awesome read, learned a lot.

codgercoder · on March 2, 2013

yet another example of "responsive interface" meaning "designed for a phone or a tablet"

jhawthorn · on March 2, 2013

These are clearly slides, meant to be fullscreen on a projector. Up/Down, Left/Right, PgUp/PgDown all work, much like they would in powerpoint. I'm a little baffled by this threads hostility towards the lack of a friendlier design.

MBCook · on March 2, 2013

I'm on my iPhone and its totally unviewable. It looks like I can see maybe 20% of a slide, with parts if that covered up by black bars. I can't resize or re-zoom.

Total disaster.

res0nat0r · on March 2, 2013

Calm down and view it via your browser on your PC.

moron4hire · on March 2, 2013

how about we just stop using keys that have no semantic meaning to their data, period?

LaGrange · on March 2, 2013

Not sure if you're trolling, but just in case you're not: because any key that has semantic meaning to the data sooner or later is proven to be neither unique, always present nor constant.

jfb · on March 3, 2013

If a selected key isn't unique, it's a failure of the data model.

jacques_chester · on March 3, 2013

It's a failure of clairvoyance, you mean.

Reality is more complex than any model.

That's why we build models. They are, by their very nature, simpler than reality.

jfb · on March 3, 2013

Yeah, true. A model can be valid for some subset of data for some fixed period of time. The real failure is when the model isn't valid even for the defined period. Skill at articulating the limitations and implicit assumptions of any model is far too rare.

jacques_chester · on March 4, 2013

That's one thing I found really useful in reading Kimball's books -- the concept of fixed, slowly changing and quickly changing dimensions.

Everything is in flux, and you can choose to model that or not model it.

At the first level, you just model the current state of the system.

Then you begin to model the events that change the current state.

Then you begin to model changes in the shape of the problem domain.

Models all the way down ...

moron4hire · on March 2, 2013

then what is the point of ever having a unique constraint

LaGrange · on March 3, 2013

In case of something that's not a primary key, you can just get rid of it in the future. While technically it is possible to change the PK, it's way too much pain compared to having a synthetic one.

wvenable · on March 3, 2013

Hasn't this argument been put to bed a long time ago? Are there still people not burned by semantic keys or unconvinced by decades of education?