> Now, all of this might have been fine had Beyond Trust not written a feature which allowed users to directly, programmatically interact with psql (the postgres command line interface).
That's the buried lede.
Yes, there was a vulnerability in psql... but that's so much less a problem than the huge gaping hole of allowing users to directly interact with psql.
No DB can be safe if you are turning untrusted user commands into psql executions. It'd be like giving untrusted users ssh access and then complaining when they find a privilege elevation exploit.
Let's be clear: Beyond Trust is not a company that wrote a database-backed web app and made the all-too-common mistake of writing insecure code that tickled a bug in the database that allowed privilege escalation. Beyond Trust's is a company whose entire contribution is adding a security layer to prevent privilege escalation, and their solution here was to bypass Postgres's standard functionality and use this weird `psql` hack instead.
They had one job, and they failed at it. This amateur-level mistake should sink the entire company.
Didn't happen when Crowdstrike broke all their customers.
The problem is that getting information security right is a matter of process control, which everyone hates, and so CEOs are absolute suckers for being sold a product which magically "adds on" security. This is like trying to buy "anti-lead-paint" rather than actually remove all your existing lead paint.
There is an interesting project https://github.com/Abstrct/Schemaverse where the whole game takes place entirely within a postgres database that the players connect directly to.
The author discovered quite a few... well lets just call them policy errors in postgres, but had a hard time filing bug reports, mainly because the response was usually an incredulous "why on earth would you even do that in the first place?.
But the author has fun with this, There is a trophy in the game that you can only get by putting your name in the trophy table.
The only reason BeyondTrust implemented that was it wasn't untrusted user commands. They sanitized the data, so it should have been fine. The unfortunate problem was that the sanitizer didn't sanitize.
Systems are built on a set of expectations. Undermine the expectations and you undermine the system.
> They sanitized the data, so it should have been fine.
This is a 101 rookie level approach to SQL or injection defense.
It's dumb for exactly the same reason why this is dumb
"SELECT * FROM foo WHERE bar=" + sanitize(userInput)
The correct way to do something like this will always be parameterized input which looks something like this
"SELECT * FROM foo WHERE bar=?"
bindParameter(1, userInput);
Why? Because that the postgres protocol splits out the command and the data for the command in a way that can't be injected. Something that should be viewed as impossible to do when data and command are merged into 1 String.
IF this company wanted to build dynamic queries, then the only correct way to do that is to limit input to only valid variables. IE "isValidColumnName(userInput)" before sending the request. And even then, you'd not use psql to do that.
You simply can't use a generalized sanitizer and expect good results.
Its been fairly effective at making them realize the fundamental mistake they are making. Quoting the key part:
> The only code that knows what characters are dangerous is the code that’s outputting in a given context.
> So the better approach is to store whatever name the user enters verbatim, and then have the template system HTML-escape when outputting HTML, or properly escape JSON when outputting JSON and JavaScript.
> And of course use your SQL engine’s parameterized query features so it properly escapes variables when building SQL
> and then have the template system HTML-escape when outputting HTML, or properly escape JSON when outputting JSON and JavaScript
Or, stop using stringly template systems, and treat the data as what it is: a structured language, with well-defined grammar.
One of these days I need to write an article titled "Don't play with escaping strings. Serialize output.". Core idea being, "escaping your output" still looks too much like "sanitizing input"[0], and one tiny mistake is all it takes to give an attacker ability to inject arbitrary code into the page (or give an unlucky user ability to brick the page for themselves) - so instead of working in "string space", work in whatever semantics your output is, and treat the string form as a serialization problem. In case of HTML, that means constructing tree of tags as data structure, and then serializing them. Then, bugs in serializer notwithstanding, the whole class of injection problem disappears - you can't do "<h1>$text</h1>" -> "<h1></h1><script .... </h1>", when your "template" is made of data structures like [:h1, $text], because $text can't possibly alter the structure here. Etc.
In some sense, "Don't escape, serialize instead" is the complement of "Parse, don't validate".
(See also: make invalid states unrepresentable.)
--
[0] - Who ever sanitizes input? I've only ever seen this kind of sanitization the article describes happen in the output, within string-gluing templates.
The more you work in software the more you should realize the developers writing security-critical software (in this case the one writing that sanitizer) are often/usually as clueless as you are. The solution? Hard to say.
But... You have to see things that way or else literally everything becomes a security concern..
Extra whitespace before a semicolon? I don't see how it can be exploited, but with the mindset you imply, I have to treat it as a security concern. But removing the whitespace is also a security concern.
Yes, general computers are fundamentally unsafe. We should always think about threat models, vulnerabilities, blast radii, defense in depth.
What we should never do is dismiss something as a non-concern because we don't know how it could be a problem. Especially when someone is trying to point out something we're doing is extensively documented as a security concern. In that case it would be quite obtuse to claim in a public discussion that the person pointing it out is wrong because you don't understand the issue, and yet I have lived through that.
It’s very hard to argue with someone who asserts ‘what I don’t know can’t hurt me’, because usually they’ll refuse to know anything that will hurt them. Like that there are things they don’t know, that can hurt them.
But who says that means your statement can't be injected? It would have to be true that the handling for EXECUTE statements is bug-free. Maybe it is bug-free. Or maybe it isn't. Maybe I can figure out just the right username to cause your prepared statement to have some undesirable side effects.
That wouldn't be SQL injection in the sense of putting hostile values into an SQL query in order to form a different SQL query, but it would be SQL injection in the sense of putting hostile values into an SQL query in order to accomplish nefarious goals via the database. The only real qualm I'd have about calling it "SQL injection" is that it wouldn't be portable across different database implementations; it would be more accurately described as "PostgreSQL injection".
If we feel entitled to assume that PostgreSQL's provided functionality to interpret strings that are provided to EXECUTE statements has no bugs, why aren't we also entitled to assume that PostgreSQL's provided functionality to interpret strings that are provided to the string escaper has no bugs? I don't really see the conceptual difference.
The postgres escape function actually worked fine before this "CVE". It was documented as escaping something for use as part of a postgres query.
BeyondTrust used it as input to the 'psql' tool, which is an interactive tool you're not really supposed to programmatically invoke, and the documentation for the postgres escape function didn't say it escaped input for psql. Even though postgres was fine calling it a CVE and fixing it, I think this is 100% on BeyondTrust for assuming that escaping a string for a postgres query meant it was safe for psql.
If BeyondTrust had just used it as part of a postgres query string, the escape function would have been sufficient.
.... and that's also exactly the reason that using parameterized queries is better. With parameterized queries, the escaping and the query parsing are done in the same place, so there's no chance of confusion for the programming language's string library to get in the way, or for the psql tool's input parsing to re-interpret and alter the escaped string before sending it over the wire.
> If BeyondTrust had just used it as part of a postgres query string, the escape function would have been sufficient.
That's completely false. The following (pseudo code because working with C strings is verbose and beside the point) with nothing to do with psql should be fine:
PQexec(conn, "SELECT * FROM user WHERE nickname = '" + PQescapeString(user_input) + "';")
but thanks to the vulnerable PQescapeString(), the following user_input
"\xc0'; DROP TABLE user"
would fuck it up. That's just the failed escape function leading to a classic SQL injection. Using psql makes it worse because psql can execute additional non-SQL commands, but this escape function is not "fine" at all with or without psql.
> With parameterized queries, the escaping and the query parsing are done in the same place
Again, wrong. For parametrized queries, params don't go through serialization because they don't need to hit the parser, there's no "escaping" whatsoever.
> Because of how PostgreSQL string escaping routines handle invalid UTF-8 characters, in combination with how invalid byte sequences within the invalid UTF-8 characters are processed by psql
If you just pass it to postgres over a normal query, postgres will reject an invalid byte sequence in the query with an error, refuse to even parse the query, and thus you won't get a SQL injection. It's just that psql didn't hard-error on invalid utf-8, even though postgres did.
That's why the escape function was suitable for postgres, both the escape function and postgres's query parser assume invalid byte sequences are, you know, invalid.
> For parametrized queries, params don't go through serialization because they don't need to hit the parser, there's no "escaping" whatsoever.
You're right of course, I used imprecise language that everyone understands, and you're choosing to read critically in order to be combative.
Sure, the blog post didn't mention PQexec would reject it, so I assumed it would be accepted. Turns out it's a narrowly dodged bullet, I'm wrong on that. But having to chain two vulnerabilities together to own the system doesn't make either vulnerability less of a vulnerability. The escape function was wrong, period, another level of defense helped in this case, but "the documentation for the postgres escape function didn't say it escaped input for psql" is a bullshit excuse (it definitely didn't achieve the documented goal of "escaping special characters so that they cannot cause any harm"), putting "CVE" in quotes and blaming it all on the user is wrong.
> I used imprecise language that everyone understands, and you're choosing to read critically in order to be combative.
No, your "imprecise language" is a fundamental and quite dangerous misunderstanding that could easily lead to more vulnerabilities like this one ("PQexecParams = PQexec + PQescapeString, amiright? I'll just use the latter"). Maybe you didn't misunderstand yourself, maybe you did, but it's 100% misleading for readers not familiar with db internals.
> The postgres escape function actually worked fine before this "CVE". It was documented as escaping something for use as part of a postgres query.
> BeyondTrust used it as input to the 'psql' tool, which is an interactive tool you're not really supposed to programmatically invoke, and the documentation for the postgres escape function didn't say it escaped input for psql.
But this is the documentation for psql:
> psql is a terminal-based front-end to PostgreSQL. It enables you to type in queries interactively, issue them to PostgreSQL, and see the query results. Alternatively, input can be from a file or from command line arguments. In addition, psql provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks.
We can learn two things from this:
1. You are definitely supposed to be able to invoke psql programmatically, or else the suggestion to "write scripts and automate a wide variety of tasks" would make no sense.
2. The input to psql is documented as a "query", and it seems fine to assume that a "query" for psql is the same thing as a "postgres query".
Then you, or someone else, fix the underlying library.
This being Postgres, that process was likely completed decades ago.
Anticipating the next question, "but what if it is still unsafe?", the answer is that there is no fundamental reason why this can't be safe code. Security exploits like this require a cooperation between the receiving code and the incoming data, and it is in fact perfectly possible for the receiving code to be completely safe. There is no such thing as data just so amazingly hackerish that it can blast through all protections. There has to be a hole, and this is among the best-tested and most scrutinized code paths in the world.
There's some nuance here. Even the most battle-tested system, with distinct slots for executable code and primitive values, might have a way in which an untrusted primitive input can overrun a buffer, or be split in an unsafe way, and cause unexpected behavior. But there's a vast difference in attack surface between that, and "just give us a string, don't worry we'll sanitize it on our end."
It's all about defense in depth. Even a system that's tested all the way through with Coq or similar is still at the mercy of bugs in the specification or in underlying system libraries. But intentional API design can make it materially less likely that a security issue will arise, and that's worth a heck of a lot.
> But there's a vast difference in attack surface between that, and "just give us a string, don't worry we'll sanitize it on our end."
If you have a type system that distinguishes between sanitized and unsanitized strings then it's not a very big difference in attack surface.
The main difference between the two methods is the risk that you can forget to sanitize. But that's not what happened here, so calling them dumb for having that risk is not a useful way to analyze the problem.
Parameters are not an extra layer of defense. For anything other than forgetting to sanitize, parameters are a sidegrade, not defense in depth.
That is not what's happening. You can't bind parameters in psql.
What beyond trust did was in fact what I said, constructing the entire query as one string and sending that on to psql. The sanitization method failed, but that's not what you should use anyways when dealing with user input.
If you are using a postgres client, then the message breakdown to postgres when you bind looks something like this (not the actual message stream format, just the jist of what it ends up looking like)
SELECT * FROM foo WHERE bar=$1
BIND 1 escape(userInput)
ENDBIND
There isn't the same opportunity to create malformed data that can cause an injection attack in the Postgres message stream. There are far fewer things that need to be escaped in order to put in the full message. (I skimmed through the protocol, one improvement I'd make to it is adding a length to the bind instead of having a termination like it appears to do. That would, however, preclude streaming data)
These yahoos took a different route to running a command than what everyone does and should do and they were bit by it.
EDIT Actually, yes you can bind parameters with psql. However, it's there mostly as a way to test postgres and not something users are expected to use.
Just a guess but this looks like politically powerful dev culture overwriting cybersecurity culture, demanding, thus getting an exception from management for 'productivity' and 'being agile.'
I dont think we appreciate how much of a wild west things are with the incredible mix of hugely complex and powerful tools available trivially to developers and the concept of "move fast, break things."
Especially as corporate sees devs like they see salesmen (big moneymakers who deserve perks, exceptions) and top-down security culture as a cost center.
The other buried ledes are that postgres allows emojis (not sure if that's intended but it works) and that you can just run system commands and scripts directly from postgres cli. I imagine a lot of eyes are going to be on new hardening guidelines for postgres now.
I also imagine the first high performance enterprise friendly drop-in db written in something like rust is going to one day be a big deal.
Hey now, a large portion of developers are seen as cost-centers too! Not everybody has the skill of flattering managers into approving greenfield projects, and then transferring away before they break horribly. :p
> Especially as corporate sees devs like they see salesmen…
You’re onto something here. People perceive the world through the lens of their education and environment. Sales, legal, finance, are all easy constructs for a business leader to view the rest of the world through. The secret of the game isn’t to have the best tech or to code the most, it’s to “outsell” your competing business unit.
>Beyond Trust did their due diligence by properly calling a sanitization method on the user’s string input using it in a PostgreSQL query.
This is not due diligence. In band messaging of user controlled data has been proven to be bad for security and this is not the first time "escaping" user controlled data for SQL has been done incorrectly.
Yep, it's prepared statements or bust. But the long tail of legacy code, examples, documentation, that uses escaping is gonna take a while to get through.
One of the nice things about modern ORMs like SQLAlchemy 2 is that it forces you to use prepared statements even when when calling raw queries.
Fun fact: MySQL (and I'm sure many other databases?) lets you pass in values directly as hexadecimal strings.
I've avoided SQL injection since ancient times not by escaping strings, but by transforming any given input via "0x" + bin2hex(value) and plopping that into the query.
No quotes needed, no code buried deep in included libraries needed, handles any kind of data possible, and also no funny business sneaking in based on how you may have mangled the input.
Cool idea. Genuinely. But it does not in any way guarantee that an injection attack like used here won’t work - unless it’s maintained as hex through the whole pipeline. In this case the (malformed) Unicode was sent to a command line call - if your hex text needed to be parsed to be understood on the command line, then your security plan would have failed.
Totally agree on my tip not being a silver bullet for all situations, just wanted to pass it along in case somebody finds themselves needing to sanitize input for queries rather than constructing prepared statements.
I was a bit perplexed by the final destination of command line for data and/or queries -- seems like an odd choice when they could've just interfaced directly with the database like a civilized human hahaha
- why a no side-effects function on a database can be used to get lateral access to the whole database instance
- why do you need to validate strings on the database itself and not on the client anyway, heck why are there no type safe way of doing it
- why would you want to execute shell commands from the database itself
- Even if there's a real use case for executing commands like that why is it enabled by default on a regular connection to the database without you specifying a THIS_IS_REALLY_DANGEROUS_BUT_I_PINKY_PROMISE_I_KNOW_WHAT_IM_DOING flag to the connection handshake.
It's not always PHP but there are some kirks that are shrugged off on PHP that makes me really concerned about the reliability of projects coded with it.
They mentioned PAM module so maybe the sql injection just allowed bypassing the authorization of a system that was using the PAM module. Like it’s in the realm of possibility that a PAM module that wanted to validate a user against credentials stored in a pg database might shell out to the psql command to do this. Though, the whole thing is very questionable.
What account were they authenticating with when attaching to psql?
If you have the connection string why does psql even matter, couldn’t you use any client? Or is this a case of your input being forwarded to a running, already authenticated, psql instance?
And finally, why do we need unicode support for schema? I assume it’s because the schema is itself data?
Your questions are programming language agnostic-- where did your PHP angst come in? And are there specific things in PHP that are problematic and avoidable by using a different Turing complete language?
PHP has grown up but in its wild youth was notorious for such gems as mysql_escape_string vs mysql_real_escape_string, rather than proper parameterization
It's not so much about Turing as it is libraries and patterns
After all, as I understand it this very issue was caused by escaping SQL rather than parameterizing it
To me, "sanitizing inputs" implies a transformation of the data into a string that can be "safely" evaluated as code which hopefully yields the input data. Instead you should be able to just mark a piece of the code as data, that will never be tokenized or parsed or anything, just dropped directly into a buffer.
"Prepared statements" sounds EXACTLY like what I was thinking! I don't understand why people would ever use anything else.
Ah, I see! It's a cool idea, but .. let's try to be maximally obtuse and pedantic today. I'm a developer and it's HN after all.
[4]tree is also code that yields data. At the end of the day some kind of parser needs to decide what to do with your data and [ ] is just another way of escaping special characters. In this case it escapes entire strings instead of individual characters. It's your special way of sanitizing the input.
Questions: Who is responsible for the number? What is this number: bytes, "characters", runes? What happens if the number is wrong? (If you expose this number to external factors of any kind you get a special, interesting new breed of SQL injection.)
In practice you'd probably do something like:
my_special_superduper_safety_syntax_preprocessor("SELECT * FROM users WHERE username=$$$", "peter")
Which will yield something like:
"SELECT * FROM users WHERE username=[5]peter"
.. so you don't have to calculate the number. If we're doing this, why not just go for:
exec("SELECT * FROM users WHERE username=?", "peter")
.. and be done with it.
> I don't understand why people would ever use anything else.
Yes, I agree. Usually it's some interesting combination of laziness and ignorance.
Yes, I almost certainly would write that wrapper to make it impossible for me to pair the wrong length with a string. And yes, it technically is still code, but at least the bytes are not being "looked at" in a way where they might get executed in some way. Again, a "Prepared Statement" is what I was hoping would exist (also for perf reasons) so I'm happy to know it does.
If I were writing it, I would exclusively use the "Prepared Statements" technique, and for people typing in SQL queries by hand, the sole string construct would be as I described.
I'm a Ziguana, so in my design, the number would be "bytes". You would have to "calculate" the number of bytes in languages like Swift or JavaScript. But still, overall it's a better idea to me than turning ' into \' and many other convoluted transformations that are often incorrect and are also just throwing away CPU cycles for no reason whatsoever.
In any case, I would still be semi-offended if I learned that "Prepared Statements" transformed the data in any way whatsoever. In the compromise solution where SQL has a string construct, I want the SQL tokenizer to do this:
if (cur_token.kind == .data) {
// copy data out
@memcpy(
some_buffer,
cur[0..cur_token.value]
);
// skip over the whole thing
// before generating next token
cur = cur[cur_token.value..];
}
Even better if the data is not sent alongside code at all.
A while back I stumbled on a way to crash my company’s Jira server simply by sending an email to it containing an emoji. Makes me wonder if that could have been abused by malicious parties if they knew the email address we used to forward new support issues to Jira.
> Working with Unicode is anything but elegant, but that’s another story.
Yeah I hear ya. IMO this is really important though and I don’t think there’s much of a story if this isn’t part of it. The design is clever but the resulting usability (and error proneness) leaves much to be desired. Not knowing what’s in there, like a closed envelope, increases user complexity significantly.
You do know what’s inside a Unicode string though. They’re not hard to parse. Very easy in fact.
The biggest problem with Unicode parsing is that you don’t know how long a Unicode string is without parsing it. Which often leads to double parsing it. But we’d have this problem even with Unicode fixed to 2 or 4 bytes (like utf-16 and utf-32) because not all glyphs are going to be printable characters (eg accents).
…or you allow for every possible combination of characters, accents, etc and fixed at specific number of bytes and then suddenly you have a 6 or 8 byte character set that breaks backwards compatibility with ASCII and increases global network throughput by six times despite that mostly being empty space, while increase a greater burden on font developers to cater for every subtle variation of glyphs.
Sure things seem messy now, but I honestly think utf-8 is the least worst solution to the problem.
I havent deep dived unicode enough to know, and probably someone somewhere already made it, but I often wonder if we can do compression more efficiently by abusing unicode somehow, especially regarding plain text.
> Out of compliance, the US Treasury posted this notice to US lawmakers, breaking the news that a “China state-sponsored Advanced Persistent Threat (APT) actor” had breached their systems.
And halfway through implementing it. I realized that these combining characters will attach to any letter, and it would be easy to have crappy stegonography on top of your crappy ceaser cypher.
Probably nothing new, but I had fun.
D̊o̝ n̝o̊ť f̔o͞r̊g̝e͞t̍ t̊h̠e̗ milk
import string
def combine_list():
l = []
for c in range(0x0300, 0x0370):
if c != 0x340f: #non joiner mark
l.append(chr(c))
#print(str(c), 'A' + chr(c))
return l
class codebook_a():
def __init__(self):
self.cb = dict(zip(string.printable, combine_list()))
self.cb[''] = ''
self.d = {}
for c in self.cb:
self.d[self.cb[c]] = c
def encode(self, message, fake):
if len(message) > len(fake):
raise ValueError('fake must be longer than message')
ml = list(message)
fl = list(fake)
encode = []
while fl:
c = fl.pop(0)
if c != ' ':
if ml:
m = ml.pop(0)
else:
m = ''
encode.append(c + self.cb[m])
else:
encode.append(c)
return ''.join(encode)
def decode(self, message):
plain = []
for c in message:
p = self.d.get(c)
if p is not None:
plain.append(p)
return ''.join(plain)
def cli(args):
cb = codebook_a()
#check for coded text
if len(args) == 0:
print('message fake : two args')
print('a message with no combining charactors(dicritics) will be coded onto fake')
print('with combining charactors will extract the real message')
return 1
m = args[0]
for c in m:
if cb.d.get(c):
print('decode')
print(cb.decode(m))
return 0
print('encode')
print(cb.encode(m, args[1]))
if __name__ == '__main__':
import sys
cli(sys.argv[1:])
UTF-8 encodes a unicode codepoint into 1, 2, 3, or 4 bytes. Assuming that you have a valid UTF-8 encoding of a codepoint, then the first byte tells you how many bytes are in the encoding. 0-127 inclusive means one byte, 192-223 means 2, 224-239 means 3, and 240-247 means 4. If the first byte is 0xC0 (192), then the sequence is two bytes long. However, not every 2-byte sequence that starts with 0xC0 is valid UTF-8. The uppermost bits of the second byte must be `10` in a valid 2-byte UTF-8 sequence. 0x27 does not meet that criteria, so `0xC0 0x27` is not valid UTF-8. If your escape function operates at the level of unicode codepoints but doesn't actually verify that they're valid, you end up copying a single quote into your "escaped" buffer that downstream parts of the code will hit.
The funny part is that not having any Unicode support in this part of the code and treating the data as ASCII (plus mistery bytes) would have worked correctly.
A PHP app called a Postgres library function to "escape strings" for use in Postgres, and that called a function to get a utf8 string length, but the function was bullshit:
> The PQescapeStringInternal method doesn’t actually validate that the string it is parsing with pg_utf_mblen is valid Unicode. So, instead, it just takes the length of 2, and grabs the next byte.
So the bug was a shitty function in a generic open source library which was probably never properly tested or fuzzed, which ended up letting attackers move laterally through the database. And this is one reason you want full test coverage; tiny stupid functions matter.
(Another fix for this is to enforce at the boundaries of every function that the input data has been "blessed" or sanitized by some other function whose purpose is just to validate that the data is what it's supposed to be. That would have to happen before escaping, and every function that uses that data would need to confirm that it got blessed. Basically you want a home-rolled strong-typing system with types (or data classes?) for all your data. But that's a lot of work, I don't expect many would do that for most apps)
That's the buried lede.
Yes, there was a vulnerability in psql... but that's so much less a problem than the huge gaping hole of allowing users to directly interact with psql.
No DB can be safe if you are turning untrusted user commands into psql executions. It'd be like giving untrusted users ssh access and then complaining when they find a privilege elevation exploit.