Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It managed to insert invalid Unicode into a SQLite database, causing a subsequent SELECT to fail. That's at least a DoS attack.


Yes, but only if you're decoding user input as UTF-7, which would be insane.


What if you were scraping a webpage and it reported its encoding as UTF-7?


...which is, in fact, exactly how this bug was exposed.


Modern browsers don't support UTF-7 any more after a number of XSS attacks relying on inserting UTF-7 encoded script elements which then cause the document to be sniffed as UTF-7.

The only place UTF-7 is still widely used is in email clients.


> only if you're decoding user input as UTF-7

Hmm, may I ask what makes utf8 won't produce U+DEADBEEF? Or something remotely like that?

Edit:

'\xfb\x9b\xbb\xaf'.decode('utf8')

UnicodeDecodeError: 'utf8' codec can't decode byte 0xfb in position 0: invalid start byte


When UTF-8 was first defined, they didn't know how big the Unicode range was going to be, so they defined it as a 1-6 byte encoding that could encode any 32-bit codepoint.

When Unicode was deemed to end at U+10FFFF (because that's the largest value that UTF-16 can encode), UTF-8 was revised to be a 1-4 byte encoding that ends in the same place.

Python clearly implements UTF-8 in a way that uses at most four bytes per codepoint (why support five and six byte sequences if they'll never be used?). I think what we're seeing in '\xfb\x9b\xbb\xaf' is four bytes out of a six byte sequence.


It's a bug in the UTF-7 decoder that yields an invalid codepoint (outwith of the Unicode codespace) and isn't checked anywhere.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: