Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's what I mean, looking at the bytes is the only way to know.

How would you solve homograph/glyph attacks though? One idea is yet another encoding where there are no homoglyphs, only whitelisted diacritic sequences, and there aren't more than one way to assemble the same character ("ó" vs "o"+"´". So tough potatoes for Cyrillic "o", it's forced to use its nearest equivalent: 0x6f Latin "o" in the ascii set.



> How would you solve homograph/glyph attacks though? One idea is yet another encoding where there are no homoglyphs

First, modern operating systems (should?) already provide APIs to canonicalize UTF.

Second, perhaps an additional API needs to be created which suggests similarities between characters intended for use by an intelligence (artificial or otherwise...).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: