Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ah so that's how we're supposed to Google for tweets now. "๐• something something".


๐• (U+1D54F) decomposes[0] to X (U+0058) meaning that if you search for ๐• your search string will likely be automatically converted to the equivalent X.

Of course, while a search engine could conceivably index and search the entire Unicode codespace internet wide, such a task would likely be somewhat unrealistic and provide only limited upside.

[0] https://en.wikipedia.org/wiki/Unicode_equivalence


The essential operation here is not decomposition, but compatibility normalization. Both NFKC and NFKD result in a regular โ€œXโ€.


Itโ€™s already got to search 1000s of Chinese characters, a few more doesnโ€™t make a big difference.


How do you decide which characters to index? The current Unicode release (15.0) includes 149,186 individual characters. I suppose you can probably ignore U+237C (Right Angle with Downwards Zigzag Arrow) seeing as nobody seems to know what it denotes.[0][1]

[0] https://news.ycombinator.com/item?id=31012865

[1] https://ionathan.ch/2022/04/09/angzarr.html


Most search engines for languages like English are indexing words as opposed to characters so choices as to what characters are indexed are made as part of deciding which words to index.

Search engines for CJK languages do tend to work at the character level so a search for โ€œSonaโ€ on a certain site run by (I think) Chinese people will turn up result for โ€œPersonaโ€.

I was involved with an A.I. startup where we had lots of meetings about what to do about all the strange Unicode characters and right now in Mastodon there is a lot of concern that screen readers will choke on ๐ฎ๐ง๐ข๐œ๐จ๐๐ž ๐›๐จ๐ฅ๐ ๐œ๐ก๐š๐ซ๐š๐œ๐ญ๐ž๐ซ๐ฌ while it doesnโ€™t seem that difficult to squash them down to ordinary characters or treat them exactly as <b>unicode bold characters</b>


That is Ellis.


i presume x.com or site:x.com will get the trick done


No, you're supposed to go to X and search on there. And consume ads.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: