Ah so that's how we're supposed to Google for tweets now. "𝕏 something something...

voxadam · on July 24, 2023

𝕏 (U+1D54F) decomposes[0] to X (U+0058) meaning that if you search for 𝕏 your search string will likely be automatically converted to the equivalent X.

Of course, while a search engine could conceivably index and search the entire Unicode codespace internet wide, such a task would likely be somewhat unrealistic and provide only limited upside.

[0] https://en.wikipedia.org/wiki/Unicode_equivalence

layer8 · on July 24, 2023

The essential operation here is not decomposition, but compatibility normalization. Both NFKC and NFKD result in a regular “X”.

PaulHoule · on July 24, 2023

It’s already got to search 1000s of Chinese characters, a few more doesn’t make a big difference.

voxadam · on July 24, 2023

How do you decide which characters to index? The current Unicode release (15.0) includes 149,186 individual characters. I suppose you can probably ignore U+237C (Right Angle with Downwards Zigzag Arrow) seeing as nobody seems to know what it denotes.[0][1]

[0] https://news.ycombinator.com/item?id=31012865

[1] https://ionathan.ch/2022/04/09/angzarr.html

PaulHoule · on July 24, 2023

Most search engines for languages like English are indexing words as opposed to characters so choices as to what characters are indexed are made as part of deciding which words to index.

Search engines for CJK languages do tend to work at the character level so a search for “Sona” on a certain site run by (I think) Chinese people will turn up result for “Persona”.

I was involved with an A.I. startup where we had lots of meetings about what to do about all the strange Unicode characters and right now in Mastodon there is a lot of concern that screen readers will choke on 𝐮𝐧𝐢𝐜𝐨𝐝𝐞 𝐛𝐨𝐥𝐝 𝐜𝐡𝐚𝐫𝐚𝐜𝐭𝐞𝐫𝐬 while it doesn’t seem that difficult to squash them down to ordinary characters or treat them exactly as <b>unicode bold characters</b>

NoGravitas · on July 24, 2023

That is Ellis.

calderknight · on July 24, 2023

i presume x.com or site:x.com will get the trick done

b800h · on July 24, 2023

No, you're supposed to go to X and search on there. And consume ads.