Hi, the tutorial writer here. The first few paragraphs in the tut would give you a sense of what Ash does, but thank you for the comment that its role in comparison with Phoenix should be spelt out at some point. Ash helps provide a thoughtful underpinning to structure your data, and Phoenix / LiveView presents it to the world. If you are a poetry person:
I whole-heartedly agree. I am a native speaker, and "fluent" in jyutping, yet I have such a hard time with Yale.
One service I'm going to build is a mapping tool between {R1, R2, ...Rn} and {G1, G2, ...Gn} where R is romanization method and G are y/z-variants of glyphs. (These, for the most part, already exists inside packages I built for building the font, and just need to have an UI to expose it to the world.) It would sure save me lots of time trying to read Matthews-Yip...
I was thinking the same thing. Perhaps creating an API around PyCantonese?
My thought is that if there's a common data format for a Cantonese sentence with jyutping/yale/traditional + translation(s), the user could then pick what to display.
It could then also be worked into games/learning exercises. Placeholders could be made with a number of options so users could learn how to slot different adjectives into sentences, for example.
(I have the same username on Reddit, by the way. Sorry I never got to test it out for you!)
This is all off the beaten path, so I suspect the answer is no one knows. Font tables have a limit of 65k characters, but this ceiling can be busted in whacky ways using multiple lookups, useExtension... Practically, font building tools / operations crash (mysteriously), stalls (mysteriously), or slows to a crawl (indistinguishable from stalling), and the Cantonese Font about pushes the limit.
Hello. Font's author here. You and Jeff are correct in guessing this is (ab)using ligatures maximally :) To satisfy your curiosity, we can go deeper.
----
Conceptually it is simple:
1. assign a default (most likely) sound for each character,
2. loop through contexts, extracting words (char-combos) where the sound is different from the default ("alt-word")
3. create SVGs + font-paths (fallback for incompatible systems) for every char and every alt-word
4. assign a ligature to substitute each char-sequence that forms the alt-word (e.g., "when 乾 隆 appears adjacently, replace with `uniF1234` (the codepoint for the alt-word 乾隆")
It is not perfect, but I didn't expect this to work so well, and was stunned when the testers report high accuracy. I have always believed that bespoke computation with word segmentation (with some 1M frequency attached library) and large data-bank (100k+ words) was necessary.
----
Practically it was horrific, tedious, mind-numbing, gawd-awful set of "why this doesn't work":
1. SVG automation that works for 10^3 breaks with 10^5
2. what worked for Latin breaks for unicode
3. what worked for unicode breaks for PUA
4. what worked for monochrome breaks for color
5. what worked for single glyphs breaks for ligatures
6. what?! The assignments in the database is wrong??
7. [...]
As I was trying to coerce the system to do what it wasn't designed to do, many of these breaks are undocumented, pretty mysterious to solve, and some steps just got manually gritted through. (And each of the 15k+ glyphs got gritted through about five times.)
Technically yes, but the general public probably doesn't have a concept of zero-width space.
(For everyone else wonder what ackfoobar is proposing: let's take the phrase (if you don't read Chinese, just treat them as shapes) 香港地少人多, properly segmented, is 香港.地少.人多. The font treats this incorrectly, because "香港地" is a commonly used fragment, the 地 in the fragment have a special sound, and parsing as 香港地.少.人多 gives a mistaken sound for 地.
Ackfoobar is absolutely correct that we can coerce the correct reading by going 香港[ ]地少人多 --- where the [ ] is an invisible spacer. My contention is that most users don't know how to do that in their favorite word processor.
Someone is probably thinking, could you add "香港地少" as a fragment? Purist says it's not pretty, but I'm a pragmatist, so I did do many of these patching. Doing this or not relies on some acumen as a native speaker, and there were hundreds of these decisions made. This language knowledge would be necessary if someone were to do Mandarin (or Thai or, ...))
This is an awesome piece of work - congratulations!
I notice you're using OpenType-SVG here; have you investigated whether it would be possible to implement this using COLRv1 (which would potentially result in a lighter-weight font, I suspect, and eventually wider support)? Or are there technical limitations in COLRv1 that make it impossible?
Color fonts really hasn't converged into a standard, and their adoption is slow. OpenType-SVG was accepted 10 years ago, and it was implemented into FreeType only one year ago --- it hasn't even trickled down to most Linux distros (nor is it usable on Windows). I don't see COLRv1 in Win/Mac/Linux until 2026 at the earliest.
But I did try to make it into COLRv1 (as well as COLR/CPAL). The only tools that build COLRv1 right now are the tools from the Google Fonts team; I remember them stalling for hours before saying completion, yet the output was broken (I can't remember how it was broken).
I personally would love to see a COLR/CPAL version, and have some idea on how that could happen. But I probably should be working on some revenue-generating product instead ;)
The history of digital fonts added a great deal of complexity to font formats, and without him writing such a concise yet comprehensive guide, I would have been stuck for even longer.
Phoenix rise from Ash.