Sure, but all of those negatives are also true of the thing you’d use in place of a private-use codepoint: an inline-reflowed image.
Private-use codepoints at least have the advantage over inline images of being “opaquely” copy-and-paste-able into other documents, machine-read, etc. Any system that works in term of Unicode text will pass along the private-use codepoints in the stream, where it might strip higher-level out-of-band features like images.
As such, private-use codepoints are the analogous feature to the .notdef glyph in fonts, but for machine semantics rather than for human comprehension. In both cases, the “reader” (human/machine) gets something that it knows is there but doesn’t recognize, but knows is valid, and can opaquely be preserved and passed along, and potentially “made legible” through the lens of a different eye than theirs.
One place I would see this as being useful is in the display of unique not-yet-formalized emoji in chat systems. Copy-and-pasting such “text” out of the system would just get you opaque PUA codepoints; but if you emailed such “text” to somebody, and then they copy-and-pasted it back into the chat system, they’d see the same emoji you saw originally. It’s like a public URL representing a private document that you have to be logged into the relevant system to “access.”
—————
The real negative of the Private-Use Area codepoints, from a conservationist/archivist perspective, is that unlike HTML images that each have a distinct—if opaque—URL, the Unicode Private-Use Area is quite limited, and so prone to collisions in usage.
If the Consortium had instead come up with a stringing scheme such that any private-use glyph was actually formed from a sequence of private-use combining codepoints [sort of like the flag combiners] to form e.g. a full encoded UUID representing the PUA codepoint, then various organizations could actually generate private non-colliding codepoints without a need for registration using e.g. UUIDv4, and then be able to rely on the assumption that such codepoints will only have semantics under their private system—and any other system that wants to be explicitly compatible with their system; rather than those codepoints potentially having other, incompatible meanings in other systems that just happen to reuse them, as happens today.
Interestingly, such Private-Use UUID codepoint-sequences could then later be “adopted” into Unicode through a formal process. People who had created documents that used such meta-codepoints could register them with the Consortium, where the Consortium would 1. create “official” codepoints for those same semantics; and 2. ship a regularly-updated database file mapping meta-codepoints to later officially-registered codepoints. One pass of Unicode normalization would then involve using that database to replace private-use UUID codepoint-sequences with their registered full codepoint.
Basically, this would take the thing that happened as a series of one-off events with Unicode codepage embeddings, and turn it into a continuous ongoing fine-grained process that anyone can take advantage of.
> Sure, but all of those negatives are also true of the thing you’d use in place of a private-use codepoint: an inline-reflowed image.
Not true: your inline image should have alt text, e.g. if ⅌ didn’t exist then you’d use an image of that shape with alt="per". If the image doesn’t load, it’ll be replaced by the word “per”, and screen readers will read it as “per” or “graphic per” or similar (I believe JAWS adds that “graphic” prefix, not sure if you can convince it not to by careful ARIA attributes—or even if you can, whether you should; these things are a bit dangerous to fiddle with).
Alternatively you might use inline SVG, which gets you vector goodness, and can definitely (rather than possibly) be presented to screen readers as the word “per” perfectly.
Another fancy trick is to use ligatures to replace entire words: make your own fancy web font replace the sequence “ per ” with “ ⅌ ”.
Private-use codepoints at least have the advantage over inline images of being “opaquely” copy-and-paste-able into other documents, machine-read, etc. Any system that works in term of Unicode text will pass along the private-use codepoints in the stream, where it might strip higher-level out-of-band features like images.
As such, private-use codepoints are the analogous feature to the .notdef glyph in fonts, but for machine semantics rather than for human comprehension. In both cases, the “reader” (human/machine) gets something that it knows is there but doesn’t recognize, but knows is valid, and can opaquely be preserved and passed along, and potentially “made legible” through the lens of a different eye than theirs.
One place I would see this as being useful is in the display of unique not-yet-formalized emoji in chat systems. Copy-and-pasting such “text” out of the system would just get you opaque PUA codepoints; but if you emailed such “text” to somebody, and then they copy-and-pasted it back into the chat system, they’d see the same emoji you saw originally. It’s like a public URL representing a private document that you have to be logged into the relevant system to “access.”
—————
The real negative of the Private-Use Area codepoints, from a conservationist/archivist perspective, is that unlike HTML images that each have a distinct—if opaque—URL, the Unicode Private-Use Area is quite limited, and so prone to collisions in usage.
If the Consortium had instead come up with a stringing scheme such that any private-use glyph was actually formed from a sequence of private-use combining codepoints [sort of like the flag combiners] to form e.g. a full encoded UUID representing the PUA codepoint, then various organizations could actually generate private non-colliding codepoints without a need for registration using e.g. UUIDv4, and then be able to rely on the assumption that such codepoints will only have semantics under their private system—and any other system that wants to be explicitly compatible with their system; rather than those codepoints potentially having other, incompatible meanings in other systems that just happen to reuse them, as happens today.
Interestingly, such Private-Use UUID codepoint-sequences could then later be “adopted” into Unicode through a formal process. People who had created documents that used such meta-codepoints could register them with the Consortium, where the Consortium would 1. create “official” codepoints for those same semantics; and 2. ship a regularly-updated database file mapping meta-codepoints to later officially-registered codepoints. One pass of Unicode normalization would then involve using that database to replace private-use UUID codepoint-sequences with their registered full codepoint.
Basically, this would take the thing that happened as a series of one-off events with Unicode codepage embeddings, and turn it into a continuous ongoing fine-grained process that anyone can take advantage of.