I don't think this communicates the _incredible_ amount of creative and technical work required not only to simply fill out over 100,000 characters in the UTF-8 space but to make them stylistically consistent.
Work, shmurk! Their set of Akkadian glyphs set looks incomplete, and they don't even have Sumerian or Rongorongo. These guys need to stop sleeping on the job!
Just FYI, this is not sarcasm, it's ironic humor. Not the same thing. Sarcasm is "the use of irony to mock or convey contempt" which is obviously not what you meant.
I think it's sarcasm, actually? The target of my wit was not the mad typographers of Google, but rather the HN commentors who like to sprinkle comment threads on awe-inspiring subjects with dismissive pedantry. Sometimes it feels as though the CDC could announce that a universal cure for cancer has passed its phase 3 clinical trials, and the median HN response would be: "does this page really need to load 170kb worth of CSS? And why does it degrade so badly when I run NoScript?"
Or like the way that some people, when confronted with the awsome ineffable magnitude of my sense of humour...
Gosh, there is a lot of depth to language. I like lisper's note because there are a lot of people whose first language is not English and may not know the difference; I like your rejoinder that it was sarcasm against another audience, but was ironic humor to Google. (We can write to multiple audiences at once, no?)
And now here's a font complete enough that perhaps I can find the right glyph to punctuate the idea. We need something with a bit more nuance than the interrobang, I think.
Just to be clear, I quite liked Lister's note as well. I'm a language nerd myself, and always appreciate having more opportunities to cheekily overload some entendres.
sarcasm derives from the ancient greek meaning 'tearing of the flesh'. Sarcasm is intended to wound the recipient. Ironic humor is also a poke, but only tickles. I suggest a new word 'gargalism'.
Thank you for introducing this notion to me! I struggle with accusations of "sarcasm" when I am not intending to "mock or convey contempt", merely expressing an idea as its more accessible inverse!
Well, you did this by sort of mocking (imaginary / hypothetical) haters or nitpickers, so I'd argue it does fall under sarcasm category. Sarcasm doesn't always have to be directed at the person you're addressing, or even an existing person.
> Just FYI, this is not sarcasm, it's ironic humor. Not the same thing. Sarcasm is "the use of irony to mock or convey contempt" which is obviously not what you meant.
Just FYI, you were humorless in your interpretation of his comment. Humorless is "unable to see humor in things when most others do."
Call me old fashioned, but I think it's important to know what words actually mean, and to use the words that mean what you actually intend to convey. To quote Tom Stoppard, "If there is any point to using language at all is that a word is taken to stand for a particular fact or idea and not for other facts or ideas." (If you're not familiar with that quote, it worth looking it up and reading it in context.)
When I use a word,” Humpty Dumpty said, in rather a
scornful tone, “it means just what I choose it to mean
—neither more nor less.” “The question is,” said Alice,
“whether you can make words mean so many different
things.” “The question is,” said Humpty Dumpty, “which
is to be master—that’s all.”
-- LEWIS CARROLL (Charles L. Dodgson), Through the Looking-Glass, chapter 6, p. 205 (1934). First published in 1872.
Yes I think what corrupts this ideal is the extent to which words come to represent feelings and associations instead of facts and ideas.
Like how "literally" has come to represent the feeling of emphasis, and the association with the experience of one-upping other emphatic adjectives like "extremely."
Actually not. If you google the debate you'll find there's a history of using the word this way that goes back a very, very, long time in a variety of literature.
I have to admit I thought the same as you until someone pointed this out.
I find it weird that this fight is only over "literally" (which doesn't actually literally mean non-metaphorically, but rather "to do with letters") but not "really", "truly" and "actually" and other such words which do literally mean "this is true and real".
Yes, it does. There is no "right" in language use beyond communication with the target audience. What people understand a word to mean is all there is.
That is a very counter-productive and shallow view of language. Language is always changing — semantics are no exception. Words don't mean the same today as they did 100 years ago, and even then they didn't mean the same as they did 200 years ago, and even then... You get the drift...
It's arrogant to think that the language that we're speaking right now is the pinnacle of linguistic evolution and that it's only downhill from here.
Clearly we aren't at the pinnacle of linguistic evolution. Ancient Greek and Latin were far more evolved than modern Indo-European languages. Indo-European languages have been going downhill for the past 2000 years or so ;-)
All this is true. Nonetheless, if you go too far towards the other extreme and apply Humpty-Dumpty's theory of language ("When I say a word it means exactly what I want it to mean") you won't be able to communicate at all.
I hear that argument a lot, but I rarely see a case where someone notes "wrong" usage — like you did — where there's actually any ambiguity. Like when people complain about "literally" being used as an intensifier. I doubt they're confused if the speaker e.g. actually died of embarrassment or if they're just using it as an intensifier.
Language has so much redundancy that a slightly different understanding of a single word in a sentence rarely is of any consequence.
Again, everything you say is is true (which, BTW, is why I prefaced my comment with "Just FYI..."). But the sarcasm/ironic-humor confusion is very prevalent even among native speakers, and in a different set of circumstances it could cause confusion or worse because the implied stance of sarcasm is the exact opposite of ironic humor. Also, HN is a public forum, and comments are read not only by their respondees but by lurkers as well, some of whom may not be native speakers and who might therefore appreciate having some of the subtleties of the language pointed out to them.
I hear you, but does Stoppard's affinity for style and precision relate to off-topic pedantry? I'm not trying to diminish the importance of correctness. I actually do share your appreciation for these things. Maybe we can make a distinction between literature and singling out one person publicly.
Btw, Stoppard also said "I was always looking for the entertainer in myself ... [but] it's really about human beings".
> Call me old fashioned, but I think it's important to know what words actually mean [...] To quote Thomas Stoppard
It's also important, when pretentiously namedropping a writer and bragging about how you actually read them (all to win a pointless internet argument about a throwaway 'sarcasm' tag) to actually know that writer's name.
This isn't an argument. If I used a word to mean the opposite of what it actually means I'd want someone to point it out. Just like if I get an author's name wrong I'd want someone to point it out. (Thanks for pointing it out.)
He didn't use "sarcasm" to mean the opposite of what it actually means.
Because his post could be read as sincere (unjustified) criticism of Google, he used the well-known internet convention of "/sarcasm" to avoid an unnecessary, off-topic sub-thread stemming from his post. How's that for irony.
Maybe they used AI to style the fonts? Train a neural network on a particular style (thickness, density, curvature) and then write out heuristics to generate new fonts based on other fonts, that get classified by a neural net. Let the bots do all the work. Might be an interesting article if the actually did that.
actually that was the first thing that came to my mind, what a ginormous job it must have been with a team hunkering down and working with multiple experts from around the world, for many unsung months (maybe years). However I personally believe that Google not shouting about how hard this was, explicitly, makes it all the more commendable. Quiet, amazing work, not shouted from the rooftops by PR/marketing people. Honest, modest, brilliance. Like all the best things.
So in a nutshell, typefaces which literally span every single defined character?
Are there special optimizations implemented for different use cases as well, e.g. screen v. print and sub-varieties of each? Ten years ago with Vista, Microsoft Typography (https://www.microsoft.com/en-us/Typography/default.aspx) put out a family of typefaces--Cambria, Calibri, Consolas, etc.--which were optimized specifically for sub-pixel rendering on LCD screens while maintaining on-paper legibility. I'd be cool with Noto not having any such optimization in mind given that the stated objective appears to be to include every defined character, but I do wonder if it should happen eventually.
...or maybe not, who knows. Pixel densities now have approached ludicrous territories. It might just no longer matter at least when we're talking about optimizing for screens.
At the time I wrote to one of the team members who worked there (maybe a Program Manager - I forget exactly) and they sent me a beautiful book showing off each of the fonts with a description of why the font was made, how it achieves the goals, and some info on the font's designer.
I still have it at home - really impressed me with the attention to detail and the self-styled publicity.
> "Google’s open-source Noto font family provides a beautiful and consistent digital type for every symbol in the Unicode standard, covering more than 800 languages and 110,000 characters."
Text is more easily consumed by Google services (Search, Now, etc.) than, e.g., graphical representations, and information is more likely to be stored in text if there are fonts that support presenting the information people want to communicate. Having a font family covering all world languages (and covering the full gamut of Unicode characters) increases the scope of information that can be effectively communicated via text.
Google has a deceptively simple incentive to make everything better: better fonts, better browsers, better phones, etc, all this drives increased web adoption, hence increased advertising revenues.
PS: I love Noto specifically for its Unicode support. I use it on my blog. It's also the default "serif" font on Chrome for Android (but not Chrome for Linux).
I regret to inform you that Noto is licensed under the SIL Open Font License, which is FSF-approved, and essentially ensures that it will be available to all in perpetuity.
You aren't really contradicting that they have incentive to improve everything. Incentive is not the same as action. You're effectively arguing with clouds for being in the sky.
Then that makes "incentive" a rather useless word, because everyone who wants to make the world a better place also has the incentive to make everything better. Why single out Google if it's also true of Microsoft, Apple, and me?
Well Google has an incentive to make everything better, but they can't literally work on everything, so they have to pick and choose.
For example Google has repeatedly explained Reader was killed because "usage has declined"¹ So this validates my point: they put resources in developing a product, but it doesn't get enough traction, so they kill it and redirect their resources to other projects that will hopefully be more successful and provide more users and ad revenues to Google.
That example does not validate your point because it supports multiple hypotheses, including my hypothesis that Google is only interested in improving things which generate sufficient profit.
That's not your '"everything"' but my '"everything", except where it doesn't.'
In other words, they pick and choose.
FWIW, I also have inventive to make everything better, because I want the world to be a better place. But I too must pick and choose.
No, there is an incentive to work on everything. Given bandwidth constraints, there isn't sufficient incentive to work on everything at once. The fact that they have to pick and choose does not negate the fact that they have more incentive to improve everything than pretty much any company in history (with the exception of Facebook, perhaps).
I'm honestly not sure i understand how your evidence supports your point.
They did, indeed, have incentive to make everything better.
That they later stopped supporting a thing does not contradict this, because they had the incentive to make reader in the first place.
Stopping support is, as someone said, not about incentive, but about later action.
The fact that they cannot focus on everything does not in fact, mean they have less incentive. It just means they can only do a limited set of the stuff they have incentive to do at any point in time.
That does not change the incentive itself.
If you want to argue "they only have incentive to do what is profitable in the first place", you would have had to argue "they could have made reader, but chose not to" or something similar.
(IE they did not have sufficient incentive)
TL;DR your argument is misplaced. Google has incentive to make everything better. They only have resources to long term focus on things that make profit.
Then why single out Google? By that argument, I also have incentive to make everything better, because I want a better world. So, I imagine, do you.
DuPont wanted a "Better Living Through Chemistry" - didn't they also have an incentive to make everything (made of chemicals) better?
Is there something different about Google's incentive which doesn't apply to Microsoft, Apple, eBay, GitHub, ... or the Sierra Club or WWF for that matter?
"Is there something different about Google's incentive which doesn't apply to Microsoft, Apple, eBay, GitHub, ... or the Sierra Club or WWF for that matter?
"
Nope ;)
Be civil. Don't say things you wouldn't say in a face-to-face conversation. Avoid gratuitous negativity.
When disagreeing, please reply to the argument instead of calling names. E.g. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."
Aside from the obvious benefit to having a Web that's less broken for everybody, one of my previous projects at Google was implementing support for multiple scripts in a mapping product. I literally don't know how I could have pulled this off without Noto: all other alternatives would have involved some combination of subpar quality, incomplete coverage, incompatible appearance or near-impossible license requirements.
(Standard disclaimer: I work at Google, but these views are my own, not Google's.)
Well, they can track people across sites that use Google Fonts, so maybe having a larger set of characters and encouraging more use gives them a farther tracking reach.
For the lazy, a snippet indicating that no such tracking occurs:
What does using the Google Fonts API mean for the privacy of my users?
The Google Fonts API is designed to limit the collection, storage, and use of end-user data to what is needed to serve fonts efficiently.
Use of Google Fonts is unauthenticated. No cookies are sent by website visitors to the Google Fonts API. Requests to the Google Fonts API are made to resource-specific domains, such as fonts.googleapis.com or fonts.gstatic.com, so that your requests for fonts are separate from and do not contain any credentials you send to google.com while using other Google services that are authenticated, such as Gmail.
In order to serve fonts quickly and efficiently with the fewest requests, responses are cached by the browser to minimize round-trips to our servers.
Google also said that their new messaging app would be end-to-end encrypted by default, and then quickly changed their mind when business reasons prevailed. Business reasons will always prevail. But even beyond that:
>For the lazy, a snippet indicating that no such tracking occurs: Requests to the Google Fonts API are made to resource-specific domains, such as fonts.googleapis.com or fonts.gstatic.com, so that your requests for fonts are separate from and do not contain any credentials you send to google.com while using other Google services that are authenticated, such as Gmail.
That's not what the snippet says. It is a technical statement, not a privacy statement. It doesn't mean they can't track you, it means it's a tiny bit harder to track you. Nothing in there says "we don't cross-reference these data" or "these data aren't used for tracking purposes" or anything else. Just that your Google account information isn't sent to the font servers.
More relevantly, the Google Fonts privacy policy links to the general Google privacy policy, which doesn't have any special "if you're using Google Fonts we collect a whole lot less than this" subsection. They might be using the data. They might not be. They might not be using the data today but decide to start using it tomorrow.
But nothing requires you to link directly to the font on their CDN. The sources, the whole build pipeline, everything is available and you can host it all by yourself.
The font they created has nothing to do with tracking you, it just so happens that if hosted on Google Fonts they can also do that. But that's a feature of Google Fonts, not Noto.
Pure speculation, but I would guess the vast majority of people do NOT self-host. I use their CDN as it's faster and much easier to set up, just link in your CSS or whatever.
The SIL Open Font License restricts subsetting, so you do need to be careful when doing this (the license requires that you change the font name if you reduce the font's display quality). Google has gotten waivers to allow them to alter the fonts without changing their names.
For it to be a valid point, it has to be a sensible strategy or position for Google to take. Do you actually think that under the list of goals and objectives for this project there was a bullet point that was like "* Better user tracking across the web". There are loads of reasons why it makes sense for Google to do this project ahead of user tracking. It's possible - anything is possible really - but does it make sense?
Even if this was remotely close to their goal, it still doesn't make sense. They already have Google Analytics on basically every website in existence. Investing millions of dollars in an extremely robust font to increase web tracking by 0.0000000001% seems like a terrible investment.
You're saying that Google, THE tracking company, is possibly sitting on this huge pipeline of data from IPs connecting to their CDN downloading fonts and they're just not doing anything with it? I would have a very hard time believing this.
Let me elaborate. I offer two situations and you can choose which one of them is more plausible.
a) What should we do next? Our cash is burning! Well, lets use countless human hours to make a beautiful font to make the world a better place!
b) How can we reach to more traffic that might be currently not visible to us? Well, maybe we should design a really beautiful font that everyone likes and host it in our server? Sounds like a plan, lets do it!
c) We're making products supporting an unprecedented amount of locales and there's no font family out that that covers them all and still looks consistent, we're going to have to make it ourselves. Hey, if we open it up then there will be even more international content out there to put AdSense on!
Although it's not always perfect, Google takes internationalization seriously. Its websites are used all over the world. Having a nice font that supports every language simplifies things.
I18n on Android is seriously awful. For example, Android 5 shipped with an improperly escaped string in the French Canadian localization strings that would cause the phone to reboot in a loop if it was connected to a charger when opening the lockscreen. Even the latest Android version (Nougat) displays the battery as "24 % %" in the French localization, "Press power button twice for camera" has been translated to the near unreadable "App. 2x sur interr. pr activ. app. photo" (roughly "Prs pwr btn 2x fr cam.") and some of the localized strings are comically nonsensical.
Except the fonts are cacheable for a year, so you won't be pinging Google on every site that uses Google Fonts. It wouldn't be a very good tracking mechanism, if that's what it were intended for.
It seems like you're confusing COMBINING GRAVE ACCENT and GRAVE ACCENT. These are two separate characters with distinct code points. One combines, the other does not. It's not a property of the font; it's a property of the characters.
Yes they are unique characters. I went to some pains to ensure I used the word 'key' as in physical switch on a keyboard. However I did not go far enough in ensuring I was referring to the font package as opposed to the font, please accept my apologies.
Dear god why? They're Unicode fonts. There are perfectly good Unicode combining characters that can be used when this behavior is intended. Making non-combining characters behave like combining characters breaks all kinds of text.
> But isn't it normally the operating system's
> responsibility to implement this
Presumably the font has backticks, but they're (poorly) implementing this method of accenting in the demo web app to allow people to search for accented characters?
It looks like "web fonts" in general are poorly implemented. You can recreate the problem with just a text box and font styling CSS.
If you look at some of my other comments in this tree you will see screenshots of the same text box with and without the local fonts installed. Presumably when local fonts are installed the OS handles everything, without the browser and the noto-fonts definition file gets in the way.
You are conflating keyboard layout, unicode codepoints and the font rendering of said codepoints.
The issue here is that the non-combining accent (as generated by many keyboard layouts, dead keys or not) is rendered by the font as if it were combining. All the other aspects are tangential.
> This is not necessarily the expected behaviour in other languages
Which language, for example?
I'm french, we have characters with grave accent ('à', 'è' and 'ù'), and we do not expect the "` + a = à" behavior, as for one language.
EDIT: we do expect that "^ + a = â", though, but that's an other thing, and we have actually two ^ keys, one for our behavior and one for the usual behavior.
Forgive me, I don't want to sound xenophobic but this seems preposterous to me. So much energy wasted. I am deeply grateful to the guys who were developing keyboard standards in our country that I don't have to go through such waste. When we want to use "o" with accent we just press right ALT with "o". Simple and fast. Frankly speaking, this is first time I hear about such weird combination just to get local characters, in western countries ofc. PC keyboard is not mechanical one. Why someone would copy the same behavior is beyond me.
If you write in a single language that has only a few combinations, sure. But as the number of diacritics - letter combinations increase, dead keys become a more attractive solution.
With just two or three Western European languages to write in, you can easily have grave, acute and circumflex accents, umlauts, tildes, plus a couple one-off letters (like ß for German, ç for French, or å/æ for Scandinavian languages). Combining those with every vowel they can apply to grows too much for the poor Alt key.
The US-International layout runs into this problem. Since Alt + A = Á, if you want to type Ä or Å or Æ in a single keypress you have to use Alt + Q, W, or Z respectively, which is hardly intuitive. And if you want À or Ã, you're SOL.
The other main solution is to have a hotkey for switching language layouts - typing in each language will be slightly faster, but when I tried it I frequently forgot which layout I was on and had to stop, delete my mangled output, switch language, and type again. I find dead keys much more friendly to muscle-memory.
I find it useful on a Mac to use some of the shortcuts for accents and diacritics in the default US-English keyboard layout, since I occasionally do have to type accented characters, or type text in a non-English language and don't want the hassle of changing keyboard setup or going through the Unicode menu. They all involve holding Option and pressing the key, though (for example, 'à' is Option-` followed by typing the 'a'; Option-e gets acute, Option-n gets tilde, Option-i gets circumflex and Option-u gets diaresis; there are also shortcuts for some common specific characters like ç and å).
It makes perfect sense if you know anything about Portuguese.
'àquele' is the preposition 'a' + 'aquele'. Grave accents are only used to mark contractions such as this, and the circumflex and acute accents - which denote differences in vowel quality and stress - are much more common, which is why it makes perfect sense to prioritise ease of typing for those two over the grave accent.
> When we want to use "o" with accent we just press right ALT with "o". Simple and fast.
That's fine for ò I guess, but then how do you type ô, ö and ó?
But as has been said elsewhere, this thread is confusing dead keys, which have nothing to do with fonts: typing the ^ key followed by the o key enters a single character ô, with combining characters where the character ̂ followed by the character o is rendered as ̂o (which should look the same as ô) (edit looks like this works fine on my fixed-width font when editing, but not so fine on the regular proportional font when displaying the comment, it's sadly quite usual for fonts to mishandle combining characters).
Here, the issue is that the standalone non-combining ` (as well as ´, ¨ and ¸) is handled by this font as if they were combining, thus doubly confusing users with dead-key keyboards.
I guess it all depends on the frequency of such characters in the given language. For example, in french keyboards, we have dedicated keys for 'é', 'é', 'à', 'ç' and 'ù', because they are incredibly frequent. On the other hand, 'â', 'ê', 'î', 'ô', 'û', 'ë' and 'ï' are less frequent, so they are generated using modifiers.
I'm quite glad of that, actually, it would be really annoying if we had to keep using modifiers every two words. I know french people who use English keyboards, they just usually forget about accents totally instead of using dedicated modifiers.
Hmm, here in Poland we use right alt plus a letter to make the diacritic version of it, e.g. alt+a = ą (the left alt functions like in other keyboards).
It is probably easier for language that doesn't have multiple diacritics for the same letter.
In Poland we have only one such letter: z. The problem is solved by using 'x' for the other (less frequently used) diacritic type:
EDIT: To be clear, parent is technically correct. The problem is with the way "web fonts" is working, the actual 'font' is the same across both, just the web fonts are having fun.
I can confirm this. When I type back tick in the search box, the cursor doesn't advance to the right. The next character I type after one or more back ticks will look like it has an accent on it (e.g. `````a becomes à).
For me at least (Chrome), it seems like it isn't applying any surrounding spacing for the backtick. If you hit ` a few times and then hit some other character, they seem to occupy the same space in the text box.
It does happen in the search box as you say but I have been using Noto Sans for a while now on my site and it doesn't have this problem: https://kitnic.it/.
Since when was the grave accent character a modifier or dead key on a computer keyboard? Using a compose key is one thing, but colons, carets, and tildes aren't magically combining with letters to form letters with diaereses, circumflexes, or tildes above them.
On many non-English layouts there are dead keys, and the ` key may not exist, except as a dead key.
For example, the letters å, ø and æ are very common in Danish, so they have their own keys. The acute accent is sometimes used to mark stress, like é or ǿ, and this is done using a dead key -- it wouldn't make sense to have several keys just for this purpose.
I keep wanting to find a font to use on the portions of my website that contain Japanese, but they are all so big! No way I'm making my visitors download 115MB worth of font just so it looks a little nicer.
Edit: Thanks for the tips, I will look into those options :)
The solution (more of a workaround) to this is to cut the font to just JIS X 0208 (containing Kanji level 1 and 2, total of 6,879 characters) or JIS X 0213 subset (containing Kanji level 1-4, total of 11,233 characters). It should cut the character count to the most commonly used ones, which should reduce file size by a lot (or skip JIS X and select only glyph for Kanji level 1 to reduce it even further).
There is a Subset Maker Tool[1] (in Japanese) that can do this. But you need to provide the list of characters you want to keep by yourself (searching for JIS第1第2水準漢字 usually helps).
If you do not want to do this by yourself, Google also provide a version of Noto Sans that has been stripped down to just JIS X 0208 subset in their Early Access program[1].
Since you mentioned that your entire site isn't in Japanese, in addition to what other people specified, I would now recommend using the CSS unicode-range selector. That allows you to list a web font declaration with a range of Unicode code points and most modern browsers now implement the optimization where it only downloads the font when text on the page contains a character in that range.
In modern versions of Chrome, Firefox, and Safari you can see that only downloads a few of the many language files available. This also works really nicely with the traditional CSS font stacks if your design goals allow you to specify system fonts which will work for many users so you can have a few local fonts specified before the downloadable one.
Does google fonts' css include do this directly? It's been a while since I've considered it, but would be nice to see a font-famly 'NotoSans' and 'NotoSerif' that has all the unicode ranges spec'd.
As an aside, I'd love to see something similar in a fixed-width font (where the asian block characters are double-width)
On a related note, I've found the Chrome team to be really responsive about i18n as well – if you report something about even fairly obscure scripts not being supported, they usually respond rapidly. Quite pleasant to see.
Google's Noto Sans CJK JP contains 9 font files, each about 13MB. You can edit fonts to subset glyphs or glyph sets you expect to use to further reduce the file size, although I suspect the CJK languages are never going to be as easy to get to fast web speeds. The JIS X 0208 subset still weighs in at almost 4 MB for a single OTF.
Wikipedia disambiguation page: for "Tofu" mentions:
Slang for the empty boxes shown in place of undisplayable code points in computer character encoding, a form of mojibake
Mojibake is specifically the "garbage" from getting an encoding wrong—treating a sequence of bits as if they were a valid sequence of characters in some encoding, when they're either not from that encoding, or not from any textual encoding (because you're e.g. trying to render the contents of an executable binary as text.)
"Tofu" is a more recent phenomena, that didn't really have a specific name until now: when there are Unicode code-points—correctly decoded—in a document, but you have no font installed that offers a glyph to represent them.
Mojibake results from "I did this wrong and didn't notice"; tofu results from "I know what this is, and it's something I don't have a visual signifier for."
Mind you, often mojibake will result in "tofu"; the garbage code-points you get from bad encoding detection will turn out to be ones that don't have a currently-defined Unicode character, so they'll show up as U+FFFD REPLACEMENT CHARACTER (�). But that's a coincidence, rather than an equivalence.
Good question. I can explain. "storage.googleapis.com" is an endpoint for Google Cloud Storage's "XML" API. Google Cloud Storage (GCS) is a cloud blob storage service similar to Amazon S3.
This particular API, the "XML" API, is designed to be API-compatible with S3. The XML namespace, 'http://doc.s3.amazonaws.com/2006-03-01', is therefore the same. This allows third party tools like 'boto' and the like which work with S3 to work with GCS with only a switch in hostname.
I have no idea. I'm not sure what you were looking at.
If you visit just "https://noto-website.storage.googleapis.com/", that's a request to list all of the objects in the bucket, so you'll see an XML document with that namespace and the results of a listing.
If you were trying to download a specific object from that bucket, you'd either see the resource itself, or you'd see an XML error result of some sort (404, 403, etc). So I assume that either you mistyped the object name once or else the object had been deleted for some reason.
Maybe it would make sense, but I personally prefer their mobile site. It's much nicer to read. Not that their desktop site is filled with ads and gibberish, but for whatever reason I like it more.
Noto fonts are great. I especially love the OFL licensing. I wanted to include fonts in a mobile application and there was confusion on whether including a GPL version of the fonts would force me to GPL my app - something I could not do. I couldn't find a definite answer.
With Noto's OFL licensing, I no longer had that worry.
You don't have to use the GPL to link GPL code. You are just required to distribute the final binary under GPL. When you distribute the source files, individual files can be under individual licenses, they just have to all be license compatible with the final license.
IE, use GPL code, license header all your own stuff as Apache, distribute the binary as GPL, but anyone else can use your own work under Apache if they want if you are individually licensing all your own files under it, they just could not use the GPL parts without also distributing under GPL.
In my eyes that practise would be equivalent with dual-licensing your code under GPL and the (more liberal) license of your choice. This happens with some frequency.
It's a way to honor the GPL idea for the stuff you link to without having to follow GPL ideas for your new code.
Linux kernel is one for instance. Tons of drivers there include GPU one are permissively licensed to make it possible to reuse that code for other kernels like BSD.
It's not meaningless. If I give something away under certain conditions, I'm not forcing anybody to adhere to those conditions.
Instead, if they choose to adopt those conditions, they get free stuff from me (and a lot of other people.) In other words, it's an incentive program, not a requirement such as a law or regulation.
The GPL requires anything that "includes" it to be GPL compatible. The definition of "includes" that it uses is pretty "iffy" and could easily be extended to include the whole application if you aren't very careful.
OFL has no requirement like that, so you can use it freely in whatever project you want.
I've just tried it, and it is absolutely the same as Droid Sans Mono (which I love and use), barring the extra line spacing. They haven't even fixed the kerning problem with "w".
It doesn't seem to support the Tangut script, which is added in Unicode 9.0. It's the first thing I test if a font claims to support all languages in Unicode. In my knowledge, I haven't really seen any general-purpose fonts containing support for Tangut (because probably no one is going to use it). I thought Google has already completed this project, but apparently they haven't yet.
Probably not -- unless you count the proprietary, obfuscated font they use to make the official Unicode charts.
Some people snark at Unicode for putting so much more effort into emoji recently than they put into scripts used in actual languages, but y'know, when they add emoji, at least people design glyphs for them pretty quickly. When they do the research to designate codepoints for Anatolian hieroglyphs, the codepoints just sit there unloved and unsupported.
Fun fact: People behind a WatchGuard firewall (default settings) won't see Google fonts (at least in Firefox) because the firewall filters the CORS headers.
What an awful overreach. Firewalls should not ship with defaults that break the internet, and such presumptuous header filtering can actually weaken security.
I'm just the victim. Whenever I suspect the firewall to be the cause of a problem I use a proxy via a ssh tunnel. As long as it's not forbidden for me to actually solve problems.
The list of software that isn't usable behind a WatchGuard firewall (and maybe similar enterprise firewalls) is getting longer and longer. No Google fonts, no Drupal, no ShopWare 5.2, JIRA barely usable, etc.
I'm sure I can find plenty of examples just by searching (and I'll do that in just a moment) but do you happen to have a list of these issues that you've encountered that can be attributed to WatchGuard firewalls?
I am (primarily) a network engineer and such a list would be wonderful to have when recommending for or against specific products.
Noto & Roboto are probably the two fonts I love more than anything now, I just find them so pleasing, they don't do anything stupid and they look good.
The claim of full coverage of 110,000+ characters (they are targeting Unicode 6, apparently) appears to be false: Noto Sans CJK covers approximately 30K characters[1], while as of Unicode 6.0 there were 74,614 CJK Unified Ideographs (calculated from [2]).
Edit: Using a script I made to check codepoint coverage[3] I get 63,639 codepoints with glyphs defined for all Noto fonts included in their default download (Noto-unhinted.zip).
When you select a different language from the search box, though, you see those same sentences translated into the other languages. Wouldn't be possible to get full alphabet coverage in all translation/alphabet combinations.
The sans has a real italic and not just an oblique. Color me impressed: real italics aren't very common in sanses, and that goes double for FOSS fonts. Very nice.
Looking at any fonts, recently, I am starting to crave for a 4K LCD panel. When people were switching to HD panels there were big WOW around, but after buying a smartphone, I realized that HD is good for smartphone, but not enough for anything big. I love to read on my smartphone and I love to code, too, but fonts look just crappy (especially on Windows). Sadly, good 4K panels out there are still quite pricey.
That is great. I am working on a project where I require a font which includes all (or at leas as much as possible) unicode symbols. Until now I was thankful that I could use GNU unifont as a fallback, even if it was ugly. But this will make my app look so much better.
I don't have any inside information, but it seems like they have different goals. Roboto is primarily designed to evoke certain emotions and to display well across devices, whereas Noto is primarily designed to render consistently across languages.
This is really fantastic. It sent me looking for source materials for the character choices made for each language.
I'm dealing with a media experience in Dakota and Ojibwa right now where we have source material that is spelled/character-ed quite differently than the alphabet provided by Noto in those languages. Given the scale of this project, I assume that some considerable thought went into each language's character set, but it's difficult to know for sure without any sourcing. The git commit logs don't offer up any hints. Anyone familiar with the project, know where I could find this sort of source information?
Should I be referencing something in the Unicode definitions for these languages?
I know basically nothing about Native American specific scripts, but if you do and they are not in Unicode (Right now they list Cherokee, Deseret, Osage, and Unified Canadian Aboriginal Syllabics as their American Scripts) then check out their web site and try to see about having the relevant scripts/alphabet/letters, etc added to the standard.
You need the hinted version if your freetype does hinting, whereas if it does autohinting (or no hinting at all) you don't need hints, as in this case hints are ignored. So if in doubt install the hinted fonts.
Surprised to see that Google is supporting even archaic Korean [0], but it would have been nice to see a chunk of text in Korean, Japanese, and Chinese, as opposed to a bunch of gibberish in all three languages.
They even support Linear B, which fell in disuse more than 3000 years ago with the Bronze Age Collapse of the Mycenaean civilization: https://www.google.com/get/noto/#sans-linb
Last time I checked, Noto Mono was not considered a monospace font by Windows' console (Command Prompt) and cannot be used for working in the console (including Bash on Ubuntu on Windows). Is that still the case?
"The Unicode Basic Multilingual Plane includes the APL symbols in the Miscellaneous Technical block, which are therefore usually rendered accurately from the larger Unicode fonts installed with most modern operating systems."
[pedantic] "UTF-8 space"... UTF-8 code units space is quite limited by single byte - 0...255. You probably meant Unicode Code Points space that, at the moment, is 21-bits number:
from 0 to 0x10FFFF (1,114,112 decimal) [/pedantic]
> UTF-8 code units space is quite limited by single byte - 0...255
Yes, but no one said "UTF-8 code units space", and its pretty clearly that the "UTF-8 space" intended was the full space representable in UTF-8, not the space representable in a single UTF-8 code unit, so this is not only pedantic, but also a non-sequitur.
Not true at all. Per the wiki article[1], "UTF-8 is a character encoding capable of encoding all possible characters, or code points, defined by Unicode"
It's a variable-length encoding, and just so happens to correspond to ASCII for the first 127 characters. But if the leading bit of the byte is 1, it indicates that it's part of a multi-byte glyph. With the encoding, you can represent the entirety of the unicode space - The latter bytes start with 10 to indicate they're middle parts of the glyph, while the first byte uses 11 and continues to indicate how many bytes long the glpyh is.
As of utf-8 encoding... It is a variable encoding that is capable to encode any 32-bit number: from 0 to 0xFFFFFFFF. Not just current set of 21-bit unicode code points.
As of "... indicates that it's part of a multi-byte glyph." You are mixing completely different entities here. Unicode has nothing with glyphs.
Glyph is an atomic component (image) of internal font structure. Single character (unicode code point here) can be composed on screen from multiple glyphs.
> As of utf-8 encoding... It is a variable encoding that is capable to encode any 32-bit number: from 0 to 0xFFFFFFFF. Not just current set of 21-bit unicode code points.
You're thinking of an older version of UTF-8 which allowed sequences of up to 6 bytes to be used to encode a single code point. UTF-8 is now defined to not allow code point values above 0x10FFFF and not allow code point values between 0xD800 to 0xDFFF (inclusive) to allow only the same values as possible in UTF-16.
"code unit" is not "code point". "code unit" for utf-8 is 1 octet. "code unit" for utf-16 is a 16bit value. Both utf-8 and utf-16 can represent the entire space of unicode code points by using multiple code units to represent a single code point.
Among language computing professionals, these square boxes are commonly known as tofu – yes, those yummy squares of bean curd common in East Asian and South East Asian cuisine!
It may not have been intended to be derogatory but that doesn't mean it hasn't ended up actually being derogatory. "No more tofu" certainly sounds like a judgement that "tofu" is bad/unwanted
Homonyms are a thing. Of course a thing can also be good in one context and bad in another. No more tofu spilled in the bed. No more tofu spilled across my blog. "No more tofu" out of context sounds like a soy allergy. With context it means exactly what it says; eliminating a UX failure mode.
If something can only be derogatory if you fundamentally change the actual meaning of the statement, I think it is proven the statement as-is is not in fact derogatory. I would go so far as to say, it's abusing the notion and nature of human language to claim otherwise.
Please do not use these fonts directly from Google servers. This is an another attribute for Google to track the whole browsing traffic. This is too much power for one company.
Google's ToS state that they will not track them beyond simple aggregate numbers of downloads per [time], which they release in their analytics dashboard publicly.
Not only that but the cache headers are set to 1 year... That's a pretty shitty tracking system if I've ever seen one.
On the other hand, letting Google host the fonts will allow your users to use the cached font without needing to waste bandwidth and time unnecessarily. Especially if you already use Google Analytics, I'll say, please let Google host the fonts too.
It is such a shame that people downvote comments like this. There are issues with Google quite aside from the surveillance ones, like limited support of foreign languages (try finding a Chinese script in a known calligraphic style, for just one example). The more demand for independent hosting and subsetting exists, the better the open source tools will become. Let's not enter a period where fonts go the way of email - centralized providers with CDNs and obscure backend processing facilities are the only ones that can do a good job.
edit: why the downvotes? They bring up tofu and then name the cure something close to natto, cured soy beans. And by cured, I mean fermented. And by fermented, I mean stringy at the molecular level, smells and tastes awful. And when I say awful, I mean, most of the people from the originating culture think it's awful.