Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google Noto Fonts (google.com)
1217 points by bpierre on Oct 6, 2016 | hide | past | favorite | 304 comments


I don't think this communicates the _incredible_ amount of creative and technical work required not only to simply fill out over 100,000 characters in the UTF-8 space but to make them stylistically consistent.


Work, shmurk! Their set of Akkadian glyphs set looks incomplete, and they don't even have Sumerian or Rongorongo. These guys need to stop sleeping on the job!

</sarcasm> (Obviously I'm awed...)


Just FYI, this is not sarcasm, it's ironic humor. Not the same thing. Sarcasm is "the use of irony to mock or convey contempt" which is obviously not what you meant.


I think it's sarcasm, actually? The target of my wit was not the mad typographers of Google, but rather the HN commentors who like to sprinkle comment threads on awe-inspiring subjects with dismissive pedantry. Sometimes it feels as though the CDC could announce that a universal cure for cancer has passed its phase 3 clinical trials, and the median HN response would be: "does this page really need to load 170kb worth of CSS? And why does it degrade so badly when I run NoScript?"

Or like the way that some people, when confronted with the awsome ineffable magnitude of my sense of humour...

</Ironic_humor> ;-)


Gosh, there is a lot of depth to language. I like lisper's note because there are a lot of people whose first language is not English and may not know the difference; I like your rejoinder that it was sarcasm against another audience, but was ironic humor to Google. (We can write to multiple audiences at once, no?)

And now here's a font complete enough that perhaps I can find the right glyph to punctuate the idea. We need something with a bit more nuance than the interrobang, I think.


Just to be clear, I quite liked Lister's note as well. I'm a language nerd myself, and always appreciate having more opportunities to cheekily overload some entendres.


sarcasm derives from the ancient greek meaning 'tearing of the flesh'. Sarcasm is intended to wound the recipient. Ironic humor is also a poke, but only tickles. I suggest a new word 'gargalism'.


Also, Rongorongo isn't yet in the Unicode standard. Can't fault a Unicode font for not supporting something which isn't there.


You are technically correct which, as we all know, is the best kind of correct.


Thank you for introducing this notion to me! I struggle with accusations of "sarcasm" when I am not intending to "mock or convey contempt", merely expressing an idea as its more accessible inverse!


You are most welcome.


Well, you did this by sort of mocking (imaginary / hypothetical) haters or nitpickers, so I'd argue it does fall under sarcasm category. Sarcasm doesn't always have to be directed at the person you're addressing, or even an existing person.


I don't think you intended to direct this comment to me. I am not OP.


Instead of "you" it should read "the OP", I apologize.


> Just FYI, this is not sarcasm, it's ironic humor. Not the same thing. Sarcasm is "the use of irony to mock or convey contempt" which is obviously not what you meant.

Just FYI, you were humorless in your interpretation of his comment. Humorless is "unable to see humor in things when most others do."


There was no humor in the attempt. Can't help that.


I'm glad pedantry and nitpicking like this are acceptable on HN and I say this without any intention of irony (or, indeed sarcasm).


What % of people do you estimate would make that distinction?


Call me old fashioned, but I think it's important to know what words actually mean, and to use the words that mean what you actually intend to convey. To quote Tom Stoppard, "If there is any point to using language at all is that a word is taken to stand for a particular fact or idea and not for other facts or ideas." (If you're not familiar with that quote, it worth looking it up and reading it in context.)


Uh huh. Stephen Fry has you pegged, I'm afraid. http://youtu.be/J7E-aoXLZGY


Beautiful rejoinder. That was a fun video. :)


  When I use a word,” Humpty Dumpty said, in rather a
  scornful tone, “it means just what I choose it to mean
  —neither more nor less.” “The question is,” said Alice, 
  “whether you can make words mean so many different 
  things.” “The question is,” said Humpty Dumpty, “which 
  is to be master—that’s all.”
-- LEWIS CARROLL (Charles L. Dodgson), Through the Looking-Glass, chapter 6, p. 205 (1934). First published in 1872.

:D


Yes I think what corrupts this ideal is the extent to which words come to represent feelings and associations instead of facts and ideas.

Like how "literally" has come to represent the feeling of emphasis, and the association with the experience of one-upping other emphatic adjectives like "extremely."


Common use does not make it right. People using litterally in this way are just showcasing their ignorance.


You must not be familiar with the "Prescriptivist vs Descriptivist" debate.

http://english.blogoverflow.com/2012/10/prescriptivism-and-d...


> Common use does not make it right.

In fact, this is the very mechanism by which language evolves.


You mean degrades?


Being flexible with my language doesn't keep me from being precise. I said what I meant.


Actually not. If you google the debate you'll find there's a history of using the word this way that goes back a very, very, long time in a variety of literature.

I have to admit I thought the same as you until someone pointed this out.


Common use is literally the only thing which defines correctness.


¿Por que no los dos? In current use, ‘literally’ for emphasis is a sign of a lowbrow sociolect.


> lowbrow sociolect

Is this Pretentious Twaddlese for "stupid"?


No, for ‘ignorant’, as the grandparent said.


The dictatorship of the Majority ?


I find it weird that this fight is only over "literally" (which doesn't actually literally mean non-metaphorically, but rather "to do with letters") but not "really", "truly" and "actually" and other such words which do literally mean "this is true and real".


You've fallen victim to Muphry's law:

https://en.wikipedia.org/wiki/Muphry%27s_law

"litterally" only has one t in it.


The same word in French as two so this is probably one letter that was dropped over time.


> Common use does not make it right.

Yes, it does. There is no "right" in language use beyond communication with the target audience. What people understand a word to mean is all there is.


That is a very counter-productive and shallow view of language. Language is always changing — semantics are no exception. Words don't mean the same today as they did 100 years ago, and even then they didn't mean the same as they did 200 years ago, and even then... You get the drift...

It's arrogant to think that the language that we're speaking right now is the pinnacle of linguistic evolution and that it's only downhill from here.


Clearly we aren't at the pinnacle of linguistic evolution. Ancient Greek and Latin were far more evolved than modern Indo-European languages. Indo-European languages have been going downhill for the past 2000 years or so ;-)


All this is true. Nonetheless, if you go too far towards the other extreme and apply Humpty-Dumpty's theory of language ("When I say a word it means exactly what I want it to mean") you won't be able to communicate at all.

http://www.sacred-texts.com/neu/eng/eft/eft43.htm


I hear that argument a lot, but I rarely see a case where someone notes "wrong" usage — like you did — where there's actually any ambiguity. Like when people complain about "literally" being used as an intensifier. I doubt they're confused if the speaker e.g. actually died of embarrassment or if they're just using it as an intensifier.

Language has so much redundancy that a slightly different understanding of a single word in a sentence rarely is of any consequence.


Again, everything you say is is true (which, BTW, is why I prefaced my comment with "Just FYI..."). But the sarcasm/ironic-humor confusion is very prevalent even among native speakers, and in a different set of circumstances it could cause confusion or worse because the implied stance of sarcasm is the exact opposite of ironic humor. Also, HN is a public forum, and comments are read not only by their respondees but by lurkers as well, some of whom may not be native speakers and who might therefore appreciate having some of the subtleties of the language pointed out to them.


After reading all the comments here, you still stand behind your original pedantic comment as being appropriate and productive? Fascinating.

If you knew the OP was more likely to feel insulted than to feel helped, that wouldn't have changed your action?


Well, we don't have to speculate. The OP weighed in:

> Just to be clear, I quite liked Lister's [sic] note as well.

So yes.


You're missing a small word ("it") in your quote.

There's another small word ("is") missing in your last sentence.

Call me old fashioned, but even small words matter. To quote Ernest Cline, "People who live in glass houses should shut the fuck up."

I apologize for the profanity, but it's just such a great quote, and so appropriate in this case.


I hear you, but does Stoppard's affinity for style and precision relate to off-topic pedantry? I'm not trying to diminish the importance of correctness. I actually do share your appreciation for these things. Maybe we can make a distinction between literature and singling out one person publicly.

Btw, Stoppard also said "I was always looking for the entertainer in myself ... [but] it's really about human beings".


> Call me old fashioned, but I think it's important to know what words actually mean [...] To quote Thomas Stoppard

It's also important, when pretentiously namedropping a writer and bragging about how you actually read them (all to win a pointless internet argument about a throwaway 'sarcasm' tag) to actually know that writer's name.


This isn't an argument. If I used a word to mean the opposite of what it actually means I'd want someone to point it out. Just like if I get an author's name wrong I'd want someone to point it out. (Thanks for pointing it out.)


He didn't use "sarcasm" to mean the opposite of what it actually means.

Because his post could be read as sincere (unjustified) criticism of Google, he used the well-known internet convention of "/sarcasm" to avoid an unnecessary, off-topic sub-thread stemming from his post. How's that for irony.


Well, if people thought I was looking out of shape or overweight I wouldn't mind someone saying so.

However wouldn't you agree that what I would like, and what is considered generally polite in our culture are probably not always equal?


I literally believe you.


Webster Dictionary: Sarcasm http://www.merriam-webster.com/dictionary/sarcasm

Full Definition of sarcasm 1: a sharp and often satirical or ironic utterance designed to cut or give pain

2 : a mode of satirical wit depending for its effect on bitter, caustic, and often ironic language that is usually directed against an individual

Nope your Ironic contains sarcasm.


That's right, sarcasm is a particular kind of irony. And it's not the kind that the OP was employing.


[flagged]


What? And miss all this intellectually stimulating badinage?

(Now that was sarcasm.)


Maybe they used AI to style the fonts? Train a neural network on a particular style (thickness, density, curvature) and then write out heuristics to generate new fonts based on other fonts, that get classified by a neural net. Let the bots do all the work. Might be an interesting article if the actually did that.



And if they didn't do that, it can still be an interesting master thesis.


I hate to be "that guy", but you are talking about the Unicode space.

UTF-8, UTF-16, etc. are just different encodings of the exact same Unicode space.


UTF-16 is not an encoding - It's an abomination.


and doesn't show it either, which is weird

edit: github page gives a glimpse in the header image: https://github.com/googlei18n/noto-source


actually that was the first thing that came to my mind, what a ginormous job it must have been with a team hunkering down and working with multiple experts from around the world, for many unsung months (maybe years). However I personally believe that Google not shouting about how hard this was, explicitly, makes it all the more commendable. Quiet, amazing work, not shouted from the rooftops by PR/marketing people. Honest, modest, brilliance. Like all the best things.


I wonder whether they used neural style transfer to automatically generate them.

https://arxiv.org/pdf/1508.06576.pdf (See p. 5 for examples.)


The fonts are mostly built by Monotype, with the exception of CJK fonts which are built by Adobe, in the regular way of designing and building fonts.


Interesting. Is that clear from the fonts themselves, or did you learn it out of band?


Yes I remember a Wikipedia post explaining that Helvetica was the only suitable choice as a font because of this. good job Google.


So in a nutshell, typefaces which literally span every single defined character?

Are there special optimizations implemented for different use cases as well, e.g. screen v. print and sub-varieties of each? Ten years ago with Vista, Microsoft Typography (https://www.microsoft.com/en-us/Typography/default.aspx) put out a family of typefaces--Cambria, Calibri, Consolas, etc.--which were optimized specifically for sub-pixel rendering on LCD screens while maintaining on-paper legibility. I'd be cool with Noto not having any such optimization in mind given that the stated objective appears to be to include every defined character, but I do wonder if it should happen eventually.

...or maybe not, who knows. Pixel densities now have approached ludicrous territories. It might just no longer matter at least when we're talking about optimizing for screens.


At the time I wrote to one of the team members who worked there (maybe a Program Manager - I forget exactly) and they sent me a beautiful book showing off each of the fonts with a description of why the font was made, how it achieves the goals, and some info on the font's designer.

I still have it at home - really impressed me with the attention to detail and the self-styled publicity.


It was called _Now Read This_ and it really is beautiful. As far as I know, it was not intended to be publicly available, but you can buy a copy on Amazon: https://www.amazon.com/Now-Read-This-Microsoft-Collection/dp...


  Currently unavailable. We don't know when or if this item will be back in stock.
Correction, you could buy a copy on Amazon...

Own up! Who got the last one?!


Some of the Noto faces are hinted, some aren't -- the filenames tell you which ones are which.


It also seems that we need incremental loading on an as-needed basis, e.g. on mobile devices with limited bandwidth :)


CSS unicode-range has that covered: https://developer.mozilla.org/en-US/docs/Web/CSS/@font-face/... Pretty good browser support, too.



> "Google’s open-source Noto font family provides a beautiful and consistent digital type for every symbol in the Unicode standard, covering more than 800 languages and 110,000 characters."

Wow. That's a lot of work.


Indeed but worth it if you are Google though.


How is this worth it for Google? Honest question, always wondered what Google gets out Google Fonts.


> How is this worth it for Google?

Text is more easily consumed by Google services (Search, Now, etc.) than, e.g., graphical representations, and information is more likely to be stored in text if there are fonts that support presenting the information people want to communicate. Having a font family covering all world languages (and covering the full gamut of Unicode characters) increases the scope of information that can be effectively communicated via text.


Google has a deceptively simple incentive to make everything better: better fonts, better browsers, better phones, etc, all this drives increased web adoption, hence increased advertising revenues.

PS: I love Noto specifically for its Unicode support. I use it on my blog. It's also the default "serif" font on Chrome for Android (but not Chrome for Linux).


"everything", except where it doesn't.

Its Usenet archives suffered from bitrot. Its RSS reader is no more.


I regret to inform you that Noto is licensed under the SIL Open Font License, which is FSF-approved, and essentially ensures that it will be available to all in perpetuity.

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=...


You aren't really contradicting that they have incentive to improve everything. Incentive is not the same as action. You're effectively arguing with clouds for being in the sky.


Then that makes "incentive" a rather useless word, because everyone who wants to make the world a better place also has the incentive to make everything better. Why single out Google if it's also true of Microsoft, Apple, and me?


[flagged]


Why are you so abusive towards me when you agree with me?

I'm not knocking them for stopping support for things they don't find sufficiently monetizeable.

I disagreed with the statement "Google has a deceptively simple incentive to make everything better", and gave counter-examples.

Append something like "... where it helps drive profit", then I've no complaint.


Well Google has an incentive to make everything better, but they can't literally work on everything, so they have to pick and choose.

For example Google has repeatedly explained Reader was killed because "usage has declined"¹ So this validates my point: they put resources in developing a product, but it doesn't get enough traction, so they kill it and redirect their resources to other projects that will hopefully be more successful and provide more users and ad revenues to Google.

¹ http://googlereader.blogspot.ca/2013/03/powering-down-google...


That example does not validate your point because it supports multiple hypotheses, including my hypothesis that Google is only interested in improving things which generate sufficient profit.

That's not your '"everything"' but my '"everything", except where it doesn't.'

In other words, they pick and choose.

FWIW, I also have inventive to make everything better, because I want the world to be a better place. But I too must pick and choose.


No, there is an incentive to work on everything. Given bandwidth constraints, there isn't sufficient incentive to work on everything at once. The fact that they have to pick and choose does not negate the fact that they have more incentive to improve everything than pretty much any company in history (with the exception of Facebook, perhaps).


I'm honestly not sure i understand how your evidence supports your point. They did, indeed, have incentive to make everything better. That they later stopped supporting a thing does not contradict this, because they had the incentive to make reader in the first place.

Stopping support is, as someone said, not about incentive, but about later action.

The fact that they cannot focus on everything does not in fact, mean they have less incentive. It just means they can only do a limited set of the stuff they have incentive to do at any point in time.

That does not change the incentive itself.

If you want to argue "they only have incentive to do what is profitable in the first place", you would have had to argue "they could have made reader, but chose not to" or something similar. (IE they did not have sufficient incentive)

TL;DR your argument is misplaced. Google has incentive to make everything better. They only have resources to long term focus on things that make profit.

These are not contradictory statements.


Then why single out Google? By that argument, I also have incentive to make everything better, because I want a better world. So, I imagine, do you.

DuPont wanted a "Better Living Through Chemistry" - didn't they also have an incentive to make everything (made of chemicals) better?

Is there something different about Google's incentive which doesn't apply to Microsoft, Apple, eBay, GitHub, ... or the Sierra Club or WWF for that matter?


"Is there something different about Google's incentive which doesn't apply to Microsoft, Apple, eBay, GitHub, ... or the Sierra Club or WWF for that matter? " Nope ;)


>>Get over it and move on with your life already.

Please review HN guidelines.

----

Be civil. Don't say things you wouldn't say in a face-to-face conversation. Avoid gratuitous negativity.

When disagreeing, please reply to the argument instead of calling names. E.g. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."


Aside from the obvious benefit to having a Web that's less broken for everybody, one of my previous projects at Google was implementing support for multiple scripts in a mapping product. I literally don't know how I could have pulled this off without Noto: all other alternatives would have involved some combination of subpar quality, incomplete coverage, incompatible appearance or near-impossible license requirements.

(Standard disclaimer: I work at Google, but these views are my own, not Google's.)


Well, they can track people across sites that use Google Fonts, so maybe having a larger set of characters and encouraging more use gives them a farther tracking reach.



For the lazy, a snippet indicating that no such tracking occurs:

What does using the Google Fonts API mean for the privacy of my users?

The Google Fonts API is designed to limit the collection, storage, and use of end-user data to what is needed to serve fonts efficiently.

Use of Google Fonts is unauthenticated. No cookies are sent by website visitors to the Google Fonts API. Requests to the Google Fonts API are made to resource-specific domains, such as fonts.googleapis.com or fonts.gstatic.com, so that your requests for fonts are separate from and do not contain any credentials you send to google.com while using other Google services that are authenticated, such as Gmail.

In order to serve fonts quickly and efficiently with the fewest requests, responses are cached by the browser to minimize round-trips to our servers.


Google also said that their new messaging app would be end-to-end encrypted by default, and then quickly changed their mind when business reasons prevailed. Business reasons will always prevail. But even beyond that:

>For the lazy, a snippet indicating that no such tracking occurs: Requests to the Google Fonts API are made to resource-specific domains, such as fonts.googleapis.com or fonts.gstatic.com, so that your requests for fonts are separate from and do not contain any credentials you send to google.com while using other Google services that are authenticated, such as Gmail.

That's not what the snippet says. It is a technical statement, not a privacy statement. It doesn't mean they can't track you, it means it's a tiny bit harder to track you. Nothing in there says "we don't cross-reference these data" or "these data aren't used for tracking purposes" or anything else. Just that your Google account information isn't sent to the font servers.

More relevantly, the Google Fonts privacy policy links to the general Google privacy policy, which doesn't have any special "if you're using Google Fonts we collect a whole lot less than this" subsection. They might be using the data. They might not be. They might not be using the data today but decide to start using it tomorrow.


If that was their goal, why would it be open source and licensed for self-hosting?


A completely valid point, even if you get downvoted for perceived cynicism.


But nothing requires you to link directly to the font on their CDN. The sources, the whole build pipeline, everything is available and you can host it all by yourself.

The font they created has nothing to do with tracking you, it just so happens that if hosted on Google Fonts they can also do that. But that's a feature of Google Fonts, not Noto.


Pure speculation, but I would guess the vast majority of people do NOT self-host. I use their CDN as it's faster and much easier to set up, just link in your CSS or whatever.


The SIL Open Font License restricts subsetting, so you do need to be careful when doing this (the license requires that you change the font name if you reduce the font's display quality). Google has gotten waivers to allow them to alter the fonts without changing their names.

The repository with the fonts and tools is here: https://github.com/google/fonts


Only fonts with “reserved name” clause have such restriction, and then you are only required to rename the modified font.


For it to be a valid point, it has to be a sensible strategy or position for Google to take. Do you actually think that under the list of goals and objectives for this project there was a bullet point that was like "* Better user tracking across the web". There are loads of reasons why it makes sense for Google to do this project ahead of user tracking. It's possible - anything is possible really - but does it make sense?

Even if this was remotely close to their goal, it still doesn't make sense. They already have Google Analytics on basically every website in existence. Investing millions of dollars in an extremely robust font to increase web tracking by 0.0000000001% seems like a terrible investment.


You're saying that Google, THE tracking company, is possibly sitting on this huge pipeline of data from IPs connecting to their CDN downloading fonts and they're just not doing anything with it? I would have a very hard time believing this.


Exactly! This is in my opinion only reason they do it.


Let me elaborate. I offer two situations and you can choose which one of them is more plausible.

a) What should we do next? Our cash is burning! Well, lets use countless human hours to make a beautiful font to make the world a better place!

b) How can we reach to more traffic that might be currently not visible to us? Well, maybe we should design a really beautiful font that everyone likes and host it in our server? Sounds like a plan, lets do it!


How about

c) We're making products supporting an unprecedented amount of locales and there's no font family out that that covers them all and still looks consistent, we're going to have to make it ourselves. Hey, if we open it up then there will be even more international content out there to put AdSense on!


Yes, this one looks also possible. Thank you for this.


Google has too much, not too little cash.


He may mean the cash is burning in that there are employees that need things to do at Google.


Although it's not always perfect, Google takes internationalization seriously. Its websites are used all over the world. Having a nice font that supports every language simplifies things.


They don't.

I18n on Android is seriously awful. For example, Android 5 shipped with an improperly escaped string in the French Canadian localization strings that would cause the phone to reboot in a loop if it was connected to a charger when opening the lockscreen. Even the latest Android version (Nougat) displays the battery as "24 % %" in the French localization, "Press power button twice for camera" has been translated to the near unreadable "App. 2x sur interr. pr activ. app. photo" (roughly "Prs pwr btn 2x fr cam.") and some of the localized strings are comically nonsensical.


"Takes seriously" is orthogonal to "is great at". There are also non-i18n problems in Android which lead to reboots.

If you expect the UI to be perfect in every language all the time, consider what you're actually asking.


Most of the fonts on Google Fonts are not, in fact, authored by Google. They're just distributing them, which is relatively cheap.

The Noto family is an exception.


> How is this worth it for Google?

Anytime you find yourself asking "why is big company X giving away Y for free?" the answer is usually that they're commoditizing their complements:

http://www.joelonsoftware.com/articles/StrategyLetterV.html

Virtually everything Google gives away for free follows this pattern.


Traffic insights for websites that don't use Google Analytics.


Except the fonts are cacheable for a year, so you won't be pinging Google on every site that uses Google Fonts. It wouldn't be a very good tracking mechanism, if that's what it were intended for.


Per their Fonts API ToS they don't track anything except total aggregate usage per font.


That only works if you use Google's hosted fonts though, right?


Yeah, but that's still plenty of webpages.


Well people keep saying they're interested in a better web. What they get out of it is data. If you include Google fonts they can track your users.


Tracking


110,000 characters is not every symbol in the Unicode standard, though. According to wikipedia Unicode 9.0 contains around 128,000.


Yep, Old Hungarian is still not there, nor anywhere else except for fonts that were specifically designed just for Old Hungarian.

I was happy when the script got included in Unicode 8.0 but if fonts don't support it, it's not much use.


It's bloody close though, and no doubt at least a useful chunk of that will eventually be filled out.


> When we began, we did not realize the enormity of the challenge.

That right there is probably the most important ingredient of many moonshot projects.



We had to stop using this font as it didn't support back tick(`). Try typing ` multiple times in the search box.

We had reported this to Google some months back but got no response.


A quick summary of this issue:

- 'Dead' keys are keys used to modify the next character

- The backtick / grave key behaved as such on a typewriter

- ` + a = à

- Noto web fonts behave like this

- The latest Noto local fonts do not

- The expected modern behaviour in English is ` + a -> `a

- This is not necessarily the expected behaviour in other languages


It seems like you're confusing COMBINING GRAVE ACCENT and GRAVE ACCENT. These are two separate characters with distinct code points. One combines, the other does not. It's not a property of the font; it's a property of the characters.

http://www.fileformat.info/info/unicode/char/0060/index.htm

http://www.fileformat.info/info/unicode/char/0300/index.htm


Yes they are unique characters. I went to some pains to ensure I used the word 'key' as in physical switch on a keyboard. However I did not go far enough in ensuring I was referring to the font package as opposed to the font, please accept my apologies.

It is the font package, look at iverg's comment on the github issue https://github.com/googlei18n/noto-fonts/issues/736

here are screenshots with local font installed http://i.imgur.com/ycRs3A0.png

and without local font http://i.imgur.com/TUKsIY4.png

EDIT: it goes deeper: https://news.ycombinator.com/item?id=12656642


> Noto web fonts behave like this

Dear god why? They're Unicode fonts. There are perfectly good Unicode combining characters that can be used when this behavior is intended. Making non-combining characters behave like combining characters breaks all kinds of text.


> This is not necessarily the expected behaviour in other languages

But isn't it normally the operating system's responsibility to implement this, depending on the keyboard layout?


Operating system? Buddy, this is web development. We're reinventing the whole stack on top of HTML and JavaScript. Welcome to the future.


    > But isn't it normally the operating system's
    > responsibility to implement this
Presumably the font has backticks, but they're (poorly) implementing this method of accenting in the demo web app to allow people to search for accented characters?


Nope.

It looks like "web fonts" in general are poorly implemented. You can recreate the problem with just a text box and font styling CSS.

If you look at some of my other comments in this tree you will see screenshots of the same text box with and without the local fonts installed. Presumably when local fonts are installed the OS handles everything, without the browser and the noto-fonts definition file gets in the way.


What do you mean by "Noto web fonts behave like this"? Whether dead keys are enabled is a property of your system / input method.

The standard en_us linux config does [alt+\] [a] = à instead for example. (unless you choose a keyboard layout)


That is what would make sense however that is not the case. I should have been clearer and used the term "package" after font:

Local fonts package installed: http://i.imgur.com/ycRs3A0.png

No Local fonts package installed: http://i.imgur.com/TUKsIY4.png


That's weird. The package seems to include the ttf files only... https://www.archlinux.org/packages/extra/any/noto-fonts/


It is seemingly to do with how the "web fonts" are implemented by browsers. For more weirdness look here: https://news.ycombinator.com/item?id=12656642


You are conflating keyboard layout, unicode codepoints and the font rendering of said codepoints.

The issue here is that the non-combining accent (as generated by many keyboard layouts, dead keys or not) is rendered by the font as if it were combining. All the other aspects are tangential.


> This is not necessarily the expected behaviour in other languages

Which language, for example?

I'm french, we have characters with grave accent ('à', 'è' and 'ù'), and we do not expect the "` + a = à" behavior, as for one language.

EDIT: we do expect that "^ + a = â", though, but that's an other thing, and we have actually two ^ keys, one for our behavior and one for the usual behavior.


Brazilian Portuguese expects that behavior to write words such as "àqueles"


Forgive me, I don't want to sound xenophobic but this seems preposterous to me. So much energy wasted. I am deeply grateful to the guys who were developing keyboard standards in our country that I don't have to go through such waste. When we want to use "o" with accent we just press right ALT with "o". Simple and fast. Frankly speaking, this is first time I hear about such weird combination just to get local characters, in western countries ofc. PC keyboard is not mechanical one. Why someone would copy the same behavior is beyond me.


If you write in a single language that has only a few combinations, sure. But as the number of diacritics - letter combinations increase, dead keys become a more attractive solution.

With just two or three Western European languages to write in, you can easily have grave, acute and circumflex accents, umlauts, tildes, plus a couple one-off letters (like ß for German, ç for French, or å/æ for Scandinavian languages). Combining those with every vowel they can apply to grows too much for the poor Alt key.

The US-International layout runs into this problem. Since Alt + A = Á, if you want to type Ä or Å or Æ in a single keypress you have to use Alt + Q, W, or Z respectively, which is hardly intuitive. And if you want À or Ã, you're SOL.

Dead keys (also in the US-International layout) solve this problem. I know which keys make the tilde, umlaut, both types of accents, and circumflex. If I want the bare symbol (or a quote in the case of umlaut) I press spacebar after the symbol key, otherwise I press any applicable letter to output the modified version. For example, I'm pretty sure there's a key combination for ç, but I don't remember what it is (Alt + C is ©); however I did remember that I could type it as ' + c.

The other main solution is to have a hotkey for switching language layouts - typing in each language will be slightly faster, but when I tried it I frequently forgot which layout I was on and had to stop, delete my mangled output, switch language, and type again. I find dead keys much more friendly to muscle-memory.


I find it useful on a Mac to use some of the shortcuts for accents and diacritics in the default US-English keyboard layout, since I occasionally do have to type accented characters, or type text in a non-English language and don't want the hassle of changing keyboard setup or going through the Unicode menu. They all involve holding Option and pressing the key, though (for example, 'à' is Option-` followed by typing the 'a'; Option-e gets acute, Option-n gets tilde, Option-i gets circumflex and Option-u gets diaresis; there are also shortcuts for some common specific characters like ç and å).


It makes perfect sense if you know anything about Portuguese.

'àquele' is the preposition 'a' + 'aquele'. Grave accents are only used to mark contractions such as this, and the circumflex and acute accents - which denote differences in vowel quality and stress - are much more common, which is why it makes perfect sense to prioritise ease of typing for those two over the grave accent.


> When we want to use "o" with accent we just press right ALT with "o". Simple and fast.

That's fine for ò I guess, but then how do you type ô, ö and ó?

But as has been said elsewhere, this thread is confusing dead keys, which have nothing to do with fonts: typing the ^ key followed by the o key enters a single character ô, with combining characters where the character ̂ followed by the character o is rendered as ̂o (which should look the same as ô) (edit looks like this works fine on my fixed-width font when editing, but not so fine on the regular proportional font when displaying the comment, it's sadly quite usual for fonts to mishandle combining characters).

Here, the issue is that the standalone non-combining ` (as well as ´, ¨ and ¸) is handled by this font as if they were combining, thus doubly confusing users with dead-key keyboards.


I guess it all depends on the frequency of such characters in the given language. For example, in french keyboards, we have dedicated keys for 'é', 'é', 'à', 'ç' and 'ù', because they are incredibly frequent. On the other hand, 'â', 'ê', 'î', 'ô', 'û', 'ë' and 'ï' are less frequent, so they are generated using modifiers.

I'm quite glad of that, actually, it would be really annoying if we had to keep using modifiers every two words. I know french people who use English keyboards, they just usually forget about accents totally instead of using dedicated modifiers.


I see. Thanks!


Spanish keyboards work like that too. If you want the accent by itself, you tap it followed by a space (same as in a type-writer).


Hmm, here in Poland we use right alt plus a letter to make the diacritic version of it, e.g. alt+a = ą (the left alt functions like in other keyboards).

It is probably easier for language that doesn't have multiple diacritics for the same letter.

In Poland we have only one such letter: z. The problem is solved by using 'x' for the other (less frequently used) diacritic type:

alt+z = ż

alt+x = ź


Which language, for example?

On Swedish keyboards, for example, "` + a = à"


Aside, I can't believe I never realized that was the intended use of the grave key.


This is not related to the font.


Local font installed: http://i.imgur.com/ycRs3A0.png

Web font: http://i.imgur.com/TUKsIY4.png

See iverg's comment on the github issue: https://github.com/googlei18n/noto-fonts/issues/736

EDIT: To be clear, parent is technically correct. The problem is with the way "web fonts" is working, the actual 'font' is the same across both, just the web fonts are having fun.


I can confirm this. When I type back tick in the search box, the cursor doesn't advance to the right. The next character I type after one or more back ticks will look like it has an accent on it (e.g. `````a becomes à).

I'm using Chrome v53.0.2785.143m on Windows 7.


I get the same behavior on firefox nightly.


Don't know what the problem looked like on your system, but there was no apparent problem here on Firefox/Linux.


Chromium/Linux here, I can replicate it on the specimen page.

https://fonts.google.com/specimen/Noto+Sans

Under the 'Styles' section, type:

    `foo`
You'll see the first ` overlap the f.

Now erase it all and type:

    ````
They all overlap each other and show up as just one `.

Edit: from what it looks like, the problem is specific to the sans. I can't replicate it with the serif (https://fonts.google.com/specimen/Noto+Serif if anyone wants to try).


Ouch! Thanks for the report. We'll look into fixing that ASAP.


Thanks for taking the time to explain the issue. Yes, same problem shows up here.


For me at least (Chrome), it seems like it isn't applying any surrounding spacing for the backtick. If you hit ` a few times and then hit some other character, they seem to occupy the same space in the text box.


I experience same problem with Firefox 47.0 on Ubuntu 15.10.


On my system, it puts the backtick as an accent mark. When I type more than one backtick, they overlap. "`a" displays as "à". "``a" displays the same.


I don't get it, what's the problem?


Here's a link to the issue on Github: https://github.com/googlei18n/noto-fonts/issues/736


Nothing looks wrong to me (Firefox/Debian). Are you sure it's a problem with the font?


It does happen in the search box as you say but I have been using Noto Sans for a while now on my site and it doesn't have this problem: https://kitnic.it/.


Interesting. After playing with it a bit I came to the example:

You must not have local noto fonts installed for this to work.

http://codepen.io/anon/pen/dpdOzk

Using chromium we see that:

Inheriting the font means the "bug" doesn't affect you.

Adding a class with the font only affects textareas, not inputs.

Adding the font directly affects both.

Notably the google demonstration page has the rule:

    html,input,textarea{font-family:'noto sans',arial,sans-serif}
this applies the noto bug directly to the input field.


Can you link to the bug report?


Come on... google dead keys


Since when was the grave accent character a modifier or dead key on a computer keyboard? Using a compose key is one thing, but colons, carets, and tildes aren't magically combining with letters to form letters with diaereses, circumflexes, or tildes above them.


Windows does this with the US International keyboard enabled. `+a=à, '+e=é, "+o=ö, etc.


On many non-English layouts there are dead keys, and the ` key may not exist, except as a dead key.

For example, the letters å, ø and æ are very common in Danish, so they have their own keys. The acute accent is sometimes used to mark stress, like é or ǿ, and this is done using a dead key -- it wouldn't make sense to have several keys just for this purpose.

http://fontmeme.com/images/danish-keyboard-550x183.png


I keep wanting to find a font to use on the portions of my website that contain Japanese, but they are all so big! No way I'm making my visitors download 115MB worth of font just so it looks a little nicer.

Edit: Thanks for the tips, I will look into those options :)


The solution (more of a workaround) to this is to cut the font to just JIS X 0208 (containing Kanji level 1 and 2, total of 6,879 characters) or JIS X 0213 subset (containing Kanji level 1-4, total of 11,233 characters). It should cut the character count to the most commonly used ones, which should reduce file size by a lot (or skip JIS X and select only glyph for Kanji level 1 to reduce it even further).

There is a Subset Maker Tool[1] (in Japanese) that can do this. But you need to provide the list of characters you want to keep by yourself (searching for JIS第1第2水準漢字 usually helps).

If you do not want to do this by yourself, Google also provide a version of Noto Sans that has been stripped down to just JIS X 0208 subset in their Early Access program[1].

[1]: http://opentype.jp/subsetfontmk.htm

[2]: https://fonts.google.com/earlyaccess#Noto+Sans+Japanese


Since you mentioned that your entire site isn't in Japanese, in addition to what other people specified, I would now recommend using the CSS unicode-range selector. That allows you to list a web font declaration with a range of Unicode code points and most modern browsers now implement the optimization where it only downloads the font when text on the page contains a character in that range.

I tossed together an example several years ago:

http://chris.improbable.org/experiments/browser/webfonts/uni...

The combined example on http://chris.improbable.org/experiments/browser/webfonts/uni... has CSS which looks like this:

    @font-face {
        font-family: NotoSansCombined;
        src: local("NotoSans"), local("Noto Sans"), url(NotoSans-Regular.woff) format("woff");
    }

    @font-face {
        font-family: NotoSansCombined;
        src: local("NotoSansArmenian"), local("Noto Sans Armenian"), url(NotoSansArmenian-Regular.woff) format("woff");
        unicode-range: U+530-58F, U+FB13-FB17;
    }

    @font-face {
        font-family: NotoSansCombined;
        src: local("NotoSansBengali"), local("Noto Sans Bengali"), url(NotoSansBengali-Regular.woff) format("woff");
        unicode-range: U+AD, U+D7, U+F7, U+964-965, U+981-983, U+985-98C, U+98F-990, U+993-9A8, U+9AA-9B0, U+9B2, U+9B6-9B9, U+9BC-9C4, U+9C7-9C8, U+9CB-9CE, U+9D7, U+9DC-9DD, U+9DF-9E3, U+9E6-9FB, U+200B-200D, U+2013-2014, U+2018-2019, U+201C-201D, U+2026, U+20B9, U+2212, U+25CC;
    }

    @font-face {
        font-family: NotoSansCombined;
        src: local("NotoSansCherokee"), local("Noto Sans Cherokee"), url(NotoSansCherokee-Regular.woff) format("woff");
        unicode-range: U+13A0-13F4;
    }
In modern versions of Chrome, Firefox, and Safari you can see that only downloads a few of the many language files available. This also works really nicely with the traditional CSS font stacks if your design goals allow you to specify system fonts which will work for many users so you can have a few local fonts specified before the downloadable one.


Does google fonts' css include do this directly? It's been a while since I've considered it, but would be nice to see a font-famly 'NotoSans' and 'NotoSerif' that has all the unicode ranges spec'd.

As an aside, I'd love to see something similar in a fixed-width font (where the asian block characters are double-width)


They did not in the past but a quick look at https://fonts.googleapis.com/css?family=Noto+Sans shows that they do now, at least if you request it using a modern browser.

This is not surprising since the Chrome team is generally really aggressive about performance in general and they were quick to jump on this issue:

https://bugs.chromium.org/p/chromium/issues/detail?id=247920

On a related note, I've found the Chrome team to be really responsive about i18n as well – if you report something about even fairly obscure scripts not being supported, they usually respond rapidly. Quite pleasant to see.


What you want is a font subset and google even provides[0] an API to get one.

[0] http://thenewcode.com/878/Slash-Page-Load-Times-With-CSS-Fon...


Google's Noto Sans CJK JP contains 9 font files, each about 13MB. You can edit fonts to subset glyphs or glyph sets you expect to use to further reduce the file size, although I suspect the CJK languages are never going to be as easy to get to fast web speeds. The JIS X 0208 subset still weighs in at almost 4 MB for a single OTF.


16M: NotoSansCJKjp-Regular.otf


I always thought the term Tofu was "mojibake": https://en.wikipedia.org/wiki/Mojibake

Wikipedia disambiguation page: for "Tofu" mentions: Slang for the empty boxes shown in place of undisplayable code points in computer character encoding, a form of mojibake

https://en.wikipedia.org/wiki/Tofu_(disambiguation)


Mojibake is specifically the "garbage" from getting an encoding wrong—treating a sequence of bits as if they were a valid sequence of characters in some encoding, when they're either not from that encoding, or not from any textual encoding (because you're e.g. trying to render the contents of an executable binary as text.)

"Tofu" is a more recent phenomena, that didn't really have a specific name until now: when there are Unicode code-points—correctly decoded—in a document, but you have no font installed that offers a glyph to represent them.

Mojibake results from "I did this wrong and didn't notice"; tofu results from "I know what this is, and it's something I don't have a visual signifier for."

Mind you, often mojibake will result in "tofu"; the garbage code-points you get from bad encoding detection will turn out to be ones that don't have a currently-defined Unicode character, so they'll show up as U+FFFD REPLACEMENT CHARACTER (�). But that's a coincidence, rather than an equivalence.


Thanks! That was a super helpful response!


Is Google using aws to host this.

If you look at: https://noto-website.storage.googleapis.com/

You will see the following: <?xml version='1.0' encoding='UTF-8'?> <ListBucketResult xmlns='http://doc.s3.amazonaws.com/2006-03-01'> <Name>noto-website</Name> <Prefix></Prefix> <Marker></Marker> <NextMarker>emoji/emoji_u1f468_200d_1f468_200d_1f466_200d_1f466.png</NextMarker> <IsTruncated>true</IsTruncated> <Contents> <Key>css/emoji-zsye-color.css</Key> <Generation>1464738619772000</Generation> <MetaGeneration>1</MetaGeneration> <LastModified>2016-05-31T23:50:19.729Z</LastModified> <ETag>"e3aaae52d88ced070044f59d1efe2009"</ETag> <Size>152</Size> <Owner/> </Contents>

http://i.imgur.com/yLiWUGq.png

Are they using Amazon S3?

Edit:

They just changed it. @1:00 pm so it no longer mentions aws.


Good question. I can explain. "storage.googleapis.com" is an endpoint for Google Cloud Storage's "XML" API. Google Cloud Storage (GCS) is a cloud blob storage service similar to Amazon S3.

This particular API, the "XML" API, is designed to be API-compatible with S3. The XML namespace, 'http://doc.s3.amazonaws.com/2006-03-01', is therefore the same. This allows third party tools like 'boto' and the like which work with S3 to work with GCS with only a switch in hostname.


Thanks for the answer. Do you know why they changed the scheme after the initial upload?


I have no idea. I'm not sure what you were looking at.

If you visit just "https://noto-website.storage.googleapis.com/", that's a request to list all of the objects in the bucket, so you'll see an XML document with that namespace and the results of a listing.

If you were trying to download a specific object from that bucket, you'd either see the resource itself, or you'd see an XML error result of some sort (404, 403, etc). So I assume that either you mistyped the object name once or else the object had been deleted for some reason.


If you do the DNS traces, it's easy to see: No. They're just implementing a S3 compatible API.

  ;; ANSWER SECTION:
  noto-website.storage.googleapis.com. 3174 IN CNAME storage.l.googleusercontent.com.
  storage.l.googleusercontent.com. 300 IN	A	172.217.5.112

  ;; ANSWER SECTION:
  112.5.217.172.in-addr.arpa. 86400 IN	PTR	sfo03s07-in-f16.1e100.net.
  112.5.217.172.in-addr.arpa. 86400 IN	PTR	sfo03s07-in-f16.1e100.net.


The host could be implementing an S3 compatible API.


Just schema


It was released originally in 2013: https://en.m.wikipedia.org/wiki/Noto_fonts

Such an amazing project


I wish Wikipedia redirected m. links to the desktop site if you aren't using a mobile browser.


Maybe it would make sense, but I personally prefer their mobile site. It's much nicer to read. Not that their desktop site is filled with ads and gibberish, but for whatever reason I like it more.


local userscript workaround: https://gist.github.com/jspenguin/11295206


> The name noto is to convey the idea that Google’s goal is to see “no more tofu”.

Nofu would have been a better name if that's actually the goal.


Noto fonts are great. I especially love the OFL licensing. I wanted to include fonts in a mobile application and there was confusion on whether including a GPL version of the fonts would force me to GPL my app - something I could not do. I couldn't find a definite answer.

With Noto's OFL licensing, I no longer had that worry.


I think that you are looking to the GPL from the wrong perspective. I also made this mistake.

GPL does not force you to licence your application under GPL.

Instead licensing your application under GPL allows you to use other GPL licensed code.


You don't have to use the GPL to link GPL code. You are just required to distribute the final binary under GPL. When you distribute the source files, individual files can be under individual licenses, they just have to all be license compatible with the final license.

IE, use GPL code, license header all your own stuff as Apache, distribute the binary as GPL, but anyone else can use your own work under Apache if they want if you are individually licensing all your own files under it, they just could not use the GPL parts without also distributing under GPL.


Do you know a project where such licensing scheme is used as an acceptable practice? As to be honest, it looks counter productive against GPL ideas.


In my eyes that practise would be equivalent with dual-licensing your code under GPL and the (more liberal) license of your choice. This happens with some frequency.

It's a way to honor the GPL idea for the stuff you link to without having to follow GPL ideas for your new code.


Linux kernel is one for instance. Tons of drivers there include GPU one are permissively licensed to make it possible to reuse that code for other kernels like BSD.


That seems like a meaningless semantic difference to me. There is no practical difference between these two sentences:

If you wish to use GPL-licensed code, your code must be open source.

If your code is open source, you may use GPL-licensed code.


It's not meaningless. If I give something away under certain conditions, I'm not forcing anybody to adhere to those conditions.

Instead, if they choose to adopt those conditions, they get free stuff from me (and a lot of other people.) In other words, it's an incentive program, not a requirement such as a law or regulation.


Yes, these two feel quite similar for me too, but the one "GPL forces you to licence your application under GPL." is quite different.

Both of your wordings are quite neutral, but this one contains in my opinion a quite strong negative element.

For me it was small, but important difference.


Following you logic you would have to release your app now with a Open Font License ...


Not at all...

The GPL requires anything that "includes" it to be GPL compatible. The definition of "includes" that it uses is pretty "iffy" and could easily be extended to include the whole application if you aren't very careful.

OFL has no requirement like that, so you can use it freely in whatever project you want.


I wanted to try the Noto mono out for programming, but it looks like the `O` is indistinguishable from the `0` :/

Otherwise, it's a nice looking font for the editor.


I've just tried it, and it is absolutely the same as Droid Sans Mono (which I love and use), barring the extra line spacing. They haven't even fixed the kerning problem with "w".


The lack of a dotted or slashed '0' is infuriating for me, especially considering that Droid Sans Mono has variants for both.


It doesn't seem to support the Tangut script, which is added in Unicode 9.0. It's the first thing I test if a font claims to support all languages in Unicode. In my knowledge, I haven't really seen any general-purpose fonts containing support for Tangut (because probably no one is going to use it). I thought Google has already completed this project, but apparently they haven't yet.


They've completed it up to Unicode 6.0 (released in 2010).

Are there any Unicode fonts at all for Tangut? I looked but didn't find any.


Probably not -- unless you count the proprietary, obfuscated font they use to make the official Unicode charts.

Some people snark at Unicode for putting so much more effort into emoji recently than they put into scripts used in actual languages, but y'know, when they add emoji, at least people design glyphs for them pretty quickly. When they do the research to designate codepoints for Anatolian hieroglyphs, the codepoints just sit there unloved and unsupported.


Fun fact: People behind a WatchGuard firewall (default settings) won't see Google fonts (at least in Firefox) because the firewall filters the CORS headers.


What an awful overreach. Firewalls should not ship with defaults that break the internet, and such presumptuous header filtering can actually weaken security.


Filtering X-Frame-Options is bad for security. Filtering anti CSRF headers like in Drupal and ShopWare is bad for security.

But this seems to be the current reality with enterprise firewalls. :-(

(No, TLS doesn't help in the long run. If your boss wants to know which pages you are reading he will let the experts setup this: https://www.howtoforge.com/filtering-https-traffic-with-squi...)



Nice.

I'm just the victim. Whenever I suspect the firewall to be the cause of a problem I use a proxy via a ssh tunnel. As long as it's not forbidden for me to actually solve problems.

The list of software that isn't usable behind a WatchGuard firewall (and maybe similar enterprise firewalls) is getting longer and longer. No Google fonts, no Drupal, no ShopWare 5.2, JIRA barely usable, etc.


I'm sure I can find plenty of examples just by searching (and I'll do that in just a moment) but do you happen to have a list of these issues that you've encountered that can be attributed to WatchGuard firewalls?

I am (primarily) a network engineer and such a list would be wonderful to have when recommending for or against specific products.


Cisco ASA firewalls perform "application inspection" (enabled by default) that breaks EDNS0, (SMTP) STARTTLS, and others.


Noto & Roboto are probably the two fonts I love more than anything now, I just find them so pleasing, they don't do anything stupid and they look good.


The claim of full coverage of 110,000+ characters (they are targeting Unicode 6, apparently) appears to be false: Noto Sans CJK covers approximately 30K characters[1], while as of Unicode 6.0 there were 74,614 CJK Unified Ideographs (calculated from [2]).

Edit: Using a script I made to check codepoint coverage[3] I get 63,639 codepoints with glyphs defined for all Noto fonts included in their default download (Noto-unhinted.zip).

[1] https://github.com/googlei18n/noto-fonts/issues/717#issuecom...

[2] https://en.wikipedia.org/wiki/CJK_Unified_Ideographs

[3] https://gist.githubusercontent.com/amake/53b2331a2547b94f430...


Did you compare it with other fonts listed on [1] like GNU Unifont, Code2000 and Arial Unicode MS?

The table of that article is missing many useful columns.

[1]: https://en.wikipedia.org/wiki/Open-source_Unicode_typefaces


How to make a quick logo? Just write the name of the company and use the https://www.google.com/get/noto/#sans-bugi font...


Luckily, the era where a different alphabet could be rendered by using a special font while using the exact same codepoints as Latin is behind us.

With Unicode, different alphabets use different codepoints and your trick won't work.

(unless of course it's a Buginese company)


Refreshing to see the lorem ipsum replaced with something meaningful.


It's better than the quick brown fox too.


not really. It's longer and it misses the letters j,k,v,x, and z. That's almost 20% of the alphabet.


When you select a different language from the search box, though, you see those same sentences translated into the other languages. Wouldn't be possible to get full alphabet coverage in all translation/alphabet combinations.


The sans has a real italic and not just an oblique. Color me impressed: real italics aren't very common in sanses, and that goes double for FOSS fonts. Very nice.


Looking at any fonts, recently, I am starting to crave for a 4K LCD panel. When people were switching to HD panels there were big WOW around, but after buying a smartphone, I realized that HD is good for smartphone, but not enough for anything big. I love to read on my smartphone and I love to code, too, but fonts look just crappy (especially on Windows). Sadly, good 4K panels out there are still quite pricey.


This is why i still use xterm with the default bitmap font in xterm. 10pt vector fonts on <100dpi screens look just awful.


I wonder why they didn't share the fact that it was created using AI.


Wow, they implemented Deseret - the Mormon-invented phonetic alphabet


And Glagolitic, an alphabet I'd never heard of before

https://en.wikipedia.org/wiki/Glagolitic_alphabet


And Klingon


? Klingon isn't in Unicode...


It technically isn't, but it's listed in a private-use area: https://en.wikipedia.org/wiki/Klingon_alphabets

And, if you check the languages for the full Noto font Klingon is listed so it should work.


Huh, I didn't know about that. Interesting!


Has anyone managed to install the Color Emoji font on Windows? Getting something about it not being a valid font file


I am getting similar errors on Mac OS 10.11. Looks like something may be messed up with that one.


That is great. I am working on a project where I require a font which includes all (or at leas as much as possible) unicode symbols. Until now I was thankful that I could use GNU unifont as a fallback, even if it was ugly. But this will make my app look so much better.


Any idea why they don't just merge Roboto and Noto?


I don't have any inside information, but it seems like they have different goals. Roboto is primarily designed to evoke certain emotions and to display well across devices, whereas Noto is primarily designed to render consistently across languages.


This is really fantastic. It sent me looking for source materials for the character choices made for each language.

I'm dealing with a media experience in Dakota and Ojibwa right now where we have source material that is spelled/character-ed quite differently than the alphabet provided by Noto in those languages. Given the scale of this project, I assume that some considerable thought went into each language's character set, but it's difficult to know for sure without any sourcing. The git commit logs don't offer up any hints. Anyone familiar with the project, know where I could find this sort of source information?

Should I be referencing something in the Unicode definitions for these languages?


The Noto characters are from the Unicode definitions, so that's where you should look to understand why those characters exist. These proposals provides some background: http://www.unicode.org/L2/L2008/08132r-n3427r-syllabics.pdf http://www.unicode.org/L2/L2008/08342-n3507-syllabics.pdf


I know basically nothing about Native American specific scripts, but if you do and they are not in Unicode (Right now they list Cherokee, Deseret, Osage, and Unified Canadian Aboriginal Syllabics as their American Scripts) then check out their web site and try to see about having the relevant scripts/alphabet/letters, etc added to the standard.


Why follow up Roboto? An honest question, not a rhetorical one. I don't yet prefer any one over the other.

I understand it covers a large part of Unicode, but if that is what makes it unique, couldn't Roboto just be extended?


Is it possible to replace the Mac system font for some languages with this?

I prefer the Noto Japanese typeface over the one that comes with Mac and would like to replace it.


I'm not sure if you can replace system-wide fonts without some hacks, like [1], but maybe TinkerTool [2][3] can help a bit:

[1] http://osxdaily.com/2015/10/15/change-default-system-font-ma...

[2] https://www.bresink.com/osx/TinkerTool.html

[3] http://i.imgur.com/HYwpDO4.png


Possibly. Various "Helvetica to Lucida" scripts and tools came out when the default font was switched in Mavericks or Yosemite.

Worth trying!


The font support spans so many languages. This is incredible. Font work is hard and tedious, and thankless. Kudos to Google for open sourcing it.


What are hints and why are the noto fonts packages according to hinted and unhinted? I'm not sure which package to use for linux.


You need the hinted version if your freetype does hinting, whereas if it does autohinting (or no hinting at all) you don't need hints, as in this case hints are ignored. So if in doubt install the hinted fonts.


  > What are hints
https://en.wikipedia.org/wiki/Font_hinting

  > I'm not sure which package to use for linux.
Ooh, yummy canned worms.


Surprised to see that Google is supporting even archaic Korean [0], but it would have been nice to see a chunk of text in Korean, Japanese, and Chinese, as opposed to a bunch of gibberish in all three languages.

[0] https://www.google.com/get/noto/#sans-kore


They even support Linear B, which fell in disuse more than 3000 years ago with the Bronze Age Collapse of the Mycenaean civilization: https://www.google.com/get/noto/#sans-linb


How come I get "6 serious errors were found. Do not use these fonts." in macOS Font Book? v10.11.6


Wondering if Matthew Butterick will like those. Will keep looking at Practical Typography for updates.


Matthew Butterick just hates anything related to Google since they started Google Fonts, don’t hold your breath.



Last time I checked, Noto Mono was not considered a monospace font by Windows' console (Command Prompt) and cannot be used for working in the console (including Bash on Ubuntu on Windows). Is that still the case?


To see all of the characters that are on the internet, you will need to download a CD's worth of fonts, 473mb. #MindBlown


I wish they would just support Latin Extended in Google Docs when Google Fonts are selected.


APL is finally gonna take the world. Thanks to google.

Wait, does it have APL symbols?


"The Unicode Basic Multilingual Plane includes the APL symbols in the Miscellaneous Technical block, which are therefore usually rendered accurately from the larger Unicode fonts installed with most modern operating systems."


Hey google team,

Windows 10 reported an error:

"NotoColorEmoji.ttf is not a valid font file".


Awesome. It took us only ~7 decades of Computing to invent this :)


This would make an excellent character set for a roguelike.


This would be better if it were called Nofu...


There is Old Persian but not Persian?


Persian is written in the Arabic script. If you search for Persian it will give you Noto Arabic fonts.


I'm not installing 500 MB of fonts without a flipping contact sheet on the page :(


You mean a specimen page?

I couldn't find one linked from the article, but I found this while Googling: https://fonts.google.com/specimen/Noto+Sans


thanks, i did mean specimen page, mea culpa. also thx for the link!


No slash through the monospace zero. What a huge shame.


Robot's won't take over the world! Google will.


Ubuntu package?


Ubuntu has a metapackage in the universe repository called "fonts-noto": http://packages.ubuntu.com/xenial/fonts-noto

  Package:    fonts-noto
  Repository: universe
To install (if you have the universe repository enabled):

  sudo apt-get update
  sudo apt-get install fonts-noto


[pedantic] "UTF-8 space"... UTF-8 code units space is quite limited by single byte - 0...255. You probably meant Unicode Code Points space that, at the moment, is 21-bits number: from 0 to 0x10FFFF (1,114,112 decimal) [/pedantic]


We detached this subthread from https://news.ycombinator.com/item?id=12654930 and marked it off-topic.


> UTF-8 code units space is quite limited by single byte - 0...255

Yes, but no one said "UTF-8 code units space", and its pretty clearly that the "UTF-8 space" intended was the full space representable in UTF-8, not the space representable in a single UTF-8 code unit, so this is not only pedantic, but also a non-sequitur.


Not true at all. Per the wiki article[1], "UTF-8 is a character encoding capable of encoding all possible characters, or code points, defined by Unicode"

It's a variable-length encoding, and just so happens to correspond to ASCII for the first 127 characters. But if the leading bit of the byte is 1, it indicates that it's part of a multi-byte glyph. With the encoding, you can represent the entirety of the unicode space - The latter bytes start with 10 to indicate they're middle parts of the glyph, while the first byte uses 11 and continues to indicate how many bytes long the glpyh is.

[1]https://en.wikipedia.org/wiki/UTF-8


Depends on what "space" means...

As of utf-8 encoding... It is a variable encoding that is capable to encode any 32-bit number: from 0 to 0xFFFFFFFF. Not just current set of 21-bit unicode code points.

As of "... indicates that it's part of a multi-byte glyph." You are mixing completely different entities here. Unicode has nothing with glyphs.

Glyph is an atomic component (image) of internal font structure. Single character (unicode code point here) can be composed on screen from multiple glyphs.


> As of utf-8 encoding... It is a variable encoding that is capable to encode any 32-bit number: from 0 to 0xFFFFFFFF. Not just current set of 21-bit unicode code points.

You're thinking of an older version of UTF-8 which allowed sequences of up to 6 bytes to be used to encode a single code point. UTF-8 is now defined to not allow code point values above 0x10FFFF and not allow code point values between 0xD800 to 0xDFFF (inclusive) to allow only the same values as possible in UTF-16.

https://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences

https://tools.ietf.org/html/rfc3629


Actually the UTF-8 format maxes out at 31 bits.


You are aware that UTF-8 is a variable width encoding, and not limited to ASCII, right?


"code unit" is not "code point". "code unit" for utf-8 is 1 octet. "code unit" for utf-16 is a 16bit value. Both utf-8 and utf-16 can represent the entire space of unicode code points by using multiple code units to represent a single code point.


[flagged]


I don't think it's meant as a derogative, but simply "the empty box symbol literally looks like tofu". An example of usage (from http://blog.tavultesoft.com/2015/01/keyman-pro-21-no-more-to...)

"""

Among language computing professionals, these square boxes are commonly known as tofu – yes, those yummy squares of bean curd common in East Asian and South East Asian cuisine!

"""


It may not have been intended to be derogatory but that doesn't mean it hasn't ended up actually being derogatory. "No more tofu" certainly sounds like a judgement that "tofu" is bad/unwanted


Homonyms are a thing. Of course a thing can also be good in one context and bad in another. No more tofu spilled in the bed. No more tofu spilled across my blog. "No more tofu" out of context sounds like a soy allergy. With context it means exactly what it says; eliminating a UX failure mode.

If something can only be derogatory if you fundamentally change the actual meaning of the statement, I think it is proven the statement as-is is not in fact derogatory. I would go so far as to say, it's abusing the notion and nature of human language to claim otherwise.


I've always assumed it's just because tofu comes in square blocks, and the character looks a bit like it.


Please do not use these fonts directly from Google servers. This is an another attribute for Google to track the whole browsing traffic. This is too much power for one company.


Google's ToS state that they will not track them beyond simple aggregate numbers of downloads per [time], which they release in their analytics dashboard publicly.

Not only that but the cache headers are set to 1 year... That's a pretty shitty tracking system if I've ever seen one.


ToS are known to change in time. Are you suggesting that if millions of websites use this font then it will be removed when Google changes its ToS?


On the other hand, letting Google host the fonts will allow your users to use the cached font without needing to waste bandwidth and time unnecessarily. Especially if you already use Google Analytics, I'll say, please let Google host the fonts too.


Use them or not, just be aware that Google is tracking them.


It is such a shame that people downvote comments like this. There are issues with Google quite aside from the surveillance ones, like limited support of foreign languages (try finding a Chinese script in a known calligraphic style, for just one example). The more demand for independent hosting and subsetting exists, the better the open source tools will become. Let's not enter a period where fonts go the way of email - centralized providers with CDNs and obscure backend processing facilities are the only ones that can do a good job.


Its a .zip...


He's referring to including Google Fonts as font-face into your webpage, so that they are downloaded by each client from Google's server.


So, did they actually handcrafted each of the 100k+ characters? Probably not, but if yes, then I call it BS.


reminds me of natto. Gross.

edit: why the downvotes? They bring up tofu and then name the cure something close to natto, cured soy beans. And by cured, I mean fermented. And by fermented, I mean stringy at the molecular level, smells and tastes awful. And when I say awful, I mean, most of the people from the originating culture think it's awful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: