Hacker News new | past | comments | ask | show | jobs | submit login

I'm calling bullshit on the last stat. Where does he source this from?



I agree. This is ignoring the fact that there are hundreds of languages we don't know about and probably millions of texts unaccounted for lost through the ages.


... which are all probably not relevant in scale.

It's like the "how many humans ever lived?" question. People immediately start discussing about the definition of a "human being", homo whatever. I say it's pretty much irrelevant to the final number.


I agree, initially for a different reason to the other commenters here - text doesn't take up that much room.

If you assume that each book is translated in each language then the figure is way short. But it's probably better to interpret it as each book in its original language.

I've read a figure of 7000 languages (http://en.wikipedia.org/wiki/Ethnologue). A survey discussed here (http://answers.yahoo.com/question/index?qid=1005120803482) suggests up to 175 million books ever published. That's 1.2 trillion uniques.

But if we just worked with the 175 million figure then 50 petabytes would allow 300 megabytes per book. In fact assuming say 3 megabytes (base 2, mibibytes) per book you could fit just under 18 billion books.

My figures are a little off from just using base 2 instead of base 10 but you get the gist of these back of the napkin numbers.


Probably ∫ estimated human population · estimated literacy rate · estimated hours per year a literate person would spend writing · words per hour · dt, or something like that. Given the dramatically higher levels of population, literacy, leisure time, and typing speed in recent decades, you should be able to get a pretty good estimate that way.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: