Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Largest Vocabulary in Hip Hop (2014) (pudding.cool)
97 points by kolbe on Dec 10, 2018 | hide | past | favorite | 64 comments


DMX would have scored a little higher had barking sounds been counted as words.


Would like to see a similar analysis that instead looks at the number of syllables, specifically vocabulary that has greater than 2 syllables.

In addition to being of personal interest (I think it's much more difficult and impressive to create rhymes with 3+ syllables due to the natural constraint of a smaller basket of words), it would also, presumably, avoid the issue that the author highlights:

> I used a research methodology called token analysis to determine each artist’s vocabulary. Each word is counted once, so pimps, pimp, pimping, and pimpin are four unique words. To avoid issues with apostrophes (e.g., pimpin’ vs. pimpin), they’re removed from the dataset. It still isn’t perfect. Hip hop is full of slang that is hard to transcribe (e.g., shorty vs. shawty), compound words (e.g., king shit)



Thanks for the link... enjoyable and educational :)

Wonder where Immortal Technique would fit within that chart.


I'd also be interested in something like a WordNet similarity comparison score; i.e., how semantically different are words that are used together. To me, that's a better measure of linguistic creativity than unique words or rhyming words.


I am legitimately intrigued by this analysis. I've always wondered why I found certain rap so much better than others. I stopped liking the kind of rap that was becoming popular around the time of Snoop Dogg, Dr. Dre, Tupac, etc. Looking at this data, it pretty obvious to me that the stuff on the right I tend to like a lot more than the stuff on the left. Sure enough, I just listened to Aesop Rock for the first time and it is exactly the kind of rap I enjoy.


Similarly, I was happy to see Blackalicious and MF Doom place fairly high in the list too. Aesop Rock is a legend and I have a couple old favorites by him ("Daylight", and that same song's antithesis "Night Light"), but I also find him to be mentally over-taxing at times.

Since you enjoyed your first listen of Aesop Rock so much, for something newer I highly recommend Milo's 2018 album "budding ornithologists are weary of tired analogies". Enjoy :)


Don't miss out on Hail Mary Mallon, which is Aesop Rock and Rob Sonic. I would highly recommend their album Bestiary if you like Aesop's solo stuff.


The bars at the end of the article are some of my favorite in hip hop. Jay Z is tackling the claims that the most skilled rappers don't sell head on...

He's taking our assumption that good art isn't the most popular art, or that pop isn't skill, and deconstructing it while making a platinum record, his best selling of the decade.

Chapelle talks about listening to that line on the Black Album with Common, Kanye, and Talib Kweli: https://www.youtube.com/watch?v=R4SYIfhzMmU

Maybe it happened that way and maybe not, funny story though.


For reference, the TV vocabulary is about 3,000 words.

The average high school vocabulary is 10,000 words.

The average college graduate vocabulary is 30,000 words.

The English language has about 1,000,000 words.

The good news is that one needs to learn only about 3,000 words to become conversant in English.


I'd like to see a source on this info. But it's definitely wrong.

4 years in college isn't enough to triple your vocabulary. And the average fluent speaker uses far more than 10000 words. In my second language, I'm slightly over 10,000 words (I keep track of everything I encounter in Anki) and I'm still nowhere near as fluent as a typical high schooler. I encounter several new words daily.


I'm also skeptical, but note that "average college graduate = 3x average high schooler" doesn't mean "college triples your vocabulary" because there are two things potentially going on here: the effect of 4 years at college, and the difference between people who graduate college and and people who don't.


Sure. Someone who reads a lot is more likely to go to college, and reading a lot expands vocabulary a lot. (I often mispronounce words because I've never heard them spoken, and had long ago invented my own sound for them.)

Newspaper vocabulary is dumbed down, and TV even more so, because that's their market.


> it's definitely wrong.

I'd say it's debatable, but in the right ball park.

English is probably the language with most words, though I've seen estimates around 300k, though even that might be high depending on the definition.

Here's Steve Pinker on it (from his 1994 book “The Language Instinct: How the Mind Creates Language”):

> The most sophisticated estimate comes from the psychologists William Nagy and Richard Anderson. They began with a list of 227,553 different words. Of these, 45,453 were simple roots and stems. Of the remaining 182,100 derivatives and compounds, they estimated that all but 42,080 could be understood in context by someone who knew their components. Thus there were a total of 44,453 + 42,080 = 88,533 listeme words. By sampling from this list and testing the sample, Nagy and Anderson estimated that an average American high school graduate knows 45,000 words—three times as many as Shakespeare managed to use! Actually, this is an underestimate, because proper names, numbers, foreign words, acronyms, and many common undecomposable compounds were excluded. There is no need to follow the rules of Scrabble in estimating vocabulary size; these forms are all listemes, and a person should be given credit for them. If they had been included, the average high school graduate would probably be credited with something like 60,000 words (a tetrabard?), and superior students, because they read more, would probably merit a figure twice as high, an octobard.

> Is 60,000 words a lot or a little? It helps to think of how quickly they must have been learned. Word learning generally begins around the age of twelve months. Therefore, high school graduates, who have been at it for about seventeen years, must have been learning an average of ten new words a day continuously since their first birthdays, or about a new word every ninety waking minutes. Using similar techniques, we can estimate that an average six-year-old commands about 13,000 words (notwithstanding those dull, dull Dick and Jane reading primers, which were based on ridiculously lowball estimates).


It'd guess the vocabulary of an average college graduate at the time they graduated from high school is considerably higher than 10,000 words.


I read it decades ago in a book, can't recall which one. Anyhow, some googling came up with this:

http://www.balancedreading.com/vocabulary.html

Specific counts vary widely, as it is hard to measure such things, and what counts as a word is also slippery. But the trend is pretty clear.


Somewhat related, I learned a new word today - conversant :) Thanks.

-ss


The OED is not the english language.


Definitely curious about some of the 90s Oakland rap like del, heiroglyphics, alt-hip-hop like binary star, and maybe some old school like young mc.


Del is pretty well represented (#7) in the print version (https://popchart.co/products/the-hip-hop-flow-chart)


I was wondering the same thing. Casual is missing, I didn't catch that Del was in the list, not surprised he's up there. Aceyalone was another one I wonder about.


At first glance, I thought higher on the Y-axis (even though there isn't one) was more words. I was frantically hovering looking for Aesop Rock until I realized how the grid worked. Sure enough.

That's what I put on when I want to listen. It's like watching a movie, I don't want to work or exercise to it, it demands all my attention. On the other hand, nothing hypes me up like a DMX track.


I knew it would be Aesop Rock before I even clicked through. Definitely not the kind of thing you listen to at work to block out the noise. It’s interesting but I guess not surprising that people in the same circle, like El-P, have similarly huge vocabularies. But now El-P is practically the biggest name in hip hop. Maybe wordiness is catching on.


El-P is not nearly the biggest name in hip hop. Maybe in the online world him and RTJ are big, but amongst the general population of hip hop listeners I don’t think he’s very big.


Their last album was the #1 best-selling album on the Billboard R&B/Hip Hop chart and they sold out every venue on their tour. I think they're reasonably well-known.


It may have been best selling when initially released, but that relies on a lot of factors (other albums released at the time being a major one).

According to https://www.billboard.com/music/run-the-jewels, RTJ 3 peaked at #13 on Billboard 200. That's good, but it isn't being 'practically the biggest name in hip hop' good. Plus, most people probably wouldn't recognize Killer Mike or El-P (although more people may recognize Mike over El-P from his time supporting Sanders).


I think track time matters more than word count. You could have two songs that are the same length and have the same lyrics except one song has the N-word at the end of every line, and this analysis would prefer the censored version as having more vocabulary.


Count of unique words tend to be a function of the length of a document or corpus. So the comparison won't make sense either, if you take every artists complete works and did a unique count.


Suprised that Company Flow, Lootpack, Souls of Mischief and rappers from these collectives don‘t appear in that list. or somebody like Ras Kass.


The measure is meaningless, because more doesn't equal better.

Music works well with repetition. Just look at graffiti for comparison, little one word poems repeated over and over.

The measure also doesn't account for ambiguity used to good effect--who makes the most sense?

I'm not sure why you put Lootpack in there, anyhow. Madlib for example talks a ton about weed, known to inhibit speech production. And the sound is blunted, too, literally. I just wonder how they can be as productive as they are, at all. I still like it, sure.

Nice Nick by the way.


>The measure is meaningless, because more doesn't equal better.

Absolutely.

>Music works well with repetition. Just look at graffiti for comparison, little one word poems repeated over and over.

Yeah but it‘s a bit like writing texts, where you might substitute words in longer ones that you have used a sentenced before.

In Graffiti you have tags and throwups, muscle memory that is usually repeated, but also more complicated pieces where freshness and coming up with new stuff has more value.

>I'm not sure why you put Lootpack in there

A shameful confusion of Lootpack with Swollen Members (Wildchild != Madchild).

> Nice Nick by the way.

Thanks!


I wonder if when they count in Kool Keith they also factor in Dr. Octogon, Black Elvis, Dr. Dooom, etc, etc. If they did I bet he’d be a lot higher.


Just his rhymes in Ultramagnetic MC's alone would push him way up I think. Looks like the creator tried to collect only solo stuff.

Kool Keith gives the best interviews too: https://www.youtube.com/watch?v=pQtfeszBgAc


decorating his refrigerator :D


Same comment but for MF Doom, Madvillain, etc.

I assume they do this though. When writing about Kool Keith, they reference the album Dr. Octagonecologyst.


Seriously one of the nerdiest hip hop albums ever. It’s one of my favorite double LP’s and never fails as a first record to kick off a party. Can’t believe we’re talking about dr. Octagon on HN :-)


One of my four-month-old daughter's nicknames is Baby Octagon. She's destructive.


I'm assuming she was born on Jupiter.


And controlled by gamma-light?


Ha, just had a sudden need to listen to that album again - it's been far too long.

Earth people...


The creator of the site said they're working on updating it. Should be up in a couple weeks.

https://www.reddit.com/r/dataisbeautiful/comments/a4urje/rap...


Question to the OP of this link - were you listening to 6 Music yesterday (UK digital/online BBC Radio station) - purely mention this because there was a running thread on Radcliffe/Maconie about unique words in songs, and someone had mentioned a rapper's vocabulary being much greater than most others.


Amazing how many wu-tang solo members are up there, along with the group as a whole.


I remember seeing this a number of years ago and the sentiment then was that it's an effect of them all coming together, and because they got to bounce rhymes off of each other they started to learn each other's words and incorporate them into their own material.


Seems to go at least as far back as 2014: https://news.ycombinator.com/item?id=7704183


The data only goes to 2012 sadly.

Drake would likely rise since he's been prolific since then. Logic and Kendrick would also probably see some gains.

It's tough to think of new entrants that might debut high on the chart, at least in pop/modern rap. If anything "mumble rap" is the standard these days and it would be pretty funny to see how the kodak/purp/yachty cohort compares to some of the other eras.

And nobody is unseating Aesop Rock. The man is one of a kind.


"and nobody is unseating Aesop Rock"

I was going to say that. Several years ago I saw either this list or a similar one on Reddit and a lot of comments wanted to know why Aesop Rock was missing. I grew interested and people gave "none shall pass" as a good first song to listen to if you want to get familiar with his work. My only problem with his stuff is that I have zero idea what he is talking about. It's basically just a lot of words jumbled together except for that song about the dog saving a little girl from drowning. I've heard it 50x now and still tear up.


Ruby ‘81

That’s one of my favourite songs of all time. It’s an amazing example of poetry emphasized by music. Highly recommended, even if someone doesn’t like hip hop.


I think his music has gotten much more accessible and less obscure over time. I think The Impossible Kid (his most recent album) is a much better point of reference for what's interesting about his work, and it deals with topics in ways that are easier to understand and relate to, but equally as interesting and well considered (See: Lotta Years -> reflections on getting older, Blood Sandwich -> Looking back on some family moments, Kirby -> Heartfelt and hilarious ode to his cat)


Check out the album he did with Kimya Dawson (The Uncluded - Hokey Fright). Kimya is pretty matter-of-fact in a lot of the songs and seems to reign Aes in (or at least offer a point of reference for when he starts getting out-there). Its one of my favorite albums. "Earthquake" is beautiful


It was this list. "Blood Sandwich" is another lyrically accessible song from him that tugs the heartstrings.


I feel like MC Paul Barman would have a shot, his whole schtick is his inflated vocab.

He's kind of a stunt rapper though, and had some dated offensive shock lyrics that are hard to listen to. I'd much rather listen to Aesop.

Mac Lethal might be up there too, not sure.


>While Lil Wayne has never been celebrated for the complexity of his word choices, I expected 2pac, Snoop, and Kanye to be well above average.

Huh, with 2012 restricted data he didn't include most of these rapper's opus but he expected differently?


He's keeping things "fair" (at least to dead artists) by only including the first 35k words of any artist. Just because time has passed and artists have a larger corpus it wouldn't make a difference under this method.

I've wondered where Eyedea would end up...


No wonder the second one on the left named "Too Short".


I'd be interested to see where Akala would fit on this list. His vocabulary is pretty impressive to listen to at least. But it seems they only included US artists.


It’s not how many words you know, it’s how you use them.


I'd put Aesop Rock at the top of that axis too.


I actually like most of the lower tier guys way more than all of the upper tier ones.. I saw Bone Thugs and Harmony live in 2014 when they did their tour and it was pretty amazing Bizzy Bone was really singing!


Wonder if Nicki Minaj gets credit for her made-up wordsounds


KRS-1 barely leading Lil Kim huh


KRS is the teacher, not the distinguished professor of fine arts and lexicography.


Can't take away from KRS, but I largely find the reason I am listening to KRS-one is because DJ Premier, Showbiz or other DITC member produced it. Lil Kim had a really good ghost writer(s) too, so I think she got a bit of a jump...


I would expect Eric B. & Rakim to be on the list for Microphone Fiend and Don't Sweat the Technique. Seems like this list should include more artists rather than just a random selection of popular artists.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: