Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Disproportionately Common Names by Profession (verdantlabs.com)
166 points by colinprince on Jan 13, 2015 | hide | past | favorite | 106 comments


> People with these names are more likely than others to have these professions.

Shouldn't it say: "People with these names happen to be in those professions more often than others"?

Anyway, there are a couple of fun ones in there, but I'll let you figure those out yourself. Unfortunately neither my name nor my profession are covered — I'm not quite sure what to make of that. :/

---

They use the same language in their blog post: "Arnolds therefore appear to have a much higher tendency to be accountants than Shanes." That's just wrong, no? By wrong I just mean intentionally misleading. I'm sure that has nothing to do with the fact that they sell an app that helps you find names for you babies though.


EDIT2: OK folks we're smart, let's use MATH.

Take above quote, which compares "1.9% of Arnolds are accountants" to the "0.55% of Shanes [are accountants]". They're implying that the probability of being an accountant (J), given that ones name (N) is Arnold, is above the expected probability of being an accountant in general. So they're looking for a high P[J|N]/P[J].

Now compare with what we were expecting to see. We assumed the chart showed, for a given job, names which had a higher incidence than normal. i.e., we're looking for a high P[N|J]/P[N].

Guess what. P[J|N]/P[J]=P[N|J]/P[N] by Bayes' Theorem [1]. These are EXACTLY the same metric! So their technique, and the chart, is correct. (And my original post, below, was wrong.)

(Not saying anything about causation here, and I don't think they were either.)

[1] http://en.wikipedia.org/wiki/Bayes%27_theorem

-----------

Yes, that is completely backward. 99% of Arnolds could go into farming, yet still the 1% who go into accounting dominate that field, and hence show up on this chart.

EDIT: but you missed the first half of that quote: "In our sample of two and a half million people, a whopping 1.9% of Arnolds are accountants. Contrast that with just 0.55% of Shanes." So I think the quote is correct. Makes me wonder if their chart is backward. (i.e., they put "Arnold" under "Accountant" because Arnolds are likely to be accountants, not because accountants are likely to be Arnolds, as the grouping implies).


After some pencil work I was convinced of EDIT2, but now I'm confused..

Before we get to my confusion, one interesting thing I found along the way is the following situation, call J1 "job 1" and N1 "name 1", and use the P(N|J)/P(N) (or its equivalent) metric:

  J1  J2
  ------
  N1  N1
  N1  N1
  N1  N3
  N2
If we limit each job to it's top name, N1 doesn't get attached to either despite being the most common name in each. J1 gets N2 while J2 gets N3.

If this is the method then don't use this chart to guess the names of people in a profession, use it to guess the professions of people whose names you know. "Guy" may be listed for investment bankers, but an investment banker is still more likely to be named Dave, but if you meet a Dave he's likely to be a mechanic.

For the same situation say we use the top value for P(N|J), then J1 gets N1 and so does J2. P(J|N) goes back to J1 getting N2 and J2 getting N3 and N1 being left out in the cold.

But here's where it's unclear, I think this:

> In our sample of two and a half million people, a whopping 1.9% of Arnolds are accountants. Contrast that with just 0.55% of Shanes. Arnolds therefore appear to have a much higher tendency to be accountants than Shanes

implies they're listing the top P(J|N) values for each J. *(edit: they're comparing Arnolds to Shanes, not Arnolds to all accountants?) I think your approach is the most consistent but is it what they're using?


It doesn't appear incorrect to me. The blurb on the graph says "For example, a higher percentage of Elwoods are farmers than of most any other name."

As I read it, it means that, say, 1% of Elwoods become farmers, while only .1% of Steves are farmers. That is, if your name is Elwood, you are much more likely to be a farmer than if your name is anything else.


That's Nominative Determinism:

http://en.wikipedia.org/wiki/Nominative_determinism

Think someone called Crapper being a toilet salesman, or Smith being a Blacksmith, Baker being a Baker etc.

I.e. a funny non-science based upon misunderstanding historical coincidence (profession being used as surnames, family marketing name being used as a crass synoniem for something else) with causation.


What? No it isn't. They're talking about given names, for one thing. For another, unless I missed it, they aren't positing a reason for this; they are simply saying which given names have more people in a given profession than would be expected based on averages.


Covered heavily by New Scientist over the years (with tongue usually firmly in cheek).

https://www.google.com.au/webhp?sourceid=chrome-instant&ion=...


What you're implying here is that correlation implies causation.

If 99% of farmers are Elwoods, you can't claim that one's name being Elwood means one is more likely to become a farmer.


The predictive element itself has little to do with correlation/causation, though. "If you have gray hair, you're more likely to die in the next ten years than people without gray hair" is a perfectly valid statement, and does not imply at all that gray hair is causing people to die.

If anything, it's more like an ecological fallacy or unwarranted extrapolation to the future.


In that scenario, could one claim that "one's name being Elwood means one is more likely to BE a farmer"?


Yup, you got it right. The wording is important.

Apparently, if your name is Elwood you're more likely to be a farmer today. But that obviously doesn't mean that kids named Elwood are more likely to become farmers (it could, but I'm afraid you're gonna have to proof that).

Wikipedia has some simple examples: https://en.wikipedia.org/wiki/Correlation_does_not_imply_cau...


In this case, it probably means that people named Elwood are more likely to (a) be older and (b) live in rural areas, both of which are correlated with being a farmer.


Do we have to write comments here as though we're writing the final draft of a math textbook? What's meant is clear enough.


What's clear to you may not be clear to someone else.


I've heard mentioned in books on psychology before about this effect, and it seems at least from a clinical perspective, that people do tend to choose their professions based on their names.

I'm sorry I can't provide sources.


It's mentioned in "Yes! 50 Scientifically Proven Ways to be Persuasive"[1], which despite the title, is a really interesting book. The Freakonomics blog quotes it here:

http://freakonomics.com/2009/04/24/yes-part-ii/

referencing this paper "Why Susie Sells Seashells by the Seashore: Implicit Egotism and Life Decisions" http://www.stat.columbia.edu/~gelman/stuff_for_blog/susie.pd...

[1]http://www.amazon.com/Yes-Scientifically-Proven-Ways-Persuas...


We would expect people to also tend to choose their names based on their parents, as well as choosing their jobs based on their upbringing.


That's obvious, but I would hope the psychologists purportedly studying this topic thought of that when they designed their experiment.


I remember when I joined CMU my Freshman year the big thing was that the previous year there had been more guys named "Dave" that had graduated from Computer Science than women. This kind of reminds me of that. This was circa year 2001.


A comedy channel in the UK decided to rename the channel a few years back. They bandied names around, then someone said. "How about Dave? Everyone knows Dave, Dave's your mate." Dave it was. As it turns out, David has been in the top 3 names from 1954-1994, so reality is if you're a Boomer, Gen X, or Gen Y, you really do know a Dave. (Stats table here: http://www.ons.gov.uk/ons/publications/re-reference-tables.h...)


I don't get why this is presented as a 'chart'. Do the layout and colours mean anything? Or is it just a set of lists laid out in circles for no reason?


The chart is meaningless. Having a colorful image makes links to the post look more appealing in Facebook, G+, Pinterest, etc.

These days, you pretty much have to put a pretty picture in an article if you want traffic from social sites.


As far as this dude, with his aging eyeballs is concerned, the colour's meaning, if any was irrelevant.

I mean, the information was interesting enough to me to try to read it, but I frankly can not parse what looks like #FFFFFF text on #FEFFFE background.

For the love of god, If you want to present data like this, ensure at least a little contrast exists between the text and background


Kurt and Jessica just had a lot of time on their hands.


Gah, I hate stats that a) don't say where the information comes from b) don't say which part of the world they're talking about (US sites are particularly prone to this). I'm assuming this is from US census data, but it's not particularly obvious. There's a rest of the world, you know.... [EDIT: looking at the app, maybe it is worldwide? Still, referring to things like "Republican" doesn't inspire confidence that it's international]


It says the data sources are from the FEC, SSA, and wikipedia.


OK, I missed that, but still, that suggests two US links (not that I can see the actual data, it just links to the homepage), and ... Wikipedia? What does that mean in terms of a data set?


It does, but it only points to the websites of FEC, SSA and Wikipedia, without really mentioning what data was used. That's barely more useful to me than "Source: Internet".


http://www.verdantlabs.com/blog/2014/12/30/names-by-professi...

Direct link to the blog that goes in to a few details of how they work out the names.


I visited a fire station in the country in my province a few years ago and on the wall were names of the volunteers.

Nearly every one was John Gallant, the surname is very common and it's a small community.

The funny thing is the surname is so common people have nicknames such as John 'Rabbit' Gallant but that name becomes so well-known his son will be called Rabbit Jr.

So the plaque is full of a wild mix of actual surnames and given names and of course nicknames but also the junior of the nicknamed people.

Add to that one family has seven daughters all named Mary.


PEI?


Of course!


Its interesting to see the difference in type of names between Football Coach (Dan, Bill, Mike, Jim, Rich, Steve) and Electrical Engineer (Bernard, Eugene, Edwin, Charles, Alfred, Harvey). Short (one vowel each), common, English versus longer (two vowels each) French/"Posh" English names. Correlation between the names and background/educative level seems likely?

Songwriter is interesting too: 4 out of 5 end with a variation of "y".


It's also possible that football coaches have a tendency to commonize their names - whether to fit in and make their players more relaxed, or maybe they were just raised in solid blue-collar American families.

Their birth certificates could very well read Daniel, William, Michael, James, Richard, Steven.

Similarly, the EE's probably have a tendency to be more formal on their resumes or business cards.

Their family and buddies probably calls them Bernie, Gene, Eddie, Chuck, Freddy, Harv.


The data sources are the FEC and SSA (and Wikipedia, for some reason). Which means that this data is coming off the stuff they submitted to the government for whatever reason.


I work with a software engineer named Charles and a project manager named Eugene. They go by Chuck and Gene, respectively.


As another commentator points out, (https://news.ycombinator.com/item?id=8881131 )the actual results appear to be a heavily human-edited "6 of the top 37" list. In which case most of the interesting patterns would reflect the biases of the people that compiled the chart.

Chances are that there are largely unsurprising gender/age/class/race biases in the full dataset too, but these have been selected for effect (I bet most of the rest of the disproportionately common football coach names are pretty regular names for men born between 30 and 55 years ago with some of them possibly even having additional syllables, and similarly and wouldn't be surprised to find that Jim and Bill, for example, also featured high in the list of people disproportionately likely to be electrical engineers, possibly even above Eugene)


That is interesting and I am sure there is a good point there but meteorologist is also a profession requiring a high education level. Yet the meteorologist list (Bill, Joe, Jim, Jeff, Mike, Scott) is way more similar to the football coach list to the point where 3 of the names are the same.


Perhaps it's because individuals in public-facing professions/roles tend to favor shortened versions of their names? I think this is largely why the media makes fun of Benedict [Timothy Carlton] Cumberbatch.

For example: http://benedictcumberbatchgenerator.tumblr.com/


Stage names. No newsreader on a newscast is going to say "Thanks for that great lasagna recipe, now lets talk to Ebenezer Ulysses about tomorrow's rain", they're going to get a stage name of "Bennie" in that case.


I wonder however if Eugene were a Football Coach if he Edwin would then be called Ed for short. Perhaps Dan into Daniel and Mike into Michael for Electrical Engineer.


Heh, seeing Bernard as common name for Electrical Engineer made me think of Maniac Mansion...


The 'football coach' names you list there are all contractions of a name that would be at least 2 syllables long, though.

edit: (or yeah, what MichaelTieso said :) )


I do agree people with French/"Posh" English names are probably more educated and contribute more to society based upon this poor correlation


This seems fairly easy to explain: Many given names are passed down through family lines, and many families pass on knowledge, habits, and even professions from generation to generation.

There are lots of common surnames that demonstrate this - Taylor, Cooper, Cobbler, Smith, and so on. Why not given names?


Yep my cousin was named George as was My late Uncle as the idea was that he would carry on the family trade as a turf accountants (book makers)in Birmingham.

Note that this was pre legalisation so watching peaky blinders(uk version of Atlantic boardwalk) on the BBC was interesting shall we say


Some time ago, I saw a paper when some scientists find that people often tends to have place of live and job which is somehow connected with their names eg. there's more Denises who works as dentists or Louises in Louisiane. I also noticed that in my country (Poland) there's quite more people working in IT with names or surnames which begins on K (Polish word for computer is "komputer"), I'm the case.



> I also noticed that in my country (Poland) there's quite more people working in IT with names or surnames which begins on K (Polish word for computer is "komputer")

You don't say... :) Perhaps because names starting with K are simply very common in Poland? ;) 1 in 6 out of 100 most popular last names starts with K.

And quite more IT employees than whom exactly? Presidents of the country? Since 1989 Poland had: Jaruzelski, Walesa, Kwasniewski, Kaczynski and Komorowski... Among prime ministers (since the 90s) about 1 in 5 had either first name or last name that begins with K :)


Okay, maybe you're right and I'm just biased with serial killer who killed few software developers with K-name living in Kraków

Polish article: http://wiadomosci.gazeta.pl/wiadomosci/1,114873,9855837,Krak...


Wow, now that's interesting :)

By the way, believe it or not, I do work as a programmer (in Poland), and my first name does indeed start with K :) Not the last name though, so I feel safe.

While the serial killer theory could boil down to a statistical oddity, it reminded me of this novel by Lem: http://en.wikipedia.org/wiki/The_Investigation


> Here's a chart with 6 of the names that are the most disproportionately common in 37 professions.

It doesn't say they're the top 6, just that they're "6 of" -- and having worked with a lot of similar data sets in the past, the results here feel a little overly edited (i.e. exaggerated, stereotyped) to me. I'd be happy to be proven wrong, though.


http://www.verdantlabs.com/blog/2014/05/23/nametrix-2/

There they list the actual top 5 for some professions. Having read that, I am inclined to agree with you here.

The top 6 for car salesmen in that graph has literally only one name (Clay) that is in the actual top 5 and even that was only 5th. The top 4 names (Emmett, Luther, Emanuel, Morton) all got replaced with stereotypical white working class guy names.

The top 6 for surgeon in the graph has no female names yet the actual most disproportionately common name for surgeons is 'Vivienne'.


Not sure I trust the results. For guitarists they list: Mick, Richie, Trey, Sonny, Buddy, Eddie. That correlates strongly with famous musician names (Mick Jagger, Lionel Richie, Trey Anastasio, Sonny Rollins, Buddy Guy, Eddie Van Halen). Maybe kids are named after these legends and are pushed toward music, but maybe their software just counted a lot of duplicate mentions?


In some of those cases, it could also partly be people adapting their names after those legends. For example, a budding guitarist called Michael who loves Mick Jagger choosing to go by the name of Mick. As Michael is a far more common name than Mick, it would only take a small proportion of 'Michael's to do that and suddenly 'Mick' makes the guitarist list.


I don't know where their data is from, are these names taken from birth certificates? Maybe people are using psuedonames for their music careers?


Surely Buddy would refer to Buddy Holly, who predates the height of Guy's career, who died young in a plane crash.

Not that I don't love both.


This is worthless unless they correct for age. Which they most likely don't, since age is not even mentioned in the post.


Indeed, the time period in which someone was born seems to have a significant effect on the likelihood of certain names. Here's one study that documents that effect in the US:

http://fivethirtyeight.com/features/how-to-tell-someones-age...


Wolfram Alpha parses queries like "how old is [name]" and renders the age distribution, e.g. http://www.wolframalpha.com/input/?i=how+old+is+jane


No study, but first hand experience.

My father is a Pediatrician, and he has always commented on the strong correlations between relatively common names in his current crop of patients and the names of 5-year-ago-popular TV shows' protagonists.

The first time I was able to make the connection, it was Brandon/Brenda/Dylan.


"Worthless" in what sense? If we want to know something in particular, maybe, but perhaps knowing that people who have a certain job tend to be older is of interest? Some jobs seem to have entirely female names in the chart; is the fact that women are apparently significantly overrepresented in those fields also "worthless?"


The correlation is obviously indirect, since names are correlated with age, social class, region and other demographics, and these are in turn correlated with career choices.

Once the stats were adjusted for the above, I suppose we'd end up with not much more but noise and some spurious correlations occassionally: http://twentytwowords.com/funny-graphs-show-correlation-betw...


Rabbi - Chaim, Shlomo, Judah, Meir, Yosef, Moshe

Seems sort of ... logical.


Okay so normally I hate people that bring up gender, but there were some interesting correlations between professions and gender here. Assuming there aren't many guys named Sue. For example, meteorologist: 100% male. When I think meteorologist I think newscast in front of a greens creen so this is perhaps a case of my terrible misconceptions showing.

Also, WHY are things separated by color and opacity? If we're going by opacity, apparently one of the most populous and important professions is...race car driver. Really? That's one of the few professions on that huge infographic that is at maximum opacity?


... in the U.S. Do these people think it should be immediately obvious to everyone?


Looks like people alter their name to fit the stereotype of their profession. Lots of drummers named Billy, but I bet it says William on their birth certificate. Same for coaches named Rich, not Richard.


Apparently they don't state what "disproportionately" is supposed to mean. If that is indeed the case I doubt much valid insight can be derived from this chart.


"In our sample of two and a half million people, a whopping 1.9% of Arnolds are accountants. Contrast that with just 0.55% of Shanes. Arnolds therefore appear to have a much higher tendency to be accountants than Shanes."

http://www.verdantlabs.com/blog/2014/12/30/names-by-professi...

You see where this is going. If you correlate the data with the popularity of names in general, you'll find that Arnold is a much more popular name than Shane ...


Well, you're not right. Almost 2% of Arnolds is more than 0.5% of Shanes, no matter how much there're Arnolds and Shanes. If you take 100 Arnolds and 100 Shanes there will be (statistically) 2 Arnolds and "half of Shane".


> Well, you're not right.

I think I am, but I might be wrong on that one too.

Given a pool of 10 people to hire from where 6 people are named Arnold and 4 are named shawn Shawn, wouldn't you expect that same relation to show in a specific profession? You can, of course, go ahead and compare those relations. So, if there only were 5 race car drivers in the world and 4 of them were named Arnold that would be noteworthy (as opposed to 60% of them, which is what you'd expect). Please let me know if I got something wrong.

---

Somebody just posted an article stating the numbers where actually correlated with the frequency they are used. I'm not sure how they did it though given that those numbers change over time.


I think their data is going the other direction. It is not 1.9% of Accountants are Arnolds. Its 1.9% of Arnolds are Accountants.

You would expect that the percent of each profession by name should be the same. So if there are 25 professions, then within any given name you should have 4% going to each professions. So 4% of Arnolds should be a profession, and 4% of Shawns should also have the same profession. That is not what the data shows though. Within any given name there is a tendency towards the professions the chart shows.

This definitely does not account for location, birthdays, etc. But it is still interesting.


Interesting choice of professions. Race car drivers and musicians seem to have similar names. Other than that I have no idea what to do with this information.


Here's an interesting article about how names could have long lasting effects: http://www.livescience.com/6569-good-bad-baby-names-long-las...

Also, many cultures obsess over giving the child a good name with a good meaning as they think it determines their future.


Gads, the even got the title wrong. Should be "Disproportionately common professions by Name". Or, actually, all we can tell from the little on that page is that the title contradicts the subtitle. So maybe the subtle that should be reversed. Who knows? Data is fun!


So, like, the richer you are, the higher the probability that your first name is actually a last name.


I like how this has "photographer" and "graphic designer" but not "artist". Never mind "cartoonist". Apparently my profession doesn't exist.


The farmers have the coolest names by far. Or at least to a non-american person, they seem rare. Maybe people in those professions are rarely in the public eye.



Venture capitalists bucket includes "Guy"... is that from Guy Kawasaki swaying stats to be that high!


Nice chart. It would be interesting if hovering over a name also highlights the same name in other professions.


In the united states.

Aha funny how people don't define their data correctly, this is an important datapoint


Funny I know electrical engineers names Bernard, Alfred and Eugene all from the same company...


Where are the numbers? It would be interesting if it wasn't so vague.


Also the source of the data, e.g. is it influenced by geography?


Any Indian names? Like Raj, Pabu, Das, etc. included in this study?


What about douchebag? Josh, Chad, Tyler, Brad etc?


That's interprofessional.


I know a lot of IT professionals named Mike.


Hehe. I am biased to be Electrical Engineer.


Kim- Police officer? The only Kim I can imagine is Kim Kardishan.



After we tackled gender inequalities, we can move on to name inequalities. How about introducing name-quotas, that ought to fix the problem.


I gather you're being sarcastic, but high name inequality is just another illustration of a lack of social mobility.

People of a certain social class are more likely to name their child certain names, and those children are more likely to grow up into particular professions.

Name inequality represents clear evidence against the existence of a meritocracy.


> Name inequality represents clear evidence against the existence of a meritocracy.

Meritocracy in what sense?

Coming from wealthier background, you're more likely to be well educated.

While we may dislike that it is so, it doesn't mean there's no meritocracy in the sense that employers don't hire people based on their qualifications alone. Only that they don't care where these qualifications come from.


> Coming from wealthier background, you're more likely to be well educated.

So, by your admission, education is not meritocratic.

I claim that employment is non-meritocratic first by your measure: if access to a better education is not merited, then employers concentrating on qualifications alone are not hiring according to merit.

I claim also that employment is non-meritocratic independently. The most obvious example is that wealth similarly gives you exclusive access to low-paying but prestigious jobs.

I suspect overall that background wealth is still a better marker for job status than education.


> I claim that employment is non-meritocratic first by your measure: if access to a better education is not merited, then employers concentrating on qualifications alone are not hiring according to merit.

That doesn't follow. Maybe education changes your merit, and people with a better education actually are better at their jobs.


It follows.

Or, in your meritocracy, people pay to increase their merit/the merit of their children? That contradicts my definition.


What definition would that be? A nation where people were not allowed to learn outside of standardized government training might achieve equality, but only by grinding everyone down to zero.


Haha. No, it's more like this for me: if the provided education system were good and appropriately diverse, few people would feel the need to pay to leave it.

Then you might be able to say that everyone had access to a decent standard of education.


That doesn't make the difference you think it does. There are European countries where barely anyone pays for education (and private schools/universities are less reputable than state ones). You still find the children of wealthy people are disproportionately likely to be wealthy themselves. Parental involvement, cultural attitudes to education - and, probably, as politically unacceptable as it is to say it, genetics as well - seem like bigger effects than expensive private education.


Sure, before I was derailed by education, my point was that society is not meritocratic. So we're now agreeing.

We don't know if genetics is an effect here because we can't eliminate background wealth - even twin studies are broken because twins get adopted into similar environments.

But, the suggestion from twin studies is that income is primarily environmental, i.e. not genetic. Here's a fresh reference: http://www.sciencedaily.com/releases/2014/11/141106113202.ht...

I suggest that it's fashionable to say social inequality is down to genetics - it certainly happens a lot in these forums and it comes up especially when people would rather pass the buck for difficult social problems.


Genetics aren't the only thing we inherit, and aren't the only possible source of differences in merit. It could equally be cultural differences - e.g. attitudes towards education, work, society and so on - that would be passed on even to adopted children, and would make people genuinely better at their jobs. Even if the advantages of the rich are purely environmental, that in no way proves that they're not meritocratic.


I don't think anybody claims that higher education in the US is a meritocracy.

When people on HN talk about the meritocracy as though it is something that exists, they are talking about it in the tech industry, not in the education system that precedes it.


If tech industry is a meritocracy via higher education, and higher education is not itself a meritocracy, then the tech industry cannot be a meritocracy.

Putting the unfair part in an early funnelling step of the hiring process, and then abstracting that away, doesn't change that the whole process is unfair.


You're missing the fact that we can see gender inequalities in this data. All of the disproportionate names for fitness instructors are female. Either there's much wider variance of male names for fitness instructors or there's terrible gender disparity in that profession.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: