Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I note that while E is more common than A if we're counting letters appearing anywhere in a word, A is substantially more common than E if we only count first letters of words:

  $ egrep -o . /usr/share/dict/words | tr a-z A-Z | sort | uniq -c | sort -rn
  235415 E
  201093 I
  199606 A
  170740 O
  161024 R
  158783 N
  152868 T
  139578 S
  130507 L
  103460 C
  87390 U
  78180 P
  70725 M
  68217 D
  64377 H
  51683 Y
  47109 G
  40450 B
  24174 F
  20181 V
  16174 K
  13875 W
  8462 Z
  6933 X
  3734 Q
  3169 J
     2 -

  $ cut -c1 /usr/share/dict/words | tr a-z A-Z | sort | uniq -c | sort -rn
  25170 S
  24465 P
  19909 C
  17105 A
  16390 U
  12969 T
  12621 M
  11077 B
  10900 D
  9676 R
  9033 H
  8800 I
  8739 E
  7850 O
  6865 F
  6862 G
  6784 N
  6290 L
  3947 W
  3440 V
  2284 K
  1643 J
  1152 Q
   949 Z
   671 Y
   385 X
This also explains the prevalence of S, P, C, M, and B.


A bit off-topic, but this used to be (one of) my favorite unix admin interview questions.

Given a file in linux, tell me the unique values of column 2, sorted by number of occurencies with the count.

If the candidate knew 'sort | uniq -c | sort -rn' it was a medium-strong hire signal.

For candidates that didn't know that line of arguments, I'd allow them to solve it anyway they wanted, but they couldn't skip it. The candidates who copied the data in excel, usually didn't make it far.


> The candidates who copied the data in excel, usually didn't make it far.

Were they able to google? If not then excel makes perfect sense because the constraints are contrived.


Just like engineering school, I always allowed open book tests. It's not reasonable to answer everything from memory.

However, if they used google, they may be a bit slower and not be able to finish all the questions resulting in a fail.


My intuitions start with: cut, wc, sort, uniq




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: