I note that while E is more common than A if we're counting letters appearing anywhere in a word, A is substantially more common than E if we only count first letters of words:
$ egrep -o . /usr/share/dict/words | tr a-z A-Z | sort | uniq -c | sort -rn
235415 E
201093 I
199606 A
170740 O
161024 R
158783 N
152868 T
139578 S
130507 L
103460 C
87390 U
78180 P
70725 M
68217 D
64377 H
51683 Y
47109 G
40450 B
24174 F
20181 V
16174 K
13875 W
8462 Z
6933 X
3734 Q
3169 J
2 -
$ cut -c1 /usr/share/dict/words | tr a-z A-Z | sort | uniq -c | sort -rn
25170 S
24465 P
19909 C
17105 A
16390 U
12969 T
12621 M
11077 B
10900 D
9676 R
9033 H
8800 I
8739 E
7850 O
6865 F
6862 G
6784 N
6290 L
3947 W
3440 V
2284 K
1643 J
1152 Q
949 Z
671 Y
385 X
This also explains the prevalence of S, P, C, M, and B.
A bit off-topic, but this used to be (one of) my favorite unix admin interview questions.
Given a file in linux, tell me the unique values of column 2, sorted by number of occurencies with the count.
If the candidate knew 'sort | uniq -c | sort -rn' it was a medium-strong hire signal.
For candidates that didn't know that line of arguments, I'd allow them to solve it anyway they wanted, but they couldn't skip it. The candidates who copied the data in excel, usually didn't make it far.