The English language is written in the Latin alphabet, and in principle, each letter corresponds to one sound. In practice however, many letters can be pronounced in multiple ways, and many letters may represent multiple sounds. The following diagram shows the relationships between English sounds, and the letters used to write them.
- The sounds of English are represented here by the phonemes of English, i.e., basic sound units of the language. This ignores differences between sounds that are not used to distinguish different words, such as the British and American way of pronouncing “au”, which is simply written /ɔː/. Phonemes are represented using the international phonetic alphabet (IPA). See English phonology on Wikipedia for a description.
- English uses not only individual letters, but also groups of multiple letters which are pronounced as one phoneme. Groups of two letters are called digraphs.
The following diagram show the relationship between the vowel phonemes of English and the letters and digraphs used to write them. Each cell shows a word written with the given letter or digraph, and producing the given phoneme. Each column represents words containing the same sound, but written in a different way. Each row represents words containing the same letter or digraph, but pronounced in a different way.
Note that the table shows only vowels, and also omits vowels followed by R, as well as the /ə/ (schwa) phoneme. Each word is color-coded:
- White words are regular English words
- Blue words have irregular spelling
- Green words have a foreign origin explaining their irregular spelling
- Yellow words have spellings or pronunciations which are either rare, or used only in certain English-speaking countries.
For each cell in which a word exists, we show a word from the first category for which a word exists. Notes: The table is likely to contain errors and omissions! Please write a comment below, and I will correct it.
- I have very probably missed many rare and foreign words. In particular, there are currently no trigraphs (groups of three letters) in the table.
- The table does not give an indication of the frequency of each letter/phoneme combination. For instance, “oo” represents /ʌ/ only in very few words, such as “blood” and “flood”.
- The assessment of spellings as regular and irregular may be arbitrary in places. For instance, I count “knowledge”/”bought”/”tought” as irregular.
- Vowels followed by R are not shown, even though they are also irregular, for instance the pronunciations of “learn”, “hear”, “bear” and “heart”. I put “bury” in this table because the R in not in the same syllable as the vowel.
- Consonants are missing. In general, consonants have more regular pronunciation, but still there are some interesting cases such as with “who”/”white”, etc.
Analysis We may note at once that English orthography does not have a one-to-one relationship between sounds and letters. In fact, almost all letters and digraphs have multiple pronunciations, and almost all phonemes have multiple orthographies. In terms of network analysis, we may ask whether it is possible to reach any phoneme from any other phoneme. For instance, we may go from /ɪ/ to /ɒ/ using the following sequence:
- /ɪ/ is written “i” “in”, which is pronounced /aɪ/ in “mind”, which is written “ei” in “height”, which is pronounced /eɪ/ in “veil”, which is written “a” in “name”, which is pronounced /ɒ/ in “yacht”.
This shows that there is a path from /ɪ/ to /ɒ/, even though these two sounds are completely different. The sequence “in” → “mind” → “height” → “veil” → “name” → “yacht” has length five. The underlying graph is thus a bipartite graph: It contains phonemes and letters/digraphs as nodes, and words as edges. Can all nodes in this graphs be reached from any other node? The answer is no, because the phoneme /ɔɪ/ with its two spellings “oi” and “oy” is isolated from the rest of the graph. The rest of the graph is connected, however, so any phoneme can be reached from any other phoneme, except for /ɔɪ/. However, we can still find that there are two clusters in the graph, with many edges within each cluster, but not many edges from one to the other: the AEI group and the OU group:
- The AEI group contains the sounds /ɪ/, /aɪ/, /ɛ/, /iː/, /æ/, /eɪ/, /ɑː/ and /ɔː/, and the spellings “i”, “y”, “uy”, “ie”, “e”, “ei”, “ea”, “ee”, “eo”, “ey”, “a”, “ae”, “ai”, “ay”, “al”, “au” and “aw”.
- The OU group contains the sounds /ɒ/, /oʊ/, /ɔɪ/, /aʊ/, /ʊ/, /ʌ/, /uː/ and /juː/, and the spellings “ao”, “o”, “oa”, “oe”, “oi”, “oo”, “ou”, “ow”, “oy”, “u”, “ue”, “eu”, “ew” and “ui”.
The only links between the two groups are irregular and foreign words:
- Words with irregular spelling: “women”, “busy”, “build”, “bury”, “broad”, “bought”, “knowledge”, “yeoman” and “gaol” (alternative spelling of “jail”).
- Words of foreign origin: “phoenix”, “mauve” and “pharaoh”.
EDIT: Added the word “laugh” with pronunciation /æ/.