I’ve been trying to learn a new language (Vietnamese) and a thing that has been driving me crazy are all these instances of letters being randomly pronounced differently in different words sometimes. If you don’t think about it too much, it’s easy to go “this language is dumb, why do they do this?” But then I think about English and we have so many examples of this or other linguistic oddities that make no sense but which I’ve just accepted since I learned them so long ago.
So I wanted to generalize my question: For all the languages where this applies, why are there these cases where letters have inconsistent pronunciations? For cases where it sounds like another letter, why not just use that one? For cases where the letter or combination of letters creates a new sound not already covered by existing letters, why not make a new one? How did this happen? What is the history? Is there linguistic logic to it beyond these being quirks of how the languages historically developed?
In almost every language the writing was developed ventures or even millenia after spoken.
Writing is also more set, especially with the printing press it made changes very slow. So while spoken language keeps changing written lags further and further behind.
Huh. I hadn’t even considered how technology might affect this. Interesting.
Yeah, English is the go-to example because the first people to run printing presses in English were Dutch and couldn’t really spell or speak English.
There was likely similar situations in Asia as well, I’m just not familiar with their history.
But go back far enough and even writing by hand is technology
English was written long before the printing press. However it always (at least to my knowledge) used the latin alphabet which predates any written English I can find by more than 1000 years. Note that I’m not an expert on English linguistics, so if is someone claims something before 1200 I’m not aware of it but that doesn’t mean they are wrong.
For cases where it sounds like another letter, why not just use that one?
In Spanish, words that use
kinstead ofctend to come from “other” languages, like Greek, Arabic, Japanese, or Russian.Aparece en palabras procedentes de otras lenguas en las que se ha buscado respetar la ortografía originaria, o en voces transcritas de lenguas que emplean alfabetos o sistemas de escritura distintos del nuestro, como el griego, el árabe, el japonés o el ruso
Yep. The letter K is basically a concession of the Latin alphabet to make some more sense of Greek loanwords, where the letter K is originally from, following a series of pronunciation shifts. But C is the Latin K, so words of Latin origin (the majority of vocabulary in Romance languages like Spanish) will normally only use C for that sound.
K is more useful in languages where the soft C has entered use (like French, Spanish, English, and others) just because K is always hard and makes it easier to define the pronunciation of (loan)words that may otherwise encourage the wrong pronunciation when paired with certain vowels (kite, cite, and site all being different words in English, for example).
Sometimes it’s because of linguistic shifts (like in the case of english), the letters were pronounced the same, but the pronunciation shifted while the writing didn’t
Sometimes it can also depend on things like stress, so for native speakers the sounds are the “same” just stressed/unstressed, so they have the same letter, but if you don’t know the rule it seems arbitrary
And it can also come from the use of loan words. If a language has an established writing system, but users adopt a word that uses a sound not otherwise present in the language they’ll write is as close phonetically and simply know that loan words have an exception to the rule, while this again is not obivous to an outsider
Probably some other cases too, and im not sure which one applies to the specific sound you’re struggling with but a couple of examples
Probably some other cases too, and im not sure which one applies to the specific sound you’re struggling with but a couple of examples
A recent example I came across while doing Vietnamese vocab was several letters being used to make an “Z” or “L” sound in some cases. For compound sounds, there are things like “ng” sounding like an “M” in front of words sometimes in Vietnamese or in English we have things like “th” or “ch” where the resulting sound doesn’t sound like it comes from either of the building blocks. “T” is pronounced like “tuh” and “H” is pronounced “huh”, (there is a certain irony in trying to use letters to communicate the pronunciation of letters, but whatever.) so you’d think that “th” would be something like “tuh huh” instead of the actual pronunciation. While I was writing this I thought of an example where this is how it works: “tw” gets pronounced like “twuh” like in “twelve” or “twenty”… and then I remembered “two” exists and sounds like “too” (and “to”) for some reason. So yeah. It’s really hard to come up with a consistent rule for a lot of these.
EDIT: Oh I just remembered another funny exception for “ch”: In “Chemistry” the “H” is neither pronounced nor does it modify the “C” to make the normal “ch” sound. It just sounds like there is a “C” there. Like “Cemistry.” Except looking at that, that pattern is used in something like “Cemetery” and then the “C” sounds like an “S”. I’m going to stop now because there are so many of these I could probably go on forever if I kept thinking about it.
A recent example I came across while doing Vietnamese vocab was several letters being used to make an “Z” or “L” sound in some cases.
There’s actually a similar thing in my native language, where we have multiple letters for the same sound :3. (Ų and Ū make the same long “oo” sound) those letters were originally distinct with Ų being nasal and Ū being long, but the nasal letters have come to simply be longer variants of the base letter, however both letters are still useful as they serve a distinct grammatical role :3… the letter clusters ei and iai also tend to sound the same but be used for different purposes
Sometimes the letters can also make the same sound in some words but not others due to palletization, I don’t know if vietnamese palletizes any letters, but it can make certain letters sound the same especially in some letter clusters, despite being otherwise distinct
in English we have things like “th” or “ch” where the resulting sound doesn’t sound like it comes from either of the building blocks.
Th is a fun one, because it did originally have distinct letters for both of the possible sounds it makes (þ and Ð) but with the rise of the printing press from germany which did not have those letters they were replaced with other letters like y (ye olde) and later th :3
But yes, the twelve/two example is due to the sounds shifting I believe, as that’s where most of the silent letters in English come from excluding the french origin words (so it would’ve been twuh-o back in the day instead of too)
What is your native language?
Seems like you skipped a small but vital detail.
Lithuanian :3
Cool.
EDIT: Oh I just remembered another funny exception for “ch”: In “Chemistry” the “H” is neither pronounced nor does it modify the “C” to make the normal “ch” sound. It just sounds like there is a “C” there. Like “Cemistry.” Except looking at that, that pattern is used in something like “Cemetery” and then the “C” sounds like an “S”. I’m going to stop now because there are so many of these I could probably go on forever if I kept thinking about it.
That one’s the loanword problem. Greek has letters Κ (kappa) and Χ (chi, pronounced similar to “key” but from the back of the throat). Kappa is a close approximation to the English K, while chi doesn’t have anything like it in English. So loanwords from Greek that used chi are written differently.
Wall of random language knowledge coming:
In the Latin language, where our alphabet derives, C was originally always hard (like “calendar” as opposed to “celery”). When Greek loanwords entered Latin, kappa was transliterated to C (Kronos—Cronus). Chi, being similar but just a bit more breathy, was transliterated as Ch (Chimera).
Latin experienced pronunciation shifts and gradually branched off into the modern romance languages. In several of them, the letter C conditionally softened (e.g. cerveza in Spanish, cent in French, etc).
The Latin alphabet did not enter use for the English language until Christianity came to Britain in the middle ages. Before then, Old English, which should be more accurately called the Anglo-Saxon language, was written in Futhorc, a runic system like old Norse. The Latin alphabet was adapted to Anglo-Saxon, but there were not always 1:1 pronunciations, so pronunciation of certain letters shifted and some runic holdovers from Futhorc like Þ (thorn) for Th remained in use.
In the intervening centuries, Anglo-Saxon/English would undergo a pronunciation shift, a series of invasions from the Danes and Normans, and Ecclesiastical Latin (Latin after undergoing a pronunciation shift) remained present for religious purposes. All of these would introduce new loanwords and expand the English vocabulary at different times. The Germanic loanwords would be transliterated, while the Romantic loanwords would be lifted directly or edited slightly because they already used the same writing system. The softer Ch sound (like “chair”) existed in English by the time the Normans arrived, and they started writing it like Ch because that sounded closer to its use in French.
Finally, this was all further complicated by the invention of the printing press. By the time this occurred, the Latin alphabet became the de facto writing system for most of Europe, but languages did not quite meet 1:1 on which letters were used. Some innovations like the letter W stuck, because it was very convenient for German. And as it happens, the German printing presses invented by Gutenberg were the first to cross over into Britain. The German W was a convenient enough replacement for the English Ƿ (Wynn), but German had no equivalent for Þ (thorn) or Ð (eth, the th pronounced like “that”), so early English printers first approximated by using the letter Y for being less common and looking close enough (“ye old” is really “the old”) before eventually settling on Th.
Okay, one final note. On the random topic of W, and why it looks like two Vs, V is how U was written in classical Latin, and so W is double that. You’ll find the logic of W persists in a lot of words if you replace it with a U, even though we think of W as a consonant and U as a vowel. You can look at an edited word like “flouer” and potentially still read it as “flower” because we have other words like “flour” which have the same sound.
There’s a reason kids in Spelling Bee competitions are allowed to ask for the language of origin of a word.
It can often give a hint that a certain sound is spelled an unusual way. The “Ch” of “Chemistry” comes through Greek where it’s spelled with their letter “chi”, which for reasons I won’t get into, looks like our X.
Kids in a spelling bee wouldn’t need to ask about “Chemistry”, of course, but there may be other examples where that would be useful.
There are 26 letters in the latin alphabet. There are between 38 and 49 sounds in English depending on dialect https://en.wikipedia.org/wiki/English_phonology (I’ve seen reports as high at 56 but I can’t find sources so I’ll stick to Wikipedia which is often accurate) There is no way to have nice spelling in English. Some languages using the latin alphabet have various accent marks which help. At this point the dialects of English are different enough that reformed spelling would need to start with reforming how we pronounce words. (there are other alphabets in the world, I have no comment on if any would be better)
Languags don’t get designed in a lab by a creator who comes up a consistent set of rules. Languages constantly shift and change as the people who speak them do. Languages borrow loanwords from each other, then proceed to mangle them. Slang arises, becomes part of the lexicon, becomes passe. Regional dialects drift apart but then mingle again.
And at no point does logic ever enter into the equation. Change just happens haphazardly.
There’s a pair of concepts in Linguistics referred to as prescriptivism and descriptivism. Prescriptivism refers to trying to declare a set of rules for how language should be. If your teacher ever told you that ‘ain’t’ isn’t a real word, that’s prescriptivism, and it’s bunk. Descriptivism is just a best effort to describe how speakers of a language actually use it. If English speakers regularly say ‘ain’t’, then it’s an English word. The fun thing about descriptivism is that there will always be holes and inconsistencies, because not all English speakers are necessarily speaking the same way.
Compare the English we speak today from Ye Olde Englishe. Many words are now spelled or pronounced differently from how they used to be. Many old words have been replaced by completely different ones. Syntax has changed quite a bit. And if you go far back enough, English used to be written with a different set of characters from the Latin alphabet we use now. But this all happened so gradually you can’t establish any clear dividing line to separate these languages, there’s no date on which you could say everything prior was Old English and everything after is Modern English. And if you look towards the future, 100, 1000, 10000 years from now, English won’t be the same as it is now either.
Below is just one possible aspect of this, the other answers you’ve received are also valid. Writing systems are complicated!
Your making the mistake that writing systems are supposed to represent speech sounds. They do not (or at least they don’t have to). As an example, in my accent (midwestern American English) there are at least three different sounds I make for “t”:
- “touch”: (aspirated) voiceless alveolar plosive
- “matter”: voiced alveolar tap
- “mat”: glottal stop
These are the technical names linguists use for these sounds; you can find them on Wikipedia if you want to know more. English speakers can agree though that they are all “the same thing”; the technical terminology is that they are all allophones of the same phoneme. Different accents will have different allophones, for example some English accents may pronounce this “t” phoneme in “matter” and “mat” the same way as my “touch”. If you think this is splitting hairs, that’s just false; the way languages divide sounds into phonemes varies greatly. For example, Japanese speakers consider my “touch” “t” and my “matter” “t” to be two completely different sounds, i.e. two different phonemes which are not interchangeable.
(Very) roughly speaking, writing systems tend to map better onto phonemes than onto actual sounds. Part of your frustration with Vietnamese writing could partly be from this: Vietnamese possibly has some sounds as allophones which in English are not allophones and belong to different phonemes. In other words, to a Vietnamese speaker they are the same sound.
One more example is the Cot-Caught merger present in some varieties of English. In my accent, the vowels in these words are two separate sounds for two separate phonemes. In English accents which have the merger, they have become the same phoneme and in fact are pronounced identically, with the exact sound depending on the particular variety of English.
This shows one way you can end up with different spellings for identically-pronounced words.
evolution happens for all things that self-replicate, not just life. languages self-replicate through those who use them and as such change over time with leftovers like we often see in life.
As a case study, I think Vietnamese is especially apt to show how the written language develops in parallel and sometimes at odds with the spoken language. The current alphabetical script of Vietnamese was only adopted for general use in the late 19th Century, in order to improve literacy. Before that, the grand majority of Vietnamese written works were in a logographic system based on Chinese characters, but with extra Vietnamese-specific characters that conveyed how the Vietnamese would pronounce those words.
The result was that Vietnamese scholars pre-20th Century basically had to learn most of the Chinese characters and their Cantonese pronunciations (not Mandarin, since that’s the dialect that’s geographically father away), and then memorize how they are supposed to be read in Vietnamese, then compounded by characters that sort-of convey hints about the pronunciation. This is akin to writing a whole English essay using Japanese katakana; try writing “ornithology” like that.
Also, the modern Vietnamese script is a work of Portuguese Jesuit scholars, who were interested in rendering the Vietnamese language into a more familiar script that could be read phonetically, so that words are pronounced letter-by-letter. That process, however faithful they could manage it, necessarily obliterates some nuance that a logographic language can convey. For example, the word bầu can mean either a gourd or to be pregnant. But in the old script, no one would confuse 匏 (gourd) with 保 (to protect; pregnant) in the written form, even though the spoken form requires context to distinguish the two.
Some Vietnamese words were also imported into the language from elsewhere, having not previously existed in spoken Vietnamese. So the pronunciation would hew closer to the origin pronunciation, and then to preserve the lineage of where the pronunciation came from, the written word might also be written slightly different. For example, nhôm (meaning aluminum) draws from the last syllable of how the French pronounce aluminum. Loanwords – and there are many in Vietnamese, going back centuries – will mess up the writing system too.
Oh I didn’t know the current alphabet came from the Portuguese. I assumed it was from the French when they colonized Vietnam.
The point about the logographic characters being distinct is interesting. I guess if you don’t have to phonetically spell it out you have some more freedom in picking what written characters will represent the meanings of the two words. It is still a shame we ended up with those homophones, but I guess that’s just a path dependency thing since the spoken words came first. I guess they just had to work with what they had when they converted them into characters.





