For all the concentration on things Japanese here, there seems to be no decent guide to the pronunciation as such. The following covers the basics, adding a bit of optional accuracy for linguists, and explains why there are problems both with the native Japanese writing and with any system of romanization. It refers to the standard dialect taught to foreigners.

Word shape

Japanese words are characteristically composed of open syllables, that is ones ending in a vowel. Names like Yo-ko-ha-ma are common. There are only two exceptions: first, the voiceless consonants p t k s can be repeated, as in Ni-k-ke-i, and this double (or geminate) consonant is noticeably prolonged. Second, there can be a separate n, as in Ni-p-po-n. These are the only consonants that can end a syllable.

A syllable can begin with at most one consonant. When an English word like strike is borrowed, extra vowels need to be inserted to break up the first syllable's str-, and it would become sutoraiku. Usually the vowel inserted is u but there might be reasons to use another one.

There is no stress in Japanese. The argument over whether it's HiroSHIma or HiROSHima is moot: it's neither. See Japanese accent for details of what it does have, which is two levels of pitch (high and low). A beginner can ignore this: it doesn't make much difference to words.


There are five vowels, a e i o u. Each of these is always pronounced the same (with an exception noted below). There is no variation depending on the position in the word, or on accent, or anything like that. Each vowel can occur short or long. The long vowel is the same quality as the short, just longer (unlike English, which hasn't got simple short/long pairs). In different romanization systems the long vowel is written either double aa ee ii oo uu or with some kind of accent over it. I'm going to use double here because it's easier to type, and it's standard among linguists.

The vowels are basically as in Spanish or Italian. Using English examples is a bit less helpful because we have dialect variation and short/long changes, but they're roughly thus:
a as in father
e as in bed or bear
i as in machine
o as in law or lord (but not if you have a in law in your accent)
u as in rule but with spread lips, not rounded. This is the tricky one.

There are no diphthongs as such, but any vowel can combine with any other. So ie, au, ai, oe etc. should be pronounced with each part distinct, as two syllables. Don't add a consonant glide between them: don't turn ie into iye. The only exception in pronunciation is that ei is usually pronounced as a long ee (speakers vary). A further exception is in writing: although ou does exist, it is rare, and that written combination almost always stands for long oo.

The only time a vowel is pronounced differently from its usual value is when a high vowel i or u is between two voiceless consonants p t k s h. In this case the vowel becomes voiceless too: whispered, or effectively silent, just adding a whispery pause or prolongation between the consonants. So kisa is pronounced almost khsa, and Yasukuni is almost Yashkuni or Yasskuni. This also happens at the ends of the grammatical endings desu and -masu. Effectively the u is silent. This devoicing effect only happens to short vowels, not long, and usually only to one of the syllables if two consecutive syllables could be affected.


The following basic consonants exist: p b t d k g s z m n r h w y. A few of them are affected by preceding or following vowels, as follows:

t is pronounced ts before u, as in Tukuba, the city commonly romanized as Tsukuba.
g is as in get, give initially, but is as in sing, singer (no g-sound) after a vowel. So eigo 'English' is pronounced een(g)o. This also happens to the grammatical particle ga, which in romanization is written as a separate word, but phonetically it's attached to the preceding word: sukiyaki-ga suki desu 'Sukiyaki is nice'.
z is often pronounced dz before u, as in Suzuki, azuki bean. It is almost never romanized as dz (but: adzuki).
r is difficult to pin down, and varies somewhat among Japanese speakers anyway. It's not exactly like English R, D, or L, but may be like Spanish R in pero, common American English T in water, a quick tap.
h is pronounced as a bilabial φ before u, a softer sound than the English labiodental F. In some romanizations it's written with f, as in Hukuoka or Fukuoka.
w has spread lips, not rounded; it resembles u in this.

The romanizations tsu and fu are quite unnecessary: they would be much better as tu and hu. Supposedly they make it easier for English-speakers to say them, but using an English F in hu is no closer to Japanese than using an H, and it is contrary to how Japanese writing and grammar work.

Any consonant followed by i is changed, as I shall now explain.

Palatalized consonants

The question of transcription is more difficult with the palatalized consonants. Japanese writing and grammar treat them one way, the Japanese sound system organizes them a different way, and romanizations are forced to choose between these. Apart from the two semivowels w y, all the consonants I have given above come in two forms, one plain and one palatalized. The palatalization for some consonants is effectively the adding of y: so py is the sound in English pure.

For other consonants there is a significant change of sound: so ty is pronounced CH as in cheese, sy is pronounced SH as in sheet, and dy and zy are pronounced J as in jeep. Some romanizations use the letters ch, sh, j for these. (The Japanese sounds are alveolo-palatal, slightly softer than the English palato-alveolar CH, J, SH.) With h the palatalized hy is the palatal sound in German ich.

The way Japanese writing and grammar treat them is that any one of the basic consonants can be followed by any one of the five vowels. The palatalized versions can only be followed by a o u. So you get this sort of pattern:

        na   ne   ni   no   nu
        nya            nyo  nyu

        za   ze   zi   zo   zu
        zya            zyo  zyu
In actual pronunciation however the sounds pattern like this, where i always palatalizes:

        na   ne        no   nu
        nya       nyi  nyo  nyu

        za   ze        zo   zu
        zya       zyi  zyo  zyu
With the merely Y-coloured ones like N, there's not a lot of difference between hearing ni and nyi. With the more striking changes there is much more a sense of two different sounds. In the former case all romanizations defer to the script and write them thus:

        na   ne   ni   no   nu
        nya            nyo  nyu
But with the larger change, either the script/grammar or the actual pronunciation is chosen as a model for romanization, giving one of these two patterns:

        za   ze   zi   zo   zu
        zya            zyo  zyu
        za   ze        zo   zu
        ja        ji   jo   ju 
The voiceless palatalized forms behave like the plain in two respects mentioned above: they can be doubled (issyo, matti, also romanized issho, matchi); and the vowel can be voiceless between two of them, and in fact the palatal quality of the consonant is the main difference between suki and siki with the vowel so faint.

The syllabic N

See the writeup syllabic N in Japanese for detail, but briefly there is a nasal consonant separate from ordinary N and M. It occupies an entire mora (or roughly a syllable) on its own, and does not need to be followed by a vowel.

It assimilates to some following consonants: np nb nm nt nd nn nk ng, and their palatalizations. In some romanizations np nb nm are written mp mb mm. In the group ng the g is in the middle of a word so has its singer pronunciation, but lengthened here.

The pronunciation at the end of a word is variable. It is some kind of nasalization, perhaps a nasalized schwa or a lax uvular nasal, and may have closed lips. You really need coaching from a native speaker for this one.