The following is a cut-and-paste copy
of the definition of the IPA/ASCII standard, widely used on
sci.lang (among other places) for accurate phonetic
transcription in a 7-bit medium. Aside from splitting off
the large appendices into their own nodes and hardlinking some of the
more esoteric linguistic terminology, I have left it untouched,
since its author knows the subject far better than I do.
The page was written by Evan Kirschenbaum and can be found at
http://www.hpl.hp.com/personal/Evan_Kirshenbaum/.
I'm not expecting upvotes for this, but please don't downvote
it either -- I refer to IPA/ASCII in a number of nodes on the
Japanese language, and I expect that quite a few other
linguistics-oriented noders will find it helpful.
--gn0sis
Representing IPA Phonetics in ASCII
Evan Kirshenbaum
<kirshenbaum@hpl.hp.com>
Last Modified, 4 Jan 1993 / Error corrected 22 Jan 2001
This article describes a standard scheme for representing IPA
transcriptions in ASCII for use in Usenet articles and email. The
following guidelines were kept in mind:
- It should be usable for both phonemic and narrow phonetic
transcription.
- It should be possible to represent all symbols and
diacritics in the IPA.
- The previous guideline notwithstanding, it is expected that
(as in the past) most use will be in transcribing English,
so where tradeoffs are necessary, decisions should be made
in favor of ease of representation of phonemes which are
common in English.
- The representation should be readable.
- It should be possible to mechanically translate from the
representation to a character set which includes IPA. The
reverse would also be nice.
In order to be able to represent a wide range of segments while making
common segments easy to type, we allow more than one representation
for a given
segment. Each segment has an "explicit" representation,
which is a set of features between curly braces ("{" and
"}"). Each feature is represented as a three letter
abbreviation taken from a standardized set. The phoneme
/b/ (a
voiced,
bilabial stop) could be represented as
/{vcd,blb,stp}/. A first cut at the feature set appears
in
appendix A.
The word tag could thus be represented phonemically as
/{vls,alv,stp}{low,fnt,unr,vwl}{vcd,vel,stp}/
and phonetically as
[{vls,asp,alv,stp}{low,fnt,lng,unr,vwl}{unx,vcd,vel,stp}]
This works, but it's a bit of a pain. To simplify transcription, we
allow an "implicit" representation for a segment which consists of a
(generally alphabetic) symbol followed by diacritics. Thus
/b/ stands for /{vcd,blb,stp}/. Case is
significant (/n/ and /N/ are different
segments). The segment symbols are given in
appendix B.
The word tag can thus be represented phonemically as
/t&g/
The diacritics for a segment are represented between angle
brackets ("<" and ">") and consist of
symbols or features. (In the common case where the diacritic symbol
is a single character which does not encode a segment, the brackets
may be removed.) The features which the diacritics map to override
those of the segment.
The word tag thus becomes narrowly
[t<asp>&<lng>g<unx>]
or
[t<h>&<:>g<o>]
or
[t<h>&:g<o>]
Some diacritic symbols encode more than one feature set. Which one is
meant should be apparent from context. For example, "."
stands for "{rnd}" when attached to a vowel, but
"{rfx}" when attached to a consonant.
Clicks are common to many languages (especially in Africa), but
there is no IPA diacritic that means "click". Rather than use up
several characters for clicks (which are infrequent in the languages
most often discussed), we instead use the diacritic "!"
after the homorganic unvoiced stop. Thus /t!/ (=
/t<clk>/ = /{alv,clk}/) is the sound
commonly written tsk and used in English to show disapproval.
The complete set of diacritic symbols appears in
appendix C below.
Appendices D
and
E contain representations of segments more or
less ordered by feature
(appendix D in tabular form,
appendix E as a list).
Appendix F contains a list of all of the ASCII
characters and the uses they have been pressed to.
For transcription of any specific language a group can by convention
alter the character mappings (as an example, for Spanish
/R/ may be better used to represent
/{alv,trl}/ than /{mid,cnt,rzd,vwl}/). An
author may also press a little used symbol (for the language under
consideration) into service to highlight a distinction. Such an
alteration should be made explicitly to avoid confusion.
The diacritics "+" and "=" and the
segment symbols "$" and "%" are
explicitly left unspecified so that they can be used to mark
language-specific features (that are otherwise cumbersome to mark).
Such symbols can be assigned either by convention for a specific
language or in an ad-hoc manner by an individual author.
Stress marks are prepended to the syllable they attach to.
"'" signals primary stress, "," signals
secondary stress. Spaces should be employed to separate words
(cliticized words may be written unseparated). When discussing single
words, it may be helpful to insert a space before each syllable that
doesn't carry a suprasegmental marker.
Thus, I hear the secretary for an American might
be something like
/aI hir D@ 'sEkrI,t&ri/
while to an Englishman it might be more like
/aI hi@ DI 'sEkrVtri/
Transcribing tone is harder. Here's an attempt. For register tone
languages (e.g., Hausa, Navajo), numbers should be used with one being
the lowest. Thus in Navajo, "1" is low tone and
"2" is high. In Yoruba "1" is low,
"2" is mid, and "3" is high. The language's
"default" tone need not be specified. For contour tone languages
(e.g., Mandarin, Thai), there is generally a numeric system in place
(Mandarin: "1" is high, "2" is rising,
"3" is falling rising, "4" is falling). The
tone indication should follow the syllable (vowel?).
The symbol "#" is used to represent a syllable or word boundary.
Appendices
Appendix A. Feature Abbreviations
Appendix B. Segment Symbols
Appendix C. Diacritics
Appendix D. Segment Table
Appendix E. Segment List
Appendix F. ASCII Table