The very first Unicode code block, Basic Latin encodes
7-bit ASCII, alternately known as
ANSI X3.4,
ISO/IEC 646:1991-IRV and the first 128 characters of
ISO 8859-1. Only a small fraction of Latin languages can be written entirely with the characters in the Basic Latin code block, but these characters (a-z and A-Z) form the core of all Latin scripts.
The Latin script was derived from the Greek script. Today it is used to write a wide variety of languages all over the world. In the process of adapting it to other languages, numerous extensions have been devised. The most common is the addition of diacritical marks. Further, the creation of digraphs, inverse or reverse forms, and outright new characters have all been used to extend the Latin script.
Characters for more complex Latin scripts can be found in
Latin-1 Supplement,
Latin Extended A,
Latin Extended B,
IPA Extensions and
Latin Extended Additional
with additional help for more obscure cases in
Letterlike Symbols
Currency Symbols,
Alphabetic Presentation Forms,
Miscellaneous Symbols,
Enclosed Alphanumerics,
Halfwidth and Fullwidth Forms and
Combining Diacritical marks.
Unicode's
Basic Latin code block reserves the
128 code points from U+0000 to U+007F, of which all 128 are currently assigned.
Basic Latin --> Latin-1 Supplement
All the characters in this code block were added in Unicode 1.1
Number of characters in each General Category :
Letter, Uppercase Lu : 26
Letter, Lowercase Ll : 26
Number, Decimal Digit Nd : 10
Punctuation, Connector Pc : 1
Punctuation, Dash Pd : 1
Punctuation, Open Ps : 3
Punctuation, Close Pe : 3
Punctuation, Other Po : 15
Symbol, Math Sm : 6
Symbol, Currency Sc : 1
Symbol, Modifier Sk : 2
Separator, Space Zs : 1
Other, Control Cc : 33
Number of characters in each Bidirectional Category :
Left To Right L : 52
European Number EN : 10
European Number Separator ES : 2
European Number Terminator ET : 3
Common Number Separator CS : 4
Boundary Neutral BN : 24
Paragraph Separator B : 5
Segment Separator S : 3
Whitespace WS : 2
Other Neutral ON : 23
The columns below should be interpreted as :
- The Unicode code for the character
- The character in question
- The Unicode name for the character
- The Unicode General Category for the character
- The Unicode Bidirectional Category for the character
If the characters below show up poorly, or not at all, see Unicode Support for possible solutions.
Basic Latin
- U+0000 null Cc BN
- U+0001 start of heading Cc BN
- U+0002 start of text Cc BN
- U+0003 end of text Cc BN
- U+0004 end of transmission Cc BN
- U+0005 enquiry Cc BN
- U+0006 acknowledge Cc BN
- U+0007 bell Cc BN
- U+0008 backspace Cc BN
- U+0009 character tabulation Cc S
- sgml 	
- aka horizontal tabulation (ht), tab
- U+000A
line feed Cc B
- sgml 

- aka line feed (lf)
- aka new line (nl), end of line (eol)
- U+000B line tabulation Cc S
- aka vertical tabulation (vt)
- U+000C form feed Cc WS
- aka form feed (ff)
- U+000D
carriage return Cc B
- aka carriage return (cr)
- U+000E shift out Cc BN
- * known as LOCKING-SHIFT ONE in 8-bit environments
- U+000F shift in Cc BN
- * known as LOCKING-SHIFT ZERO in 8-bit environments
- U+0010 data link escape Cc BN
- U+0011 device control one Cc BN
- U+0012 device control two Cc BN
- U+0013 device control three Cc BN
- U+0014 device control four Cc BN
- U+0015 negative acknowledge Cc BN
- U+0016 synchronous idle Cc BN
- U+0017 end of transmission block Cc BN
- U+0018 cancel Cc BN
- U+0019 end of medium Cc BN
- U+001A substitute Cc BN
- ref U+FFFD � replacement character (Specials)
- U+001B escape Cc BN
- U+001C information separator four Cc B
- aka file separator (fs)
- U+001D information separator three Cc B
- aka group separator (gs)
- U+001E information separator two Cc B
- aka record separator (rs)
- U+001F information separator one Cc S
- aka unit separator (us)
ASCII punctuation and symbols
Based on ISO/IEC 646.
- U+0020 space Zs WS
- * sometimes considered a control code
- * other space characters: 2000-200A
- ref U+00A0 no break space (Latin-1 Supplement)
- ref U+200B zero width space (General Punctuation)
- ref U+2060 word joiner (General Punctuation)
- ref U+3000 ideographic space (CJK Symbols and Punctuation)
- ref U+FEFF zero width no break space (Arabic Presentation Forms B)
- U+0021 ! exclamation mark Po ON
- sgml !
- aka factorial
- aka bang
- ref U+00A1 ¡ inverted exclamation mark (Latin-1 Supplement)
- ref U+01C3 ǃ Latin letter retroflex click (Latin Extended B)
- ref U+203C ‼ double exclamation mark (General Punctuation)
- ref U+203D ‽ interrobang (General Punctuation)
- ref U+2762 ❢ heavy exclamation mark ornament (Dingbats)
- U+0022 " quotation mark Po ON
- html "
- sgml "
- * neutral (vertical), used as opening or closing quotation mark
- * preferred characters in English for paired quotation marks are 201C & 201D
- ref U+02BA ʺ modifier letter double prime (Spacing Modifier Letters)
- ref U+030B ̋ combining double acute accent (Combining Diacritical Marks)
- ref U+030E ̎ combining double vertical line above (Combining Diacritical Marks)
- ref U+2033 ″ double prime (General Punctuation)
- ref U+3003 〃 ditto mark (CJK Symbols and Punctuation)
- U+0023 # number sign Po ET
- sgml #
- aka pound sign, hash, crosshatch, octothorpe
- ref U+2114 ℔ l b bar symbol (Letterlike Symbols)
- ref U+266F ♯ music sharp sign (Miscellaneous Symbols)
- U+0024 $ dollar sign Sc ET
- sgml $
- aka milreis, escudo
- * Dollar
- * glyph may have one or two vertical bars
- * other currency symbol characters: 20A0-20B5
- ref U+00A4 ¤ currency sign (Latin-1 Supplement)
- U+0025 % percent sign Po ET
- sgml %
- ref U+066A ٪ Arabic percent sign (Arabic)
- ref U+2030 ‰ per mille sign (General Punctuation)
- ref U+2031 ‱ per ten thousand sign (General Punctuation)
- ref U+2052 ⁒ commercial minus sign (General Punctuation)
- U+0026 & ampersand Po ON
- html &
- sgml &
- ref U+204A ⁊ tironian sign et (General Punctuation)
- ref U+214B ⅋ turned ampersand (Letterlike Symbols)
- U+0027 ' apostrophe Po ON
- sgml '
- aka apostrophe-quote (1.0)
- aka APL quote
- * neutral (vertical) glyph with mixed usage
- * 2019 is preferred for apostrophe
- * preferred characters in English for paired quotation marks are 2018 & 2019
- ref U+02B9 ʹ modifier letter prime (Spacing Modifier Letters)
- ref U+02BC ʼ modifier letter apostrophe (Spacing Modifier Letters)
- ref U+02C8 ˈ modifier letter vertical line (Spacing Modifier Letters)
- ref U+0301 ́ combining acute accent (Combining Diacritical Marks)
- ref U+2032 ′ prime (General Punctuation)
- ref U+A78C ꞌ Latin small letter saltillo (Latin Extended D)
- U+0028 ( left parenthesis Ps ON
- sgml (
- aka opening parenthesis (1.0)
- U+0029 ) right parenthesis Pe ON
- sgml )
- aka closing parenthesis (1.0)
- * see discussion on semantics of paired bracketing characters
- U+002A * asterisk Po ON
- sgml *
- aka star (on phone keypads)
- ref U+066D ٭ Arabic five pointed star (Arabic)
- ref U+204E ⁎ low asterisk (General Punctuation)
- ref U+2217 ∗ asterisk operator (Mathematical Operators)
- ref U+26B9 ⚹ sextile (Miscellaneous Symbols)
- ref U+2731 ✱ heavy asterisk (Dingbats)
- U+002B + plus sign Sm ES
- sgml +
- U+002C , comma Po CS
- sgml ,
- aka decimal separator
- ref U+060C ، Arabic comma (Arabic)
- ref U+201A ‚ single low 9 quotation mark (General Punctuation)
- ref U+3001 、 ideographic comma (CJK Symbols and Punctuation)
- U+002D - hyphen minus Pd ES
- sgml ‐
- aka hyphen or minus sign
- * used for either hyphen or minus sign
- ref U+2010 ‐ hyphen (General Punctuation)
- ref U+2011 ‑ non breaking hyphen (General Punctuation)
- ref U+2012 ‒ figure dash (General Punctuation)
- ref U+2013 – en dash (General Punctuation)
- ref U+2212 − minus sign (Mathematical Operators)
- ref U+10191 𐆑 Roman uncia sign (Ancient Symbols)
- U+002E . full stop Po CS
- sgml .
- aka period, dot, decimal point
- * may be rendered as a raised decimal point in old style numbers
- ref U+06D4 ۔ Arabic full stop (Arabic)
- ref U+3002 。 ideographic full stop (CJK Symbols and Punctuation)
- U+002F / solidus Po CS
- sgml /
- aka slash, virgule
- ref U+01C0 ǀ Latin letter dental click (Latin Extended B)
- ref U+0338 ̸ combining long solidus overlay (Combining Diacritical Marks)
- ref U+2044 ⁄ fraction slash (General Punctuation)
- ref U+2215 ∕ division slash (Mathematical Operators)
ASCII digits
- U+0030 0 digit zero Nd EN
- U+0031 1 digit one Nd EN
- U+0032 2 digit two Nd EN
- U+0033 3 digit three Nd EN
- U+0034 4 digit four Nd EN
- U+0035 5 digit five Nd EN
- U+0036 6 digit six Nd EN
- U+0037 7 digit seven Nd EN
- U+0038 8 digit eight Nd EN
- U+0039 9 digit nine Nd EN
ASCII punctuation and symbols
- U+003A : colon Po CS
- sgml :
- ref U+0589 ։ Armenian full stop (Armenian)
- ref U+05C3 ׃ Hebrew punctuation sof pasuq (Hebrew)
- ref U+2236 ∶ ratio (Mathematical Operators)
- ref U+A789 ꞉ modifier letter colon (Latin Extended D)
- U+003B ; semicolon Po ON
- sgml ;
- * this, and not 037E, is the preferred character for 'Greek question mark'
- ref U+037E ; Greek question mark (Greek and Coptic)
- ref U+061B ؛ Arabic semicolon (Arabic)
- ref U+204F ⁏ reversed semicolon (General Punctuation)
- U+003C < less than sign Sm ON
- html <
- sgml <
- ref U+2039 ‹ single left pointing angle quotation mark (General Punctuation)
- ref U+2329 〈 left pointing angle bracket (Miscellaneous Technical)
- ref U+27E8 ⟨ mathematical left angle bracket (Miscellaneous Mathematical Symbols A)
- ref U+3008 〈 left angle bracket (CJK Symbols and Punctuation)
- U+003D = equals sign Sm ON
- sgml =
- * other related characters: 2241-2263
- ref U+2260 ≠ not equal to (Mathematical Operators)
- ref U+2261 ≡ identical to (Mathematical Operators)
- ref U+A78A ꞊ modifier letter short equals sign (Latin Extended D)
- ref U+10190 𐆐 Roman sextans sign (Ancient Symbols)
- U+003E > greater than sign Sm ON
- html >
- sgml >
- ref U+203A › single right pointing angle quotation mark (General Punctuation)
- ref U+232A 〉 right pointing angle bracket (Miscellaneous Technical)
- ref U+27E9 ⟩ mathematical right angle bracket (Miscellaneous Mathematical Symbols A)
- ref U+3009 〉 right angle bracket (CJK Symbols and Punctuation)
- U+003F ? question mark Po ON
- sgml ?
- ref U+00BF ¿ inverted question mark (Latin-1 Supplement)
- ref U+037E ; Greek question mark (Greek and Coptic)
- ref U+061F ؟ Arabic question mark (Arabic)
- ref U+203D ‽ interrobang (General Punctuation)
- ref U+2048 ⁈ question exclamation mark (General Punctuation)
- ref U+2049 ⁉ exclamation question mark (General Punctuation)
- U+0040 @ commercial at Po ON
- sgml @
- aka at sign
Uppercase Latin alphabet
- U+0041 A Latin capital letter A Lu L
- U+0042 B Latin capital letter B Lu L
- ref U+212C ℬ script capital b (Letterlike Symbols)
- U+0043 C Latin capital letter C Lu L
- ref U+2102 ℂ double struck capital c (Letterlike Symbols)
- ref U+212D ℭ black letter capital c (Letterlike Symbols)
- U+0044 D Latin capital letter D Lu L
- U+0045 E Latin capital letter E Lu L
- ref U+2107 ℇ euler constant (Letterlike Symbols)
- ref U+2130 ℰ script capital e (Letterlike Symbols)
- U+0046 F Latin capital letter F Lu L
- ref U+2131 ℱ script capital f (Letterlike Symbols)
- ref U+2132 Ⅎ turned capital f (Letterlike Symbols)
- U+0047 G Latin capital letter G Lu L
- U+0048 H Latin capital letter H Lu L
- ref U+210B ℋ script capital h (Letterlike Symbols)
- ref U+210C ℌ black letter capital h (Letterlike Symbols)
- ref U+210D ℍ double struck capital h (Letterlike Symbols)
- U+0049 I Latin capital letter I Lu L
- * Turkish and Azerbaijani use 0131 for lowercase
- ref U+0130 İ Latin capital letter I with dot above (Latin Extended A)
- ref U+0406 І Cyrillic capital letter byelorussian ukrainian i (Cyrillic)
- ref U+04C0 Ӏ Cyrillic letter palochka (Cyrillic)
- ref U+2110 ℐ script capital i (Letterlike Symbols)
- ref U+2111 ℑ black letter capital i (Letterlike Symbols)
- ref U+2160 Ⅰ Roman numeral one (Number Forms)
- U+004A J Latin capital letter J Lu L
- U+004B K Latin capital letter K Lu L
- ref U+212A K kelvin sign (Letterlike Symbols)
- U+004C L Latin capital letter L Lu L
- ref U+2112 ℒ script capital l (Letterlike Symbols)
- U+004D M Latin capital letter M Lu L
- ref U+2133 ℳ script capital m (Letterlike Symbols)
- U+004E N Latin capital letter N Lu L
- ref U+2115 ℕ double struck capital n (Letterlike Symbols)
- U+004F O Latin capital letter O Lu L
- U+0050 P Latin capital letter P Lu L
- ref U+2119 ℙ double struck capital p (Letterlike Symbols)
- U+0051 Q Latin capital letter Q Lu L
- ref U+211A ℚ double struck capital q (Letterlike Symbols)
- U+0052 R Latin capital letter R Lu L
- ref U+211B ℛ script capital r (Letterlike Symbols)
- ref U+211C ℜ black letter capital r (Letterlike Symbols)
- ref U+211D ℝ double struck capital r (Letterlike Symbols)
- U+0053 S Latin capital letter S Lu L
- U+0054 T Latin capital letter T Lu L
- U+0055 U Latin capital letter U Lu L
- U+0056 V Latin capital letter V Lu L
- U+0057 W Latin capital letter W Lu L
- U+0058 X Latin capital letter X Lu L
- U+0059 Y Latin capital letter Y Lu L
- U+005A Z Latin capital letter Z Lu L
- ref U+2124 ℤ double struck capital z (Letterlike Symbols)
- ref U+2128 ℨ black letter capital z (Letterlike Symbols)
ASCII punctuation and symbols
- U+005B [ left square bracket Ps ON
- sgml [ [
- aka opening square bracket (1.0)
- * other bracket characters: 27E6-27EB, 2983-2998, 3008-301B
- U+005C \ reverse solidus Po ON
- sgml \ &sbsol;
- aka backslash
- ref U+20E5 ⃥ combining reverse solidus overlay (Combining Diacritical Marks for Symbols)
- ref U+2216 ∖ set minus (Mathematical Operators)
- U+005D ] right square bracket Pe ON
- sgml ] ]
- aka closing square bracket (1.0)
- U+005E ^ circumflex accent Sk ON
- sgml ^ ˆ
- * this is a spacing character
- ref U+02C4 ˄ modifier letter up arrowhead (Spacing Modifier Letters)
- ref U+02C6 ˆ modifier letter circumflex accent (Spacing Modifier Letters)
- ref U+0302 ̂ combining circumflex accent (Combining Diacritical Marks)
- ref U+2038 ‸ caret (General Punctuation)
- ref U+2303 ⌃ up arrowhead (Miscellaneous Technical)
- U+005F _ low line Pc ON
- sgml _
- aka spacing underscore (1.0)
- * this is a spacing character
- ref U+02CD ˍ modifier letter low macron (Spacing Modifier Letters)
- ref U+0331 ̱ combining macron below (Combining Diacritical Marks)
- ref U+0332 ̲ combining low line (Combining Diacritical Marks)
- ref U+2017 ‗ double low line (General Punctuation)
- U+0060 ` grave accent Sk ON
- sgml `
- * this is a spacing character
- ref U+02CB ˋ modifier letter grave accent (Spacing Modifier Letters)
- ref U+0300 ̀ combining grave accent (Combining Diacritical Marks)
- ref U+2035 ‵ reversed prime (General Punctuation)
Lowercase Latin alphabet
- U+0061 a Latin small letter A Ll L
- U+0062 b Latin small letter B Ll L
- U+0063 c Latin small letter C Ll L
- U+0064 d Latin small letter D Ll L
- U+0065 e Latin small letter E Ll L
- ref U+212E ℮ estimated symbol (Letterlike Symbols)
- ref U+212F ℯ script small e (Letterlike Symbols)
- U+0066 f Latin small letter F Ll L
- U+0067 g Latin small letter G Ll L
- ref U+0261 ɡ Latin small letter script g (IPA Extensions)
- ref U+210A ℊ script small g (Letterlike Symbols)
- U+0068 h Latin small letter H Ll L
- ref U+04BB һ Cyrillic small letter shha (Cyrillic)
- ref U+210E ℎ planck constant (Letterlike Symbols)
- U+0069 i Latin small letter I Ll L
- * Turkish and Azerbaijani use 0130 for uppercase
- ref U+0131 ı Latin small letter dotless i (Latin Extended A)
- ref U+1D6A4 𝚤 mathematical italic small dotless i (Mathematical Alphanumeric Symbols)
- U+006A j Latin small letter J Ll L
- sgml ȷ
- ref U+0237 ȷ Latin small letter dotless j (Latin Extended B)
- ref U+1D6A5 𝚥 mathematical italic small dotless j (Mathematical Alphanumeric Symbols)
- U+006B k Latin small letter K Ll L
- U+006C l Latin small letter L Ll L
- ref U+2113 ℓ script small l (Letterlike Symbols)
- ref U+1D4C1 𝓁 mathematical script small l (Mathematical Alphanumeric Symbols)
- U+006D m Latin small letter M Ll L
- U+006E n Latin small letter N Ll L
- ref U+207F ⁿ superscript Latin small letter N (Superscripts and Subscripts)
- U+006F o Latin small letter O Ll L
- ref U+2134 ℴ script small o (Letterlike Symbols)
- U+0070 p Latin small letter P Ll L
- U+0071 q Latin small letter Q Ll L
- U+0072 r Latin small letter R Ll L
- U+0073 s Latin small letter S Ll L
- U+0074 t Latin small letter T Ll L
- U+0075 u Latin small letter U Ll L
- U+0076 v Latin small letter V Ll L
- U+0077 w Latin small letter W Ll L
- U+0078 x Latin small letter X Ll L
- U+0079 y Latin small letter Y Ll L
- U+007A z Latin small letter Z Ll L
- ref U+01B6 ƶ Latin small letter Z with stroke (Latin Extended B)
ASCII punctuation and symbols
- U+007B { left curly bracket Ps ON
- sgml { {
- aka opening curly bracket (1.0)
- aka left brace
- U+007C | vertical line Sm ON
- sgml | |
- aka vertical bar
- * used in pairs to indicate absolute value
- ref U+01C0 ǀ Latin letter dental click (Latin Extended B)
- ref U+05C0 ׀ Hebrew punctuation paseq (Hebrew)
- ref U+2223 ∣ divides (Mathematical Operators)
- ref U+2758 ❘ light vertical bar (Dingbats)
- U+007D } right curly bracket Pe ON
- sgml } }
- aka closing curly bracket (1.0)
- aka right brace
- U+007E ~ tilde Sm ON
- * this is a spacing character
- ref U+02DC ˜ small tilde (Spacing Modifier Letters)
- ref U+0303 ̃ combining tilde (Combining Diacritical Marks)
- ref U+2053 ⁓ swung dash (General Punctuation)
- ref U+223C ∼ tilde operator (Mathematical Operators)
- ref U+FF5E ~ fullwidth tilde (Halfwidth and Fullwidth Forms)
Control character
- U+007F delete Cc BN
http://unicode.org
Some prose may have been lifted verbatim from unicode.org,
as is permitted by their terms of use at http://www.unicode.org/copyright.html