Code pages
Code pages are tables of values that describe the character set for a particular language. The following table lists the code pages supported by International Components for Unicode (ICU).
Code page | Description |
---|---|
ASCII | 7-bit ASCII |
LATIN1 | ISO 8859-1 Western European |
ISO8859_2 | ISO 8859-2 Eastern European |
ISO8859_3 | ISO 8859-3 Southeast European |
ISO8859_4 | ISO 8859-4 Baltic |
ISO8859_5 | ISO 8859-5 Cyrillic |
ISO8859_6 | ISO 8859-6 Arabic |
ISO8859_7 | ISO 8859-7 Greek |
ISO8859_8 | ISO 8859-8 Hebrew |
ISO8859_9 | ISO 8859-9 Latin 5 (Turkish) |
ISO8859_10 | ISO 8859-10 Latin 6 (Nordic) |
ISO8859_11 | ISO 8859-11 Thai |
ISO8859_13 | ISO 8859-13 Latin 7 (Baltic Rim) |
ISO8859_14 | ISO 8859-14 Latin 8 (Celtic) |
ISO8859_15 | ISO 8859-15 Latin 9 (Western Europe) |
UTF_8 | UTF-8 encoding of Unicode |
EUC_CN | Simplified Chinese Combined (367 + 1382) |
EUC_KR | Korean EUC Combined (367 + 971) |
EUC_JP | Japanese Combined (895 + 952 + 896 + 953) |
EUC_TW | Taiwan Extended UNIX® Code (CNS 11643-1986), Combined (367 + 960 + 961) |
UCS2 | UCS-2 (Really UTF-16 BE) |
CP037 | IBM® EBCDIC US English |
CP037_S390 | IBM EBCDIC US English LF & NL reversed |
CP256 | IBM EBCDIC Netherlands |
CP259 | IBM EBCDIC Symbols Set 7 |
CP273 | IBM EBCDIC German |
CP274 | IBM EBCDIC Belgium |
CP275 | IBM EBCDIC Brazil |
CP276 | IBM EBCDIC French-Canada |
CP277 | IBM EBCDIC Danish |
CP278 | IBM EBCDIC Swedish |
CP280 | IBM EBCDIC Italian |
CP282 | IBM EBCDIC Portugal |
CP284 | IBM EBCDIC Latin American Spanish |
CP285 | IBM EBCDIC UK English |
CP290 | IBM EBCDIC Japanese Katakana |
CP297 | IBM EBCDIC French |
CP420 | IBM EBCDIC Arabic |
CP421 | IBM EBCDIC Maghreb/French |
CP423 | IBM EBCDIC Greek |
CP424 | IBM EBCDIC Latin/Hebrew |
CP437 | MS-DOS US English |
CP500 | IBM EBCDIC 500V1 |
CP708 | Arabic (ASMO 708) |
CP709 | Arabic (ASMO 449+, BCON V4) |
CP710 | Arabic (Transparent Arabic) |
CP720 | Arabic (Transparent ASMO) |
CP737 | reek (formerly 437G) |
CP770 | Lithuanian Standard RST 1095-89 |
CP771 | KBL (Lithuanian and Russian characters) |
CP772 | Lithuanian Standard LST 1284:1993 |
CP773 | Lithuanian (Mix of 771 and 775) |
CP774 | Lithuanian Standard 1283:1993 |
CP775 | Baltic |
CP776 | Lithuanian 770 extended |
CP777 | Lithuanian 771 extended |
CP778 | Lithuanian 775 extended |
CP790 | Mazovia (Polish + codepage 437 extended characters |
CP803 | IBM EBCDIC Hebrew (old) |
CP813 | ISO 8859-7 Greek/Latin |
CP819 | ISO 8859-1 Latin Alphabet No. 1 |
CP833 | IBM EBCDIC Korean SBCS |
CP834 | IBM EBCDIC Korean DBCS |
CP835 | IBM EBCDIC Traditional Chinese DBCS |
CP837 | IBM EBCDIC Simplified Chinese DBCS |
CP838 | IBM EBCDIC Thai |
CP850 | MS-DOS Latin 1 |
CP851 | MS-DOS Greek |
CP852 | MS-DOS Slavic (Latin 1) |
CP853 | MS-DOS Turkey Latin 3 (replaced by Latin 5) |
CP855 | IBM Cyrillic (primarily Russian) |
CP856 | PC Hebrew |
CP857 | IBM Turkish (Latin 5) |
CP860 | MS-DOS Portuguese |
CP861 | MS-DOS Icelandic |
CP862 | Hebrew (Migration) |
CP863 | MS-DOS Canadian-French |
CP864 | PC Arabic |
CP865 | MS-DOS Nordic |
CP866 | MS-DOS Russian |
CP868 | MS-DOS Urdu |
CP869 | IBM Modern Greek |
CP870 | IBM EBCDIC Multilingual Latin 2 |
CP871 | IBM EBCDIC Icelandic |
CP872 | PC Cyrillic with Euro update |
CP874 | MS-DOS Thai, superset of TIS 620 |
CP875 | IBM EBCDIC Greek |
CP878 | KOI-R (Cyrillic) |
CP880 | Cyrillic Multilingual |
CP899 | PC Symbols |
CP905 | IBM EBCDIC Turkey Latin 3 (replaced by Latin 5) |
CP912 | ISO 8859-2; ROECE Latin-2 Mulitlingual |
CP913 | ISO 8859-3 Southeast European |
CP914 | ISO 8859-4 Baltic |
CP915 | ISO 8859-5; Cyrillic; 8-bit ISO |
CP916 | ISO 8859-8; Hebrew |
CP918 | IBM EBCDIC Urdu |
CP920 | ISO 8859-9; Latin 5 |
CP921 | ISO Baltic (8-bit) |
CP922 | ISO Estonia (8-bit) |
CP929 | Thai PC double byte |
CP930 | IBM EBCDIC Japanese Katakana Extended, Combined (290 + 300) |
CP931 | IBM EBCDIC Japanese Latin-Kanji, Combined (037 + 300) |
CP932 | MS Windows® Japanese, superset of Shift-JIS, Combined (897 + 301) |
CP933 | IBM EBCDIC Korean Combined (833 + 834) |
CP934 | Korean PC Combined (891 + 926) |
CP935 | IBM EBCDIC Simplified Chinese, Combined (836 + 837) |
CP936 | MS Windows Simplified Chinese, Combined (903 + 928) |
CP937 | IBM EBCDIC Traditional Chinese, Combined (037 + 835) |
CP938 | Traditional Chinese Combined (904 + 927) |
CP939 | IBM EBCDIC Japanese Latin Extended, Combined (1027 + 300) |
CP942 | MS-DOS Japanese Kana Combined (1041 + 301) |
CP943 | MS-DOS Japanese Combined (1041 + 941) |
CP944 | Korean PC Combined (1040 + 926) |
CP946 | Simplified Chinese PC Combined (1042 + 928) |
CP948 | MS-DOS Traditional Chinese, Combined (1043 + 927) |
CP949 | MS Windows Korean, superset of KS C 5601-1992, Combined (1088 + 951) |
CP950 | MS Windows Traditional Chinese, superset of Big 5, Combined (1114 + 947) |
CP1004 | PC-data Latin-1 extended desktop publishing |
CP1006 | Urdu, 8-bit |
CP1008 | Arabic, 8-bit ISO/ASCII |
CP1025 | IBM EBCDIC Cyrillic |
CP1026 | IBM EBCDIC Turkish |
CP1027 | IBM EBCDIC Japanese Extended Single Byte |
CP1040 | Korean PC extended Single Byte |
CP1041 | Japanese PC extended Single Byte |
CP1043 | Traditional Chinese extended Single Byte |
CP1046 | Arabic |
CP1047 | Latin 1 / Open Systems (US 3270) |
CP1047_S390 | Latin 1 / Open Systems (US 3270) LF & NL reversed |
CP1051 | HP-UX Latin1 |
CP1097 | IBM EBCDIC Farsi |
CP1098 | MS-DOS Farsi |
CP1112 | IBM EBCDIC Baltic Multilingual |
CP1114 | Traditional Chinese Single Byte (IBM Big 5) |
CP1115 | Simplified Chinese Single Byte (IBM GB) |
CP1122 | IBM EBCDIC Estonia |
CP1123 | IBM EBCDIC Cyrillic Ukraine |
CP1124 | Cyrillic Ukraine 8-bit |
CP1130 | IBM EBCDIC Vietnamese |
CP1137 | IBM EBCDIC India |
CP1140 | IBM EBCDIC US (with Euro) |
CP1141 | IBM EBCDIC Germany, Austria (with Euro) |
CP1142 | IBM EBCDIC Denmark (with Euro) |
CP1143 | IBM EBCDIC Sweden (with Euro) |
CP1144 | IBM EBCDIC Italy (with Euro) |
CP1145 | IBM EBCDIC Spain (with Euro) |
CP1146 | IBM EBCDIC UK Ireland (with Euro) |
CP1147 | IBM EBCDIC France (with Euro) |
CP1148 | IBM EBCDIC International Latin1 (with Euro) |
CP1149 | IBM EBCDIC Iceland (with Euro) |
CP1153 | IBM EBCDIC Latin2 (with Euro) |
CP1154 | IBM EBCDIC Cyrillic (with Euro) |
CP1155 | IBM EBCDIC Turkish (with Euro) |
CP1156 | IBM EBCDIC Baltic Multilingual (with Euro) |
CP1157 | IBM EBCDIC Estonia (with Euro) |
CP1158 | IBM EBCDIC Cyrillic Ukraine (with Euro) |
CP1159 | SBCS Traditional Chinese Host (with Euro) |
CP1160 | IBM EBCDIC Thailand (with Euro) |
CP1164 | IBM EBCDIC Vietnamese (with Euro) |
CP1250 | MS Windows Latin 2 (Central Europe) |
CP1251 | MS Windows Cyrillic (Slavic) |
CP1252 | MS Windows Latin 1 (ANSI), superset of Latin1 |
CP1253 | MS Windows Greek |
CP1254 | MS Windows Latin 5 (Turkish), superset of ISO 8859-9 |
CP1255 | MS Windows Hebrew |
CP1256 | MS Windows Arabic |
CP1257 | MS Windows Baltic Rim |
CP1258 | MS Windows Vietnamese |
CP1279 | Hitachi Japanese Katakana Host |
CP1361 | MS Windows Korean (Johab) |
CP1381 | MS-DOS Simplified Chinese Combined (1115 + 1380) |
CP1383 | China EUC |
CP1386 | GBK Chinese |
CP1392 | Simplified Chinese GB18030 |
CP5026 | IBM EBCDIC Japan Katakana-Kanji Combined (290 + 300) |
CP5028 | Japan Mixed Combined (897 + 301) |
CP5031 | IBM EBCDIC Simplified Chinese Combined (836 + 837) |
CP5033 | IBM EBCDIC Traditional Chinese Combined (037 + 835) |
CP5035 | IBM EBCDIC Japan Latin Combined (1027 + 300) |
CP5038 | Japan Mixed Combined (1041 + 301) |
CP5045 | Korean PC Combined (1088 + 951) |
CP5050 | Japanese EUC Combined (895 + 952 + 896 + 953) |
CP5488 | Simplified Chinese GB18030 |
CP9125 | IBM EBCDIC Korean Combined (833 + 834) |
EuroShift_JIS | Test code page, Shift-JIS with European characters |
SBCS | Single Byte Code Set |
DBCS | Double Byte Code Set |
MBCS | MultiByte Code Set |
UTF16BE | utf-16 big endian |
UTF16LE | utf-16 little endian |
UTF32BE | utf-32 big endian |
UTF32LE | utf-32 little endian |
HZ | HZ code set |
SCSU | SCSU code set |
ISCII | iscii code set |
UTF7 | utf-7 |
BOCU1 | bocu1 |
UTF16 | utf16 code set |
UTF32 | utf32 code set |
CESU8 | cesu8 code set |
GB18030 | gb18030 code set |