??????????????
[good ?]
?????????
???? ??????????? ????????????
?????????
(writting system) ????????????? ????????? ?????????????????? ????????????? ????????????? ?????????????
????????
(industry standard) ?????????????
Universal Character Set
?????????? ????????????? ???????????? ??? ??? ?????????? ?????? ????????????? ??????????????? ?????????? ????????? encoding ????????????
character encoding
??????? ??????? ???????????? (???? ??????????? ?????????)? ????????????
computer file
????? ????? ?????????????? (???????????
normalization
????? ????????? ???????????? rendering ???? ??????????? ??????? ????? ????? ????????? ?????) ?????????
[?]
???????? ?????????? ???????
Unicode Consortium
??? ???????? ?????????????????????? ??????????????? ??????? character encoding ?????????? ???????? ???????????????????? ??????????? ???????? ?
Unicode Transformation Format
(UTF) scheme ???????? ??????? encoding ???????????
???????
(multilingual) ????? ?????????????????
??????? ???????? ???????? ? ???????????????????
computer software
???
internationalization and localization
?????????? ??????? ?????????????????? ????????????????? ??? ??????????????
XML
?
Java programming language
?
Microsoft .NET Framework
????? ????????
Operating System
???????? ??????????????????
????????????????? ??????????????
character encoding
???????? ????????????????? ?????????????????
UTF-8
(1 byte ??? ASCII ?????? ????? ????? ? byte ??? ???? ????????? ????? ?????? ?????)? ????????????
UCS-2
(2 bytes ??? ?????? ???????????? ???????? ???????? ?????? ??????????) ?????
UTF-16
(? bytes ???????? UCS-2 ??? ???????? ??????????????? ???????? ?????? ??????? ?????????) ??????????
???? ????? ?????????????????????
[
??????????
]
??????????? ???????? ??????????????
?????? ????????????
(character encoding) ?????
ISO 8859
?? ???? ??? ?????????????????? ?????????????? ISO 8859 ?? ??? ????????????????????? ???????????????? ??? ??????????????????????? ??????????????? ?????????? (incompatible) ??????????????? ?????????? ?????????? ?????? ????????????? ???????? ???????? ????????????? ???? ???????????? (???????????????
Roman characters
???? ?????) ???? ???????????? ???????????
??????????? ????????????
?????????
(glyph) ??????? ??? ??????
??????
(character) ???? ?
graphemes
and grapheme-like units ? ??? ???????????????? ??????????? ??????
??????
????
?????????
????????????????? ?????????
????? ???????????? (text processing) ???? ??????????? ????????????????? ??????????
??????
(code point) ? ????????? ???? ??????????? ? ?????? ??????????? ??????????????? ??????????? ????????????? ?????????? (an abstract way) ?????????????????? ?????????? ?????????? (?????? ????
??????
(????) ???????)
software
(web browser, word processor) ??????????? ?????????????? ?????????????????? ?????????????? ????????????? ??????????? ???????? ????????????????? ???????? ???????????? ??????????? ???????????????? ??????????? ???????????????????????????????
????????? ??? ??? ???????
ISO 8859-1
????? ???????????? ??????? ??????????????? ??????????? ??????? ??????? ?????????? ????? ????????????? ?????? ?????????????? ??????????????? ???????
?????? ????????????
????? ???????????? ???????? ??????????? ????????? ?????? ??????????? ????????????? ???????????? ????????????
???????? ?????????? ???? ??
Xerox
?? Joe Becker ?????
Apple
?? Lee Collins and Mark Davis ????? ???????????????? ?????? ???? ??????????????? ???? ??????????????
[?]
??????? ?????????? Joe Becker ??? "international/multilingual text character encoding system, tentatively called Unicode." ????????? ?????????????
Unicode 88
??????????????? ?????????? ??
???
(bit) ??????? ?????? ??????????
Unicode is intended to address the need for a workable, reliable world text encoding. Unicode could be roughly described as "wide-body ASCII" that has been stretched to 16 bits to encompass the characters of all the world's living languages. In a properly engineered design, 16 bits per character are more than sufficient for this purpose.
??????????????? ??? ?? ??? (bit) ??????? ?????? ????????? ??? ???????????? ????????? ????????? ?????????????? ???????? ???????????? ?????????????
Unicode gives higher priority to ensuring utility for the future than to preserving past antiquities. Unicode aims in the first instance at the characters published in modern text (e.g. in the union of all newspapers and magazines printed in the world in 1988), whose number is undoubtedly far below 2
14
= 16,384. Beyond those modern-use characters, all others may be defined to be obsolete or rare; these are better candidates for private-use registration than for congesting the public list of generally-useful Unicodes.
?? ?????????????? ???????????? ??????????? ???????????? ???????????? ???????????? ??????????????????
?????????? ????????
applications
??????? ??????????? ?????????? ????????? ??????????????
??????????? ??????? ???????????? ???????????? ????????
??????
(script) ??????? ?????????
[good ?]
alphabets, abugidas and syllabaries ???????? ?????? ?? ????? ??? ?????? ???????????????????????? ????????? ??????????? ?????????????? ???? ?????? ?????????? ??? ?????? ?????????? ?????????????? ?????? ?????????? ???????? ??????????? ????????? ???????????? ??? ???? ???????????
Michael Everson
, Rick McGowan, and Ken Whistler ???????????? ???????? ??????? ??????? (The Unicode Roadmap Committee) ??? ??????? ?????? ????? ?????? ???????????? ???????
???????? ?????
(Unicode Consortium) ????????????????
???????? ???????
????? ??????????? ????? ????????????? ??????????? ??????? ?????????????? ????? ?????????? ????????? ??????????
???????? ?.? ???? ????????????? ????????????? Avestan, Egyptian Hieroglyphics, Tai Tham, Tai Viet, Imperial Aramaic, Inscriptional Pahlavi, Inscriptional Parthian, Javanese, Kaithi, Lisu, Meetei Mayek, Nu Shu, Old South Arabian, Old Turkic, Samaritan ???? Tangut ?????????? ????? ????????????? Yi, ???? Uyghur and Oracle Bone Script ???? ??????????
??????????????????? ???????????????????? ??????????? ???????? ????????????? ConScript Unicode Registry ???? Private Use Area ??????????? ??????????
?????????????
???????
Unicode Consortium
??? ???????? ????????? ?????????????? ???????? ????????? ??????? ??????? ??????? ????????? ????????? ?????????? ???????? ?????????? ??? ?????????????? ????????? ?????????????? ???? ?????????
software
?????
hardware
?????????????? ???????
Adobe
Systems,
Apple
,
Google
,
HP
,
IBM
,
Microsoft
,
Sun Microsystems
?????
Yahoo
???? ??????????
???????????????????? ???????? ?? (
ISBN 0-321-18578-1
) ??? ???? ???? ??????????? ?????? ??????? ??????????????? ????????? ????? ???????? ?? ? (
ISBN 0-321-48091-0
) ???????? ???? ???? ????????????? ????????? ??????????? ???????? ?? ?.? ??? ???? ???? ? ???? ??????????????? ?????????????? ????????????
????????????? ??????? ????????? ???? ???????? ?? ?????????????? ???????????? ???????? ?? ?.? ???? ????????????? ???????? ?? ?.? ???? ????????????? Myanmar2? ???????? ?? ?.? ???? ?????????????????? ?????????????
Myanmar3
???????????
????????
|
??????
|
??????
|
Corresponding
ISO/IEC 10646
Edition
|
Scripts
|
Characters
|
#
|
Notable additions
|
1.0.0
|
October 1991
|
ISBN 0-201-56788-1
(Vol.1)
|
|
24
|
7,161
|
Initial repertoire covers these scripts:
Arabic
,
Armenian
,
Bengali
,
Bopomofo
,
Cyrillic
,
Devanagari
,
Georgian
,
Greek and Coptic
,
Gujarati
,
Gurmukhi
,
Hangul
,
Hebrew
,
Hiragana
,
Kannada
,
Katakana
,
Lao
,
Latin
,
Malayalam
,
Oriya
,
Tamil
,
Telugu
,
Thai
, and
Tibetan
.
[?]
|
1.0.1
|
June 1992
|
ISBN 0-201-60845-6
(Vol.2)
|
|
25
|
28,359
|
The initial set of 20,902
CJK Unified Ideographs
is defined.
[?]
|
1.1
|
June 1993
|
|
ISO/IEC 10646-1:1993
|
24
|
34,233
|
4,306 more
Hangul
syllables added to original set of 2,350 characters.
Tibetan
removed.
[?]
|
2.0
|
July 1996
|
ISBN 0-201-48345-9
|
ISO/IEC 10646-1:1993 plus Amendments 5, 6 and 7
|
25
|
38,950
|
Original set of
Hangul
syllables removed, and a new set of 11,619 Hangul syllables added at a new location.
Tibetan
added back in a new location and with a different character repertoire. Surrogate character mechanism defined, and Plane 15 and Plane 16
Private Use Areas
allocated.
[?]
|
2.1
|
May 1998
|
|
ISO/IEC 10646-1:1993 plus Amendments 5, 6 and 7, and two characters from Amendment 18
|
25
|
38,952
|
Euro
sign added.
[?]
|
3.0
|
September 1999
|
ISBN 0-201-61633-5
|
ISO/IEC 10646-1:2000
|
38
|
49,259
|
Cherokee
,
Ethiopic
,
Khmer
,
Mongolian
,
Myanmar
,
Ogham
,
Runic alphabet
,
Sinhala
,
Syriac
,
Thaana
,
Unified Canadian Aboriginal Syllabics
, and
Yi Syllables
added, as well as a set of
Braille
patterns.
[?]
|
3.1
|
March 2001
|
|
ISO/IEC 10646-1:2000
ISO/IEC 10646-2:2001
|
41
|
94,205
|
Deseret
,
Gothic
and
Old Italic
added, as well as sets of symbols for
Western music
and
Byzantine music
, and 42,711 additional
CJK Unified Ideographs
.
[?]
|
3.2
|
March 2002
|
|
ISO/IEC 10646-1:2000 plus Amendment 1
ISO/IEC 10646-2:2001
|
45
|
95,221
|
Philippine
scripts
Buhid
,
Hanunoo
,
Tagalog
, and
Tagbanwa
added.
[??]
|
4.0
|
April 2003
|
ISBN 0-321-18578-1
|
ISO/IEC 10646:2003
|
52
|
96,447
|
Cypriot syllabary
,
Limbu
,
Linear B
,
Osmanya
,
Shavian
,
Tai Le
, and
Ugaritic
added, as well as
Hexagram symbols
.
[??]
|
4.1
|
March 2005
|
|
ISO/IEC 10646:2003 plus Amendment 1
|
59
|
97,720
|
Buginese
,
Glagolitic
,
Kharoshthi
,
New Tai Lue
,
Old Persian
,
Syloti Nagri
, and
Tifinagh
added, and
Coptic
was disunified from
Greek
. Ancient Greek numbers and musical symbols were also added.
[??]
|
5.0
|
July 2006
|
ISBN 0-321-48091-0
|
ISO/IEC 10646:2003 plus Amendments 1 and 2, and four characters from Amendment 3
|
64
|
99,089
|
Balinese
,
Cuneiform
,
N'Ko
,
Phags-pa
, and
Phoenician
added.
[??]
|
5.1
|
April 2008
|
|
ISO/IEC 10646:2003 plus Amendments 1, 2, 3 and 4
|
75
|
100,713
|
Carian
,
Cham
,
Kayah Li
,
Lepcha
,
Lycian
,
Lydian
,
Ol Chiki
,
Rejang
,
Saurashtra
,
Sundanese
, and
Vai
added, as well as sets of symbols for the
Phaistos Disc
,
Mahjong tiles
, and
Domino tiles
. There were also important additions for
Myanmar
, additions of letters and
Scribal abbreviations
used in medieval
manuscripts
, and the addition of
capital ß
.
[??]
|
?????????? (mapping) ????? ???????????? (encoding)
[
??????????
]
?????????? ???????? ????????????? ?????????????? ???????????? ??????? ?????????? ?????????????? ?????????????? ?????? ?????????????????????
source code
???????????? ????? ??????? ???????????? ???????????? ????????? ?????????
Unicode Transformation Format (???????? ???????????????) ????? Universal Character Set (?????? ?????? ???????)
[
??????????
]
????????? ?????????? ?????????????
Unicode Transformation Format
(UTF) ???????????? ?????
Universal Character Set
(UCS) ???????????? ???? ???????? ???????????? ?????????? (encoding maps) ???? ????????
??????
?? ????????? ???????? ???? ??? ??? ????????????? (code value) ?? ??????????? ???????????? ???????? ????????? UTF ???????????? ???? ????????????? ?????? ??? (bit) ???? ???????? ??? ?????????? UCS ???????????? ???? ????????????? ?????? ????? (byte) ???? ???????? ??? ??????????
UTF ??????????????????? -
- UTF-1 ? ???????????????? UTF-8 ????, maximizes compatibility with ISO 2022, no longer part of The Unicode Standard
- UTF-7 ? ????????????? 7-bit ????????????
- UTF-8 ? 8-bit ????? ???????????????? ???????????? ????? ASCII ???? ???????????
?????? ????????
???????
- UTF-EBCDIC ? 8-bit ????? ???????????????? ???????????? ????? EBCDIC ????? ?????????? EBCDIC (?????????? ?????)
- UTF-16 ? 16-bit ????? ???????????????? ????????????
- UTF-32 ? 16-bit ????? ?????? ????????????
UTF-8 ??? ?????? ???????? ? ?? ? ????? ??? ??????? ?????????? ASCII ????? ????????????????? ???????? ????? ???????????? (
de facto
standard) ???????? ??????? ???????
Linux distribution
???????? ??? ???????????? ???? ??????????? ????????
???????? (ready-made) ????? ????????? ??????
[
??????????
]
???????????? ?????? ????????????? ?????????????????? ????????? ?????????? ???????? ????????????? ??????????????? ?????????? ??????????????
????????? ???????????? ???????????
(combining diacritical marks) ?????????? ???????????????? ?????? ?????? ????????? ???????????? ???????? (????
?????? ????????
???? ??????????????)? ???????????? ?????????????? ??????????????????????? ????????????? ?????????? ??????? ???? ???? ????? ????????? ?????????? ????????????? ??????????? e ??? ???????? U+0065 (???????????? ???? e) ???? ???????? U+0301 (????????) ??????? ?????????? ??????????? U+00E9 ????? ?????????? ??????????????? ??????????? ?????????????? ????????????? ???????? ???????????? ?????????????? ????????? ????????????
?????? ???????
(canonical equivalence) ??????? ???????
??????????????? (ligatures)
[
??????????
]
?????? ??????????????? ????????? ?????????? ??????????? ???? ????????????? ??????? (orthographic rule) ??????? ???????????????? ??????? ??? ?????????? ??????-??????????? (special script-shaping) ??????? ??????????? ???????? ?????????????????? ?????????? ??????-??????????? ??????? ???? ??? DecoType ??????????
?????????
????? Arabic Calligraphic Engine (ACE) ?????
Adobe
?????
Microsoft
???? ??????????
OpenType
?????
SIL International
??
Graphite
????
Apple
??
AAT
???? ???? ?????????? ?????? ?????????? ???????
?????????????
??? Graphite ??????? ???????????
??????????????
myanmar3
?????
?????????????
OpenType
??????? ?????????? ???????? ????????????
operating system
??? ???????????????? ??????? ?????????????? ????????????????? ????????? ????????? ???????????? ??????? ?????????? ????????? ??????? ??? ?????? ????????? ????????? ???? ?????? ???????? ????????? ??????? ??? ???????????????????? ??????????? ????????? ????????????? ?????? ??????? ???????? ????? ????????? ?????????????????? ?????????? ????? ???????????????? ?????????????????? ??????????? ????????????
?????? ????????
??????? ??? ???????????????????? ??????????????? ??????????????? ?????????????
???????? ??????????? ??????
(monospaced font) ?????? ?????????????????????? ???????????????? ????????? ?????????????? ???????????????
???????? ???????????
[
??????????
]
????????? ?????????????
[
??????????
]
??????????? ???????????? ???????????????? ????? ???????????? (storage) ???????????? UCS-2 ???? ????????????? UTF-16 ???? ????????????????? ????????????????
Windows NT
(?????????????
Windows 2000
,
Windows XP
and
Windows Vista
?????????) ???????? ???????? ?????? ??????????????? ?????????????????
Java
????
.NET
bytecode ?????
Mac OS X
????
KDE
???????? ???????? ??? ?????????? ???????????
UTF-8 ???????
Unix-like
operating system ???? ????? ???? ???????????? ???????????? ????????????????
extended ASCII
??? ???????? ???????????????? ???????????????
??????????? ??????? ????????????? (text-rendering engine) ????????? ?????????????
Microsoft Windows
????
Uniscribe
???????????????? Mac OS X ????
ATSUI
???????????????? GTK+ (
GNOME
desktop) ???
Pango
??????????????? ?????????????
???????
??????? ????????????? ??????????????????
?????????????
(operating system) ???????? ?????????????? ????????????? ?????????? ??????? ??????? ??????? ?????????? ???????? ??????????? ????? ??????? ???????? ????????????? ??????????????
ISO 14755
??? ???????? ?????????? ??????? ????????????????? ??????????? ???????????????? ??????????? ??????? ?? ????? ????? ???????? ??????? hex ??????? ???? ???? ??????????????? ??? ????????????????? ????????????????? ????????????? ???????????? ??????????????????
???????? ??????????? ?????????
[
??????????
]
??????????????
(Han unification) (
????? ????
???? ?????? ???? ??????? ??????? ?????????????) ??? ????????? ???? ????????? ????????? ???????? ?????? ?????????????? ???????? ??????? ???????
Ideographic Rapporteur Group
????? ????????? ??????? ???????????????????? ????? ?????????????
??????????? ???? ????? ??????????
????????
?? ?????????????? ????????????? ???????????? ?????? ???????? ?????????? ???????????? ??????????? ????? ????????? ????? ????? ??????? ???????????????? ????????? ????????? ???????????????? ???????????????????? ????????????????? ?????? ?????? ????????? ??????????? ??????????????? ?????????????? ??? ??????????????? ????? ??????? ???????????? ?????????????? ???????? ????????????????? ???????? ??????????
TRON
?????
UTF-2000
????????????
???? ???????????? ???????????? ????????? ?? ??? ??????????????????? ??? ???????????? ????????? ?? ??? ??????????? ?????? ?????? ????????? ????? ???????? ?????????? ?????? ????????????????????
????????? ??????????
[
??????????
]
?????????????? ???????????? ????????? ?????????? ????????????????????? ????????? ??????????? ???? ??????????????? ???? ?????? ??????? ???? ???????? ??????? ???????????????? ? ??? ?????? ???????? ???????? ???????? e ??????? ??????????????? ?????????????????? ??????????? ??? ?????? ??????????????? ????????? ??????????????? ????????? ??????????????? ???????? ??????????????? ????????? ???????
?????? ????????
?????????????? ?????
Graphite
?
OpenType
?????????
AAT
??????? ?????????? ????????????????????? ??????????????????????
??????? ????? -
<ref>
tags exist for a group named "good", but no corresponding
<references group="good"/>
tag was found