•  


ISO/IEC 8859-8 - Wikipedia Jump to content

ISO/IEC 8859-8

From Wikipedia, the free encyclopedia
ISO-8859-8: Latin/Hebrew
MIME / IANA ISO-8859-8
Alias(es) iso-ir-138, hebrew, csISOLatinHebrew [1]
Language(s) Hebrew , English
Standard ISO/IEC 8859-8, ECMA-121, SI 1311
Classification extended ASCII , ISO 8859
Based on DEC Hebrew (8-bit) , ISO/IEC 8859-1
Other related encoding(s) Windows-1255

ISO/IEC 8859-8 , Information technology ? 8-bit single-byte coded graphic character sets ? Part 8: Latin/Hebrew alphabet , is part of the ISO/IEC 8859 series of ASCII-based standard character encodings . ISO/IEC 8859-8:1999 from 1999 represents its second and current revision, preceded by the first edition ISO/IEC 8859-8:1988 in 1988. It is informally referred to as Latin/Hebrew . ISO/IEC 8859-8 covers all the Hebrew letters , but no Hebrew vowel signs . IBM assigned code page 916 ( CCSIDs 916 and 5012) to it. [2] [3] [4] This character set was also adopted by Israeli Standard SI1311:2002, with some extensions.

ISO-8859-8 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429 . The text is (usually) in logical order, so bidi processing is required for display. Nominally ISO-8859-8 ( code page 28598 ) is for “visual order”, and ISO-8859-8- I ( code page 38598 ) is for logical order. But usually in practice, and required for XML documents, [ citation needed ] ISO-8859-8 also stands for logical order text. The WHATWG Encoding Standard used by HTML5 treats ISO-8859-8 and ISO-8859-8- I as distinct encodings with the same mapping due to influence on the layout direction, but notes that this no longer applies to ISO-8859-6 (Arabic), only to ISO-8859-8. [5]

There is also ISO-8859-8-E which supposedly requires directionality to be explicitly specified with special control characters; this latter variant is in practice unused.

The Microsoft Windows code page for Hebrew, Windows-1255 , is mostly an extension of ISO/IEC 8859-8 without C1 controls, except for the omission of the double underscore, and replacement of the generic currency sign ( ¤ ) with the sheqel sign (?). It adds support for vowel points as combining characters, and some additional punctuation.

Over a decade after the publication of that standard, Unicode is preferred, at least for the Internet [6] (meaning UTF-8 , the dominant encoding for web pages). ISO-8859-8 is used by less than 0.1% of websites. [7]

Code page layout [ edit ]

ISO/IEC 8859-8 [8] [9] [10] [11]
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x
1x
2x   SP   ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~
8x
9x
Ax NBSP ¤ | § ¨ × SHY ® ?
Bx ° ± ² ³ ´ μ · ¸ ¹ ÷ ¼ ½ ¾
Cx
Dx ?
Ex ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Fx ? ? ? ? ? ? ? ? ? ? ? LRM RLM
   Different from both DEC Hebrew (8-bit) and ISO-8859-1 .

FD is left-to-right mark (U+200E) and FE is right-to-left mark (U+200F), as specified in a newer amendment as ISO/IEC 8859-8:1999.

2002 Israeli Standard extensions [ edit ]

Israeli Standard SI1311:2002 matches ISO/IEC 8859-8:1999 except for a number of additional character allocations for the euro sign , new shekel sign and more advanced explicit bidirectional formatting . [12]

SI1311:2002 [12]
0 1 2 3 4 5 6 7 8 9 A B C D E F
Dx ? LRO RLO PDF ?
Ex ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Fx ? ? ? ? ? ? ? ? ? ? ? LRE RLE LRM RLM
   Absent from ISO/IEC 8859-8:1999, added in SI1311:2002.

See also [ edit ]

References [ edit ]

  1. ^ Character Sets , Internet Assigned Numbers Authority (IANA), 2018-12-12
  2. ^ "Code page 916 information document" . Archived from the original on 2017-02-16.
  3. ^ "CCSID 916 information document" . Archived from the original on 2014-11-29.
  4. ^ "CCSID 5012 information document" . Archived from the original on 2016-03-27.
  5. ^ van Kesteren, Anne . "9. Legacy single-byte encodings" . Encoding Standard . WHATWG . Note: ISO-8859-8 and ISO-8859-8- I are distinct encoding names, because ISO-8859-8 has influence on the layout direction. And although historically this might have been the case for ISO-8859-6 and "ISO-8859-6- I " as well, that is no longer true.
  6. ^ John, Nicholas A. (2013). "The Construction of the Multilingual Internet: Unicode, Hebrew, and Globalization" . Journal of Computer-Mediated Communication . 18 (3): 321?338. doi : 10.1111/jcc4.12015 . ISSN   1083-6101 . Background: the problem of Hebrew and the Internet
  7. ^ "Usage Statistics of ISO-8859-8 for Websites, January 2019" . w3techs.com . Retrieved 2019-01-17 .
  8. ^ Code Page CPGID 00916 (pdf) (PDF) , IBM
  9. ^ Code Page CPGID 00916 (txt) , IBM
  10. ^ International Components for Unicode (ICU), ibm-916_P100-1995.ucm , 2002-12-03
  11. ^ International Components for Unicode (ICU), ibm-5012_P100-1999.ucm , 2002-12-03
  12. ^ a b Standards Institution of Israel . ISO-IR-234: Latin/Hebrew character set for 8-bit codes (PDF) . ITSCJ/ IPSJ .

External links [ edit ]

- "漢字路" 한글한자자동변환 서비스는 교육부 고전문헌국역지원사업의 지원으로 구축되었습니다.
- "漢字路" 한글한자자동변환 서비스는 전통문화연구회 "울산대학교한국어처리연구실 옥철영(IT융합전공)교수팀"에서 개발한 한글한자자동변환기를 바탕하여 지속적으로 공동 연구 개발하고 있는 서비스입니다.
- 현재 고유명사(인명, 지명등)을 비롯한 여러 변환오류가 있으며 이를 해결하고자 많은 연구 개발을 진행하고자 하고 있습니다. 이를 인지하시고 다른 곳에서 인용시 한자 변환 결과를 한번 더 검토하시고 사용해 주시기 바랍니다.
- 변환오류 및 건의,문의사항은 juntong@juntong.or.kr로 메일로 보내주시면 감사하겠습니다. .
Copyright ⓒ 2020 By '전통문화연구회(傳統文化硏究會)' All Rights reserved.
 한국   대만   중국   일본