Urdu/Other encodings

From the LDC Language Resource Wiki

(Difference between revisions)
Jump to: navigation, search
(New page: Category:Urdu:Other Encodings The following is based on an LDC analysis done in 2005. ==Microsoft == Urdu may be encoded using the Microsoft encoding for Arabic, which is Code Page 1...)
m
 
(9 intermediate revisions not shown)
Line 1: Line 1:
-
[[Category:Urdu:Other Encodings]]
+
[[Main_Page|Home]] > [[Urdu]] > [[Urdu/Other encodings|Other encodings]]
 +
 
 +
 
The following is based on an LDC analysis done in 2005.
The following is based on an LDC analysis done in 2005.
Line 11: Line 13:
==IBM==
==IBM==
-
There are two IBM encodings for Urdu, IBM CP918 and IBM CP1006.
+
There are three IBM encodings for Urdu: IBM CP868, IBM CP918, and IBM CP1006.
-
Java supports both of these encodings.
+
Java supports all three:
 +
 
 +
{| border=1
 +
! Code Page<br>Number !! [http://www-01.ibm.com/software/globalization/cp/cp_cpgid.html IBM description] !! [http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html Java description]
 +
|-
 +
| [http://www-01.ibm.com/software/globalization/cp/cp00868.html 868] || Urdu - Personal Computer || MS-DOS Pakistan
 +
|-
 +
| [http://www-01.ibm.com/software/globalization/cp/cp00918.html 918] || Urdu Bilingual || IBM Pakistan (Urdu)
 +
|-
 +
| [http://www-01.ibm.com/software/globalization/cp/cp01006.html 1006] || Urdu, 8-Bit || IBM AIX Pakistan (Urdu)
 +
|}
==Other encodings==
==Other encodings==
-
Two other encodings are known, but we do not know whether they are in use:
+
Two other encodings have been proposed, but we do not know whether they are in use:
-
*'''Urdu Zabta Takhti (UZT)''', an 8-bit encoding proposed by the Urdu Standards Committee, which appears to be authorized by the Government of Pakistan. We do not know whether it is actually in use.
+
*'''Urdu Zabta Takhti (UZT)''', an 8-bit encoding proposed by the Urdu Standards Committee, which appears to be authorized by the Government of Pakistan. See Hussain, Sarmad, & M. Afzal (2001): "Urdu Computing Standards: Urdu Zabta Takhti (UZT) 1.01." (IEEE INMIC 2001. PDF from [http://www.crulp.org/Publication/papers/2001/uzt1.01.pdf CRULP], [http://ieeexplore.ieee.org/stampPDF/getPDF.jsp?tp=&arnumber=995341 IEEE].)
-
**Hussain, Sarmad, & M. Afzal. 2001. Urdu Computing Standards: Urdu Zabta Takhti (UZT) 1.01. IEEE INMIC 2001. PDF from [http://www.crulp.org/Publication/papers/2001/uzt1.01.pdf CRULP], [http://ieeexplore.ieee.org/stampPDF/getPDF.jsp?tp=&arnumber=995341 IEEE].
+
* '''Perso-Arabic Standard for Computer Information Interchange (PASCII)'''. This is the Indian government standard for the scheduled languages written in Arabic-based writing systems, the counterpart to [[ISCII]].  Note that, although ISCII was originally intended to include the languages written in Perso-Arabic writing systems, this was never implemented. There is no systematic correspondance between ISCII and PASCII encodings of the alphabets. PASCII agrees with ISCII in the encoding of characters outside the Arabic alphabet.
* '''Perso-Arabic Standard for Computer Information Interchange (PASCII)'''. This is the Indian government standard for the scheduled languages written in Arabic-based writing systems, the counterpart to [[ISCII]].  Note that, although ISCII was originally intended to include the languages written in Perso-Arabic writing systems, this was never implemented. There is no systematic correspondance between ISCII and PASCII encodings of the alphabets. PASCII agrees with ISCII in the encoding of characters outside the Arabic alphabet.
 +
 +
 +
[[Category:Urdu|Other Encodings]]
 +
[[Category:Additional language-specific]]

Latest revision as of 07:15, 14 May 2010

Home > Urdu > Other encodings


The following is based on an LDC analysis done in 2005.

Microsoft

Urdu may be encoded using the Microsoft encoding for Arabic, which is Code Page 1256. This encoding can be converted to UTF-8 using either of the GNU programs iconv or recode. The necessary commands are:

iconv -f CP1256 -t UTF-8 < InputFileName > OutputFileName
recode -CP1256..UTF8 < InputFileName > OutputFileName

IBM

There are three IBM encodings for Urdu: IBM CP868, IBM CP918, and IBM CP1006. Java supports all three:

Code Page
Number
IBM description Java description
868 Urdu - Personal Computer MS-DOS Pakistan
918 Urdu Bilingual IBM Pakistan (Urdu)
1006 Urdu, 8-Bit IBM AIX Pakistan (Urdu)

Other encodings

Two other encodings have been proposed, but we do not know whether they are in use:

  • Urdu Zabta Takhti (UZT), an 8-bit encoding proposed by the Urdu Standards Committee, which appears to be authorized by the Government of Pakistan. See Hussain, Sarmad, & M. Afzal (2001): "Urdu Computing Standards: Urdu Zabta Takhti (UZT) 1.01." (IEEE INMIC 2001. PDF from CRULP, IEEE.)
  • Perso-Arabic Standard for Computer Information Interchange (PASCII). This is the Indian government standard for the scheduled languages written in Arabic-based writing systems, the counterpart to ISCII. Note that, although ISCII was originally intended to include the languages written in Perso-Arabic writing systems, this was never implemented. There is no systematic correspondance between ISCII and PASCII encodings of the alphabets. PASCII agrees with ISCII in the encoding of characters outside the Arabic alphabet.
Personal tools