This
language recognition chart
presents a variety of clues one can use to help determine the language in which a text is written.
The language of a foreign text can often be identified by looking up characters specific to that language.
- ABCDEFGHIJKLMNOPQRSTUVWXYZ (
Latin alphabet
)
- and no other ?
English
,
Indonesian
,
Latin
,
Malay
,
Swahili
,
Zulu
- aaeeeiijou ?
Dutch
(Except for the ligature ij, these letters are very rare in Dutch. Even fairly long Dutch texts often have no
diacritics
.)
- aeeeeiiioouu
Afrikaans
- eouu ?
West Frisian
- ÆØAæøa ?
Danish
,
Norwegian
- single
diacritics
, mostly
umlauts
- AOao ?
Finnish
(BCDFGQWXZAbcfgqwxza are found only in names and loanwords, occasionally also ????)
- AAOaao ?
Swedish
(occasionally e)
- AOOUaoou ?
Estonian
(BCDFGQWXYZcfqwxyz are found only in names and loanwords, occasionally also ????)
- AOU?aouß ?
German
- Circumflexes
- CEI?Ucei?u ?
Kurdish
- ?AI???ai?? ?
Romanian
- AEIOU??AEIIaeiou??aeii ?
Welsh
; (OU?YAEIOU??AEOU??ou?yaeiou??aeou?y used also but much less commonly)
- ???????????? ?
Esperanto
- Three or more types of diacritics
- C??O?Uc?ıo?u ?
Turkish
- AÐEIOUYÞÆOaðeiouyþæo ?
Icelandic
- AÐIOUYÆØaðiouyæø ?
Faroese
- AEIOO?UU?aeioo?uu? ?
Hungarian
- ACEEIOOUUIaceeioouui· ?
Catalan
- AAÆCEEEEIIOŒUUU?aaæceeeeiioœuuuy ?
French
; (? and y are found only in certain proper names)
- AACEEIOOUEUIaaceeiooueui (· only in Gascon dialect) ?
Occitan
- AEIOUAEOAaocaeiouaeoa (u Brazilian and k, w and y not in native words) ?
Portuguese
- AEINOUUaeinouu ¡¿ ?
Spanish
- AEEIOUaeeiou ?
Italian
- AEIOUYA??O??NG?aeiouya??o??ng? -
Guarani
(the only language to use g?)
- A???E???I???Ł? a???e???i???ł? (FQRVfqrv not in native words) ?
Southern Athabaskan languages
- ?ł?o?
Lechitic languages
- A, ?, A, B, C, D, E, E, E, F, G, H, I, J, K, L, Ł, M, N, ?, O, O, O, O, P, R, S, T, U, U, W, Y, Z, ? ?
Kashubian
- ???
- đA?AA???????A?????E??E?E?????I??I?O?OO?O???????????U??U??????????Y? đa?aa???????a?????e??e?e?????i??i?o?oo???????????u??u??????????y???
Vietnamese
- ? ? ? ? ? ? May be seen in some
Japanese
texts in
R?maji
or transcriptions (see below) or
Hawaiian
and
M?ori
texts.
- e ?
Sundanese
- n -
Basque
- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Arabic script
- Brahmic family
of scripts
- Bengali script
- ? ? ?? ?? ?? ? ?? ? ?? ? ?? ? ?? ? ?? ? ?? ? ?? ?? ???? ?? ?? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ?? ?? ? ? ? ? ? ? ? ? ? ?
- used to write
Bengali
and
Assamese
.
- Devan?gar?
- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ?? ?? ?? ?? ??
- used to write, either along with other scripts or exclusively, several
Indian
languages including
Sanskrit
,
Hindi
,
Maithili
,
Magahi
Marathi
,
Kashmiri
,
Sindhi
,
Bhili
,
Konkani
,
Bhojpuri
and
Nepali
from
Nepal
.
- Gurmukhi
- Gujarati script
- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ???
- used to write
Gujarati
and
Kachchi
- Tibetan script
- АБВГДЕЖЗИКЛМНОПРСТУФХЦЧШ (
Cyrillic alphabet
)
- ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ αβγδεζηθικλμνξοπρσ?τυφχψω (
Greek Alphabet
) ?
Greek
- ?????????????????????? (
Hebrew alphabet
)
- and maybe some odd dots and lines above, below, or inside characters ?
Hebrew
- ??; dots/lines below letters appearing
only
with ?,?, and ? ?
Yiddish
- no dots or lines around the letters, and more than a few words end with ? (i.e., they have it at the leftmost position) ?
Aramaic
- Ladino
- 漢字文化圈 ? Some East Asian Languages
- 위키百科에 (note commonplace ellipses and circles)
Korean
- ??????????? etc. -- ??????????? (
Bopomofo
)
- ????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ?????????? (
Khmer alphabet
) -
Khmer
- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (
Armenian alphabet
) ?
Armenian
- ? ? ?? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (
Georgian alphabet
) ?
Georgian
- ?????????????????????????????????????????????????????????????????????? (
Thai script
) -
Thai
- AEIOUHKLMNPW'
Hawaiian alphabet
- Hawaiian
- ??????????????????????????????????????????????????????
Tifinagh
, a script used for
Tamazight
(Berber)
Lots of
Latin
roots.
- Accented letters:
a c e e e i o u
, rarely
e i
?;
u
only in the word
ou
,
a
only at the ends of a few words (including
a
). Never
a i i o o u
.
- Angle quotation marks: ≪?≫ (though "curly-Q" quotation marks are also used); dialogue traditionally indicated by means of dashes.
- Common short words:
la
,
le
,
les
,
un
,
une
,
des
,
de
,
du
,
a
,
au
,
et
,
ou
,
ou
,
sur
,
il
,
elle
,
ils
,
se
,
je
,
vous
,
que
,
qui
,
y
,
en
,
si
,
ne
,
est
,
sont
,
a
,
ont
.
- Many apostrophised contractions for common pronouns and particles, i.e. words
l
'
or
d
'
, less often
c
'
,
j
'
,
m
'
,
n
'
,
s
'
,
t
'
, or rarely
z
'
? only before a word starting by a vowel or, in some cases, an
h
.
- Common digraphs and trigraphs:
- Vowels digraphs:
au
,
ai
,
ei
,
ou
. Word-final
-ez
.
- Vowels digraphs (nasals):
an
,
en
,
in
,
on
, rarely
un
. For all of these, the
n
become
m
before
b
,
p
or
m
(e.g.
embouchure
, never *
enbouchure
).
- Vowel trigraphs:
eau
,
ein
,
ain
,
oin
.
- Consonant digraphs:
ch
,
gu-
. Rarely
sh
. Semi-consonant
-ill-
.
- Letters
w
and
k
, are rare and used only in loanwords, most often from Germanic languages (e.g
whisky
).
- Ligatures
œ
and
æ
are conventional but are rarely used (a few words are well known, e.g.
œil
,
œuf(s)
,
bœuf(s)
, most other are scientific/technical and borrowed from Latin).
- Words ending in
-aux
,
-eux
, or
-oux
.
- Characters: ¿ ¡ (inverted question and exclamation marks), n
- All vowels (a, e, i, o, u) may take an acute accent
- The letter
u
can take a diaeresis (u), but only after the letter g
- Some words frequently used: de, el, del, los, la(s), uno(s), una(s), y
- No apostrophised contractions
- No use of grave accent
- Letters
k
and
w
are rare and only used in loanwords (e.g.
walkman
)
- Word beginnings: ll- (check not Welsh or Catalan) double L (ll)
- Word endings: -o, -a, -cion, -miento, -dad
- Angle quotation marks: ≪?≫ (though "curly-Q" quotation marks are also used); dialogue often indicated by means of dashes
- Almost every native word ends in a vowel. Example exceptions include
non
,
il
,
per
,
con
,
del
.
- Common one-letter word:
e
.
- Common word:
perche
.
- Letter sequences:
gli
,
gn
,
sci
.
- Letters
j
,
k
,
w
,
x
and
y
are rare and used only in loanwords (e.g.
whisky
).
- Word endings:
-o
,
-a
,
-zione
,
-mento
,
-ta
,
-aggio
.
- Grave accent (e.g., on a) almost always occurs in the last letter of words.
- Double consonants (
tt
,
zz
,
cc
,
ss
,
bb
,
pp
,
ll
, etc.) are frequent.
- Characters: a, e, e, i, i, o, o, u, u, c, ·
- Character combination
tz
(also common in Basque, however) and
l·l
- Syllables and words ending in
-aig
,
-eig
,
-oig
,
-uig
,
-aix
,
-eix
,
-oix
,
-uix
- Letter sequences:
tx
(also common in Basque, however) and
tg
- Letter
y
is only used in the combination
ny
and loanwords
- Letters
k
and
w
are rare and only used in loanwords (e.g.
walkman
)
- Word endings:
-o
,
-a
,
-es
,
-cio
,
-tat
,
-ment
- Word beginning:
ll-
(also common in Spanish and Welsh, however)
- Common words:
aixo
,
amb
,
mateix
,
tots
,
que
- Characters: ? a i ? ?
- Common words: ?i, de, la, a, ai, ale, alor, cu
- Word endings: -a, -?, -u, -ul, -ului, -?ie (or -?iune), -ment, -tate; names ending in -escu
- Double and triple i: copii, copiii
- Note that Romanian is sometimes written online with no diacritics, making it harder to identify. A cedilla is sometimes used on S (?) and on T (?) instead of the correct diacritic, the comma (above).
- Characters: a, o, a, e, o, a, e, i, o, u, a, c
- Common one-letter words: a, a, e, e, o
- Common two-letter words: ao, as, as, da, de, do, em, os, ou, um
- Common three-letter words: aos, com, das, dos, ele, ela, mas, nao, por, que, sao, uma
- Common endings: -cao, -dade, -ismo, -mente
- Common digraphs: ch, nh, lh; examples: chave, galinha, baralho.
- The letters k, w and y are rare. They are found mostly in loanwords, e.g.:
keynesianismo
,
walkie-talkie
,
nylon
.
- Most singular words end in a vowel, l, m, r, or z.
- Plural words end in -s.
- Characters: a, e, e, e, i, o, u
- Common digraphs and trigraphs: ai, ae, en, -jh-, tch, oe, -nn-, -nnm-, xh, ou
- Common one-letter words: a, a, e, i, t', l', s', k'
- Common two-letter words: al, as, li, el, vs, ki, si, po, pa, po, ni, en, dj'
- Common three-letter words: dji, nen, ren, ben, pol, mel
- Common endings: -aedje, -mint, -xhmint, -es, -ou, -owe, -yi, -acion
- Apostrophes are followed by a space (preferably non breaking one), eg:
l'?ome
instead of
l'ome
.
- Similar to Portuguese; the indefinite article "unha" (fem. plural), the suffix -cion and a heavier usage of the letter "x" usually sign Galician.
- Definite articles o (masc. sing.), os (masc. plural), a (fem. sing.), as (fem. plural)
- Common diagraphs: nh (ningunha)
- The letters j, k, w and y are not in the alphabet, and appear only in loanwords
- words:
a
,
an
,
and
,
in
,
of
,
on
,
the
,
that
,
to
,
is
,
what
,
I
(
I
is always capital when talking about oneself)
- letter sequences:
th
,
ch
,
sh
,
ough
,
augh
,
qu
- word endings:
-ing
,
-tion
,
-ed
,
-age
,
-s
,
-’s
,
-’ve
,
-n’t
,
-’d
- vast majority of words end with a consonant, or sometimes with an e. Some common exceptions:
who,
to,
so,
no,
do,
a,
and a few names like
Julia.
- diacritics or accents only in loanwords (pinata)
- letter sequences
ij
(capitalized as
IJ
, and also found as a ligature,
IJ
or
ij
),
ei
,
ou
,
au
,
oe
, doubled vowels (but not
ii
),
kw
,
ch
,
sch
,
oei
,
ooi
,
aai
and
uw
(especially
eeuw
,
ieuw
,
auw
, and
ouw
).
- all consonants, except
h
,
j
,
q
,
v
,
w
,
x
and
z
can be doubled.
- the letters
c
(except in the sequence
(s)ch
),
q
,
x
and
y
are almost only found in loanwords.
- words:
het, op, en, een, voor
(and compounds of
voor
).
- word endings:
-tje
,
-sje
,
-ing
,
-en
,
-lijk
,
- at the start of words:
z-, v-, ge-
- t/m
occasionally occurs between two points in time or between numbers (e.g. house numbers).
- letter sequences:
ij
,
ei
,
oa
- words: yn
- Words:
'n
,
as
,
vir
,
nie
.
- Similar to
Dutch
, but:
- the common Dutch letters
c
and
z
are rare and used only in loanwords (e.g.
chalet
);
- the common Dutch vowel
ij
is not used; instead,
i
and
y
are used (e.g.
-lik
,
sy
);
- the common Dutch word ending
-en
is rare, being replaced by
-e
.
- umlauts (a, o, u), ess-zett (ß)
- letter sequences:
ch
,
sch
,
tsch
,
tz
,
ss
,
- common words:
der
,
die
,
das
,
den
,
dem
,
des
,
er
,
sie
,
es
,
ist
,
ich
,
du
,
aber
- common endings:
-en
,
-er
,
-ern
,
-st
,
-ung
,
-chen
,
-tat
- rare letters:
x
,
y
(except in loanwords)
- letter
c
rarely used except in the sequences listed above and in loanwords
- long compound words
- a period (.) after ordinal numbers, e.g.
3. Oktober
- many capitalised words in the middle of sentences since German capitalizes all nouns.
- letters a, a, o, rarely e
- common words:
och
,
i
,
att
,
det
,
en
,
som
,
ar
,
av
,
den
,
pa
,
om
,
inte
,
men
- common endings:
-ning
,
-lig
,
-isk
,
-ande
,
-ade
,
-era
,
-rna
- common surname endings:
-sson
,
-berg
,
-borg
,
-gren
,
-lund
,
-lind
,
-strom
,
-kvist/qvist/quist
- long compound words
- letter sequences:
stj
,
sj
,
skj
,
tj
,
ck
,
an
- no use of characters
w
,
z
except for foreign proper nouns and some loanwords but
x
is used, unlike Danish and Norwegian, which replace it with
ks
- doubling of consonants common, but doubling of vowels very rare
- letters æ, ø, a
- common words:
af, og, til, er, pa, med, det, den
;
- common endings:
-tion
,
-ing
,
-else
,
-hed
;
- long compound words;
- no use of character
q
,
w
,
x
and
z
except for foreign proper nouns and some loanwords;
- to distinguish from Norwegian: uses letter combination
øj
; frequent use of
æ
; spellings of borrowed foreign words are retained (in particular use of
c
), such as
centralstation
.
- doubling of consonants common, but doubling of vowels very rare
- letters æ, ø, a
- common words:
av, ble, er, og, en, et, men, i, a, for, eller
;
- common endings:
-sjon
,
-ing
,
-else
,
-het
;
- long compound words;
- no use of character
c
,
w
,
z
and
x
except for foreign proper nouns and some loanwords;
- two versions of the language:
Bokmal
(much closer to Danish) and
Nynorsk
? for example
ikke, lørdag, Norge
(Bokmal) vs.
ikkje, laurdag, Noreg
(Nynorsk); Nynorsk uses the word
og
; printed materials almost always published in Bokmal only;
- to distinguish from Danish: uses letter combination
øy
; less frequent use of
æ
; spellings of borrowed foreign words are ‘Norsified’ (in particular removing use of
c
), such as
sentralstasjon
.
- doubling of consonants common, but doubling of vowels very rare
- letters
a, ð, e, i, o, u, y, þ, æ, o
- common beginnings:
fj-
,
gj-
,
hj-
,
hl-
,
hr-
,
hv-
,
kj-
, and
sj-
,
- common endings:
-ar
(especially
-nar
),
-ir
(especially
-nir
),
-ur
,
-nn
(especially
-inn
)
- no use of character
c
,
q
,
w
, or
z
except for foreign proper nouns, some loanwords, and, in the case of
z
, older texts.
- doubling of consonants common, but doubling of vowels very rare
- letters
a, ð, i, o, u, y, æ, ø
- letter combinations:
ggj
,
oy
,
skt
- to distinguish from Icelandic: does not use e or þ, uses ø instead of o (occasionally rendered as o on road signs, or even ?).
- doubling of consonants common, but doubling of vowels very rare
- uses
diacritics
: ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?
- does not have letters: q, w, x, y
- no longer uses ? or ? in modern language
- extremely rare doubling of
vowels
- rare doubling of
consonants
- a period (.) after ordinal numbers, e.g.
2005. gads
- common words:
ir
,
bija
,
tika
,
es
,
vi??
- visual abundance of letters ?, ?, ?, ?, ?, ?, ?, ?, ?
- does not have letters q, w, x
- extremely rare doubling of
vowels
and
consonants
- many varying forms (usually endings) of the same word, e.g. namas, namo, namus, namams, etc.
- generally long words (absence of articles and fewer prepositions in comparison to Germanic languages)
- common words:
ir
,
yra
,
kad
,
bet
.
- consonant clusters
rz, sz, cz, prz, trz
- includes: ?, ?, ?, ?, ł, ?, o, ?, ?
- words
w, z, we, i, na
(several one-letter words)
- words
jest, si?
- words beginning with
był, b?dzie, jest
(forms of
copula
by?
, "to be").
- visual abundance of letters
? ? ? ? ?
- words
je, v
- to distinguish from Slovak: does not use a, ?, ?, ? or o; u only appears at the beginning of words.
- visual abundance of letters
? ? ?
;
- uses: a, ?, and o and (very rarely) ? and ?;
- typical suffixes:
-cia
,
-?
;
- to distinguish from Czech: does not use ?, ? or ?.
- similar to Serbian
- letters-digraphs
d?, lj, nj
- does not have q, w, x, y
- typical suffixes:
-ti
,
-?i
- special letters: ?, ?, ?, ?, đ
- common words: a, i, u, je
- to distinguish from Serbian: sequences
-ije-
and
-je-
are common; verbs ending in
-irati
,
-iran
- similar to Croatian
- letters-digraphs
d?, lj, nj
(lj and nj are somewhat more common than d?, although not by much)
- no q, w, x, y
- typical verb suffixes
-ti
,
-?i
(infinitive is much less used than in Croatian)
- foreign words might end in
-tija
,
-ovan
,
-ovati
,
-uje
- special letters: đ (rare), ?, ? (common), ?, ? (less common)
- common words: a, i, u, je, jeste
- future tense
suffix
-i?e
,
-i?e?
,
-i?emo
,
-i?ete
(not found in Croatian)
- vowel sequences
-ije-
and
-je-
are very often in Serbian that is spoken in Bosnia and Herzegovina, Montenegro and Croatia (ijekavica), but it does not appear in Serbia because each of those sequences are substituted with
-e-
(ekavica).
- uses ?, ?, ?, ?, ?, ?
- does not use Щ, Ъ, Ы, Ь, Э, Ю, Я, Ё, ?, ?, ?, ?, ?
- to distinguish from Macedonian: does not use ?, ?, ?
- letters
?, ?
used in Welsh
- words
y, yr, yn, a, ac, i, o
- letter sequences
wy, ch, dd, ff, ll, mh, ngh, nh, ph, rh, th, si
- letters not used:
k, q, v, x, z
- letter only used rarely, in loanwords:
j
- commonly accented letters:
a, e, i, o, u, ?, ?
, although acute (
´
), grave (
`
), and dieresis (
¨
) accents can hypothetically occur on all vowels
- word endings:
-ion, -au, -wr, -wyr
- y
is the most common letter in the language
- w
between consonants (
w
in fact represents a vowel in the Welsh language)
- circumflex accent (
^
) is by far the commonest diacritical mark, although diacritics are often omitted altogether
- vowels with acute accents:
a e i o u
- words beginning with letter sequences
bp dt gc bhf
- letter sequences
sc cht
- no use of the letter J, K, Q, V, W.
- frequent bh, ch, dh, fh, gh, mh, th, sh
- to distinguish from (Scottish) Gaelic: there may be words or names with the second (or even third) letter capitalized instead of the first:
hEireann
.
- vowels with grave accents:
a e i o u
(
e
and
o
still occasionally seen but usage is now discouraged)
- letter sequences
sg chd
- frequent bh, ch, dh, fh, gh, mh, th, sh
- to distinguish from Irish: prefixes are hyphenated, so capitals in the middle of words generally do not occur:
an t-Oban
.
- unique letters:
e
,
c
.
- e
is the most common letter in the language.
- the letter
w
is not used except in loanwords.
- dh
,
gj
,
ll
,
nj
,
rr
,
sh
,
th
,
xh
, and
zh
are considered one letter instead of two.
- common words: po, jo, dhe, i, te, me
- unique letters: ?, ?, ħ, għ, ħ, ?
- semitic origin, fairly intelligible with Arabic
- uses il-xxx for the definite article
- uses circumflex ( ^ ): e, i, u and cedilla ( ¸ ): c, ?
- the word
xwe
(oneself, myself, yourself etc.) appears frequently and is highly specific (
xw
combination)
- ( I, i ) is the most common letter in the language
- uses eight vowels (a, e, e, i, i, o, u, u)
- impossible to find a word without any vowel
- has lots of compound words
- distinct letters
a
,
a
and
o
; but never
o
or
u
(
y
takes the place of
u
)
- b
,
f
,
z
,
?
and
?
appear in
loanwords
and
proper names
only; the last two are substituted with
sh
or
zh
in some texts
- c
,
q
,
w
,
x
,
a
appear in (typically foreign) proper names only
- outside of loanwords,
d
appears only between vowels or in
hd
- outside of loanwords,
g
only appears in
ng
- outside of loanwords, words do not begin with two consonants; this is reflected in the general syllable structure, where consonant clusters only occur across syllable boundaries, except in some loanwords
- common words:
sina
,
on
- common endings:
-nen
,
-ka
/
-ka
,
-in
,
-t
(plural suffix)
- common vowel combinations:
ai
,
uo
,
ei
,
ie
,
oi
,
yo
,
ai
- unusually high degree of letter duplication, both vowels and consonants will be geminated, for example
aa
,
ee
,
ii
,
kk
,
ll
,
ss
,
yy
,
aa
- frequent long words
- distinct letters:
o
,
a
,
o
and
u
; but never
ß
or
a
- similar to Finnish, except:
- letter
y
is not used, except in loanwords (
u
is the corresponding vowel)
- letters
b
and
g
(without preceding
n
) are found outside of loanwords
- occasional use of
?
and
?
, mainly in loanwords (plus combination
t?
)
- loanwords more common generally than in Finnish, mainly loaned from German
- words end in consonants more frequently than in Finnish, word-final
b
,
d
,
v
being particularly typical
- letter
d
is much more common in Estonian than in Finnish, and in Estonian it is often the last letter of the word (plural suffix), which it never is in Finnish
- double
oo
more common than in Finnish; other doubles can include
oo
,
uu
, rarely
hh
(for German
ch
) and even
??
- common words:
ja
,
on
,
ei
,
ta
,
see
,
voi
.
- letters ? and ? (
double acute accent
) unique to Hungarian
- accented letters
a
and
e
frequent
- letter combinations:
cs, dz, dzs, gy, ly, ny, sz, ty, zs
(all classed as separate letters),
leg?, ?obb
(note:
sz
also common in
Polish
)
- common words:
a, az, ez, egy, es, van, hogy
- letter
k
very frequent (plural suffix)
- long polysynthetic words (a single word can number 30+ letters)
- relatively abundant
n
,
q
(not necessarily followed by
u
),
u
- ubiquitous double consonants and vowels (
aa
,
ii
,
qq
,
uu
, more rarely
ee
,
oo
)
- vowels
a
,
i
,
u
conspicuously more frequent than
e
,
o
(which are only found before
q
and
r
)
- no diphthongs except occasional word-final
ai
, only consonant combinations besides double consonants and
(n)ng
consist of
r
+ consonant
- old spellings (now abolished in spelling reform) sometimes included acute accent, circumflex and/or tilde:
Qanaq
vs.
Qaanaaq
.
- vowels with acute accent,
ogonek
(nasal hook), or both: a, ?, ??
- doubled vowels: aa, aa, ??, ????
- slashed
l
: ł (check not Polish!)
- n
with acute accent: ?
- quotation mark: ' or ’
- sequences: dl, tł, tł’, dz, ts’, ii, aa, aa
- may have rather long words
In addition to the above,
- does
not
use
u
,
u
, or
?
In addition to the above,
- uses: u, u, ?
- does
not
use
o
,
o
, or
?
- lots of tildes over vowels (including y) and n
- tilde over g: g??it's the only language in the world to use it. Example words:
hag?a
and
g?uah?
.
- b, d, and g usually do not occur without m or n before (mb, nd, ng) unless they're Spanish loan words.
- f, l, q, w, x, z extremely rare outside loan words
- does not use c without h: ch
- words:
desu, aru, suru
, esp. at end of sentences;
- word endings:
-masu, -masen, -shita
;
- letters: Japanese almost always alternates between a consonant and a vowel. Exceptions are
digraphs
shi
and
chi
,
affricate
tsu
,
gemination
(two of the same consonant in a row) and
palatalization
(a consonant followed by the letter
y
).
- a macron or circumflex may be used to indicate doubled vowels, eg.
T?ky?
- common words:
no, o, wa, de, ni
(Note: Romaji is not often used in Japanese script. It is most often used for foreigners learning the pronunciation of the Japanese language.)
- Almost all written words are quite short (one syllable).
- Syllables (unless they are pronounced with mid tone) end in a tone letter: one of
b s j v m g d
, leading to apparent "consonant clusters" such as
-wj
- w
can be the main vowel of a syllable (e.g.
tswv
)
- Syllables can begin with sequences such as
hm-, ntxh-, nq-
.
- Syllables ending in double vowels (especially
-oo, -ee
) possibly followed by a tone letters (as in
Hmoob
"Hmong").
- Roman characters with more than one diacritical mark on the same vowel. See
above
.
- Almost all written words are quite short (one syllable, mostly less than six characters long).
- Words beginning with
ng
or
ngh
- Words ending with
nh
- common words:
cai, khong, co, ?, c?a, va, t?i, v?i, đ?, đa, s?, đang, toi, b?n, chung, la
Vietnamese Quoted-Readable (
VIQR
)
[
edit
]
- The following characters (often in combination) after vowels: ^ ( + ' `?? ~ .
- DD, Dd, or dd
- The following character before punctuation: \
Vietnamese
VNI
encoding
[
edit
]
- The digits 1-8 after vowels
- The digit 9 after a D or d
- The following character before numbers: \
- The following characters after vowels: s f r x j
- The following vowels, doubled up: a e o
- The letter
w
after the following characters: a o u
- DD, Dd, or dd
Chinese, Romanized
[
edit
]
- In general, Mandarin syllables end only in vowels or n, ng, r; never in p, t, k, m
- Words beginning with x, q, zh
- Tone marks on vowels, such as ?, a, ?, a
- For convenience while using a computer, these are sometimes substituted with numbers, e.g. a1, a2, a3, a4
- Words do not begin with
b, d, g, z, q, x, r
- Words beginning with
hs
- Many hyphenated words
- Apostrophes after initial letters or digraphs, e.g.
t'a, ch'i
- Many unusual vowel combinations such as ae, eei, ii, iee, oou, yy, etc.
- Insertion of r, e.g. arn, erng, etc.
- Words ending in nn, nq
- Many hyphenated words.
- Words can end in p, t, k, m, n, ng, h; never r
- Roman characters with many diacritical marks on vowels. Unlike Vietnamese, each character has at most one such mark.
- Unusual combining characters, namely · (middle dot, always after
o
) and | (vertical bar). ? (
macron
) is also common.
May contain the following:
Prefixes:
me-, mem-, memper-, pe-, per-, di-, ke-
Suffixes:
-kan, -an, -i
Others (these almost always written in lowercase):
yang, dan, di, ke, oleh, itu
Malay
and
Indonesian
are mutually intelligible to proficient speakers, although translators and interpreters will generally be specialists in one or other language. See
Comparison of Standard Malay and Indonesian
.
Frequent use of the letter 'a' (comparable to the frequency of the English 'e').
Most Polynesian languages use A E F G H I K L M N O P R S T U V and
?
(sometimes written ' or Q)
- Consonants always separated by one or more vowels (
fenua
,
Ha?apai
,
?olelo
)
- Short and long vowels, written either with a macron (?????) or by replication (aa, ee, ii, oo, uu)
- Frequent diphtongs (
oiaue
,
m?ori
)
- Words always end with a vowel
- Loanwords are translitterated (like in Japanese):
Sesu Kilisito
=Jesus Christ,
polokalama
=program)
- Frequent English or French loanwords (depending on colonial history)
Tongan
(lea fakatonga)
[
edit
]
- A E F H I K L M N NG O P S T U V ?
- ng (
Tonga
), h, endings in -onua (
fonua
)
- article
te
- frequent words: 'o, te, ki, mei, i, faka-
- English loanwords
- A E F G I L M N O P S T U V ?
- no K letter, uses okina (?) or nothing instead (
faka
in Tongan is
fa?a
in Samoan)
- frequent use of L (
le
)
- frequent words:
o
,
e
,
le
,
se
,
a
,
i
,
ma
- A E F G H I K L M N O P S T U V ?
- distinguish from Tongan: g instead of ng (
tokaga
)
- article
te
- h is more frequent than s (
tahi
)
- frequent words: ko, te, ki, mai, i, o, ne'e, e, mo, faka-
- French loanwords
- A E F G H I K L M N O P S T U V ?
- article
le
- frequent words: ko, le, ki, mei, i, o, mo, faka-
- distinguish from Wallisian: S is more frequent than H (
tasi
)
- distinguish from Samoan: letter K
- French loanwords
Note that some Turkic languages like
Azeri
and
Turkmen
use a similar
Latin alphabet
(often
Jaŋalif
) and similar words, and might be confused with Turkish.
Azeri has the letters ??, Xx and Qq not present in the Turkish alphabet, and Turkmen has Aa, ??, ??, Yy and Ww.
Latin Characters uniquely (or nearly uniquely) used for Turkic languages: ??, Ŋŋ, ??, Ьь, ??, ??, ?, and ı.
All Turkic languages can form long words by adding multiple suffixes.
Turkish Alphabet
[
edit
]
Lowercase: a b c c d e f g ? h ı i j k l m n o o p r s ? t u u v y z
Uppercase: A B C C D E F G ? H I ? J K L M N O O P R S ? T U U V Y Z
Common words
[
edit
]
- bir
? one, a
- bu
? this
- ancak
? but
- oldu
? was (happened)
- ?u
? that
Misc.
[
edit
]
- The letter "j" is only used in loanwords.
- Words never begin with "?"
- Look for common word endings. Tense changes in Turkish verbs are created by adding suffixes to the end of the verb. Pluralizations occur by adding
-lar
and
-ler
.
- Common Tense Changes:
-yor
-mı?
-mu?
-sun
- Possessivity/person:
-im
-un
-ın
-in
-iz
-dur
-tır
- Example:
Yap
tı
, "[He] did it";
Yap
is the verb stem meaning "to do",
-mı?
indicates the perfect tense,
-tır
indicates the third person (he/she/it).
- Example:
Ada
lar
, "Islands";
Ada
is a noun meaning "island",
-lar
makes it plural.)
- Example:
Ev
imiz
, "Our house";
Ev
is a noun meaning "house",
-im
indicates the first-person possessor, which
-iz
then makes plural.)
Azeri can be easily recognized by the frequent use of
?
. This letter is not used in any other officially recognized modern Latin alphabet. In addition, it uses the letters
x
and
q
, which are not used in Turkish.
- Common words:
v?
,
ki
,
il?
,
bu
,
o
,
is?
,
gor?
,
da
,
d?
- Frequent use of diacritics:
c
,
?
,
ı
,
?
,
o
,
?
,
u
- Words ending in
-lar
,
-l?r
,
-ın
,
-in
,
-da
,
-d?
,
-dan
,
-d?n
- Words never beginning with
?
or
ı
- Words rarely beginning with two or more consonants
- Transliteration of foreign words and names, e.g.
Audrey Hepburn
=
Odri Hepbern
- No spaces, except between punctuation marks and (sometimes) foreign words.
- Arabic numerals (0-9) sometimes used
- Punctuation:
- Period 。(not .)
- Serial comma 、(distinguished from the regular comma ,)
- Ellipse …… (six dots)
- No
hiragana
,
katakana
, or
hangul
- May be written vertically
Note: Many characters were not simplified. As a result, it is common for a short word or phrase to be identical between Simplified and Traditional, but it is rare for an entire sentence to be identical as well.
Common radicals different between Traditional and Simplified:
- Simplified:
?????
(e.g.
? ? ? ? ?
)
- Traditional:
????門
(e.g.
語 銀 飯 紀 問
)
Common characters different between Traditional and Simplified:
- Simplified:
? ? ? ? ? ? ? ? ? ? ? ? 万 ? ? ? ? ?
- Traditional:
國 會 這 來 對 開 關 門 時 個 書 長 萬 邊 東 車 愛 兒
Standard written Chinese (based on Mandarin) vs written Vernacular Cantonese
[
edit
]
Note: Apart from Hong Kong, there are also Cantonese-speakers in southern Mainland China, Malaysia and Singapore
[1]
, so written Cantonese can be written in either Simplified or Traditional characters.
Common characters in Vernacular Cantonese that do not occur or seldom occur in Mandarin:
Some of the above characters are not supported in all character encodings, so sometimes the 口 radical on the left is substituted with a
0
or
o
, e.g.
Sometimes, different Chinese characters are used to express the same meaning in Cantonese and Mandarin. If you use the one commonly used in Cantonese to express the same meaning when you are speaking or writing Mandarin, a native speaker may be confused or even find it difficult to understand, and vice versa. Some examples are:
(Cantonese vs Mandarin)
- 食vs吃(eat) ?vs喝(drink) 企vs站(stand) 凍vs冷(cold) 落vs下(down) 著vs穿(wear) 讀vs?(read) 鬧vs罵(scold) 計vs算(calculate) ?vs別(do not) 行vs走(walk/go) 先vs才(then)
There are Chinese words used to construct vocabularies used in Cantonese that are not or seldomly implemented in modern Mandarin. Some examples are:
(Cantonese vs Mandarin)
- 成日vs整天(always) 傾計vs聊天(talk) 返工vs上班(go to work) 溫書vs溫習(study) 影片vs視頻(video) 隔離vs旁邊(nearby) 起屋vs蓋樓(build a house) 聽日vs明天(tomorrow) 巴閉vs?張(arrogant) ??vs完成(finished) 定係vs還是(or) ?仔vs帥哥(handsome male) 鍾意vs喜歡(like) 犀利vs?害(powerful) 同埋vs和/及(and) ??vs?的(crazy) 雪櫃vs?箱(fridge)
Cantonese vocabularies constructed by Cantonese words are used in daily life in southern China and are not used in modern Mandarin. Some examples are:
- ??(don't be like this) 好?(ok?) 玩?(to play tricks) 做?(to work) ??(to watch a film/movie) ?知(don't know) 埋?(come) ?個(that) ???(such thing) ??(they) ?事/?事(what?) ??(nothing) ?陣(at that moment) 越?越多(more and more) 我?(mine) 梗係(of course) ??(to peek) ??(love him/her) ??我(take it to me) ??(everything is wasted) ??(you are right) ??住(to cover something) ???(all) ?實(to press something tightly) ?覺(to sleep) ?石仔(to throw a tiny stone) ?[a modal word to express comtemption] ?[a modal word for reminding or warning someone] ?交(to argue) 好?(very angry) 心?(feeling depressed in heart) ??女仔(to please a girl) 得?多?(only this much) 做好?(done something well)
Finally, when terms are introduced from other countries(especially the US and the UK) to China, Cantonese and Mandarin often get different translations, where Cantonese often translates according to pronunciation of the terms in English and Mandarin often translates according to the meaning of the terms. Some examples are:
(Cantonese vs Mandarin)
- 的士(dik1 si2, has no direct meaning, translated according to the English pronunciation.) vs 出租車(ch? z? ch?, meaning cars for renting.), translated from Taxi.
- 巴士(baa1 si2, has no direct meaning, translated according to the English pronunciation.) vs 公車(g?ng ch?, meaning public cars.), translated from Bus.
- 多士(do1 si2, has no direct meaning, translated according to the English pronunciation.) vs 土司(t? s?, has no direct meaning, translated according to the English pronunciation.), translated from Toast.
- 騷(sou1, has no direct meaning, translated according to the English pronunciation.) vs 秀(xiu, has no direct meaning, translated according to the English pronunciation), translated from Show.
- 士多(si2 do1, has no direct meaning, translated according to the English pronunciation) vs 小店(xi?o dian, meaning small shop), translated from Store.
- ??(lip1, has no direct meaning, translated according to the English pronunciation) vs 升降機(sh?ng jiang j?, meaning machine that elevates and lowers itself), translated from Lift/Elevator.
- ?拜(baai1 baai3, has no direct meaning, translated according to the English pronunciation) vs 再見(zai jian, meaning see you again), translated from Byebye/Goodbye.
- Katakana
(カタカナ) and
hiragana
(ひらがな) characters mixed with
kanji
(漢字)
- No spaces
- Number system = Arabic Numerals (1,2,3 etc.)
- Punctuation:
- Period 。
- Comma 、(,also used in double byte)
- Quotation marks 「」
- Occasional small characters beside large ones, eg. しゃ りゅ しょ って シャ リュ ショ ッテ
- Double tick marks (known as daku-on) appearing at upper right of characters, eg. で が ず デ ガ ズ
- Empty circles (known as handaku-on) appearing at upper right of characters, eg. ぱ ぴ パ ぴ
- Frequent characters: の を は が
- Originally written vertically(books, school, etc.) but mostly appears horizontal online.
- Western-style punctuation marks
- Western-style spacing
- Hangul
letters(phonetic) ex: ㅂ(b in book) ㅈ(j in jump) ㅅ(s in sock)ㅊ(ch in champion) ㅍ(p in pox)
- Hangul letters used to form syllable blocks; e.g. ㅅ s + ㅓ o + ㅇ ng = 性 song
- Circles and ellipses are commonplace in Hangul; are exceedingly rare in Chinese.
- General appearance has relatively-uniform complexity, as contrasted with Chinese or Japanese.
Khmer is written using the distinctive
Khmer alphabet
.
- rarely uses spaces
- Letters have a distinctively "taller" shape than other Brahmic scripts.
- Uses
Khmer numerals
in writing ? ? ? ? ? ? ? ? ?.
- Has smaller version of consonants placed below main consonants that may appear clustered
- Has 24 diacritics denoting syllable rhymes - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ?? ? ?? ? ?? ?? ??
- Uses this as a full stop: ?
Modern Greek is written with
Greek alphabet
in
monotonic
,
polytonic
or
atonic
, either according to Demotic (
Mr.
Triantafilidis
) grammar or
Katharevousa
grammar. Some people write in
Greeklish
(Greek with Latin script) which is either Visual-based,
orthographic
or
phonetic
or just messed-up (mixed). The only official orthographic forms of Greek language are Monotonic and Polytonic.
Normal Modern Greek (Greek Monotonic)
[
edit
]
- words
και, ε?ναι
;
- Each multi-syllable word has one accent/tone mark (
oxia
): ? ? ? ? ? ? ?
- The only other diacritic ever used is the trema: ?/?, ?/?, etc.
Pre-1980s Greek (Greek Polytonic)
[
edit
]
Katharevousa
,
Dimotiki
(
Triantafylidis
' grammar)
- Diacritics: ?, ?, ?, ?, and combinations, also with other vowels.
- Some texts, especially in Katharevousa, also have ?, ?, in combination with other diacritics.
Ancient Greek
[
edit
]
- Diacritics: ?, ?, ?, ?, ?, ?, and combinations, also with other vowels; ?; tilde (?) often appears more like a rounded circumflex
- some texts feature lunate sigma (looks like c) instead of σ/?
Greek Atonic
[
edit
]
- Was common in some Greek media (television);
- You will see Greek characters without accents/tones;
- words:
και, ειναι, αυτο
.
- Automated conversion software for Greeklish->Greek conversion exists. If you notice a Greeklish text it may be useful for the Greek el.wikipedia (after conversion).
- Keep in mind: in Greeklish more than one character may be used for one letter. (example: th for
Θ
(
theta
)).
Orthographic Greeklish
[
edit
]
Phonetic Greeklish
[
edit
]
- words
ke, ine
;
- omega appears as o;
- ei, oi appear as i;
- ai appears as e.
Visual-based Greeklish
[
edit
]
- omega (Ω or ω) may appear as W or w;
- epsilon (E) may appear as
3
;
- alpha (A) may appear as
4
;
- theta (Θ) may appear as
8
;
- upsilon (Y) may appear as
\|/
;
- gamma (γ) may appear as
y
- More than one character may be used for one letter.
Messed-up (Mixed) Greeklish
[
edit
]
- words
kai, eine
;
- combines principles of phonetic, visual-based and orthographic Greeklish according to writer's
idiosyncrasy
;
- The most commonly used form of Greeklish.
Armenian can be recognized by its unique 39-letter alphabet:
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??(?) ? ?
Georgian can be recognised by its unique alphabet (note some characters have fallen out of use).
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Bolding denotes letters unique to the language
- uses: ё, ?, й, ?, ы, э, ’
- features:
шч
used instead of
щ
- the only Cyrillic language not to feature и.
- uses: ъ, щ, я, ю, й
- words: със, в
- features: many words end in definite article ?ът, ?ят, ?та, ?то, ?те
- uses: ?, ?, ?, ?,
?
,
?
,
?
- words: во, со
- features:
р
is usually found between consonants, for example
првин
- uses: ё, й, ъ (rarely), ы, э, щ
- uses: ?, ?, ?, ?,
?
,
?
- does not use: ъ, щ, я, ю, й
- words: ?е, у
- features: large consonant clusters, for example
српски
- uses: ?, и, ?, ?, й, ?, ? щ, ’
- does not use: ъ, ё, ы, э
- uses: ?, ?
- does not use: ё, й, к, щ, ъ, ы, ь, ю, я
- used only in names or borrowed words: в, е, з, ф, ц
- All languages using the Arabic alphabet are written right-to-left.
- A number of other languages have been written in the Arabic alphabet in the past, but now are more commonly written in Latin characters; examples include
Turkish
,
Somali
and
Swahili
.
- backwards question mark: ?
- short vowels are not written, so many words are written with no vowel at all
- common prefix: -???
- common suffix: ??-
- words: ???? ??? ???
Except in very rare case, verbs are at the end of a phrase.
- common verbs: ???? ???? ??? ???? ??????
- uses: ?? ?? ?? ?
- words: ??? ??
- uses: ???? ??? ??? ?? ?
- many words ending in ?
- words: ???? ??
- to distinguish from Arabic: in many texts, Urdu is written stylistically with words ‘slanting’ downwards from top-right to bottom-left (unlike the ‘linear’ style of Arabic, Persian etc.).
- short vowels are not usually written so many words are written with no vowel at all
- three styles of writing (estrangela, serto, mahdnaya) and two different ways of representing vowels
- basic alphabet in Estrangela style is: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
- basic alphabet in Serto style is:
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?
- basic alphabet in Madnhaya style is:
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?,
?
?
- All Dravidian languages are written from left to right.
- All Dravidian languages have different scripts. But similarity can be found in their orthography.
Kannada
[
edit
]
- Kannada has a 49 letter alphabet.
Tamil
[
edit
]
- common word endings :?????, ?????, ??????, ??
- common words: ?????, ????, ????, ???
- Tamil has a unique 30-letter alphabet. With the help of diacritics, as many as 247 letters can be written.
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Telugu
[
edit
]
Telugu has 56 characters (Aksharamulu) including vowels (Achchulu) and consonants (Hallulu).
Telugu uses eighteen vowels, each of which has both an independent form and a diacritic form used with consonants to create syllables. The language makes a distinction between short and long vowels.
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ??
? ? ? ? ?
? ? ? ? ?
? ? ? ? ?
? ? ? ? ?
? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
The Bengali alphabet or Bangla alphabet (Bengali: ????? ????????, bangla bornomala) or Bengali script (Bengali: ????? ????, bangla lipi) is the writing system, originating in the Indian subcontinent, for the Bengali language and is the fifth most widely used writing system in the world. The script is used for other languages like Assamese, Maithili, Meithei and Bishnupriya Manipuri, and has historically been used to write Sanskrit within Bengal.
Bengali
[
edit
]
Bengali has unique 50 letter Alphabet.
- The Bengali script has a total of 9 vowel graphemes, each of which is called a ???????? sworoborno "vowel letter". The sworobornos represent six of the seven main vowel sounds of Bengali, along with two vowel diphthongs. All of them are used in both Bengali and Assamese languages.
? ? ? ? ? ? ? ? ? ? ?
- The Bengali script has a total of 39 Consonants. Consonant letters are called ??????????? bænjonborno "consonant letter" in Bengali. The names of the letters are typically just the consonant sound plus the inherent vowel ? o. Since the inherent vowel is assumed and not written, most letters' names look identical to the letter itself (the name of the letter ? is itself gho, not gh).
? ? ? ? ?
? ? ? ? ?
? ? ? ? ?
? ? ? ? ?
? ? ? ? ?
? ? ? ? ?
? ? ?? ?? ??
? ? ? ?
- has 10 diacritics denoting syllable rhymes -
? ? ? ? ? ? ? ? ? ?
Assamese
[
edit
]
- The Assamese script has a total of 9 vowel graphemes, each of which is called a ???????? sworoborno "vowel letter" too.
? ? ? ? ? ? ? ? ? ? ?
- has a total of 39 Consonants. Consonant letters are called ??????????? bænjonborno "consonant letter" in Bengali.
? ? ? ? ?
? ? ? ? ?
? ? ? ? ?
? ? ? ? ?
? ? ? ? ?
? ? ? ? ?
? ? ?? ?? ??
? ? ? ?
- has 10 diacritics denoting syllable rhymes -
? ? ? ? ? ? ? ? ? ?
In modern writing,
Canadian Aboriginal syllabics
are indicative of
Cree languages
,
Inuktitut
, or
Ojibwe
, though the latter two are also written in alternative scripts. The basic glyph set is ? ? ? ? ? ? ? ? ?, each of which may appear in any of four orientations, boldfaced, superscripted, and with diacritics including ? ? ? ? ? ? ? ? ? ? ? ? ?. This
abugida
has also been used for
Blackfoot
.
Cherokee writing features a unique syllabary consisting of the following characters:
????????????????????????????????????????????????????????????????????????????????????.
- words:
de
,
la
,
al
,
kaj
- Six accented letters:
? ? ? ? ? ? ? ? ? ? ? ?
, their corresponding
H-system
representation
ch Ch gh Gh hh Hh jh Jh sh Sh u U
or their corresponding
X-system
representation
cx Cx gx Gx hx Hx jx Jx sx Sx ux Ux
- words ending in
o
,
a
,
oj
,
aj
,
on
,
an
,
ojn
,
ajn
,
as
,
os
,
is
,
us
,
u
,
i
,
a?
- When written in the Latin alphabet Klingon has the unusual property of a distinction in case;
q
and
Q
are different letters, and other letters are either always (e.g. D, I, S) or never (e.g. ch, tlh, v) written in upper case. This causes a large number of words that look quite strange to people who aren't used to it, for example:
yIDoghQo
'
,
tlhIngan Hol
(with mixed case).
- The apostrophe is fairly frequent, especially at the end of a word or syllable.
- Common suffixes:
-be', -'a
'
- Common words:
'
oH
,
Qapla
'
- May use one or more apostrophes in the middle of a word:
SuvwI″a
'
- (almost) all lowercase;
- common words
lo
,
mi
,
cu
,
la
,
nu
,
do
,
na
,
se
;
- paragraphs delimited with
ni'o
and sentences delimited with
.i
(or
i
);
- many five-letter words in consonant-vowel shape
CCVCV
or
CVCCV
;
- many short words with apostrophes between vowels, like
ko'a
pi'o
etc.;
- usually no punctuation except for dots;
- may use commas in the middle of words (typically proper nouns).
- alphabet is all lowercase except loanwords
- no diacritics
- only uses
unvoiced consonants
in writing, e.g.
p
,
t
,
k
Full alphabet:
p, t, k, s, m, n, l, j, w, a, e, i, o, u
- common words
li
,
mi
,
e
,
sina
,
ona
,
jan
- often sounds like a simplified and phonetic form of English or Swedish
- many two-syllable words
- Language Identification Web Service
, language detection API, 100+ languages supported
- Google Translate
, Google's translation service.
- Xerox
, an online language identifier, 47 languages supported
- Language Guesser
, a statistical language identifier, 74 languages recognized
- NTextCat - free Language Identification API for .NET (C#)
: 280+ languages available out of the box. Recognizes language and encoding (
UTF-8
,
Windows-1252
,
Big5
, etc.) of text.
Mono
compatible.