 |
Internationalization (i18n)
|
|
¿Qué? for AUTOMATICALLY identifying language and character
encoding in software applications
For Customer Relations Management, Knowledge Management,
Spell Check, Database or Automated Translation, ¿Qué? is
the BEST solution there is for adding value to your software
applications.
In this age of globalization, more and more companies are
doing business planet wide and in a growing number of languages.
The Internet explosion confronts us daily with documents
written in a host of languages. Because, even though the
predominance of English is evident, many tens of languages
are now making their presence felt on the Internet and,
of course, on local intranets. So it's becoming necessary
to be able to identify the language and character set of
these documents in order to handle them appropriately. ¿Qué?,
our language and character set identifier, is an invaluable
component of any internationalized information retrieval,
machine translation, spell check or even voice synthesis
system. The applications are limitless.
Of all the identifiers on the market, ¿Qué? is the fastest
and most complete. It can now identify correctly 29
languages and more than 101 language/character set pairs.
Other languages that are less widely used are available
on order.
Based on revolutionary technology, our identifier is intended
primarily for applications designers and for integrators.
The identification system can be obtained under licence
to be incorporated into products such as Web browsers, word
processors, customer relations management systems, search
engines and more.
You can
try it out right away by filling out this form.
Thanks to extremely accurate state-of-the-art technology,
the identification system needs only about 100 bytes to
identify a document's language and character set. The system
recognizes the language and the character set at the same
time, unlike other software that recognizes the language
of a document by searching for commonly occurring letter
strings. Bear in mind, however, that "letters" as such do
not exist in various written languages, which makes it impossible
to identify particular letter strings.
|
|
 |
Advantages of Alis' Professional
Services:
- Performs fast and accurately
- Covers a large number of languages
and encoding
- Is multi-platform
- Recognizes both language and
encoding, unlike other products resources
- Needs only 100 or more characters
to recognize language and character sets
- Reduces the need for human
intervention to classify multilingual documents
|
|
Simplified
illustration of the process ¿Qué? uses to recognize a language
and character set
|
|
Language and character set recognition can be used for
many applications. A computer system for handling Internet
documents should, in principle, be able to identify each
document's language and how it is encoded. If it cannot
recognize the character set, it will not be able to display
or print it correctly. Similarly, if it cannot detect the
language of documents in a documentation library, it will
have difficulty in performing an intelligent search. For
example, when a user searches the French word lime (a tool),
the system should not return documents that contain the
English word lime (a fruit). Following the same logic, if
it cannot identify a document's language, it will be not
easily be able to carry out a machine translation or voice
synthesis.
¿Qué? is available as a static or dynamic library for the
following systems: AIX, HP-UX, Linux, Macintosh, SGI, Solaris
and Windows. A description of the API and a list of questions
(FAQ) are also available. Contact
Alis for this information.
|
|
Other internationalization
products and
services:
- Batam
TM: Internationalization library
- FlorèsTM:
Character set converter
|
|
Combinations of languages and encoding supported by ¿Qué?
| Language |
ISO
639
code |
Recognized
character sets |
| Albanian |
sq |
cp1252,
cp850, Macintosh, utf8 |
| German |
de |
cp1252,
cp850, Macintosh, utf8 |
| English |
en |
cp1252,
utf8 |
| Arabic |
ar |
cp1256,
ISO 8859-6, utf8 |
| Basque |
eu |
cp1252,
cp850, Macintosh, utf8 |
| Bulgarian |
bg |
cp1251,
ISO 8859-5, utf8 |
| Chinese |
zh |
gb2312,
hz, big5, utf8 |
| Korean |
ko |
ks
c 5601, ISO 2022-kr, utf8 |
| Croatian |
sh |
cp1250,
ISO 8859-2, Macintosh-Croat, utf8 |
| Danish |
da |
cp1252,
cp850, Macintosh, utf8 |
| Spanish |
es |
cp1252,
cp850, Macintosh, utf8 |
| Estonian |
et |
ISO
8859-4, utf8 |
| Finnish |
fi |
cp1252,
cp850, Macintosh, utf8 |
| French |
fr |
cp1252,
cp850, Macintosh, utf8 |
| Greek |
el |
cp1253,
cp869, ISO 8859-7, Macintosh-Greek, utf8 |
| Hebrew |
he |
ISO
8859-8 |
| Hungarian |
hu |
cp1250,
cp852, utf8 |
| Italian |
it |
cp1252,
cp850, Macintosh, utf8 |
| Japanese |
ja |
euc-jp,
ISO 2022-jp, shift-jis, utf8 |
| Malay |
ms |
cp1252,
cp850, Macintosh, utf8 |
| Dutch |
nl |
cp1252,
cp850, Macintosh, utf8 |
| Norwegian |
no |
cp1252,
cp850, Macintosh, utf8 |
| Polish |
pl |
cp1252,
ISO 8859-2, utf8 |
| Portuguese |
pt |
cp1252,
cp850, Macintosh, utf8 |
| Russian |
ru |
cp1251,
ISO 8859-5, koi8-r, utf8 |
| Swedish |
sv |
cp1252,
cp850, Macintosh, utf8 |
| Czech |
cs |
cp1250,
ISO 8859-2, utf8 |
| Thai |
th |
tis
620, utf8 |
| Turkish |
tr |
cp853,
ISO 8859-9, utf8 |
|
|
Applications already using ¿Qué?:
- Customer Relations Management
- Knowledge Management
- Search engines
- Internet browsers
- Web filters
- Machine translation
|
|
|
|
|