Internationalization (i18n)

¿Qué? for AUTOMATICALLY identifying language and character encoding in software applications

For Customer Relations Management, Knowledge Management, Spell Check, Database or Automated Translation, ¿Qué? is the BEST solution there is for adding value to your software applications.

In this age of globalization, more and more companies are doing business planet wide and in a growing number of languages. The Internet explosion confronts us daily with documents written in a host of languages. Because, even though the predominance of English is evident, many tens of languages are now making their presence felt on the Internet and, of course, on local intranets. So it's becoming necessary to be able to identify the language and character set of these documents in order to handle them appropriately. ¿Qué?, our language and character set identifier, is an invaluable component of any internationalized information retrieval, machine translation, spell check or even voice synthesis system. The applications are limitless.

Of all the identifiers on the market, ¿Qué? is the fastest and most complete. It can now identify correctly 29 languages and more than 101 language/character set pairs. Other languages that are less widely used are available on order.

Based on revolutionary technology, our identifier is intended primarily for applications designers and for integrators. The identification system can be obtained under licence to be incorporated into products such as Web browsers, word processors, customer relations management systems, search engines and more.

You can try it out right away by filling out this form.

Thanks to extremely accurate state-of-the-art technology, the identification system needs only about 100 bytes to identify a document's language and character set. The system recognizes the language and the character set at the same time, unlike other software that recognizes the language of a document by searching for commonly occurring letter strings. Bear in mind, however, that "letters" as such do not exist in various written languages, which makes it impossible to identify particular letter strings.


Advantages of Alis' Professional Services:
  • Performs fast and accurately
  • Covers a large number of languages and encoding
  • Is multi-platform
  • Recognizes both language and encoding, unlike other products resources
  • Needs only 100 or more characters to recognize language and character sets
  • Reduces the need for human intervention to classify multilingual documents


Simplified illustration of the process ¿Qué? uses to recognize a language and character set


Language and character set recognition can be used for many applications. A computer system for handling Internet documents should, in principle, be able to identify each document's language and how it is encoded. If it cannot recognize the character set, it will not be able to display or print it correctly. Similarly, if it cannot detect the language of documents in a documentation library, it will have difficulty in performing an intelligent search. For example, when a user searches the French word lime (a tool), the system should not return documents that contain the English word lime (a fruit). Following the same logic, if it cannot identify a document's language, it will be not easily be able to carry out a machine translation or voice synthesis.

¿Qué? is available as a static or dynamic library for the following systems: AIX, HP-UX, Linux, Macintosh, SGI, Solaris and Windows. A description of the API and a list of questions (FAQ) are also available. Contact Alis for this information.



Other internationalization products and
  • Batam TM: Internationalization library
  • FlorèsTM: Character set converter


Combinations of languages and encoding supported by ¿Qué?

Language ISO 639
Recognized character sets
Albanian sq cp1252, cp850, Macintosh, utf8
German de cp1252, cp850, Macintosh, utf8
English en cp1252, utf8
Arabic ar cp1256, ISO 8859-6, utf8
Basque eu cp1252, cp850, Macintosh, utf8
Bulgarian bg cp1251, ISO 8859-5, utf8
Chinese zh gb2312, hz, big5, utf8
Korean ko ks c 5601, ISO 2022-kr, utf8
Croatian sh cp1250, ISO 8859-2, Macintosh-Croat, utf8
Danish da cp1252, cp850, Macintosh, utf8
Spanish es cp1252, cp850, Macintosh, utf8
Estonian et ISO 8859-4, utf8
Finnish fi cp1252, cp850, Macintosh, utf8
French fr cp1252, cp850, Macintosh, utf8
Greek el cp1253, cp869, ISO 8859-7, Macintosh-Greek, utf8
Hebrew he ISO 8859-8
Hungarian hu cp1250, cp852, utf8
Italian it cp1252, cp850, Macintosh, utf8
Japanese ja euc-jp, ISO 2022-jp, shift-jis, utf8
Malay ms cp1252, cp850, Macintosh, utf8
Dutch nl cp1252, cp850, Macintosh, utf8
Norwegian no cp1252, cp850, Macintosh, utf8
Polish pl cp1252, ISO 8859-2, utf8
Portuguese pt cp1252, cp850, Macintosh, utf8
Russian ru cp1251, ISO 8859-5, koi8-r, utf8
Swedish sv cp1252, cp850, Macintosh, utf8
Czech cs cp1250, ISO 8859-2, utf8
Thai th tis 620, utf8
Turkish tr cp853, ISO 8859-9, utf8


Applications already using ¿Qué?:

  • Customer Relations Management
  • Knowledge Management
  • Search engines
  • Internet browsers
  • Web filters
  • Machine translation