Wiktionary:Spraken
Vörlaag:policy-TT Vörlaag:shortcut
- For a list of all language codes, see Wiktionary:List vun de Spraken.
- For information on how to add or remove a language from Wiktionary, see Wiktionary:Guide to adding and removing languages.
Wiktionary includes many words in many languages. To distinguish languages, Wiktionary gives each a unique name and a unique code, which identify it.
Language names
ännernWiktionary calls each language by a different name; these language names are used in headers, translations tables, lexical categories, appendices, and some other places. Language names are chosen by consensus. Whenever possible, common English names of languages are used, and diacritics are avoided. Attested names (names which meet CFI) are strongly preferred.
When a single language is known by multiple names, only one (the first listed) is used.
When two languages are commonly known by the same name, Wiktionary distinguishes them by using synonyms for one or both, or (rarely) by using appended identifiers. For example, the Ghanan language commonly called "Buli" is referred to as "Buli (Ghana)" on Wiktionary and represented by the code "bwu"; the Indonesian language commonly called "Buli" is referred to as "Buli (Indonesia)" on Wiktionary and represented by the code "bzq". The Indonesian language commonly called "Maba" is referred to as "Maba" on Wiktionary and represented by the code "mqa"; the Chadian language commonly called "Maba" is referred to on Wiktionary as "Bura Mabang" and represented by the code "mde".
Language codes
ännernWiktionary has an intricate system for determining which string of letters (code) represents each language and language family, as well as storing other information associated with a particular language or family. Language codes are used in naming categories, and are called by many templates. The module Module:languages stores all language-related information, while Module:families covers language families. The module stores different pieces of information, such as the names, the family and the scripts of the language. These modules cannot be used directly in a template, so instead there is another module named Module:language utilities, which allows templates to access the information.
If you know the name of a language, you can determine its code by using {{langrev}}
with the language's name as a parameter: the template will return the language's code if it can find it. (Type {{langrev|English}}
, for example, in the Sandbox or Special:ExpandTemplates, and it will return "en".)
Wiktionary also has a simple system for recording which family individual languages belong to, and which scripts they are written in.
Wiktionary represents individual languages as follows:
- Languages which were assigned two-letter codes in the international standard ISO 639-1 are generally represented on Wiktionary by those codes. The individual codes are stored in Module:languages. English, for example, is represented by en. German is represented by de. Esperanto is represented by eo. Wiktionary has a list of ISO 639-1 codes here.
- A few languages are represented on Wiktionary by 639-1 codes the ISO has deprecated. (This is generally the case when the ISO has come to consider a lect a group of languages, but Wiktionary still considers it a single language.) Serbo-Croatian, for example, is represented by sh.
- Languages which were not assigned codes by ISO 639-1, but which were assigned three-letter codes (based on Ethnologue codes) in the international standard ISO 639-3 are generally represented on Wiktionary by those codes. Abenaki, for example, is represented by abe. Wiktionary has a list of ISO 639-3 codes here.
- A few languages are represented by other, "exceptional" codes. (A complete list of these is in the section "List of languages with exceptional codes".) Exceptional codes are chosen as follows:
- A few are ISO 639-2 codes. (This is the case, for example, for languages which were not assigned specific, single codes by either ISO 639-1 or ISO 639-3.) Nahuatl, for example, is represented by nah.
- A few are codes devised by the Wikimedia Foundation Language Committee. (This is the case when a Wikimedia project is begun in a language which was not assigned a code by any ISO standard.) Zamboanga Chavacano, for example, is represented by cbk-zam. Wiktionary has a list of such codes in its Appendix:Wikimedia language codes.
- Any language which does not have an ISO or specially-devised Wikimedia code, but which is to be included in Wiktionary, is given a two-part exceptional code. The first part of this code is a relevant ISO 639-5 family code (see Wiktionary's appendix); after a hyphen, the second part of the code is a series of three lowercase letters which generally approximate the language name. (No digits, upper case letters, etc are used: IANA tags allow these, case independent, but Mediawiki software is more restrictive.) For example, Samoan Plantation Pidgin is cpe-spp: "cpe" is the ISO 639-5 code for English-based creoles and pidgins, "spp" abbreviates "Samoan Plantation Pidgin". Gallo is roa-gal: "roa" is the ISO 639-5 code for Romance languages, "gal" abbreviates "Gallo".
Constructed languages which are not widely used but which have been assigned ISO 639-3 codes are sometimes accepted by Wiktionary for inclusion in dedicated Appendices. These languages are represented by their ISO 639-3 codes. Láadan, for example, is represented by the ISO 639-3 code ldn. Such languages have their type
set to appendix-constructed
in Module:languages. Some other constructed languages are also included in dedicated Appendices though they do not have ISO 639-3 codes: these languages are given codes which consist of "art-" followed by three letters.
Reconstructed languages are assigned special codes consisting of the language family's code with "-pro" added to the end. Proto-Germanic, for example, is represented by the code gem-pro. Such languages have their type
set to reconstructed
in Module:languages.
Not all lects which have been assigned codes by the ISO are assigned codes or included by Wiktionary.
- The ISO has assigned codes to some constructed languages which Wiktionary excludes.
- The ISO has assigned codes to some lects which Wiktionary treats as dialects of other languages and thus of other codes. (This is the case, for example, with Moldovan/Moldavian: the ISO assigned the lect the 639-1 code mo, but Wiktionary regards it as a form of Romanian and represents it and Romanian by the same code ro. See Wiktionary:Language treatment.)
Languages' family and script information
ännernWiktionary sorts languages into families. Most families are related through descent from a common ancestor, but a few are merely categories, such as "creoles and pidgins". Wiktionary records which family a language belongs to in Module:languages. Each family is represented by a code; the family codes are explained in Wiktionary:Families.
- English belongs to the family of West Germanic languages; this information is recorded in the module as gmw. Serbo-Croatian is a South Slavic language, as recorded as zls. Abenaki is an Algonquian language, as alg. Nahuatl is a Nahuan language, azc-nah.
- The widely-used constructed language Esperanto has its membership in the category "Artificial languages" recorded as art.
- Zamboanga Chavacano has its membership in the category "Creole or pidgin languages" recorded as crp.
- Wiktionary even records information about appendix-only constructed languages in this way: Láadan has its membership in the category "Artificial languages" recorded as art.
Wiktionary records which script(s) a language uses through the module as well. Each script is represented by a code, which is itself the name of a template, stored in the Template: namespace. The script codes are explained in Wiktionary:Scripts.
- English is written in the Latin script; this is recorded as Latn.
- Serbo-Croatian is written in both the Latin and the Cyrillic scripts; this is recorded as Latn and Cyrl.
- Wiktionary even records information about appendix-only constructed languages in this way: the information that Láadan is written in the Latin script is recorded as Latn.