Tisane language models are stored in directories. They can be divided into:
- Language-specific data that describes a particular language.
- Crosslingual data used by all languages (for example, semantic connections between concepts).
Language-specific data stores are named according to the following convention: (language_code)-(data_store_name)
- Language code: Based on ISO-639-1 language code standard, optionally including dialects.
- Data store name: Structures stored.
Examples:
en-phrase: English phrasal patternsfr-nondic: French nondictionary entity heuristicszh_CN-phrase: Chinese (Simplified) phrasal patterns
These data stores used by all languages:
familyrolepragma
Important: All data stores for a language must reside in the same directory.
In order to conserve space or out of other considerations, it is possible to exclude languages or components from deployment.
To include only specific languages, identify the appropriate language codes (e.g., en, de, zh_CN) and include the corresponding language-specific data stores along with the three shared data stores (family, role, pragma).
Stores xx-famlex and xx-famphrase are used for translation only, and can be excluded from distribution if Tisane is not used for translation.
Spellchecking data is stored under xx-spell stores. If omitted, spellchecking will not work.