Language Models Data Stores
Tisane language models are stored in directories. They can be divided into:
- Language-specific data that describes a particular language.
- Crosslingual data used by all languages (for example, semantic connections between concepts).
Language-Specific Data
Language-specific data stores are named according to the following convention: (language_code)-(data_store_name)
- Language code: Based on ISO-639-1 language code standard, optionally including dialects.
- Data store name: Structures stored.
Examples:
en-phrase
: English phrasal patternsfr-nondic
: French nondictionary entity heuristicszh_CN-phrase
: Chinese (Simplified) phrasal patterns
Crosslingual Data Stores
These data stores used by all languages:
family
role
pragma
Important: All data stores for a language must reside in the same directory.
Partial Distribution
In order to conserve space or out of other considerations, it is possible to exclude languages or components from deployment.
Providing Selected Languages Only
To include only specific languages, identify the appropriate language codes (e.g., en
, de
, zh_CN
) and include the corresponding language-specific data stores along with the three shared data stores (family
, role
, pragma
).
Providing Partial Functionality
Stores xx-famlex
and xx-famphrase
are used for translation only, and can be excluded from distribution if Tisane is not used for translation.
spellchecking
Spellchecking data is stored under xx-spell
stores. If omitted, spellchecking will not work.