# Lexical Chunking and Tokenization Tisane uses a unified representation for lexical chunks, opting for a logical, morpheme-based representation. In languages using compounds like German, the compounds are sliced into constituents. Idiomatic [multi-word expressions](https://en.wikipedia.org/wiki/Multiword_expression) ("kung fu", "power plant", "clay pigeon") are viewed as a single lexeme. ### Examples * English: "I don't see the power plant." => ["I", "do", "n't", "see", "the", "power plant", "."] * German: "Jetzt sollen die Stahlkugeln ersetzt werden." => ["Jetzt", "sollen", "die", "Stahl", "kugeln", "ersetzt", "werden", "."] * Simplified Chinese: "我给了老张三本书" => ["我", "给了", "老张", "三", "本", "书"] (In languages not using white spaces, particles are often joined with the word they modify.) * Spanish: "Asimismo, San Francisco es una de las mejores ciudades de EE. UU." => ["Asimismo", ",", "San Francisco", "es", "una", "de", "las", "mejores", "ciudades", "de", "EE. UU."] ## How To Use To use Tisane for tokenization/lexical chunking: 1. Specify `"words":true` in your `settings`. 2. In the response, traverse all elements in the `sentence_list` section (individual sentences). 3. The lexical chunks are under `words`.