This section details all the components and data structures in response structures returned by Tisane.
The POST /parse response includes sections that can be displayed or hidden based to the provided settings. (See Configuration And Customization Guide for more information.)
The root-level attributes are:
text(string) - The original input text.language(string) - The detected language code, if language identification was used.reduced_output(boolean) - Indicates if verbose information was omitted due to input size.sentiment(floating-point number) - A document-level sentiment score (-1 to 1). Only shown whendocument_sentimentsetting is set totrue.signal2noise(floating-point number) - A relevance score showing how closely the text matches the concepts from therelevantsetting. This value appears only if therelevantsetting exists.
Tisane automatically analyzes the text for potential instances of abuse (problematic content) and flags them in the response.
The abuse section lists detected instances of content that may require attention of moderators, or be relevant to law enforcement agencies.
This section appears if:
- Instances of problematic content is found, and;
- The
abusesetting is set totrue(or omitted).
Note: use case scenarios can vary. It is the responsibility of integrators and community administrators whether and how to act upon different types of flagged content.
For example: It might not be appropriate to restrict sexual advances in a dating app, or censor profanity in communities where it’s commonly accepted, or restrict external contact if it's allowed.
If you don't want problematic content appear in your response, you must explicitly set the abuse setting to false.
Every instance contains the following attributes:
type(string) - The type of the abuseoffset(unsigned integer) - The starting position of the instance. It is zero-based.length(unsigned integer) - Length of the content.sentence_index(unsigned integer) - Index of the sentence containing the instance. It is zero-based.text(string) - The fragment of text containing the instance. Only appears if thesnippetssetting is set totrue.tags(array of strings) - Additional abuse details. For example: if the fragment is classified as an attempt to sell hard drugs, the instance contains thehard_drugtag.severity(string) - How severe the abuse is. The levels of severity arelow,medium,high, andextreme.explanation(string) - The rationale for classification (ifexplainis enabled).
The supported problematic types are:
personal_attack- Direct insults / bullying / attacks targeting an individual. For example: instances of cyberbullying. Note: Criticism of ideas, posts, or general negative sentiment is not the same as a personal insult. See: Personal Attackbigotry- Hate speech or expression of bigoted opinions; adversarial remarks targetting protected groups. This includes not just racial slurs but any hostile statements directed at the group as a whole. See: Bigotry and Hate Speechprofanity- Use of profane language, regardless of the context or intent. Note that racial slurs are not captured by this type. See: Profanitysexual_advances- Sexual solicitation. Attempts, whether welcome or unwelcome, to seek sexual favors or gratification. Also see: Sexual Advancescriminal_activity- Content involving attempts to sell or acquire illegal items, engage in criminal services, issue threats, or similar actions. Also see: Criminal Activityexternal_contact- Attempts to initiate communication or payment through external channels. For example: Phone, email, messaging apps. These attempts might violate rules in certain communities, such as gig economy platforms or e-commerce sites. Also see: Attempts to Establish External Contactadult_only- activities restricted for minors. For example: Consumption of alcohol. Also see: Adult-Only Contentmental_issues- Content that might indicate mental health concerns, such as suicidal thoughts or signs of depression. Also see: Mental Issuesallegation- Claims or accusations of misconduct, which may or may not involve criminal activity. Also see: Allegationscontentious- Content likely to incite or provoke emotional reactions from individuals or groups. Also see: Contentious Contentdisturbing- Graphic or unsettling descriptions that might be distressing to readers. Also see: Disturbing Contentno_meaningful_content- Nonsensical or gibberish text that lacks clear meaning. Also see: Meaningless Contentdata_leak- Sensitive personal information or privacy violations. For example: Passwords, ID numbers. Also see: Data Leaksspam- Spam content. Also see: Spamsocial_hierarchy- Forceful assertion of hierarchy in a community. For example: Someone is acting as a control freak. Also see: Assertion of Hierarchygeneric- Content that doesn't fit into any specific category; undefined.
The sentiment_expressions section highlights the sentiment towards aspects or entities.
This section appears if:
- Instances where sentiment is expressed are found, and;
- The
sentimentsetting is set totrue(or omitted).
Every instance contains the following attributes:
offset(unsigned integer) - The starting position of the instance. It is zero-based.length(unsigned integer) - Length of the content.sentence_index(unsigned integer) - Index of the sentence containing the instance. It is zero-based.text(string) - The fragment of text containing the instance (if thesnippetssetting is set totrue).polarity(string) - Indicates the sentiment of the text:positive,negative, ormixed. There is also a default value used when the sentiment has been pre-determined by the application. For example: if a review is split into two portions, What did you like? and What did you not like?, and the reviewer replies briefly with The quiet. The service. In such cases, the default polarity allows the application to assign sentiment externally based on the context.targets(array of strings) - Lists the specific aspects or entities the sentiment refers to, if available. For example, in the sentence The breakfast was yummy but the staff is unfriendly, the the sentiment targets aremealandstaff. Named entities can also be sentiment targets.reasons(array of strings) - Justifications for the sentiment. if available. For example, in the sentence The breakfast was yummy but the staff is unfriendly, thereasonsarray for thestaffis["unfriendly"], while thereasonsarray formealis["tasty"].explanation(string) - The rationale for the sentiment (ifexplainis enabled).
Example:
"sentiment_expressions": [
{
"sentence_index": 0,
"offset": 0,
"length": 32,
"polarity": "positive",
"reasons": ["close"],
"targets": ["location"]
},
{
"sentence_index": 0,
"offset": 38,
"length": 29,
"polarity": "negative",
"reasons": ["disrespectful"],
"targets": ["staff"]
}
]The entities_summary section lists detected entities in the text.
This section appears if:
- Named entities are found, and;
- The
entitiessetting is set totrue(or omitted).
Every entity contains the following attributes:
name(string) - The most complete form of the entity's name found in the text across all mentions.ref_lemma(string) - The dictionary form (lemma) of the entity in English, if available, regardless of the input language.type(string or array of strings) - Defines the entity's type. Some entities may have multiple types. A country (or any other geopolical entity declaring itself a country, even if not universally recognized as such) is considered both a place and an organization.subtype(string) - Specifies a more detailed classification within the entity type.mentions(array of objects) - Lists all instances where the entity appears in the text.
Every mention contains the following attributes:
offset(unsigned integer) - The starting position of the instance. It is zero-based.length(unsigned integer) - Length of the content.sentence_index(unsigned integer) - Index of the sentence containing the instance. It is zero-based.text(string) - The fragment of text containing the instance (if thesnippetssetting is set totrue).
Example:
"entities_summary": [
{
"type": "person",
"name": "John Smith",
"ref_lemma": "John Smith",
"mentions": [
{
"sentence_index": 0,
"offset": 0,
"length": 10 }
]
}
,
{
"type": [ "organization", "place" ]
,
"name": "UK",
"ref_lemma": "U.K.",
"mentions": [
{
"sentence_index": 0,
"offset": 40,
"length": 2 }
]
}
]The currently supported entity types are:
person, with optional subtypes:fictional_character,important_person,spiritual_beingorganizationNote: A country is both an organization and a placeplacetime_rangedatetimehashtagemailamount_of_moneyphone- Phone number, either domestic or international, in a variety of formatsrole- A social role. For example: A position in an organization.software- A named software packagewebsite(URL), with an optional subtype:torfor Onion links; Note: Web services can also have thesoftwaretype assigneditem_of_interest- Any type of artifact of potential interest to the investigation. For example: Weapons, drugs, vehicles, luxury.weightbank_accountOnly IBAN format is currently supported; subtypes:ibancredit_card- A credit card number, with optional subtypes:visa,mastercard,american_express,diners_club,discovery,jcb,unionpaycoordinates- GPS coordinatescredential, with optional subtypes:md5,sha-1crypto, with optional subtypes:bitcoin,ethereum,monero,monero_payment_id,litecoin,dashevent- A notable event involving participation of multiple people.fileOnly Windows pathnames are supported; subtypes:windows,facebook(for images downloaded from Facebook)flight_codeidentifierAny alphanumeric identifiers (ID numbers, codes, etc.) not classified elsewhere.ip_address, subtypes:v4,v6mac_addressnumeric(an unclassified numeric entity)username- A user name or a user's alias.
The topics section lists detected topics in the text. For example: subjects, domains, themes in other terms.
This section appears if:
- Topics are found, and;
- The
topicssetting is set totrue(or omitted).
If topic_stats setting is set to true, every entry in the array contains:
topic(string) - The name of the topic.coverage(floating-point number) - A relevance score representing the ratio of sentences where the topic is detected to the total number of sentences.
The memory section provides optional context that can be passed to the settings in subsequent messages within the same conversation thread.
See Context and Long-Term Memory for more details.
Tisane can also provide more detailed linguistic data:
- Sentences: Original sentences, along with their corrected versions if any misspellings are detected.
- Lexical chunks: Groups of words (chunks) annotated with grammatical and stylistic features.
- Parse Trees and Phrases: Hierarchical representations of sentence structure, highlighting phrases and their relationships.
The sentence_list section is generated if:
- The
wordsor theparsessetting is set totrue.
Every sentence structure in the list contains:
offset(unsigned integer) - The starting position of the instance. It is zero-based.length(unsigned integer) - Length of the content.text(string) - The original input text.corrected_text(string) - The automatically corrected version of the sentence, if a misspelling or obfuscation/algospeak was detected and spellchecking is enabled.words(array of structures) - Provides detailed information about each lexical chunk, ifwordssetting is set totrue. Note: While the term "word" is used for simplicity, it may not be linguistically accurate to equate lexical chunks with individual words.parse_tree(object) - Contains the parse tree and detected phrases for the sentence when theparsessetting is set totrue.nbest_parses(array of parse objects) - Lists alternative parse trees that are close to the best one. Generated when bothparsessetting istrueanddeterministicsetting is explicitly set tofalse.
The words section is generated if:
- The
wordssetting is set totrue.
Every lexical chunk (referred to as a "word" for simplicity) structure in the words array contains:
type(string) - Defines the element's category. For example:punctuationfor punctuation marks,numeralfor numerals, orwordfor all other text elements.text(string) - The original input text.offset(unsigned integer) - The starting position of the instance. It is zero-based.length(unsigned integer) - Length of the content.corrected_text(string) - The automatically corrected version of the sentence, if a misspelling was detected.lettercase(string) - Indicates the original letter case of the word:upper,capitalized, ormixed. Note: If the text is entirely lowercase or case-insensitive, this attribute is omitted.stopword(boolean) - Specifies whether the word is a stopwordgrammar(array of strings or structures) - Lists the grammatical features associated with theword. If thefeature_standardsetting is defined asnative, each feature is an object containing anindex(a numeral) and avalue(string). Otherwise, each feature is represented as a plain string.
For lexical words only:
role(string) - The semantic role of the word. For example:agentorpatient. Note: In the passive voice, semantic roles are reversed relative to syntactic roles. For example: In The car was driven by David, car is the patient, and David is the agent.numeric_value(floating-point number) - The numeric value, if the word represents or is associated with one.family(integer number) - the ID of the associated family with the disambiguated word-sense of the lexical chunk.definition(string) - the definition of the family.- Included if the
fetch_definitionssetting is set totrue.
- Included if the
lexeme(integer number) - The ID of the lexeme entry associated with the disambiguated word-sense.nondictionary_pattern(integer number) - The ID of a non-dictionary pattern that matched the word if it was not found in the language model but classified using non-dictionary heuristics.style(array of strings or structures) - Generates a list of style features associated with theword.- Included if the
feature_standardsetting is set tonativeordescription.
- Included if the
semantics(array of strings or structures) - Generates a list of semantic features associated with theword.- Included if the
feature_standardsetting is set tonativeordescription.
- Included if the
segmentation(structure) - Provides information on the selected segmentation. A segmentation is an array of word structures.- Included if multiple segmentations are possible, and the
deterministicsetting is set tofalse.
- Included if multiple segmentations are possible, and the
other_segmentations(array of structures) - Lists alternative segmentations considered incorrect during the disambiguation process. Each entry has the same structure assegmentation.nbest_senses(array of structures) - Provides alternative disambiguation hypotheses.Included if the
deterministicsetting isfalse.Each hypothesis includes:
grammar,style, andsemantics. These are structured identically to the corresponding attributes above.senses. Lists word-senses for the hypothesis, each containing:family: The associated family ID.definition: The family’s definition iffetch_definitionsis enabled.ref_lemma: The reference lemma, if available.
For punctuation marks only:
id(integer number) - The ID of the punctuation mark.behavior(string) - The behavior code that defines the function of the punctuation mark. Values:sentenceTerminatorgenericCommabracketStartbracketEndscopeDelimiterhyphenquoteStartquoteEndlistComma(for East-Asian enumeration commas such as、)
Punctuation marks have no n-best senses.
A parse tree, or more precisely, a parse forest, is a hierarchical collection of phrases linked to one another.
The parse tree section is generated if:
- The
parsessetting is set totrue.
At the top level of the parse, there is an array of root phrases contained within the phrases element, each associated with a numeric id.
Every phrase can have child phrases, forming a nested structure.
Each phrase includes the following attributes:
type(string) - A standard phrase tag denoting the type of the phrase. For example:S,VP,NP,ADJP,ADVP.family(integer number) - An ID of the phrase family.offset(unsigned integer) - The starting position of the instance. It is zero-based.length(unsigned integer) - Length of the phrase.role(string) - The semantic role of the phrase, if applicable, similar to semantic roles assigned to individual words.text(string) - The textual representation of the phrase: Phrase members are separated by the vertical bar character (|). Children phrases are enclosed in parentheses (). For example:- driven|by|David 9 (a flat phrase with three members)
- (The|car)|was|(driven|by|David) (a hierarchical structure with child phrases).
Example:
"parse_tree": {
"id": 4,
"phrases": [
{
"type": "S",
"family": 1451,
"offset": 0,
"length": 27,
"text": "(The|car)|was|(driven|by|David)",
"children": [
{
"type": "NP",
"family": 1081,
"offset": 0,
"length": 7,
"text": "The|car",
"role": "patient"
},
{
"type": "VP",
"family": 1172,
"offset": 12,
"length": 15,
"text": "driven|by|David",
"role": "verb"
}
]
}Tisane supports context-aware spelling correction. It identifies and corrects misspellings or intentional obfuscations by deducing the intended meaning, especially when the language model does not recognize a word.
When a correction is made, Tisane adds the corrected_text attribute:
At the word level: If individual words or lexical chunks are returned.
At the sentence level: If the sentence text is generated.
Sentence-level corrected_text appears when:
- The
wordsorparsessettings are set totrue.
Tisane works with large dictionaries. You can exclude esoteric terms by adjusting the min_generic_frequency setting.
Note: Spell-checking runs regardless of whether sentence or word sections are included in the output.
You can control this behavior with the following settings:
Set
disable_spellchecktotrueto turn off spell-checking entirely.To avoid correcting proper nouns (in languages with capitalization), set
lowercase_spellcheck_onlytotrue. This restricts spell-checking to lowercase words, excluding capitalized and uppercase terms.