Algospeak and Adversarial Text Manipulations

Tisane uses a special type of built-in spellchecker module to process text with both unintentional errors (misspellings) and adversarial text manipulations (e.g. algospeak).

The spellchecker employs several different techniques to handle different types of manipulations (masking characters, substitutions, etc.). These corrections are not limited by profanities or slurs, and consider the context. The same misspelled word may be interpreted differently in different sentences.

If corrections were found to be necessary in a sentence, the sentence gets a corrected_text attribute where the corrected text is logged. (Set words to true to output sentence data.)

Limitations

Spell-checking is not a "did you mean" tool, as many people seem to believe:

If the word is a legitimate word, no matter if misused or esoteric, Tisane will not correct it. For example, if noun is misspelled as nun, or house is misspelled as horse, Tisane won't help (unless it's part of a known often obfuscated concept, e.g. corn star in English).
The primary purpose of the spellchecker is to decipher obfuscations. Therefore, the spellchecker is biased toward more profane, objectionable, or heavily used concepts.

Excluding Esoteric Senses And Words To Get Better Results

To get around the issue, you can use the min_generic_frequency parameter.

This allows you to exclude the most esoteric senses and words.

The frequency is graded between 0 and 10, with 10 being the most frequent.

Some esoteric senses are also graded at -10.

We recommend you initially set min_generic_frequency to 1 or2 to see if it works in your situation.

Excluding Potential Proper Nouns

If you need to avoid spell-checking potential proper nouns, set lowercase_spellcheck_only to true.

Example

Request:

{
  "language":"en",
  "content":"I will br*k his neck and kll him",
  "settings": 
  {
    "words":true,"topics":false,"sentiment":false,"snippets":true
  }
}

Response:


	"text": "I will br*k his neck and kll him",
	"abuse": [
		{
			"sentence_index": 0,
			"offset": 0,
			"length": 32,
			"text": "I will br*k his neck and kll him",
			"type": "criminal_activity",
			"severity": "medium",
			"tags": [
				"threat",
				"violence",
				"death"
			]
		}
	],
	"sentence_list": [
		{
			"offset": 0,
			"text": "I will br*k his neck and kll him",
			"words": [
				{
					"type": "word",
					"offset": 0,
					"text": "I",
					"lettercase": "capitalized",
					"role": "agent",
					"lexeme": 63061,
					"family": 301,
					"grammar": [
						"PRON"
					],
					"stopword": true
				},
				{
					"type": "word",
					"offset": 2,
					"text": "will",
					"lexeme": 146938,
					"family": 316,
					"grammar": [
						"VERB"
					],
					"stopword": true
				},
				{
					"type": "word",
					"offset": 7,
					"text": "br*k",
					"role": "verb",
					"lexeme": 20996,
					"family": 107846,
					"grammar": [
						"VERB"
					]
				},
				{
					"type": "word",
					"offset": 12,
					"text": "his",
					"lexeme": 63064,
					"family": 303,
					"grammar": [
						"DET"
					],
					"stopword": true
				},
				{
					"type": "word",
					"offset": 16,
					"text": "neck",
					"lexeme": 93293,
					"family": 40510,
					"wikidata": "Q9633",
					"grammar": [
						"NOUN"
					]
				},
				{
					"type": "word",
					"offset": 21,
					"text": "and",
					"lexeme": 4096,
					"family": 322,
					"grammar": [
						"CCONJ"
					],
					"stopword": true
				},
				{
					"type": "word",
					"offset": 25,
					"text": "kll",
					"role": "verb",
					"lexeme": 77380,
					"family": 113102,
					"grammar": [
						"VERB"
					]
				},
				{
					"type": "word",
					"offset": 29,
					"text": "him",
					"role": "patient",
					"lexeme": 63062,
					"family": 303,
					"grammar": [
						"PRON"
					],
					"stopword": true
				}
			],
			"corrected_text": "I will break his neck and kill him"
		}
	]
}

Algospeak and Adversarial Text Manipulations

Limitations

Excluding Esoteric Senses And Words To Get Better Results

Excluding Potential Proper Nouns

Example

Was this helpful?