Last updated

Topic Extraction

Topic extraction determines the dominant topics in the text.

This functionality is also known as:

  • theme identification
  • subject detection
  • key topic recognition

Tisane stores the topics under the topics array (strings without topic_stats, objects with topic_stats). The topics are document level.

When a particular word has multiple interpretations, the sense of the word must be determined in the current context. For example, Jupiter is a planet and a Roman deity. Whether it's the planet or the deity, depends on the text.

For example, the sentence Juno is the wife of Jupiter refers to the deity. Tisane determines the relevant topics as Roman mythology, supernatural (gods), relationship, and family (since the spousal connection is mentioned).

{
	"text": "Juno is the wife of Jupiter",
	"topics": [
		"supernatural",
		"Roman mythology",
		"relationship",
		"family"
	]
}

On the other hand, the sentence Jupiter is farther from the sun than Mars refers to planets. Tisane determines the topics to be outer space and astronomy.

{
	"text": "Jupiter is farther from the sun than Mars",
	"topics": [
		"outer space",
		"astronomy"
	]
}

Topic Statistics

If the setting topic_stats is set to true, then the portion of the input where the topic is active is provided. The topic is then not provided as a string but as an object made of the topic itself (topic (string) attribute) and its distribution statistic (coverage (float) attribute).

Example

Request:

{
  "language":"en",
  "content":"Jupiter is farther from the sun than Mars. Which is not important in the current context",
  "settings": 
  {
    "topic_stats": true
  }
}

Response:

{
	"text": "Jupiter is farther from the sun than Mars. Which is not important in the current context",
	"topics": [
		{
			"topic": "outer space",
			"coverage": 0.5
		},
		{
			"topic": "astronomy",
			"coverage": 0.5
		}
	]
}

(both detected topics appear in 1 sentence out of 2, which is 0.5 of all sentences)

Standards

There are common taxonomy standards that Tisane can use with topic_standard setting:

To specify the standard, add the topic_standard setting.

Example

Request:

{
  "language":"en",
  "content":"Jupiter is farther from the sun than Mars.",
  "settings": 
  {
    "topic_standard": "wikidata"
  }
}

Response:

{
	"text": "Jupiter is farther from the sun than Mars. Which is not important in the current contex",
	"topics": [
		"Q4169",
		"Q333"
	]
}

The standard taxonomies cover a small fraction of the native standard. When a concept is not covered by a taxonomy, it is omitted from the response.