mindmeld.core module

This module contains a collection of the core data structures used in MindMeld.

class mindmeld.core.Bunch(**kwargs)[source]

Bases: dict

Dictionary-like object that exposes its keys as attributes.

Inspired by scikit learn's Bunches

>>> b = Bunch(a=1, b=2)
>>> b['b']
2
>>> b.b
2
>>> b.a = 3
>>> b['a']
3
>>> b.c = 6
>>> b['c']
6
class mindmeld.core.CallableRegistry[source]

Bases: object

A registration class to map callable object names to corresponding objects.

functions_registry

Getter for functions registry

class mindmeld.core.Entity(text, entity_type, role=None, value=None, display_text=None, confidence=None)[source]

Bases: object

An Entity is any important piece of text that provides more information about the user intent.

text

str -- The text contents that span the entity

type

str -- The type of the entity

role

str -- The role of the entity

value

dict -- The resolved value of the entity

display_text

str -- A human readable text representation of the entity for use in natural language responses.

confidence

float -- A confidence value from 0 to 1 about how confident the entity recognizer was for the given class label.

is_system_entity[source]

bool -- True if the entity is a system entity

static from_cache(obj)[source]
static from_cache_typed(obj)[source]

Function to instantiate a cached Entity by the class type which was serialized when it's to_cache() function was called.

static is_system_entity(entity_type)[source]

Checks whether the provided entity type is a MindMeld-recognized system entity.

Parameters:entity_type (str) -- An entity type
Returns:True if the entity is a system entity type, else False
Return type:bool
to_cache()[source]
to_dict()[source]

Converts the entity into a dictionary

static value_to_cache(value)[source]
entity_class_map = {'Entity': <class 'mindmeld.core.Entity'>, 'NestedEntity': <class 'mindmeld.core.NestedEntity'>, 'QueryEntity': <class 'mindmeld.core.QueryEntity'>}
class mindmeld.core.FormEntity(entity: str, role: Optional[str] = None, responses: Optional[List[str]] = None, retry_response: Optional[List[str]] = None, value: Optional[Dict] = None, default_eval: Optional[bool] = True, hints: Optional[List[str]] = None, custom_eval: Optional[str] = None)[source]

Bases: object

A form entity is used for defining custom objects for the entity form used in AutoEntityFilling (slot-filling).

entity

str -- Entity name

role

str, optional -- The role of the entity

responses

list/str, optional -- Message(s) for prompting the user for missing entities

retry_response

list/str, optional -- Message(s) for re-prompting users. If not provided,

defaults to responses
value

str, optional -- The resolved value of the entity

default_eval

bool, optional -- Use system validation (default: True)

hints

list, optional -- Developer defined list of keywords to verify the

user input against
custom_eval

str, optional -- custom validation function name (should return either bool:

validated or not) or a custom resolved value for the entity. If custom resolved value
is returned, the slot response is considered to be valid.
to_dict()[source]

Converts the entity into a dictionary

class mindmeld.core.NestedEntity(texts, spans, token_spans, entity, children=None)[source]

Bases: object

An entity with the context of the query it came from, along with information like the entity's parent and children.

texts

tuple -- Tuple containing the three forms of text: raw text, processed text, and normalized text

spans

tuple -- Tuple containing the character index spans of the text for this entity for each text form

token_spans

tuple -- Tuple containing the token index spans of the text for this entity for each text form

entity

Entity -- The entity object

parent

NestedEntity -- The parent of the nested entity

children

tuple of NestedEntity -- A tuple of children nested entities

static from_cache(obj)[source]
classmethod from_query(query, span=None, normalized_span=None, entity_type=None, role=None, entity=None, parent_offset=None, children=None)[source]

Creates an entity node using a parent entity node

Parameters:
  • query (Query) -- Description
  • span (Span) -- The span of the entity in the query's raw text
  • normalized_span (None, optional) -- The span of the entity in the query's normalized text
  • entity_type (str, optional) -- The entity type. One of this and entity must be provided
  • role (str, optional) -- The entity role. Ignored if entity is provided.
  • entity (Entity, optional) -- The entity. One of this and entity must be provided
  • parent_offset (int) -- The offset of the parent in the query
  • children (None, optional) -- Description
Returns:

the created entity

static get_largest_non_overlapping_entities(candidates, get_span_func)[source]

This function filters out overlapping entity spans

Parameters:
  • candidates (iterable) -- A iterable of candidates to filter based on span
  • get_span_func (function) -- A function that accesses the span from each candidate
Returns:

A list of non-overlapping candidates

Return type:

list

to_cache()[source]
to_dict()[source]

Converts the query entity into a dictionary

with_children(children)[source]

Creates a copy of this entity with the provided children

normalized_span

The span of the normalized text span

normalized_text

The normalized input text

normalized_token_span

The token_span of the normalized text span

processed_span

The span of the preprocessed text span

processed_text

The input text after it has been preprocessed

processed_token_span

The token_span of the preprocessed text span

span

The span of original input text span

text

The original input text span

token_span

The token_span of original input text span

class mindmeld.core.ProcessedQuery(query, domain=None, intent=None, entities=None, is_gold=False, nbest_transcripts_queries=None, nbest_transcripts_entities=None, nbest_aligned_entities=None, confidence=None)[source]

Bases: object

A processed query contains a query and the additional metadata that has been labeled or predicted.

query

Query -- The underlying query object.

domain

str -- The domain of the query

entities

list -- A list of entities present in this query

intent

str -- The intent of the query

is_gold

bool -- Indicates whether the details in this query were predicted or human labeled

nbest_transcripts_queries

list -- A list of n best transcript queries

nbest_transcripts_entities

list -- A list of lists of entities for each query

nbest_aligned_entities

list -- A list of lists of aligned entities

confidence

dict -- A dictionary of the class probas for the domain and intent classifier

static from_cache(obj)[source]
to_cache()[source]
to_dict()[source]

Converts the processed query into a dictionary

class mindmeld.core.Query(raw_text, processed_text, normalized_tokens, char_maps, locale=None, language=None, time_zone=None, timestamp=None, stemmed_tokens=None)[source]

Bases: object

The query object is responsible for processing and normalizing raw user text input so that classifiers can deal with it. A query stores three forms of text: raw text, processed text, and normalized text. The query object is also responsible for translating text ranges across these forms.

raw_text

str -- the original input text

processed_text

str -- the text after it has been preprocessed. The pre-processing happens at the application level and is generally used for special characters

normalized_tokens

tuple of str -- a list of normalized tokens

system_entity_candidates

tuple -- A list of system entities extracted from the text

locale

str, optional -- The locale representing the ISO 639-1 language code and ISO3166 alpha 2 country code separated by an underscore character.

language

str, optional -- The language code representing ISO 639-1 language codes.

time_zone

str -- The IANA id for the time zone in which the query originated such as 'America/Los_Angeles'

timestamp

long, optional -- A unix timestamp used as the reference time If not specified, the current system time is used. If time_zone is not also specified, this parameter is ignored

stemmed_tokens

list -- A sequence of stemmed tokens for the query text

static char_maps_from_cache(obj)[source]
char_maps_to_cache()[source]
static from_cache(obj)[source]
get_system_entity_candidates(sys_types)[source]
Parameters:sys_types (set of str) -- A set of entity types to select
Returns:Returns candidate system entities of the types specified
Return type:list
get_text_form(form)[source]

Programmatically retrieves text by form

Parameters:form (int) -- A valid text form (TEXT_FORM_RAW, TEXT_FORM_PROCESSED, or TEXT_FORM_NORMALIZED)
Returns:The requested text
Return type:str
get_token_ngram_raw_ngram_span(tokens, start_token_index, end_token_index)[source]
get_verbose_normalized_tokens()[source]

This function returns a list of dictionaries containing details of each normalized token

to_cache()[source]
transform_index(index, form_in, form_out)[source]

Transforms a text index from one form to another.

Parameters:
  • index (int) -- the index being transformed
  • form_in (int) -- the input form. should be one of TEXT_FORM_RAW
  • form_out (int) -- the output form
Returns:

the equivalent index of text in the output form

Return type:

int

transform_span(text_span, form_in, form_out)[source]

Transforms a text range from one form to another.

Parameters:
  • text_span (Span) -- the text span being transformed
  • form_in (int) -- the input text form. Should be one of TEXT_FORM_RAW, TEXT_FORM_PROCESSED or TEXT_FORM_NORMALIZED
  • form_out (int) -- the output text form. Should be one of TEXT_FORM_RAW, TEXT_FORM_PROCESSED or TEXT_FORM_NORMALIZED
Returns:

the equivalent range of text in the output form

Return type:

tuple

language

Language of the query specified using a 639-2 code.

locale

The locale representing the ISO 639-1/2 language code and ISO3166 alpha 2 country code separated by an underscore character.

normalized_text

The normalized input text

normalized_tokens

The tokens of the normalized input text

processed_text

The input text after it has been preprocessed

stemmed_text

The stemmed input text

text

The original input text

time_zone

The IANA id for the time zone in which the query originated such as 'America/Los_Angeles'.

timestamp

A unix timestamp for when the time query was created. If time_zone is None, this parameter is ignored.

class mindmeld.core.QueryEntity(texts, spans, token_spans, entity, children=None)[source]

Bases: mindmeld.core.NestedEntity

An entity with the context of the query it came from.

text

str -- The raw text that was processed into this entity

processed_text

str -- The processed text that was processed into this entity

normalized_text

str -- The normalized text that was processed into this entity

span

Span -- The character index span of the raw text that was processed into this entity

processed_span

Span -- The character index span of the raw text that was processed into this entity

span

Span -- The character index span of the raw text that was processed into this entity

start

int -- The character index start of the text range that was processed into this entity. This index is based on the normalized text of the query passed in.

end

int -- The character index end of the text range that was processed into this entity. This index is based on the normalized text of the query passed in.

class mindmeld.core.Span(start, end)[source]

Bases: object

Object representing a text span with start and end indices

start

int -- The index from the original text that represents the start of the span

end

int -- The index from the original text that represents the end of the span

static from_cache(obj)[source]
static get_largest_non_overlapping_candidates(spans)[source]

Finds the set of the largest non-overlapping candidates.

Parameters:spans (list) -- List of tuples representing candidate spans (start_index, end_index + 1).
Returns:List of the largest non-overlapping spans.
Return type:selected_spans (list)
has_overlap(other)[source]

Determines whether two spans overlap.

shift(offset)[source]

Shifts a span by offset

Parameters:offset (int) -- The number to change start and end by
slice(obj)[source]

Returns the slice of the object for this span

Parameters:obj -- The object to slice
Returns:The slice of the passed in object for this span
to_cache()[source]
to_dict()[source]

Converts the span into a dictionary

end
start
mindmeld.core.resolve_entity_conflicts(query_entities)[source]

This method takes a list containing query entities for a query, and resolves any entity conflicts. The resolved list is returned.

If two entities in a query conflict with each other, use the following logic:
  • If the target entity is a subset of another entity, then delete the target entity.
  • If the target entity shares the identical span as another entity, then keep the one with the highest confidence.
  • If the target entity overlaps with another entity, then keep the one with the highest confidence.
Parameters:entities (list of QueryEntity) -- A list of query entities to resolve
Returns:A filtered list of query entities
Return type:list of QueryEntity