mindmeld.core module¶
This module contains a collection of the core data structures used in MindMeld.
-
class
mindmeld.core.
Bunch
(**kwargs)[source]¶ Bases:
dict
Dictionary-like object that exposes its keys as attributes.
Inspired by scikit learn's Bunches
>>> b = Bunch(a=1, b=2) >>> b['b'] 2 >>> b.b 2 >>> b.a = 3 >>> b['a'] 3 >>> b.c = 6 >>> b['c'] 6
-
class
mindmeld.core.
CallableRegistry
[source]¶ Bases:
object
A registration class to map callable object names to corresponding objects.
-
functions_registry
¶ Getter for functions registry
-
-
class
mindmeld.core.
Entity
(text, entity_type, role=None, value=None, display_text=None, confidence=None)[source]¶ Bases:
object
An Entity is any important piece of text that provides more information about the user intent.
-
text
¶ str -- The text contents that span the entity
-
type
¶ str -- The type of the entity
-
role
¶ str -- The role of the entity
-
value
¶ dict -- The resolved value of the entity
-
display_text
¶ str -- A human readable text representation of the entity for use in natural language responses.
-
confidence
¶ float -- A confidence value from 0 to 1 about how confident the entity recognizer was for the given class label.
-
static
from_cache_typed
(obj)[source]¶ Function to instantiate a cached Entity by the class type which was serialized when it's to_cache() function was called.
-
static
is_system_entity
(entity_type)[source] Checks whether the provided entity type is a MindMeld-recognized system entity.
Parameters: entity_type (str) -- An entity type Returns: True if the entity is a system entity type, else False Return type: bool
-
entity_class_map
= {'Entity': <class 'mindmeld.core.Entity'>, 'NestedEntity': <class 'mindmeld.core.NestedEntity'>, 'QueryEntity': <class 'mindmeld.core.QueryEntity'>}¶
-
-
class
mindmeld.core.
FormEntity
(entity: str, role: Optional[str] = None, responses: Optional[List[str]] = None, retry_response: Optional[List[str]] = None, value: Optional[Dict] = None, default_eval: Optional[bool] = True, hints: Optional[List[str]] = None, custom_eval: Optional[str] = None)[source]¶ Bases:
object
A form entity is used for defining custom objects for the entity form used in AutoEntityFilling (slot-filling).
-
entity
¶ str -- Entity name
-
role
¶ str, optional -- The role of the entity
-
responses
¶ list/str, optional -- Message(s) for prompting the user for missing entities
-
retry_response
¶ list/str, optional -- Message(s) for re-prompting users. If not provided,
-
defaults to responses
-
value
¶ str, optional -- The resolved value of the entity
-
default_eval
¶ bool, optional -- Use system validation (default: True)
-
hints
¶ list, optional -- Developer defined list of keywords to verify the
-
user input against
-
custom_eval
¶ str, optional -- custom validation function name (should return either bool:
-
validated or not) or a custom resolved value for the entity. If custom resolved value
-
is returned, the slot response is considered to be valid.
-
-
class
mindmeld.core.
NestedEntity
(texts, spans, token_spans, entity, children=None)[source]¶ Bases:
object
An entity with the context of the query it came from, along with information like the entity's parent and children.
-
texts
¶ tuple -- Tuple containing the three forms of text: raw text, processed text, and normalized text
-
spans
¶ tuple -- Tuple containing the character index spans of the text for this entity for each text form
-
token_spans
¶ tuple -- Tuple containing the token index spans of the text for this entity for each text form
-
entity
¶ Entity -- The entity object
-
parent
¶ NestedEntity -- The parent of the nested entity
-
children
¶ tuple of NestedEntity -- A tuple of children nested entities
-
classmethod
from_query
(query, span=None, normalized_span=None, entity_type=None, role=None, entity=None, parent_offset=None, children=None)[source]¶ Creates an entity node using a parent entity node
Parameters: - query (Query) -- Description
- span (Span) -- The span of the entity in the query's raw text
- normalized_span (None, optional) -- The span of the entity in the query's normalized text
- entity_type (str, optional) -- The entity type. One of this and entity must be provided
- role (str, optional) -- The entity role. Ignored if entity is provided.
- entity (Entity, optional) -- The entity. One of this and entity must be provided
- parent_offset (int) -- The offset of the parent in the query
- children (None, optional) -- Description
Returns: the created entity
-
static
get_largest_non_overlapping_entities
(candidates, get_span_func)[source]¶ This function filters out overlapping entity spans
Parameters: - candidates (iterable) -- A iterable of candidates to filter based on span
- get_span_func (function) -- A function that accesses the span from each candidate
Returns: A list of non-overlapping candidates
Return type:
-
normalized_span
¶ The span of the normalized text span
-
normalized_text
¶ The normalized input text
-
normalized_token_span
¶ The token_span of the normalized text span
-
processed_span
¶ The span of the preprocessed text span
-
processed_text
¶ The input text after it has been preprocessed
-
processed_token_span
¶ The token_span of the preprocessed text span
-
span
¶ The span of original input text span
-
text
¶ The original input text span
-
token_span
¶ The token_span of original input text span
-
-
class
mindmeld.core.
ProcessedQuery
(query, domain=None, intent=None, entities=None, is_gold=False, nbest_transcripts_queries=None, nbest_transcripts_entities=None, nbest_aligned_entities=None, confidence=None)[source]¶ Bases:
object
A processed query contains a query and the additional metadata that has been labeled or predicted.
-
query
¶ Query -- The underlying query object.
-
domain
¶ str -- The domain of the query
-
entities
¶ list -- A list of entities present in this query
-
intent
¶ str -- The intent of the query
-
is_gold
¶ bool -- Indicates whether the details in this query were predicted or human labeled
-
nbest_transcripts_queries
¶ list -- A list of n best transcript queries
-
nbest_transcripts_entities
¶ list -- A list of lists of entities for each query
-
nbest_aligned_entities
¶ list -- A list of lists of aligned entities
-
confidence
¶ dict -- A dictionary of the class probas for the domain and intent classifier
-
-
class
mindmeld.core.
Query
(raw_text, processed_text, normalized_tokens, char_maps, locale=None, language=None, time_zone=None, timestamp=None, stemmed_tokens=None)[source]¶ Bases:
object
The query object is responsible for processing and normalizing raw user text input so that classifiers can deal with it. A query stores three forms of text: raw text, processed text, and normalized text. The query object is also responsible for translating text ranges across these forms.
-
raw_text
¶ str -- the original input text
-
processed_text
¶ str -- the text after it has been preprocessed. The pre-processing happens at the application level and is generally used for special characters
-
normalized_tokens
¶ tuple of str -- a list of normalized tokens
-
system_entity_candidates
¶ tuple -- A list of system entities extracted from the text
-
locale
¶ str, optional -- The locale representing the ISO 639-1 language code and ISO3166 alpha 2 country code separated by an underscore character.
-
language
¶ str, optional -- The language code representing ISO 639-1 language codes.
-
time_zone
¶ str -- The IANA id for the time zone in which the query originated such as 'America/Los_Angeles'
-
timestamp
¶ long, optional -- A unix timestamp used as the reference time If not specified, the current system time is used. If time_zone is not also specified, this parameter is ignored
-
stemmed_tokens
¶ list -- A sequence of stemmed tokens for the query text
-
get_system_entity_candidates
(sys_types)[source]¶ Parameters: sys_types (set of str) -- A set of entity types to select Returns: Returns candidate system entities of the types specified Return type: list
-
get_text_form
(form)[source]¶ Programmatically retrieves text by form
Parameters: form (int) -- A valid text form (TEXT_FORM_RAW, TEXT_FORM_PROCESSED, or TEXT_FORM_NORMALIZED) Returns: The requested text Return type: str
-
get_verbose_normalized_tokens
()[source]¶ This function returns a list of dictionaries containing details of each normalized token
-
transform_index
(index, form_in, form_out)[source]¶ Transforms a text index from one form to another.
Parameters: Returns: the equivalent index of text in the output form
Return type:
-
transform_span
(text_span, form_in, form_out)[source]¶ Transforms a text range from one form to another.
Parameters: Returns: the equivalent range of text in the output form
Return type:
-
language
Language of the query specified using a 639-2 code.
-
locale
The locale representing the ISO 639-1/2 language code and ISO3166 alpha 2 country code separated by an underscore character.
-
normalized_text
¶ The normalized input text
-
normalized_tokens
The tokens of the normalized input text
-
processed_text
The input text after it has been preprocessed
-
stemmed_text
¶ The stemmed input text
-
text
¶ The original input text
-
time_zone
The IANA id for the time zone in which the query originated such as 'America/Los_Angeles'.
-
timestamp
A unix timestamp for when the time query was created. If time_zone is None, this parameter is ignored.
-
-
class
mindmeld.core.
QueryEntity
(texts, spans, token_spans, entity, children=None)[source]¶ Bases:
mindmeld.core.NestedEntity
An entity with the context of the query it came from.
-
text
¶ str -- The raw text that was processed into this entity
-
processed_text
¶ str -- The processed text that was processed into this entity
-
normalized_text
¶ str -- The normalized text that was processed into this entity
-
span
¶ Span -- The character index span of the raw text that was processed into this entity
-
processed_span
¶ Span -- The character index span of the raw text that was processed into this entity
-
span
Span -- The character index span of the raw text that was processed into this entity
-
start
¶ int -- The character index start of the text range that was processed into this entity. This index is based on the normalized text of the query passed in.
-
end
¶ int -- The character index end of the text range that was processed into this entity. This index is based on the normalized text of the query passed in.
-
-
class
mindmeld.core.
Span
(start, end)[source]¶ Bases:
object
Object representing a text span with start and end indices
-
start
¶ int -- The index from the original text that represents the start of the span
-
end
¶ int -- The index from the original text that represents the end of the span
-
static
get_largest_non_overlapping_candidates
(spans)[source]¶ Finds the set of the largest non-overlapping candidates.
Parameters: spans (list) -- List of tuples representing candidate spans (start_index, end_index + 1). Returns: List of the largest non-overlapping spans. Return type: selected_spans (list)
-
shift
(offset)[source]¶ Shifts a span by offset
Parameters: offset (int) -- The number to change start and end by
-
slice
(obj)[source]¶ Returns the slice of the object for this span
Parameters: obj -- The object to slice Returns: The slice of the passed in object for this span
-
end
-
start
-
-
mindmeld.core.
resolve_entity_conflicts
(query_entities)[source]¶ This method takes a list containing query entities for a query, and resolves any entity conflicts. The resolved list is returned.
- If two entities in a query conflict with each other, use the following logic:
- If the target entity is a subset of another entity, then delete the target entity.
- If the target entity shares the identical span as another entity, then keep the one with the highest confidence.
- If the target entity overlaps with another entity, then keep the one with the highest confidence.
Parameters: entities (list of QueryEntity) -- A list of query entities to resolve Returns: A filtered list of query entities Return type: list of QueryEntity