mindmeld.auto_annotator module¶
-
class
mindmeld.auto_annotator.
Annotator
(app_path, annotation_rules=None, language='en', locale='en_US', overwrite=False, unannotate_supported_entities_only=True, unannotation_rules=None, **kwargs)[source]¶ Bases:
abc.ABC
Abstract Annotator class that can be used to build a custom Annotation class.
-
parse
(sentence, **kwargs)[source]¶ Extract entities from a sentence. Detected entities should be represented as dictionaries with the following keys: "body", "start" (start index), "end" (end index), "value", "dim" (entity type).
Parameters: sentence (str) -- Sentence to detect entities. Returns: List of QueryEntity objects. Return type: query_entities (list)
-
-
class
mindmeld.auto_annotator.
AnnotatorAction
[source]¶ Bases:
enum.Enum
An enumeration.
-
ANNOTATE
= 'annotate'¶
-
UNANNOTATE
= 'unannotate'¶
-
-
class
mindmeld.auto_annotator.
BootstrapAnnotator
(*args, **kwargs)[source]¶ Bases:
mindmeld.auto_annotator.Annotator
Bootstrap Annotator class used to generate annotations based on existing annotations.
-
parse
(sentence, entity_types, domain: str, intent: str, **kwargs)[source]¶ Parameters: Returns: List of QueryEntity objects.
Return type: query_entities (list)
-
text_queries_to_processed_queries
(text_queries: List[str])[source]¶ Converts text queries into processed queries.
Parameters: text_queries (List[str]) -- List of raw text queries. Returns: List of processed queries. Return type: processed_queries (List[ProcessedQuery])
-
-
class
mindmeld.auto_annotator.
MultiLingualAnnotator
(*args, **kwargs)[source]¶ Bases:
mindmeld.auto_annotator.Annotator
The MultiLingualAnnotator detects entities in English and non-English sentences.
- If the 'language' is English, this annotator solely uses the Spacy's English NER model to
- detect entities.
- If the 'language' is not English, this annotator will detect entities using both Spacy
- non-English NER models and a Duckling-based Annotator. A. The TranslationDucklingAnnotator will be used if a 'translator' service is available (E.g. "GoogleTranslator"). Non-English duckling candidates are matched to English entities detected by Spacy's English NER model. B. The NoTranslationDucklingAnnotator will be used if a 'translator' service is not available. The set of Non-English duckling candidates with the largest non-overlapping spans is selected.
-
class
mindmeld.auto_annotator.
NoTranslationDucklingAnnotator
(*args, **kwargs)[source]¶ Bases:
mindmeld.auto_annotator.Annotator
The NoTranslationDucklingAnnotator detects entities by filtering non-English candidates from Duckling to a set containing the largest non-overlapping spans.
Unlike the TranslationDucklingAnnotator, this annotator does not use a translation service. Unlike the MultiLingualAnnotator, this annotator does not use non-English Spacy NER models.
-
class
mindmeld.auto_annotator.
SpacyAnnotator
(*args, **kwargs)[source]¶ Bases:
mindmeld.auto_annotator.Annotator
Annotator class that uses spacy to generate annotations. Depending on the language, supported entities can include: "sys_time", "sys_interval", "sys_duration", "sys_number", "sys_amount-of-money", "sys_distance", "sys_weight", "sys_ordinal", "sys_quantity", "sys_percent", "sys_org", "sys_loc", "sys_person", "sys_gpe", "sys_norp", "sys_fac", "sys_product", "sys_event", "sys_law", "sys_langauge", "sys_work-of-art", "sys_other-quantity". For more information on the supported entities for the Spacy Annotator check the MindMeld docs.
-
parse
(sentence, entity_types=None, **kwargs)[source]¶ Extracts entities from a sentence. Detected entities should are represented as dictionaries with the following keys: "body", "start" (start index), "end" (end index), "value", "dim" (entity type).
Parameters: Returns: List of QueryEntity objects.
Return type: query_entities (list)
-
supported_entity_types
¶ This function generates a list of supported entities for the given language. These entities labels are mapped to MindMeld sys_entities. The "misc" spacy entity is skipped since the category too broad to be helpful in an application.
Returns: List of supported entity types. Return type: supported_entity_types (list)
-
-
class
mindmeld.auto_annotator.
TranslationDucklingAnnotator
(*args, **kwargs)[source]¶ Bases:
mindmeld.auto_annotator.Annotator
The TranslationDucklingAnnotator detects entities in non-English sentences using a translation service and Duckling by following these steps:
- The non-English sentence is translated to English.
- Spacy detects entities in the translated English sentence.
- Duckling detects non-English entities in the non-English sentence.
4. A heuristic in parse() is used to match and filer the non-English entities against the English entities. 5. The final set of filtered non-English entities are returned.
Unlike the NoTranslationDucklingAnnotator, this annotator uses a translation service. Unlike the MultiLingualAnnotator, this annotator does not use non-English Spacy NER models.
-
parse
(sentence, entity_types=None, **kwargs)[source]¶ Implements a heuristic to match English entities detected by Spacy on the translated non-English sentence against the non-English entities detected by Duckling on the non-English sentence.
Parameters: Returns: List of QueryEntity objects.
Return type: query_entities (list)