mindmeld.components.entity_resolver module¶
This module contains the entity resolver component of the MindMeld natural language processor.
-
class
mindmeld.components.entity_resolver.
BaseEntityResolver
(app_path, entity_type, resource_loader=None, **_kwargs)[source]¶ Bases:
abc.ABC
Base class for Entity Resolvers
-
dump
(model_path, incremental_model_path=None)[source]¶ Persists the trained classification model to disk. The state for an embedder based model is the cached embeddings whereas for text features based resolvers, (if required,) it will generally be a serialized pickle of the underlying model/algorithm and the data associated.
- In general, this method leads to creation of the following files:
- .configs.pkl: pickle of the resolver's configuarble parameters
- .pkl.hash: a hash string obtained from a combination of KB data and the config params
- .pkl (optional, for non-ES models): pickle of the underlying model/algo state
- .embedder_cache.pkl (optional, for embedder models): pickle of underlying embeddings
Parameters:
-
fit
(clean=False, entity_map=None)[source]¶ Fits the resolver model, if required
Parameters: Raises: EntityResolverError
-- if the resolver cannot be fit with the loaded/passed-in data- entity_map = {
"some_optional_key": "value", "entities": [
- {
- "id": "B01MTUORTQ", "cname": "Seaweed Salad", "whitelist": [...],
],
}
-
load
(path, entity_map=None)[source]¶ Loads state of the entity resolver as well the KB data. The state for embedder model is the cached embeddings whereas for text features based resolvers, (if required,) it will generally be a serialized pickle of the underlying model/algorithm. There is no state as such for Elasticsearch resolver to be dumped.
Parameters: Raises: EntityResolverError
-- if the resolver cannot be loaded from the specified path
-
load_deprecated
()[source]¶ A method to handle the deprecated way of using the .load() method in entity resolvers. This ensures backwards compatibility when loading models that were built using an older version of Mindmeld i.e a version <=4.4.0. Since no hash pickle file is dumped in the older version of MindMeld, using the latest .load() method throws a FileNotFoundError.
-
predict
(entity_or_list_of_entities, top_n=20, allowed_cnames=None)[source]¶ Predicts the resolved value(s) for the given entity using the loaded entity map or the trained entity resolution model.
Parameters: - entity_or_list_of_entities (Entity, tuple[Entity], str, tuple[str]) -- One or more entity query strings or Entity objects that needs to be resolved.
- top_n (int, optional) -- maximum number of results to populate. If specifically inputted as 0 or None, results in an unsorted list of results in case of embedder and tfidf entity resolvers. This is sometimes helpful when a developer wishes to do some wrapper operations on top of unsorted results, such as combining scores from multiple resolvers and then sorting, etc.
- allowed_cnames (Iterable, optional) -- if inputted, predictions will only include objects related to these canonical names
Returns: The top n resolved values for the provided entity.
Return type: (list)
Raises: EntityResolverError
-- if unable to obtain predictions for the given input
-
unload
()[source]¶ Unloads the model from memory. This helps reduce memory requirements while training other models.
-
resolver_configurations
¶
-
-
class
mindmeld.components.entity_resolver.
ElasticsearchEntityResolver
(app_path, entity_type, **kwargs)[source]¶ Bases:
mindmeld.components.entity_resolver.BaseEntityResolver
Resolver class based on Elastic Search
-
static
ingest_synonym
(app_namespace, index_name, index_type='syn', field_name=None, data=None, es_host=None, es_client=None, use_double_metaphone=False)[source]¶ Loads synonym documents from the mapping.json data into the specified index. If an index with the specified name doesn't exist, a new index with that name will be created.
Parameters: - app_namespace (str) -- The namespace of the app. Used to prevent collisions between the indices of this app and those of other apps.
- index_name (str) -- The name of the new index to be created.
- index_type (str) -- specify whether to import to synonym index or knowledge base object index. INDEX_TYPE_SYNONYM is the default which indicates the synonyms to be imported to synonym index, while INDEX_TYPE_KB indicates that the synonyms should be imported into existing knowledge base index.
- field_name (str) -- specify name of the knowledge base field that the synonym list corresponds to when index_type is INDEX_TYPE_SYNONYM.
- data (list) -- A list of documents to be loaded into the index.
- es_host (str) -- The Elasticsearch host server.
- es_client (Elasticsearch) -- The Elasticsearch client.
- use_double_metaphone (bool) -- Whether to use the phonetic mapping or not.
-
load_deprecated
()[source]¶ A method to handle the deprecated way of using the .load() method in entity resolvers. This ensures backwards compatibility when loading models that were built using an older version of Mindmeld i.e a version <=4.4.0. Since no hash pickle file is dumped in the older version of MindMeld, using the latest .load() method throws a FileNotFoundError.
-
ES_SYNONYM_INDEX_PREFIX
= 'synonym'¶ The prefix of the ES index.
-
resolver_configurations
¶
-
static
-
class
mindmeld.components.entity_resolver.
EmbedderCosSimEntityResolver
(app_path, entity_type, **kwargs)[source]¶ Bases:
mindmeld.components.entity_resolver.BaseEntityResolver
Resolver class for embedder models that create dense embeddings
-
get_processed_entity_map
(entity_map)[source]¶ Processes the entity map into a format suitable for indexing and similarity searching
Parameters: entity_map (Dict[str, Union[str, List]]) -- Entity map if passed in directly instead of loading from a file path Returns: - A processed entity map better suited for indexing and
- querying
Return type: processed_entity_map (Dict)
-
load_deprecated
()[source]¶ A method to handle the deprecated way of using the .load() method in entity resolvers. This ensures backwards compatibility when loading models that were built using an older version of Mindmeld i.e a version <=4.4.0. Since no hash pickle file is dumped in the older version of MindMeld, using the latest .load() method throws a FileNotFoundError.
-
resolver_configurations
¶
-
-
class
mindmeld.components.entity_resolver.
EntityResolver
[source]¶ Bases:
object
Class for backwards compatibility
- deprecated usage
>>> entity_resolver = EntityResolver( app_path, resource_loader, entity_type )
- new usage
>>> entity_resolver = EntityResolverFactory.create_resolver( app_path, entity_type ) # or ... >>> entity_resolver = EntityResolverFactory.create_resolver( app_path, entity_type, resource_loader=resource_loader )
-
class
mindmeld.components.entity_resolver.
EntityResolverFactory
[source]¶ Bases:
object
-
classmethod
create_resolver
(app_path, entity_type, config=None, resource_loader=None, **kwargs)[source]¶ - Identifies appropriate entity resolver based on input config and
- returns it.
Parameters: - app_path (str) -- The application path.
- entity_type (str) -- The entity type associated with this entity resolver.
- resource_loader (ResourceLoader) -- An object which can load resources for the resolver.
- er_config (dict) -- A classifier config
- es_host (str) -- The Elasticsearch host server.
- es_client (Elasticsearch) -- The Elasticsearch client.
-
classmethod
-
class
mindmeld.components.entity_resolver.
ExactMatchEntityResolver
(app_path, entity_type, **kwargs)[source]¶ Bases:
mindmeld.components.entity_resolver.BaseEntityResolver
Resolver class based on exact matching
-
get_processed_entity_map
(entity_map)[source]¶ Processes the entity map into a format suitable for indexing and similarity searching
Parameters: entity_map (Dict[str, Union[str, List]]) -- Entity map if passed in directly instead of loading from a file path Returns: - A processed entity map better suited for indexing and
- querying
Return type: processed_entity_map (Dict)
-
load_deprecated
()[source]¶ A method to handle the deprecated way of using the .load() method in entity resolvers. This ensures backwards compatibility when loading models that were built using an older version of Mindmeld i.e a version <=4.4.0. Since no hash pickle file is dumped in the older version of MindMeld, using the latest .load() method throws a FileNotFoundError.
-
resolver_configurations
¶
-
-
class
mindmeld.components.entity_resolver.
SentenceBertCosSimEntityResolver
(app_path, entity_type, **kwargs)[source]¶ Bases:
mindmeld.components.entity_resolver.EmbedderCosSimEntityResolver
Resolver class for bert models based on the sentence-transformers library https://github.com/UKPLab/sentence-transformers
-
class
mindmeld.components.entity_resolver.
TfIdfSparseCosSimEntityResolver
(app_path, entity_type, **kwargs)[source]¶ Bases:
mindmeld.components.entity_resolver.BaseEntityResolver
a tf-idf based entity resolver using sparse matrices. ref: scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
-
find_similarity
(src_texts, top_n=20, scores_normalizer=None, _return_as_dict=False, _no_sort=False)[source]¶ Computes sparse cosine similarity
Parameters: - src_texts (Union[str, list]) -- string or list of strings to obtain matching scores for.
- top_n (int, optional) -- maximum number of results to populate. if None, equals length of self._syn_tfidf_matrix
- scores_normalizer -- normalizer type to normalize scores. Allowed values are: "min_max_scaler", "standard_scaler"
Returns: - if _return_as_dict, returns a dictionary of tgt_texts and
their scores, else a list of sorted synonym names paired with their similarity scores (descending order)
Return type:
-
get_processed_entity_map
(entity_map)[source]¶ Processes the entity map into a format suitable for indexing and similarity searching
Parameters: entity_map (Dict[str, Union[str, List]]) -- Entity map if passed in directly instead of loading from a file path Returns: - A processed entity map better suited for indexing and
- querying
Return type: processed_entity_map (Dict)
-
load_deprecated
()[source]¶ A method to handle the deprecated way of using the .load() method in entity resolvers. This ensures backwards compatibility when loading models that were built using an older version of Mindmeld i.e a version <=4.4.0. Since no hash pickle file is dumped in the older version of MindMeld, using the latest .load() method throws a FileNotFoundError.
-
resolver_configurations
¶
-