mindmeld.resource_loader module¶
This module contains the processor resource loader.
-
class
mindmeld.resource_loader.
Hasher
(algorithm='sha1')[source]¶ Bases:
object
An thin wrapper around hashlib. Uses cache for commonly hashed strings.
-
algorithm
¶ str -- The hashing algorithm to use. Defaults to 'sha1'. See hashlib.algorithms_available for a list of options.
-
hash
(string)[source]¶ Hashes a string.
Parameters: string (str) -- The string to hash Returns: The hash result Return type: str
-
hash_file
(filename)[source]¶ Creates a hash of the file. If the file does not exist, use the empty string instead and return the resulting hash digest.
Parameters: filename (str) -- The path of a file to hash. Returns: A hex digest of the file hash Return type: str
-
hash_list
(strings)[source]¶ Hashes a list of strings.
Parameters: strings (list[str]) -- The strings to hash Returns: The hash result Return type: str
-
algorithm
Getter for algorithm property.
Returns: the hashing algorithm Return type: str
-
-
class
mindmeld.resource_loader.
ProcessedQueryList
(cache=None, elements=None)[source]¶ Bases:
object
ProcessedQueryList provides a memory efficient disk backed list representation for a list of queries.
-
class
ListIterator
(elements)[source]¶ Bases:
mindmeld.resource_loader.Iterator
ListIterator is a wrapper around a in memory list that supports the same functionality as a ProcessedQueryList.Iterator. This allows building of arbitrary lists of data and presenting them as a ProcessedQueryList.Iterator to functions that require them.
-
class
MemoryCache
(queries)[source]¶ Bases:
object
A class to provide cache functionality for in-memory lists of ProcessedQuery objects
-
static
from_in_memory_list
(queries)[source]¶ Creates a ProcessedQueryList wrapper around an in-memory list of ProcessedQuery objects
Parameters: queries (list(ProcessedQuery)) -- queries to wrap Returns: ProcessedQueryList object
-
cache
¶
-
class
-
class
mindmeld.resource_loader.
ResourceLoader
(app_path, query_factory, query_cache=None)[source]¶ Bases:
object
ResourceLoader objects are responsible for loading resources necessary for nlp components (classifiers, entity recognizer, parsers, etc).
Note: we need to keep resource helpers as instance methods, as
load_feature_resource
assumes all helpers to be instance methods.-
class
CharNgramFreqBuilder
(lengths, thresholds)[source]¶ Bases:
object
Compiles n-gram character frequency dictionary of normalized query tokens
-
class
QueryFreqBuilder
(enable_stemming=False)[source]¶ Bases:
object
Compiles frequency dictionary of normalized and stemmed query strings
-
class
WordFreqBuilder
(enable_stemming=False)[source]¶ Bases:
object
Compiles unigram frequency dictionary of normalized query tokens
-
class
WordNgramFreqBuilder
(lengths, thresholds, enable_stemming=False)[source]¶ Bases:
object
Compiles n-gram frequency dictionary of normalized query tokens
-
build_gazetteer
(gaz_name, exclude_ngrams=False, force_reload=False)[source]¶ Builds the specified gazetteer using the entity data and mapping files.
Parameters:
-
static
create_resource_loader
(app_path, query_factory=None, text_preparation_pipeline=None)[source]¶ Creates the resource loader for the app at app path.
Parameters: - app_path (str) -- The path to the directory containing the app's data
- query_factory (QueryFactory) -- The app's query factory
- text_preparation_pipeline (TextPreparationPipeline) -- The app's text preparation pipeline.
Returns: a resource loader
Return type: ResourceLoader
-
filter_file_paths
(compiled_pattern, file_paths=None)[source]¶ Get a list of file paths that match a specific file_pattern
Parameters: - compiled_pattern (sre.SRE_Pattern) -- A compiled regex pattern to filter with.
- file_paths (list) -- A list of file paths.
Returns: A list of file paths.
Return type:
-
static
flatten_query_tree
(query_tree)[source]¶ Takes a query tree and returns the elements in list form.
Parameters: query_tree (dict) -- A nested dictionary that organizes queries by domain then intent. Returns: A list of Query objects. Return type: list
-
get_all_file_paths
(file_pattern='.*.txt')[source]¶ Get a list of text file paths across all intents.
Returns: A list of all file paths. Return type: list
-
get_entity_map
(entity_type, force_reload=False)[source]¶ Creates a mapping file for a given entity.
Parameters: entity_type (str) -- The name of the entity
-
get_gazetteer
(gaz_name, force_reload=False)[source]¶ Gets a gazetteers by name.
Parameters: gaz_name (str) -- The name of the entity the gazetteer corresponds to Returns: Gazetteer data Return type: dict
-
get_gazetteer_hash
(gaz_name)[source]¶ Gets the hash of a gazetteer by entity name.
Parameters: gaz_name (str) -- The name of the entity the gazetteer corresponds to Returns: Hash of a gazetteer specified by name. Return type: str
-
get_gazetteers
(force_reload=False)[source]¶ Gets gazetteers for all entities.
Returns: Gazetteer data keyed by entity type Return type: dict
-
get_gazetteers_hash
()[source]¶ Gets a single hash of all the gazetteer ordered by alphabetical entity type.
Returns: Hash of a list of gazetteer hashes. Return type: str
-
get_labeled_queries
(domain=None, intent=None, label_set=None, force_reload=False)[source]¶ Gets labeled queries from the cache, or loads them from disk.
Parameters: Returns: - ProcessedQuery objects (or strings) loaded from labeled query files, organized by
domain and intent.
Return type:
-
static
get_sentiment_analyzer
()[source]¶ Returns a sentiment analyzer and downloads the necessary data libraries required from nltk
-
get_sys_entity_types
(labels)[source]¶ Get all system entity types from the entity labels.
Parameters: labels (list of QueryEntity) -- a list of labeled entities
-
get_text_preparation_pipeline
()[source]¶ Get the tokenizer from the query_factory attribute
Returns: - Class responsible for
- the normalization and tokenization of text.
Return type: text_preparation_pipeline (TextPreparationPipeline)
-
hash_feature_resource
(name)[source]¶ Hashes the named resource.
Parameters: name (str) -- The name of the resource to hash Returns: The hash result Return type: str
-
hash_list
(items)[source]¶ Hashes the list of items.
Parameters: items (list[str]) -- A list of strings to hash Returns: The hash result Return type: str
-
hash_string
(string)[source]¶ Hashes a string.
Parameters: string (str) -- The string to hash Returns: The hash result Return type: str
-
load_entity_map
(entity_type)[source]¶ Loads an entity mapping file.
Parameters: entity_type (str) -- The name of the entity
-
load_gazetteer
(gaz_name)[source]¶ Loads a gazetteer specified by the entity name.
Parameters: gaz_name (str) -- The name of the entity the gazetteer corresponds to
-
load_query_file
(domain, intent, file_path)[source]¶ Loads the queries from the specified file.
Parameters:
-
RSC_HASH_MAP
= {'c_ngram_freq': <function ResourceLoader.<lambda> at 0x13121d310>, 'enable-stemming': <function ResourceLoader.<lambda> at 0x13121d4c0>, 'gazetteers': <function ResourceLoader.get_gazetteers_hash at 0x13121ad30>, 'q_freq': <function ResourceLoader.<lambda> at 0x13121d3a0>, 'sys_types': <function ResourceLoader.<lambda> at 0x13121d430>, 'vader_classifier': <function ResourceLoader.<lambda> at 0x13121d550>, 'w_freq': <function ResourceLoader.<lambda> at 0x13121d1f0>, 'w_ngram_freq': <function ResourceLoader.<lambda> at 0x13121d280>}¶
-
hash_to_model_path
¶ dict -- A dictionary that maps hashes to the file path of the classifier.
-
query_cache
¶ Lazy load the query cache since it's not required for inference.
-
class