mindmeld.resource_loader module¶

This module contains the processor resource loader.

class mindmeld.resource_loader.Hasher(algorithm='sha1')[source]¶

Bases: object

An thin wrapper around hashlib. Uses cache for commonly hashed strings.

algorithm¶: str -- The hashing algorithm to use. Defaults to 'sha1'. See hashlib.algorithms_available for a list of options.

hash(string)[source]¶

Hashes a string.

Parameters:	string (str) -- The string to hash
Returns:	The hash result
Return type:	str

hash_file(filename)[source]¶

Creates a hash of the file. If the file does not exist, use the empty string instead and return the resulting hash digest.

Parameters:	filename (str) -- The path of a file to hash.
Returns:	A hex digest of the file hash
Return type:	str

hash_list(strings)[source]¶

Hashes a list of strings.

Parameters:	strings (list[str]) -- The strings to hash
Returns:	The hash result
Return type:	str

algorithm

Getter for algorithm property.

Returns:	the hashing algorithm
Return type:	str

class mindmeld.resource_loader.ProcessedQueryList(cache=None, elements=None)[source]¶

Bases: object

ProcessedQueryList provides a memory efficient disk backed list representation for a list of queries.

class DomainIterator(source)[source]¶: Bases: mindmeld.resource_loader.Iterator

class EntitiesIterator(source, cached=False)[source]¶: Bases: mindmeld.resource_loader.Iterator

class IntentIterator(source)[source]¶: Bases: mindmeld.resource_loader.Iterator

class Iterator(source, cached=False)[source]¶

Bases: object

reorder(indices)[source]¶

class ListIterator(elements)[source]¶

Bases: mindmeld.resource_loader.Iterator

ListIterator is a wrapper around a in memory list that supports the same functionality as a ProcessedQueryList.Iterator. This allows building of arbitrary lists of data and presenting them as a ProcessedQueryList.Iterator to functions that require them.

class MemoryCache(queries)[source]¶

Bases: object

A class to provide cache functionality for in-memory lists of ProcessedQuery objects

get(row_id)[source]¶

get_domain(row_id)[source]¶

get_entities(row_id)[source]¶

get_intent(row_id)[source]¶

get_query(row_id)[source]¶

get_raw_query(row_id)[source]¶

class QueryIterator(source, cached=False)[source]¶: Bases: mindmeld.resource_loader.Iterator

class RawQueryIterator(source, cached=False)[source]¶: Bases: mindmeld.resource_loader.Iterator

append(query_id)[source]¶

domains()[source]¶

entities()[source]¶

extend(query_ids)[source]¶

static from_in_memory_list(queries)[source]¶

Creates a ProcessedQueryList wrapper around an in-memory list of ProcessedQuery objects

Parameters:	queries (list(ProcessedQuery)) -- queries to wrap
Returns:	ProcessedQueryList object

intents()[source]¶

processed_queries()[source]¶

queries()[source]¶

raw_queries()[source]¶

cache¶

class mindmeld.resource_loader.ResourceLoader(app_path, query_factory, query_cache=None)[source]¶

Bases: object

ResourceLoader objects are responsible for loading resources necessary for nlp components (classifiers, entity recognizer, parsers, etc).

Note: we need to keep resource helpers as instance methods, as load_feature_resource assumes all helpers to be instance methods.

class CharNgramFreqBuilder(lengths, thresholds)[source]¶

Bases: object

Compiles n-gram character frequency dictionary of normalized query tokens

add(query)[source]¶

get_resource()[source]¶

class QueryFreqBuilder(enable_stemming=False)[source]¶

Bases: object

Compiles frequency dictionary of normalized and stemmed query strings

add(query)[source]¶

get_resource()[source]¶

class WordFreqBuilder(enable_stemming=False)[source]¶

Bases: object

Compiles unigram frequency dictionary of normalized query tokens

add(query)[source]¶

get_resource()[source]¶

class WordNgramFreqBuilder(lengths, thresholds, enable_stemming=False)[source]¶

Bases: object

Compiles n-gram frequency dictionary of normalized query tokens

add(query)[source]¶

get_resource()[source]¶

build_gazetteer(gaz_name, exclude_ngrams=False, force_reload=False)[source]¶

Builds the specified gazetteer using the entity data and mapping files.

Parameters:	gaz_name (str) -- The name of the entity the gazetteer corresponds to exclude_ngrams (bool, optional) -- Whether partial matches of entities should be included in the gazetteer force_reload (bool, optional) -- Whether file should be forcefully reloaded from disk

static create_resource_loader(app_path, query_factory=None, text_preparation_pipeline=None)[source]¶

Creates the resource loader for the app at app path.

Parameters:	app_path (str) -- The path to the directory containing the app's data query_factory (QueryFactory) -- The app's query factory text_preparation_pipeline (TextPreparationPipeline) -- The app's text preparation pipeline.
Returns:	a resource loader
Return type:	ResourceLoader

filter_file_paths(compiled_pattern, file_paths=None)[source]¶

Get a list of file paths that match a specific file_pattern

Parameters:	compiled_pattern (sre.SRE_Pattern) -- A compiled regex pattern to filter with. file_paths (list) -- A list of file paths.
Returns:	A list of file paths.
Return type:	list

static flatten_query_tree(query_tree)[source]¶

Takes a query tree and returns the elements in list form.

Parameters:	query_tree (dict) -- A nested dictionary that organizes queries by domain then intent.
Returns:	A list of Query objects.
Return type:	list

get_all_file_paths(file_pattern='.*.txt')[source]¶

Get a list of text file paths across all intents.

Returns:	A list of all file paths.
Return type:	list

get_entity_map(entity_type, force_reload=False)[source]¶

Creates a mapping file for a given entity.

Parameters:	entity_type (str) -- The name of the entity

get_flattened_label_set(domain=None, intent=None, label_set=None, force_reload=False)[source]¶

get_gazetteer(gaz_name, force_reload=False)[source]¶

Gets a gazetteers by name.

Parameters:	gaz_name (str) -- The name of the entity the gazetteer corresponds to
Returns:	Gazetteer data
Return type:	dict

get_gazetteer_hash(gaz_name)[source]¶

Gets the hash of a gazetteer by entity name.

Parameters:	gaz_name (str) -- The name of the entity the gazetteer corresponds to
Returns:	Hash of a gazetteer specified by name.
Return type:	str

get_gazetteers(force_reload=False)[source]¶

Gets gazetteers for all entities.

Returns:	Gazetteer data keyed by entity type
Return type:	dict

get_gazetteers_hash()[source]¶

Gets a single hash of all the gazetteer ordered by alphabetical entity type.

Returns:	Hash of a list of gazetteer hashes.
Return type:	str

get_labeled_queries(domain=None, intent=None, label_set=None, force_reload=False)[source]¶

Gets labeled queries from the cache, or loads them from disk.

Parameters:

domain (str) -- The domain of queries to load
intent (str) -- The intent of queries to load
force_reload (bool) -- Will not load queries from the cache when True

Returns:

ProcessedQuery objects (or strings) loaded from labeled query files, organized by: domain and intent.

Return type:

dict

static get_sentiment_analyzer()[source]¶: Returns a sentiment analyzer and downloads the necessary data libraries required from nltk

get_sys_entity_types(labels)[source]¶

Get all system entity types from the entity labels.

Parameters:	labels (list of QueryEntity) -- a list of labeled entities

get_text_preparation_pipeline()[source]¶

Get the tokenizer from the query_factory attribute

Returns:	Class responsible for the normalization and tokenization of text.
Return type:	text_preparation_pipeline (TextPreparationPipeline)

hash_feature_resource(name)[source]¶

Hashes the named resource.

Parameters:	name (str) -- The name of the resource to hash
Returns:	The hash result
Return type:	str

hash_list(items)[source]¶

Hashes the list of items.

Parameters:	items (list[str]) -- A list of strings to hash
Returns:	The hash result
Return type:	str

hash_string(string)[source]¶

Hashes a string.

Parameters:	string (str) -- The string to hash
Returns:	The hash result
Return type:	str

load_entity_map(entity_type)[source]¶

Loads an entity mapping file.

Parameters:	entity_type (str) -- The name of the entity

load_gazetteer(gaz_name)[source]¶

Loads a gazetteer specified by the entity name.

Parameters:	gaz_name (str) -- The name of the entity the gazetteer corresponds to

load_query_file(domain, intent, file_path)[source]¶

Loads the queries from the specified file.

Parameters:	domain (str) -- The domain of the query file intent (str) -- The intent of the query file file_path (str) -- The name of the query file

RSC_HASH_MAP = {'c_ngram_freq': <function ResourceLoader.<lambda> at 0x13121d310>, 'enable-stemming': <function ResourceLoader.<lambda> at 0x13121d4c0>, 'gazetteers': <function ResourceLoader.get_gazetteers_hash at 0x13121ad30>, 'q_freq': <function ResourceLoader.<lambda> at 0x13121d3a0>, 'sys_types': <function ResourceLoader.<lambda> at 0x13121d430>, 'vader_classifier': <function ResourceLoader.<lambda> at 0x13121d550>, 'w_freq': <function ResourceLoader.<lambda> at 0x13121d1f0>, 'w_ngram_freq': <function ResourceLoader.<lambda> at 0x13121d280>}¶

hash_to_model_path¶: dict -- A dictionary that maps hashes to the file path of the classifier.

query_cache¶: Lazy load the query cache since it's not required for inference.