mindmeld.models.features.query_features module¶
This module contains feature extractors for queries
-
mindmeld.models.features.query_features.
char_ngrams
(n, word, **kwargs)[source]¶ This function extracts character ngrams for the given word
Parameters: Returns: A list of character n-grams for the given word
Return type:
-
mindmeld.models.features.query_features.
enabled_stemming
(**kwargs)[source]¶ Feature extractor for enabling stemming of the query
-
mindmeld.models.features.query_features.
extract_bag_of_words_features
(ngram_lengths_to_start_positions, thresholds=(1, ), **kwargs)[source]¶ Returns a bag-of-words feature extractor.
Parameters: Returns: (function) The feature extractor.
-
mindmeld.models.features.query_features.
extract_char_ngrams
(lengths=(1, ), thresholds=(1, ), **kwargs)[source]¶ Extract character ngrams of specified lengths.
Parameters: - lengths (list of int) -- The ngram length.
- thresholds (list of int) -- frequency cut off value to include ngram in vocab
Returns: - (function) An feature extraction function that takes a query and
returns character ngrams of specified lengths.
-
mindmeld.models.features.query_features.
extract_char_ngrams_features
(ngram_lengths_to_start_positions, thresholds=(1, ), **kwargs)[source]¶ Returns a character n-gram feature extractor.
Parameters: Returns: (function) The feature extractor.
-
mindmeld.models.features.query_features.
extract_edge_ngrams
(lengths=(1, ), **kwargs)[source]¶ Extract ngrams of some specified lengths.
Parameters: lengths (list of int) -- The ngram length. Returns: (function) An feature extraction function that takes a query and returns ngrams of the specified lengths at start and end of query.
-
mindmeld.models.features.query_features.
extract_freq
(bins=5, **kwargs)[source]¶ Extract frequency bin features.
Parameters: bins (int) -- The number of frequency bins (besides OOV) Returns: A feature extraction function that returns the log of the count of query tokens within each frequency bin. Return type: (function)
-
mindmeld.models.features.query_features.
extract_gaz_freq
(**kwargs)[source]¶ Extract frequency bin features for each gazetteer
Returns: A feature extraction function that returns the log of the count of query tokens within each gazetteer's frequency bins. Return type: (function)
-
mindmeld.models.features.query_features.
extract_in_gaz_feature
(scaling=1, **kwargs)[source]¶ Returns a feature extractor that generates a set of features indicating the presence of query n-grams in different entity gazetteers. Used by the domain and intent classifiers when the 'in-gaz' feature is specified in the config.
Parameters: - scaling (int) -- A multiplicative scale factor to the
ratio_pop
andratio
features of - in-gaz feature set. (the) --
Returns: Returns an extractor function
Return type: function
- scaling (int) -- A multiplicative scale factor to the
-
mindmeld.models.features.query_features.
extract_in_gaz_ngram_features
(**kwargs)[source]¶ Returns a feature extractor for surrounding ngrams in gazetteers
-
mindmeld.models.features.query_features.
extract_in_gaz_span_features
(**kwargs)[source]¶ Returns a feature extractor for properties of spans in gazetteers
-
mindmeld.models.features.query_features.
extract_length
(**kwargs)[source]¶ Extract length measures (tokens and chars; linear and log) on whole query.
Returns: (function) A feature extraction function that takes a query and returns number of tokens and characters on linear and log scales
-
mindmeld.models.features.query_features.
extract_ngrams
(lengths=(1, ), thresholds=(1, ), **kwargs)[source]¶ Extract ngrams of some specified lengths.
Parameters: - lengths (list of int) -- The ngram length.
- thresholds (list of int) -- frequency cut off value to include ngram in vocab
Returns: (function) An feature extraction function that takes a query and returns ngrams of the specified lengths.
-
mindmeld.models.features.query_features.
extract_query_string
(scaling=1000, **kwargs)[source]¶ Extract whole query string as a feature.
Returns: (function) A feature extraction function that takes a query and returns the whole query string for exact matching
-
mindmeld.models.features.query_features.
extract_sentiment
(analyzer='composite', **kwargs)[source]¶ Generates sentiment intensity scores for each query
Returns: (function) A feature extraction function that takes in a query and returns sentiment values across positive, negative and neutral
-
mindmeld.models.features.query_features.
extract_sys_candidate_features
(start_positions=(0, ), **kwargs)[source]¶ Return an extractor for features based on a heuristic guess of numeric candidates at/near the current token.
Parameters: start_positions (tuple) -- positions relative to current token (=0) Returns: (function) The feature extractor.
-
mindmeld.models.features.query_features.
extract_sys_candidates
(entities=None, **kwargs)[source]¶ Return an extractor for features based on a heuristic guess of numeric candidates in the current query.
Returns: (function) The feature extractor.
-
mindmeld.models.features.query_features.
extract_word_shape
(lengths=(1, ), **kwargs)[source]¶ Extracts word shape for ngrams of specified lengths.
Parameters: lengths (list of int) -- The ngram length Returns: (function) An feature extraction function that takes a query and returns ngrams of word shapes, for n of specified lengths.
-
mindmeld.models.features.query_features.
find_ngrams
(input_list, n, **kwargs)[source]¶ Generates all n-gram combinations from a list of strings
Parameters: Returns: A list of ngrams across all the strings in the input list
Return type:
-
mindmeld.models.features.query_features.
update_features_sequence
(feat_seq, update_feat_seq, **kwargs)[source]¶ Update a list of features with another parallel list of features.
Parameters: - feat_seq (list of dict) -- The original list of feature dicts which gets mutated.
- update_feat_seq (list of dict) -- The list of features to update with.