mindmeld.components.classifier module¶
This module contains the base class for all the machine-learned classifiers in MindMeld.
-
class
mindmeld.components.classifier.
Classifier
(resource_loader)[source]¶ Bases:
abc.ABC
The base class for all the machine-learned classifiers in MindMeld. A classifier is a machine-learned model that categorizes input examples into one of the pre-determined class labels. Among other functionality, each classifier provides means by which to fit a statistical model on a given training dataset and then use the trained model to make predictions on new unseen data.
-
ready
¶ bool -- Whether the classifier is ready.
-
dirty
¶ bool -- Whether the classifier has unsaved changes to its model.
-
config
¶ ClassifierConfig -- The classifier configuration.
-
hash
¶ str -- A hash representing the inputs into the model.
-
dump
(model_path, incremental_model_path=None)[source]¶ Persists the trained classification model to disk.
Parameters:
-
evaluate
(queries=None, label_set=None, fetch_distribution=False)[source]¶ Evaluates the trained classification model on the given test data
Parameters: Returns: A ModelEvaluation object that contains evaluation results
Return type: ModelEvaluation
-
fit
(queries=None, label_set=None, incremental_timestamp=None, load_cached=True, **kwargs)[source]¶ Trains a statistical model for classification using the provided training examples and model configuration.
Parameters: - queries (list(ProcessedQuery) or ProcessedQueryList, optional) -- A list of queries to train on. If not specified the queries will be loaded from the label_set.
- label_set (str) -- A label set to load. If not specified, the default training set will be loaded.
- incremental_timestamp (str, optional) -- The timestamp folder to cache models in
- model_type (str, optional) -- The type of machine learning model to use. If omitted, the default model type will be used.
- model_settings (dict) -- Settings specific to the model type specified
- features (dict) -- Features to extract from each example instance to form the feature vector used for model training. If omitted, the default feature set for the model type will be used.
- params (dict) -- Params to pass to the underlying classifier
- params_selection (dict) -- The grid of hyper-parameters to search, for finding the optimal hyper-parameter settings for the model. If omitted, the default hyper-parameter search grid will be used.
- param_selection (dict) -- Configuration for param selection (using cross-validation) {'type': 'shuffle', 'n': 3, 'k': 10, 'n_jobs': 2, 'scoring': '', 'grid': { 'C': [100, 10000, 1000000]}}
- features -- The keys are the names of feature extractors and the values are either a kwargs dict which will be passed into the feature extractor function, or a callable which will be used as to extract features.
- load_cached (bool) -- If the model is cached on disk, load it into memory.
Returns: True if model was loaded and fit, False if a valid cached model exists but was not loaded (controlled by the load_cached arg).
Examples
Fit using default the configuration.
>>> clf.fit()
Fit using a 'special' label set.
>>> clf.fit(label_set='special')
Fit using given params, bypassing cross-validation. This is useful for speeding up train times if you are confident the params are optimized.
>>> clf.fit(params={'C': 10000000})
Fit using given parameter selection settings (also known as cross-validation settings).
>>> clf.fit(param_selection={})
Fit using a custom set of features, including a custom feature extractor. This is only for advanced users.
>>> clf.fit(features={ 'in-gaz': {}, // gazetteer features 'contrived': lambda exa, res: {'contrived': len(exa.text) == 26} })
-
load
(model_path)[source]¶ Loads the trained classification model from disk
Parameters: model_path (str) -- The location on disk where the model is stored
-
predict
(query, time_zone=None, timestamp=None, dynamic_resource=None)[source]¶ Predicts a class label for the given query using the trained classification model
Parameters: - query (Query or str) -- The input query
- time_zone (str, optional) -- The name of an IANA time zone, such as 'America/Los_Angeles', or 'Asia/Kolkata' See the [tz database](https://www.iana.org/time-zones) for more information.
- timestamp (long, optional) -- A unix time stamp for the request (in seconds).
- dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference
Returns: The predicted class label
Return type:
-
predict_proba
(query, time_zone=None, timestamp=None, dynamic_resource=None)[source]¶ Runs prediction on a given query and generates multiple hypotheses with their associated probabilities using the trained classification model
Parameters: - query (Query) -- The input query
- time_zone (str, optional) -- The name of an IANA time zone, such as 'America/Los_Angeles', or 'Asia/Kolkata' See the [tz database](https://www.iana.org/time-zones) for more information.
- timestamp (long, optional) -- A unix time stamp for the request (in seconds).
- dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference
Returns: a list of tuples of the form (str, float) grouping predicted class labels and their probabilities
Return type:
-
unload
()[source]¶ Unloads the model from memory. This helps reduce memory requirements while training other models.
-
view_extracted_features
(query, time_zone=None, timestamp=None, dynamic_resource=None)[source]¶ Extracts features for the given input based on the model config.
Parameters: - query (Query or str) -- The input query
- time_zone (str, optional) -- The name of an IANA time zone, such as 'America/Los_Angeles', or 'Asia/Kolkata' See the [tz database](https://www.iana.org/time-zones) for more information.
- timestamp (long, optional) -- A unix time stamp for the request (in seconds).
- dynamic_resource (dict) -- Dynamic gazetteer to be included for feature extraction.
Returns: The extracted features from the given input
Return type:
-
CLF_TYPE
= None¶ Classifier type (str).
-
-
class
mindmeld.components.classifier.
ClassifierConfig
(model_type=None, features=None, model_settings=None, params=None, param_selection=None)[source]¶ Bases:
object
A value object representing a classifier configuration
-
model_type
¶ str -- The name of the model type. Will be used to find the model class to instantiate.
-
model_settings
¶ dict -- Settings specific to the model type specified.
-
params
¶ dict -- Params to pass to the underlying classifier.
-
param_selection
¶ dict -- Configuration for param selection (using cross validation). For example: {'type': 'shuffle', 'n': 3, 'k': 10, 'n_jobs': 2, 'scoring': '', 'grid': {} }
-
features
¶ dict -- The keys are the names of feature extractors and the values are either a kwargs dict which will be passed into the feature extractor function, or a callable which will be used as to extract features.
-
to_dict
()[source]¶ Converts the model config object into a dict.
Returns: A dict version of the config. Return type: (dict)
-
to_json
()[source]¶ Converts the model config object to JSON.
Returns: JSON representation of the classifier. Return type: (str)
-
features
-
model_settings
-
model_type
-
param_selection
-
params
-