mindmeld.models.tagger_models module¶
This module contains the Memm entity recognizer.
-
class
mindmeld.models.tagger_models.
PytorchTaggerModel
(config)[source]¶ Bases:
mindmeld.models.model.PytorchModel
-
evaluate
(examples, labels)[source]¶ Evaluates a model against the given examples and labels
Parameters: - examples -- A list of examples to predict
- labels -- A list of expected labels
Returns: an object containing information about the evaluation
Return type: ModelEvaluation
-
ALLOWED_CLASSIFIER_TYPES
= ['embedder', 'lstm-pytorch', 'cnn-lstm', 'lstm-lstm']¶
-
-
class
mindmeld.models.tagger_models.
TaggerModel
(config)[source]¶ Bases:
mindmeld.models.model.Model
A machine learning classifier for tags.
This class manages feature extraction, training, cross-validation, and prediction. The design goal is that after providing initial settings like hyperparameters, grid-searchable hyperparameters, feature extractors, and cross-validation settings, TaggerModel manages all of the details involved in training and prediction such that the input to training or prediction is Query objects, and the output is class names, and no data manipulation is needed from the client.
-
classifier_type
¶ str -- The name of the classifier type. Currently recognized values are "memm","crf", and "lstm"
-
hyperparams
¶ dict -- A kwargs dict of parameters that will be used to initialize the classifier object.
-
grid_search_hyperparams
¶ dict -- Like 'hyperparams', but the values are lists of parameters. The training process will grid search over the Cartesian product of these parameter lists and select the best via cross-validation.
-
feat_specs
¶ dict -- A mapping from feature extractor names, as given in FEATURE_NAME_MAP, to a kwargs dict, which will be passed into the associated feature extractor function.
-
cross_validation_settings
¶ dict -- A dict that contains "type", which specifies the name of the cross-validation strategy, such as "k-folds" or "shuffle". The remaining keys are parameters specific to the cross-validation type, such as "k" when the type is "k-folds".
-
evaluate
(examples, labels, fetch_distribution=False)[source]¶ Evaluates a model against the given examples and labels
Parameters: - examples -- A list of examples to predict
- labels -- A list of expected labels
Returns: an object containing information about the evaluation
Return type: ModelEvaluation
-
fit
(examples, labels, params=None)[source]¶ Trains the model.
Parameters: - examples (ProcessedQueryList.QueryIterator) -- A list of queries to train on.
- labels (ProcessedQueryList.EntitiesIterator) -- A list of expected labels.
- params (dict) -- Parameters of the classifier.
-
classmethod
load
(path)[source]¶ Load the model state to memory.
Parameters: path (str) -- The path to dump the model to
-
predict
(examples, dynamic_resource=None)[source]¶ Parameters: - examples (list of mindmeld.core.Query) -- a list of queries to train on
- dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference
Returns: a list of predicted labels
Return type: (list of tuples of mindmeld.core.QueryEntity)
-
predict_proba
(examples, dynamic_resource=None, fetch_distribution=False)[source]¶ Parameters: - examples (list of mindmeld.core.Query) -- a list of queries to train on
- dynamic_resource (dict, optional) -- A dynamic resource to aid NLP inference
Returns: a list of predicted labels with confidence scores
Return type: list of tuples of (mindmeld.core.QueryEntity)
-
select_params
(examples, labels, selection_settings=None)[source]¶ - Selects the best set of hyper-parameters for a given set of examples and true labels
- through cross-validation
Parameters: - examples -- A list of example queries
- labels -- A list of labels associated with the queries
- selection_settings -- A dictionary of parameter lists to select from
Returns: A dictionary of optimized parameters to use
Return type:
-
view_extracted_features
(query, dynamic_resource=None)[source]¶ Returns a dictionary of extracted features and their weights for a given query
Parameters: - query (mindmeld.core.Query) -- The query to extract features from
- dynamic_resource (dict) -- The dynamic resource used along with the query
Returns: A list of dictionaries of extracted features and their weights
Return type:
-
ACCURACY_SCORING
= 'accuracy'¶
-
ALLOWED_CLASSIFIER_TYPES
= ['crf', 'memm', 'lstm']¶
-
CRF_TYPE
= 'crf'¶
-
DEFAULT_FEATURES
= {'bag-of-words-seq': {'ngram_lengths_to_start_positions': {1: [-2, -1, 0, 1, 2], 2: [-2, -1, 0, 1]}}, 'in-gaz-span-seq': {}, 'sys-candidates-seq': {'start_positions': [-1, 0, 1]}}¶
-
LSTM_TYPE
= 'lstm'¶
-
MEMM_TYPE
= 'memm'¶
-
SEQUENCE_MODELS
= ['crf']¶
-
SEQ_ACCURACY_SCORING
= 'seq_accuracy'¶
-