mindmeld.models.nn_utils.sequence_classification module¶
Custom modules built on top of nn layers that can do sequence classification
-
class
mindmeld.models.nn_utils.sequence_classification.
BaseSequenceClassification
[source]¶ Bases:
mindmeld.models.nn_utils.classification.BaseClassification
Base class that defines all the necessary elements to successfully train/infer custom pytorch modules wrapped on top of this base class. Classes derived from this base can be trained for sequence classification.
-
forward
(batch_data)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
predict
(examples)[source]¶ Returns predicted class labels
Parameters: examples (List[str]) -- The list of examples for which predictions are computed and returned.
-
predict_proba
(examples)[source]¶ Returns predicted class probabilities
Parameters: examples (List[str]) -- The list of examples for which class prediction probabilities are computed and returned.
-
classification_type
¶
-
-
class
mindmeld.models.nn_utils.sequence_classification.
BertForSequenceClassification
[source]¶ Bases:
mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification
-
fit
(examples, labels, **params)[source]¶ Trains the underlying neural model on the inputted data and finally retains the best scored model among all iterations.
Because of possibly large sized neural models, instead of retaining a copy of best set of model weights on RAM, it is advisable to dump them in a temporary folder and upon completing the training process, load the best checkpoint weights.
Parameters: - examples (List[str]) -- A list of text strings that will be used for model training and validation
- labels (Union[List[int], List[List[int]]]) -- A list of labels passed in as integers corresponding to the examples. The encoded labels must have values between 0 and n_classes-1 -- one label per example in case of sequence classification and a sequence of labels per example in case of token classification
-
-
class
mindmeld.models.nn_utils.sequence_classification.
CnnForSequenceClassification
[source]¶ Bases:
mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification
A CNN module that operates on a batched sequence of token ids. The tokens could be characters or words or sub-words. This module finally outputs one 1D representation for each instance in the batch (i.e. [BS, EMB_DIM]).
The forward method of this module expects only padded token ids as input.
-
class
mindmeld.models.nn_utils.sequence_classification.
EmbedderForSequenceClassification
[source]¶ Bases:
mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification
An embedder pooling module that operates on a batched sequence of token ids. The tokens could be characters or words or sub-words. This module finally outputs one 1D representation for each instance in the batch (i.e. [BS, EMB_DIM]).
The forward method of this module expects padded token ids along with numer of tokens per instance in the batch.
Additionally, one can set different coefficients for different tokens of the embedding matrix (e.g. tf-idf weights).
-
class
mindmeld.models.nn_utils.sequence_classification.
LstmForSequenceClassification
[source]¶ Bases:
mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification
A LSTM module that operates on a batched sequence of token ids. The tokens could be characters or words or sub-words. This module finally outputs one 1D representation for each instance in the batch (i.e. [BS, EMB_DIM]).
The forward method of this module expects padded token ids along with numer of tokens per instance in the batch.