mindmeld.models.nn_utils package¶
-
class
mindmeld.models.nn_utils.
EmbedderForSequenceClassification
[source]¶ Bases:
mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification
An embedder pooling module that operates on a batched sequence of token ids. The tokens could be characters or words or sub-words. This module finally outputs one 1D representation for each instance in the batch (i.e. [BS, EMB_DIM]).
The forward method of this module expects padded token ids along with numer of tokens per instance in the batch.
Additionally, one can set different coefficients for different tokens of the embedding matrix (e.g. tf-idf weights).
-
class
mindmeld.models.nn_utils.
CnnForSequenceClassification
[source]¶ Bases:
mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification
A CNN module that operates on a batched sequence of token ids. The tokens could be characters or words or sub-words. This module finally outputs one 1D representation for each instance in the batch (i.e. [BS, EMB_DIM]).
The forward method of this module expects only padded token ids as input.
-
class
mindmeld.models.nn_utils.
LstmForSequenceClassification
[source]¶ Bases:
mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification
A LSTM module that operates on a batched sequence of token ids. The tokens could be characters or words or sub-words. This module finally outputs one 1D representation for each instance in the batch (i.e. [BS, EMB_DIM]).
The forward method of this module expects padded token ids along with numer of tokens per instance in the batch.
-
class
mindmeld.models.nn_utils.
BertForSequenceClassification
[source]¶ Bases:
mindmeld.models.nn_utils.sequence_classification.BaseSequenceClassification
-
fit
(examples, labels, **params)[source]¶ Trains the underlying neural model on the inputted data and finally retains the best scored model among all iterations.
Because of possibly large sized neural models, instead of retaining a copy of best set of model weights on RAM, it is advisable to dump them in a temporary folder and upon completing the training process, load the best checkpoint weights.
Parameters: - examples (List[str]) -- A list of text strings that will be used for model training and validation
- labels (Union[List[int], List[List[int]]]) -- A list of labels passed in as integers corresponding to the examples. The encoded labels must have values between 0 and n_classes-1 -- one label per example in case of sequence classification and a sequence of labels per example in case of token classification
-
-
class
mindmeld.models.nn_utils.
EmbedderForTokenClassification
[source]¶ Bases:
mindmeld.models.nn_utils.token_classification.BaseTokenClassification
-
class
mindmeld.models.nn_utils.
LstmForTokenClassification
[source]¶ Bases:
mindmeld.models.nn_utils.token_classification.BaseTokenClassification
A LSTM module that operates on a batched sequence of token ids. The tokens could be characters or words or sub-words. This module uses an additional input that determines how the sequence of embeddings obtained after the LSTM layers for each instance in the batch, needs to be split. Once split, the sub-groups of embeddings (each sub-group corresponding to a word or a phrase) can be collapsed to 1D representation per sub-group through pooling operations. Finally, this module outputs a 2D representation for each instance in the batch (i.e. [BS, SEQ_LEN', EMB_DIM]).
-
class
mindmeld.models.nn_utils.
CharCnnWithWordLstmForTokenClassification
[source]¶ Bases:
mindmeld.models.nn_utils.token_classification.BaseTokenClassification
-
class
mindmeld.models.nn_utils.
CharLstmWithWordLstmForTokenClassification
[source]¶ Bases:
mindmeld.models.nn_utils.token_classification.BaseTokenClassification
-
class
mindmeld.models.nn_utils.
BertForTokenClassification
[source]¶ Bases:
mindmeld.models.nn_utils.token_classification.BaseTokenClassification
-
fit
(examples, labels, **params)[source]¶ Trains the underlying neural model on the inputted data and finally retains the best scored model among all iterations.
Because of possibly large sized neural models, instead of retaining a copy of best set of model weights on RAM, it is advisable to dump them in a temporary folder and upon completing the training process, load the best checkpoint weights.
Parameters: - examples (List[str]) -- A list of text strings that will be used for model training and validation
- labels (Union[List[int], List[List[int]]]) -- A list of labels passed in as integers corresponding to the examples. The encoded labels must have values between 0 and n_classes-1 -- one label per example in case of sequence classification and a sequence of labels per example in case of token classification
-
-
class
mindmeld.models.nn_utils.
TokenizerType
[source]¶ Bases:
enum.Enum
An enumeration.
-
BPE_TOKENIZER
= 'bpe-tokenizer'¶
-
CHAR_TOKENIZER
= 'char-tokenizer'¶
-
HUGGINGFACE_PRETRAINED_TOKENIZER
= 'huggingface_pretrained-tokenizer'¶
-
WHITESPACE_AND_CHAR_DUAL_TOKENIZER
= 'whitespace_and_char-tokenizer'¶
-
WHITESPACE_TOKENIZER
= 'whitespace-tokenizer'¶
-
WORDPIECE_TOKENIZER
= 'wordpiece-tokenizer'¶
-
-
class
mindmeld.models.nn_utils.
EmbedderType
[source]¶ Bases:
enum.Enum
An enumeration.
-
BERT
= 'bert'¶
-
GLOVE
= 'glove'¶
-
NONE
= None¶
-
-
class
mindmeld.models.nn_utils.
SequenceClassificationType
[source]¶ Bases:
enum.Enum
An enumeration.
-
CNN
= 'cnn'¶
-
EMBEDDER
= 'embedder'¶
-
LSTM
= 'lstm'¶
-
-
class
mindmeld.models.nn_utils.
TokenClassificationType
[source]¶ Bases:
enum.Enum
An enumeration.
-
CNN_LSTM
= 'cnn-lstm'¶
-
EMBEDDER
= 'embedder'¶
-
LSTM
= 'lstm-pytorch'¶
-
LSTM_LSTM
= 'lstm-lstm'¶
-