mindmeld.active_learning.heuristics module¶
This module contains query selection heuristics for the Active Learning Pipeline.
-
class
mindmeld.active_learning.heuristics.
DisagreementSampling
[source]¶ Bases:
abc.ABC
-
static
rank_2d
(confidences_2d: List[List[float]]) → List[int][source]¶ Need confidences_2d from more than one model (confidences_3d) to run DisagreementSampling.
Parameters: confidences_2d (List[List[float]]) -- Confidence probabilities per element. Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
rank_3d
(confidences_3d: List[List[List[float]]]) → List[int][source]¶ Finds the most frequent class label for a given element across all models. Calculates the agreement per element (% of models who voted the most frequent class). Ranks elements by highest to lowest disagreement.
Parameters: confidences_3d (List[List[List[float]]]) -- Confidence probabilities per element. Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
-
class
mindmeld.active_learning.heuristics.
EnsembleSampling
[source]¶ Bases:
abc.ABC
-
static
rank_2d
(confidences_2d: List[List[float]]) → List[int][source]¶ Combine ranks from all heuristics that can support ranking given 2d confidence input.
Parameters: confidences_2d (List[List[float]]) -- Confidence probabilities per element. Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
rank_3d
(confidences_3d: List[List[List[float]]]) → List[int][source]¶ Combine ranks from all heuristics that can support ranking given 3d confidence input.
Parameters: confidences_3d (List[List[List[float]]]) -- Confidence probabilities per element. Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
-
class
mindmeld.active_learning.heuristics.
EntropySampling
[source]¶ Bases:
abc.ABC
-
static
rank_2d
(confidences_2d: List[List[float]]) → List[int][source]¶ Calculates the entropy score of the confidences per element. Elements are ranked from highest to lowest entropy.
Parameters: confidences_2d (List[List[float]]) -- Confidence probabilities per element. Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
rank_3d
(confidences_3d: List[List[List[float]]]) → List[int][source]¶ Calculates the entropy score of the confidences per element. Elements are ranked from highest to lowest entropy. This is done for each confidence_2d in a confidence_3d. The rankings are added to generate a final ranking.
Parameters: confidences_3d (List[List[List[float]]]) -- Confidence probabilities per element. Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
rank_entities
(entity_confidences: List[List[List[float]]]) → List[int][source]¶ Calculates the entropy score of the entity confidences per element. Elements are ranked from highest to lowest entropy. :returns: Token Entropy: Average of per token entropies across a query; or
Total Token Entropy: Sum of token entropies across a query.Return type: Ranked lists based on either
-
static
-
class
mindmeld.active_learning.heuristics.
Heuristic
[source]¶ Bases:
abc.ABC
Heuristic base class used as Active Learning query selection strategies.
-
static
ordered_indices_list_to_final_rank
(ordered_sample_indices_list: List[List[int]])[source]¶ Converts multiple lists of ordered indices to a final rank. :param ordered_sample_indices_list: Multiple lists of ordered sample indices. :type ordered_sample_indices_list: List[List[int]]
Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
rank_2d
(confidences_2d: List[List[float]]) → List[int][source]¶ Ranking method for 2d confidence arrays. :param confidences_2d: Confidence probabilities per element. :type confidences_2d: List[List[float]]
Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
rank_3d
(confidences_3d: List[List[List[float]]]) → List[int][source]¶ Ranking method for 3d confidence arrays. :param confidences_3d: Confidence probabilities per element. :type confidences_3d: List[List[List[float]]]
Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
-
class
mindmeld.active_learning.heuristics.
HeuristicsFactory
[source]¶ Bases:
object
Heuristics Factory Class
-
class
mindmeld.active_learning.heuristics.
KLDivergenceSampling
[source]¶ Bases:
abc.ABC
-
static
get_divergences_per_element_no_segments
(confidences_3d: List[List[List[float]]]) → List[List[float]][source]¶ Parameters: confidences_3d (List[List[List[float]]]) -- Confidence probabilities per element. Returns: Divergences per model for each element. Return type: divergences (List[List[float]])
-
static
get_divergences_per_element_with_segments
(confidences_3d: List[List[List[float]]], confidence_segments: Dict) → List[List[float]][source]¶ Calculate divergences by segments defined in confidence segments where p_d is the probabilities within class X and q_d is the mean probability distribution for class X. Divergence(p_d, q_d) is calculated for each element in all classes.
Parameters: Returns: Divergences per model for each element.
Return type: divergences (List[List[float]])
-
static
get_domain
(confidence_segments: Dict, row: List[List[float]]) → str[source]¶ Get the domain for a given probability row, inferred based on the non-zero values. :param confidence_segments: A mapping between domains (str) to the
corresponding indices in the probability vector. Used for intent-level KLD.Parameters: row (List[List[float]]) -- A single row representing a queries probability distrubition. Returns: The domain that the given row belongs to. Return type: domain (str) Raises: AssertionError
-- If a row does not have an associated domain.
-
static
rank_2d
(confidences_2d: List[List[float]]) → List[int][source]¶ Need confidences_2d from more than one model (confidences_3d) to run KLDivergenceSampling.
Parameters: confidences_2d (List[List[float]]) -- Confidence probabilities per element. Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
rank_3d
(confidences_3d: List[List[List[float]]], confidence_segments: Dict = None) → List[int][source]¶ Calculates the KL Divergence between the average confidence distribution across all models for a given class and the confidence distribution for a given element in said class. Elements are ranked from highest to lowest divergence.
Parameters: Returns: Indices corresponding to elements ranked by the heuristic.
Return type: ranked_indices (List[int])
-
static
-
class
mindmeld.active_learning.heuristics.
LeastConfidenceSampling
[source]¶ Bases:
abc.ABC
-
static
rank_2d
(confidences_2d: List[List[float]]) → List[int][source]¶ First calculates the highest (max) confidences per element and then returns the elements from lowest confidence to highest confidence.
Parameters: confidences_2d (List[List[float]]) -- Confidence probabilities per element. Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
rank_3d
(confidences_3d: List[List[List[float]]]) → List[int][source]¶ First calculates the highest (max) confidences per element and then returns the elements with the lowest max confidence. This is done for each confidence_2d in a confidence_3d. The rankings are added to generate a final ranking.
Parameters: confidences_3d (List[List[List[float]]]) -- Confidence probabilities per element. Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
-
class
mindmeld.active_learning.heuristics.
MarginSampling
[source]¶ Bases:
abc.ABC
-
static
rank_2d
(confidences_2d: List[List[float]]) → List[int][source]¶ Calculates the "margin" or difference between the highest and second highest confidence score per element. Elements are ranked from lowest to highest margin.
Parameters: confidences_2d (List[List[float]]) -- Confidence probabilities per element. Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
rank_3d
(confidences_3d: List[List[List[float]]]) → List[int][source]¶ Calculates the "margin" or difference between the highest and second highest confidence score per element. Elements are ranked from lowest to highest margin. This is done for each confidence_2d in a confidence_3d. The rankings are added to generate a final ranking.
Parameters: confidences_3d (List[List[List[float]]]) -- Confidence probabilities per element. Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
rank_entities
(entity_confidences: List[List[List[float]]]) → List[int][source]¶ Queries are ranked on the basis of Margin Sampling for tag sequences. This approach uses beam search to obtain the top 2 queries/sequences in terms of the query confidences for entities. The margin is calculated between these top two sequences. (For more information about this method: https://dl.acm.org/doi/pdf/10.5555/1613715.1613855)
-
static
-
class
mindmeld.active_learning.heuristics.
RandomSampling
[source]¶ Bases:
abc.ABC
-
static
random_rank
(num_elements: int) → List[int][source]¶ Randomly shuffles indices. :param num_elements: Number of elements to randomly sample. :type num_elements: int
Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
rank_2d
(confidences_2d: List[List[float]]) → List[int][source]¶ Randomly shuffles indices. :param confidences_2d: Confidence probabilities per element. :type confidences_2d: List[List[float]]
Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
rank_3d
(confidences_3d: List[List[List[float]]]) → List[int][source]¶ Randomly shuffles indices. :param confidences_3d: Confidence probabilities per element. :type confidences_3d: List[List[List[float]]]
Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])
-
static
-
mindmeld.active_learning.heuristics.
stratified_random_sample
(labels: List) → List[int][source]¶ Reorders indices in evenly repeating pattern for as long as possible and then shuffles and appends the remaining labels. The first part of this list will maintain a uniform distrubition across labels, however, since the labels may not be perfectly balanced the remaining portion will have a similar distribution as the original data.
|-------- Evenly Repeating --------||--- Shuffled Remaining ----|For Example: ["R","B","C","R","B","C","R","B","C","B","R","R","B","B","B","R"]
Parameters: labels (List[str or int]) -- A list of labels. (Eg: labels = ["R", "B", "B", "C"]) Returns: Indices corresponding to elements ranked by the heuristic. Return type: ranked_indices (List[int])