mindmeld.text_preparation.normalizers module¶
This module contains Normalizers.
-
class
mindmeld.text_preparation.normalizers.
ASCIIFold
[source]¶ Bases:
mindmeld.text_preparation.normalizers.Normalizer
An ASCII Folding Normalizer.
-
fold_char_to_ascii
(char)[source]¶ Return the ASCII character corresponding to the folding token.
Parameters: char -- ASCII folding token Returns: a ASCII character Return type: char
-
-
class
mindmeld.text_preparation.normalizers.
Lowercase
[source]¶ Bases:
mindmeld.text_preparation.normalizers.Normalizer
Lowercase Normalizer Class.
-
class
mindmeld.text_preparation.normalizers.
NFC
[source]¶ Bases:
mindmeld.text_preparation.normalizers.Normalizer
Unicode NFC Normalizer Class. (Canonical Decomposition, followed by Canonical Composition)
For more details: https://unicode.org/reports/tr15/#Norm_Forms
-
class
mindmeld.text_preparation.normalizers.
NFD
[source]¶ Bases:
mindmeld.text_preparation.normalizers.Normalizer
Unicode NFD Normalizer Class. (Canonical Decomposition)
For more details: https://unicode.org/reports/tr15/#Norm_Forms
-
class
mindmeld.text_preparation.normalizers.
NFKC
[source]¶ Bases:
mindmeld.text_preparation.normalizers.Normalizer
Unicode NFKC Normalizer Class. (Compatibility Decomposition, followed by Canonical Composition)
For more details: https://unicode.org/reports/tr15/#Norm_Forms
-
class
mindmeld.text_preparation.normalizers.
NFKD
[source]¶ Bases:
mindmeld.text_preparation.normalizers.Normalizer
Unicode NFKD Normalizer Class. (Compatibility Decomposition)
For more details: https://unicode.org/reports/tr15/#Norm_Forms
-
class
mindmeld.text_preparation.normalizers.
NoOpNormalizer
[source]¶ Bases:
mindmeld.text_preparation.normalizers.Normalizer
A No-Ops Normalizer.
-
class
mindmeld.text_preparation.normalizers.
Normalizer
[source]¶ Bases:
abc.ABC
Abstract Normalizer Base Class.
-
class
mindmeld.text_preparation.normalizers.
NormalizerFactory
[source]¶ Bases:
object
Normalizer Factory Class
-
class
mindmeld.text_preparation.normalizers.
RegexNormalizerRule
(pattern: str, replacement: str)[source]¶
-
class
mindmeld.text_preparation.normalizers.
RegexNormalizerRuleFactory
[source]¶ Bases:
object
-
static
get_default_regex_normalizer_rule
(regex_normalizer: str)[source]¶ Creates a RegexNormalizerRule object based on the given rule and the current EXCEPTION_CHARS.
Parameters: regex_normalizer (str) -- Name of the desired RegexNormalizerRule Returns: Default Regex Normalizer Rule Return type: (RegexNormalizerRule)
-
static
get_regex_normalizers
(regex_norm_rules)[source]¶ A static method to get a RegexNormalizerRule from regex_norm_rules.
Parameters: regex_norm_rules (List[Dict], optional) -- Regex normalization rules represented as dictionaries. The example rule below removes any text in parentheses. {
"pattern": "(.+?)", "replacement": ""}
Returns: - List of RegexNormalizerRule ojects
- created from the regex_norm_rules_provided.
Return type: regex_normalizer_rules (List[RegexNormalizerRule])
-
EXCEPTION_CHARS
= "\\@\\[\\]'"¶
-
static