langml.baselines.contrastive.utils

Module Contents

Functions

aeda_augment(words: List[str], ratio: float = 0.3, language: str = 'EN') → str

AEDA:An Easier Data Augmentation Technique for Text Classification

whitespace_tokenize(text: str) → List[str]

Attributes

CN_PUNCTUATIONS

EN_PUNCTUATIONS

langml.baselines.contrastive.utils.CN_PUNCTUATIONS = ['。', ',', '?', '!', ';'][source]
langml.baselines.contrastive.utils.EN_PUNCTUATIONS = ['.', ',', '!', '?', ';', ':'][source]
langml.baselines.contrastive.utils.aeda_augment(words: List[str], ratio: float = 0.3, language: str = 'EN') str[source]

AEDA:An Easier Data Augmentation Technique for Text Classification :param text: str, input text :param ratio: float, ratio to add punctuation randomly :param language: str, specify language from [‘EN’, ‘CN’], default EN

langml.baselines.contrastive.utils.whitespace_tokenize(text: str) List[str][source]