fibber.defense_strategies.sem_strategy module

class fibber.defense_strategies.sem_strategy.SEMStrategy(arg_dict, dataset_name, strategy_gpu_id, defense_desc, metric_bundle, attack_strategy, field)[source]

Bases: fibber.defense_strategies.defense_strategy_base.DefenseStrategyBase

Base class for Tuning strategy

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._defense_desc, self._dataset_name.

You should not overwrite this function.

  • self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.

  • self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.

  • self._device (torch.Device): any computation that requires a GPU accelerator should use this device.

  • self._defense_desc (str): the dir name where the defense will save files.

  • self._dataset_name (str): the dataset name.

Parameters
  • arg_dict (dict) – all args load from command line.

  • dataset_name (str) – the name of the dataset.

  • strategy_gpu_id (int) – the gpu id to run the strategy.

  • metric_bundle (MetricBundle) – a MetricBundle object.

  • attack_strategy (ParaphraseStrategyBase or None) – the attack strategy. Used in some defense methods.

  • field (str) – the field that perturbation can happen.

fit(trainset)[source]

Fit the paraphrase strategy on a training set.

Parameters

trainset (dict) – a fibber dataset.

load(trainset)[source]
fibber.defense_strategies.sem_strategy.load_or_build_sem_wordmap(save_path, trainset, field, device, kk=10, delta=0.5)[source]

Load or build the synonym encoding.

See Natural Language Adversarial Defense through Synonym Encoding (https://arxiv.org/abs/1909.06723)

Parameters
  • dataset_name (str) – name of the dataset.

  • trainset (dict) – the training set. (used to compute word frequency.)

  • kk (int) – maximum synonym considered for each word.

  • delta (float) – threshold for synonym.

Returns

a map from word to encoding.

Return type

(dict)

fibber.defense_strategies.sem_strategy.sem_fix_sentences(sentences, data_record_list, word_map, reformat=False)[source]
fibber.defense_strategies.sem_strategy.sem_substitute(tok, word_map)[source]
fibber.defense_strategies.sem_strategy.sem_transform_dataset(dataset, word_map, field, deepcopy=True)[source]