fibber.paraphrase_strategies.fudge_strategy module

class fibber.paraphrase_strategies.fudge_strategy.FudgeStrategy(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]

Bases: fibber.paraphrase_strategies.strategy_base.StrategyBase

A baseline paraphrase strategy. Just return the reference.

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._output_dir, self._dataset_name.

You should not overwrite this function.

  • self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.

  • self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.

  • self._device (torch.Device): any computation that requires a GPU accelerator should use this device.

  • self._output_dir (str): the dir name where the strategy can save files.

  • self._dataset_name (str): the dataset name.

Parameters
  • arg_dict (dict) – all args load from command line.

  • dataset_name (str) – the name of the dataset.

  • strategy_gpu_id (int) – the gpu id to run the strategy.

  • output_dir (str) – a directory to save any models or temporary files.

  • metric_bundle (MetricBundle) – a MetricBundle object.

fit(trainset)[source]

Fit the paraphrase strategy on a training set.

Parameters

trainset (dict) – a fibber dataset.

paraphrase_example(data_record, n)[source]

Paraphrase one data record.

This function should be overwritten by subclasses. When overwriting this class, you can use self._strategy_config, self._metric_bundle, self._device, self._output_dir, and self._dataset_name

Parameters
  • data_record (dict) – a dict storing one data of a dataset.

  • n (int) – number of paraphrases.

Returns

A list contain at most n strings.

Return type

([str,])

score(data_record, tmp, text, ll)[source]
fibber.paraphrase_strategies.fudge_strategy.make_batch(toks_list)[source]

Convert multiple text to a batch tensor.

fibber.paraphrase_strategies.fudge_strategy.make_input_output_pair(tokenizer, x)[source]

Tokenize the text, then construct input and output for GPT2.