fibber.paraphrase_strategies.strategy_base module

class fibber.paraphrase_strategies.strategy_base.StrategyBase(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]

Bases: object

The base class for all paraphrase strategies.

The simplest way to write a strategy is to overwrite the paraphrase_example function. This function takes one data records, and returns multiple paraphrases of a given field.

For more advanced use cases, you can overwrite the paraphrase function.

Some strategy may have hyper-parameters. Add hyper parameters into the class attribute __hyperparameters__.

Hyperparameters defined in __hyperparameters__ can be added to the command line arg parser by add_parser_args(parser). The value of the hyperparameters will be added to self._strategy_config.

__abbr__

a unique string as an abbreviation for the strategy.

Type

str

__hyper_parameters__

A list of tuples that defines the hyperparameters for the strategy. Each tuple is (name, type, default, help). For example:

__hyperparameters = [ ("p1", int, -1, "the first hyper parameter"), ...]
Type

list

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._output_dir, self._dataset_name.

You should not overwrite this function.

  • self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.

  • self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.

  • self._device (torch.Device): any computation that requires a GPU accelerator should use this device.

  • self._output_dir (str): the dir name where the strategy can save files.

  • self._dataset_name (str): the dataset name.

Parameters
  • arg_dict (dict) – all args load from command line.

  • dataset_name (str) – the name of the dataset.

  • strategy_gpu_id (int) – the gpu id to run the strategy.

  • output_dir (str) – a directory to save any models or temporary files.

  • metric_bundle (MetricBundle) – a MetricBundle object.

classmethod add_parser_args(parser)[source]

create commandline args for all hyperparameters in __hyperparameters__.

Parameters

parser – an arg parser.

fit(trainset)[source]

Fit the paraphrase strategy on a training set.

Parameters

trainset (dict) – a fibber dataset.

paraphrase_dataset(paraphrase_set, n, tmp_output_filename)[source]

Paraphrase one dataset.

Parameters
  • paraphrase_set (dict) – a dict storing one data of a dataset.

  • n (int) – number of paraphrases.

  • tmp_output_filename (str) – the output json filename to save results during running.

Returns

A dict containing the original text and paraphrased text.

Return type

(dict)

paraphrase_example(data_record, n)[source]

Paraphrase one data record.

This function should be overwritten by subclasses. When overwriting this class, you can use self._strategy_config, self._metric_bundle, self._device, self._output_dir, and self._dataset_name

Parameters
  • data_record (dict) – a dict storing one data of a dataset.

  • n (int) – number of paraphrases.

Returns

A list contain at most n strings.

Return type

([str,])