fibber.paraphrase_strategies package¶

Submodules¶

Module contents¶

class fibber.paraphrase_strategies.ASRSStrategy(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶

Bases: fibber.paraphrase_strategies.strategy_base.StrategyBase

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._output_dir, self._dataset_name.

You should not overwrite this function.

self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.

Parameters

arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.

fit(trainset)[source]¶

Fit the paraphrase strategy on a training set.

Parameters: trainset (dict) – a fibber dataset.

paraphrase_example(data_record, n, early_stop=False)[source]¶

Paraphrase one data record.

This function should be overwritten by subclasses. When overwriting this class, you can use self._strategy_config, self._metric_bundle, self._device, self._output_dir, and self._dataset_name

Parameters

data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.

Returns

A list contain at most n strings.

Return type

([str,])

class fibber.paraphrase_strategies.CheatStrategy(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶

Bases: fibber.paraphrase_strategies.strategy_base.StrategyBase

A baseline paraphrase strategy. Just return the reference.

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._output_dir, self._dataset_name.

You should not overwrite this function.

self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.

Parameters

arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.

paraphrase_example(data_record, n)[source]¶

Paraphrase one data record.

Parameters

data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.

Returns

A list contain at most n strings.

Return type

([str,])

class fibber.paraphrase_strategies.FudgeStrategy(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶

Bases: fibber.paraphrase_strategies.strategy_base.StrategyBase

A baseline paraphrase strategy. Just return the reference.

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._output_dir, self._dataset_name.

You should not overwrite this function.

self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.

Parameters

arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.

fit(trainset)[source]¶

Fit the paraphrase strategy on a training set.

Parameters: trainset (dict) – a fibber dataset.

paraphrase_example(data_record, n)[source]¶

Paraphrase one data record.

Parameters

data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.

Returns

A list contain at most n strings.

Return type

([str,])

score(data_record, tmp, text, ll)[source]¶

class fibber.paraphrase_strategies.IdentityStrategy(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶

Bases: fibber.paraphrase_strategies.strategy_base.StrategyBase

A baseline paraphrase strategy. Just return the original sentence.

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._output_dir, self._dataset_name.

You should not overwrite this function.

self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.

Parameters

arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.

paraphrase_example(data_record, n)[source]¶

Paraphrase one data record.

Parameters

data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.

Returns

A list contain at most n strings.

Return type

([str,])

class fibber.paraphrase_strategies.OpenAttackStrategy(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶

Bases: fibber.paraphrase_strategies.strategy_base.StrategyBase

This strategy is a wrapper for strategies implemented in OpenAttack Package.

The recipe is used to attack the classifier in metric_bundle.: If the attack succeeds, we use the adversarial sentence as a paraphrase. If the attack fails, we use the original sentence as a paraphrase.

This strategy always returns one paraphrase for one data record, regardless of n.

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._output_dir, self._dataset_name.

You should not overwrite this function.

self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.

Parameters

arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.

fit(trainset)[source]¶

Fit the paraphrase strategy on a training set.

Parameters: trainset (dict) – a fibber dataset.

paraphrase_example(data_record, n)[source]¶: Generate paraphrased sentences.

class fibber.paraphrase_strategies.RandomStrategy(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶

Bases: fibber.paraphrase_strategies.strategy_base.StrategyBase

Randomly shuffle words in a sentence to generate paraphrases.

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._output_dir, self._dataset_name.

You should not overwrite this function.

self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.

Parameters

arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.

paraphrase_example(data_record, n)[source]¶

Paraphrase one data record.

Parameters

data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.

Returns

A list contain at most n strings.

Return type

([str,])

class fibber.paraphrase_strategies.RemoveStrategy(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶

Bases: fibber.paraphrase_strategies.strategy_base.StrategyBase

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._output_dir, self._dataset_name.

You should not overwrite this function.

self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.

Parameters

arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.

fit(trainset)[source]¶

Fit the paraphrase strategy on a training set.

Parameters: trainset (dict) – a fibber dataset.

paraphrase_example(data_record, n)[source]¶

Paraphrase one data record.

Parameters

data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.

Returns

A list contain at most n strings.

Return type

([str,])

class fibber.paraphrase_strategies.RewriteRollbackStrategy(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶

Bases: fibber.paraphrase_strategies.strategy_base.StrategyBase

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._output_dir, self._dataset_name.

You should not overwrite this function.

self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.

Parameters

arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.

fit(trainset)[source]¶

Fit the paraphrase strategy on a training set.

Parameters: trainset (dict) – a fibber dataset.

paraphrase_example(data_record, n)[source]¶

Paraphrase one data record.

Parameters

data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.

Returns

A list contain at most n strings.

Return type

([str,])

paraphrase_multiple_examples(data_record_list)[source]¶

class fibber.paraphrase_strategies.SSRSStrategy(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶

Bases: fibber.paraphrase_strategies.strategy_base.StrategyBase

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._output_dir, self._dataset_name.

You should not overwrite this function.

self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.

Parameters

arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.

fit(trainset)[source]¶

Fit the paraphrase strategy on a training set.

Parameters: trainset (dict) – a fibber dataset.

paraphrase_example(data_record, n)[source]¶

Paraphrase one data record.

Parameters

data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.

Returns

A list contain at most n strings.

Return type

([str,])

paraphrase_multiple_examples(data_record_list)[source]¶

class fibber.paraphrase_strategies.SapStrategy(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle)[source]¶

Bases: fibber.paraphrase_strategies.strategy_base.StrategyBase

A baseline paraphrase strategy. Just return the original sentence.

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._output_dir, self._dataset_name.

You should not overwrite this function.

self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.

Parameters

arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.

fit(trainset)[source]¶

Fit the paraphrase strategy on a training set.

Parameters: trainset (dict) – a fibber dataset.

paraphrase_example(data_record, n)[source]¶

Paraphrase one data record.

Parameters

data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.

Returns

A list contain at most n strings.

Return type

([str,])

class fibber.paraphrase_strategies.StrategyBase(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶

Bases: object

The base class for all paraphrase strategies.

The simplest way to write a strategy is to overwrite the paraphrase_example function. This function takes one data records, and returns multiple paraphrases of a given field.

For more advanced use cases, you can overwrite the paraphrase function.

Some strategy may have hyper-parameters. Add hyper parameters into the class attribute __hyperparameters__.

Hyperparameters defined in __hyperparameters__ can be added to the command line arg parser by add_parser_args(parser). The value of the hyperparameters will be added to self._strategy_config.

__abbr__¶

a unique string as an abbreviation for the strategy.

Type: str

__hyper_parameters__¶

A list of tuples that defines the hyperparameters for the strategy. Each tuple is (name, type, default, help). For example:

__hyperparameters = [ ("p1", int, -1, "the first hyper parameter"), ...]

Type: list

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._output_dir, self._dataset_name.

You should not overwrite this function.

self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.

Parameters

arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.

classmethod add_parser_args(parser)[source]¶

create commandline args for all hyperparameters in __hyperparameters__.

Parameters: parser – an arg parser.

fit(trainset)[source]¶

Fit the paraphrase strategy on a training set.

Parameters: trainset (dict) – a fibber dataset.

paraphrase_dataset(paraphrase_set, n, tmp_output_filename)[source]¶

Paraphrase one dataset.

Parameters

paraphrase_set (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.
tmp_output_filename (str) – the output json filename to save results during running.

Returns

A dict containing the original text and paraphrased text.

Return type

(dict)

paraphrase_example(data_record, n)[source]¶

Paraphrase one data record.

Parameters

data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.

Returns

A list contain at most n strings.

Return type

([str,])

class fibber.paraphrase_strategies.TextAttackStrategy(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶

Bases: fibber.paraphrase_strategies.strategy_base.StrategyBase

This strategy is a wrapper for strategies implemented in TextAttack Package.

The recipe is used to attack the classifier in metric_bundle.: If the attack succeeds, we use the adversarial sentence as a paraphrase. If the attack fails, we use the original sentence as a paraphrase.

This strategy always returns one paraphrase for one data record, regardless of n.

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._output_dir, self._dataset_name.

You should not overwrite this function.

self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.

Parameters

arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.

fit(trainset)[source]¶

Fit the paraphrase strategy on a training set.

Parameters: trainset (dict) – a fibber dataset.

paraphrase_example(data_record, n)[source]¶: Generate paraphrased sentences.

fibber.metrics.metric_utils module

fibber.paraphrase_strategies.asrs_strategy module