fibber.paraphrase_strategies package¶
Submodules¶
- fibber.paraphrase_strategies.asrs_strategy module
- fibber.paraphrase_strategies.asrs_utils_wpe module
- fibber.paraphrase_strategies.cheat_strategy module
- fibber.paraphrase_strategies.fudge_strategy module
- fibber.paraphrase_strategies.identity_strategy module
- fibber.paraphrase_strategies.openattack_strategy module
- fibber.paraphrase_strategies.random_strategy module
- fibber.paraphrase_strategies.remove_strategy module
- fibber.paraphrase_strategies.rewrite_rollback_strategy module
- fibber.paraphrase_strategies.sap_strategy module
- fibber.paraphrase_strategies.sap_utils_euba module
- fibber.paraphrase_strategies.ssrs_strategy module
- fibber.paraphrase_strategies.strategy_base module
- fibber.paraphrase_strategies.textattack_strategy module
Module contents¶
-
class
fibber.paraphrase_strategies.
ASRSStrategy
(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶ Bases:
fibber.paraphrase_strategies.strategy_base.StrategyBase
Initialize the paraphrase_strategies.
This function initialize the
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
,self._dataset_name
.You should not overwrite this function.
self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.
- Parameters
arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.
-
fit
(trainset)[source]¶ Fit the paraphrase strategy on a training set.
- Parameters
trainset (dict) – a fibber dataset.
-
paraphrase_example
(data_record, n, early_stop=False)[source]¶ Paraphrase one data record.
This function should be overwritten by subclasses. When overwriting this class, you can use
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
, andself._dataset_name
- Parameters
data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.
- Returns
A list contain at most n strings.
- Return type
([str,])
-
class
fibber.paraphrase_strategies.
CheatStrategy
(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶ Bases:
fibber.paraphrase_strategies.strategy_base.StrategyBase
A baseline paraphrase strategy. Just return the reference.
Initialize the paraphrase_strategies.
This function initialize the
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
,self._dataset_name
.You should not overwrite this function.
self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.
- Parameters
arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.
-
paraphrase_example
(data_record, n)[source]¶ Paraphrase one data record.
This function should be overwritten by subclasses. When overwriting this class, you can use
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
, andself._dataset_name
- Parameters
data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.
- Returns
A list contain at most n strings.
- Return type
([str,])
-
class
fibber.paraphrase_strategies.
FudgeStrategy
(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶ Bases:
fibber.paraphrase_strategies.strategy_base.StrategyBase
A baseline paraphrase strategy. Just return the reference.
Initialize the paraphrase_strategies.
This function initialize the
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
,self._dataset_name
.You should not overwrite this function.
self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.
- Parameters
arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.
-
fit
(trainset)[source]¶ Fit the paraphrase strategy on a training set.
- Parameters
trainset (dict) – a fibber dataset.
-
paraphrase_example
(data_record, n)[source]¶ Paraphrase one data record.
This function should be overwritten by subclasses. When overwriting this class, you can use
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
, andself._dataset_name
- Parameters
data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.
- Returns
A list contain at most n strings.
- Return type
([str,])
-
class
fibber.paraphrase_strategies.
IdentityStrategy
(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶ Bases:
fibber.paraphrase_strategies.strategy_base.StrategyBase
A baseline paraphrase strategy. Just return the original sentence.
Initialize the paraphrase_strategies.
This function initialize the
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
,self._dataset_name
.You should not overwrite this function.
self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.
- Parameters
arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.
-
paraphrase_example
(data_record, n)[source]¶ Paraphrase one data record.
This function should be overwritten by subclasses. When overwriting this class, you can use
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
, andself._dataset_name
- Parameters
data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.
- Returns
A list contain at most n strings.
- Return type
([str,])
-
class
fibber.paraphrase_strategies.
OpenAttackStrategy
(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶ Bases:
fibber.paraphrase_strategies.strategy_base.StrategyBase
This strategy is a wrapper for strategies implemented in OpenAttack Package.
- The recipe is used to attack the classifier in metric_bundle.
If the attack succeeds, we use the adversarial sentence as a paraphrase. If the attack fails, we use the original sentence as a paraphrase.
This strategy always returns one paraphrase for one data record, regardless of n.
Initialize the paraphrase_strategies.
This function initialize the
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
,self._dataset_name
.You should not overwrite this function.
self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.
- Parameters
arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.
-
class
fibber.paraphrase_strategies.
RandomStrategy
(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶ Bases:
fibber.paraphrase_strategies.strategy_base.StrategyBase
Randomly shuffle words in a sentence to generate paraphrases.
Initialize the paraphrase_strategies.
This function initialize the
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
,self._dataset_name
.You should not overwrite this function.
self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.
- Parameters
arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.
-
paraphrase_example
(data_record, n)[source]¶ Paraphrase one data record.
This function should be overwritten by subclasses. When overwriting this class, you can use
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
, andself._dataset_name
- Parameters
data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.
- Returns
A list contain at most n strings.
- Return type
([str,])
-
class
fibber.paraphrase_strategies.
RemoveStrategy
(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶ Bases:
fibber.paraphrase_strategies.strategy_base.StrategyBase
Initialize the paraphrase_strategies.
This function initialize the
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
,self._dataset_name
.You should not overwrite this function.
self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.
- Parameters
arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.
-
fit
(trainset)[source]¶ Fit the paraphrase strategy on a training set.
- Parameters
trainset (dict) – a fibber dataset.
-
paraphrase_example
(data_record, n)[source]¶ Paraphrase one data record.
This function should be overwritten by subclasses. When overwriting this class, you can use
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
, andself._dataset_name
- Parameters
data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.
- Returns
A list contain at most n strings.
- Return type
([str,])
-
class
fibber.paraphrase_strategies.
RewriteRollbackStrategy
(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶ Bases:
fibber.paraphrase_strategies.strategy_base.StrategyBase
Initialize the paraphrase_strategies.
This function initialize the
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
,self._dataset_name
.You should not overwrite this function.
self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.
- Parameters
arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.
-
fit
(trainset)[source]¶ Fit the paraphrase strategy on a training set.
- Parameters
trainset (dict) – a fibber dataset.
-
paraphrase_example
(data_record, n)[source]¶ Paraphrase one data record.
This function should be overwritten by subclasses. When overwriting this class, you can use
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
, andself._dataset_name
- Parameters
data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.
- Returns
A list contain at most n strings.
- Return type
([str,])
-
class
fibber.paraphrase_strategies.
SSRSStrategy
(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶ Bases:
fibber.paraphrase_strategies.strategy_base.StrategyBase
Initialize the paraphrase_strategies.
This function initialize the
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
,self._dataset_name
.You should not overwrite this function.
self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.
- Parameters
arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.
-
fit
(trainset)[source]¶ Fit the paraphrase strategy on a training set.
- Parameters
trainset (dict) – a fibber dataset.
-
paraphrase_example
(data_record, n)[source]¶ Paraphrase one data record.
This function should be overwritten by subclasses. When overwriting this class, you can use
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
, andself._dataset_name
- Parameters
data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.
- Returns
A list contain at most n strings.
- Return type
([str,])
-
class
fibber.paraphrase_strategies.
SapStrategy
(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle)[source]¶ Bases:
fibber.paraphrase_strategies.strategy_base.StrategyBase
A baseline paraphrase strategy. Just return the original sentence.
Initialize the paraphrase_strategies.
This function initialize the
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
,self._dataset_name
.You should not overwrite this function.
self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.
- Parameters
arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.
-
fit
(trainset)[source]¶ Fit the paraphrase strategy on a training set.
- Parameters
trainset (dict) – a fibber dataset.
-
paraphrase_example
(data_record, n)[source]¶ Paraphrase one data record.
This function should be overwritten by subclasses. When overwriting this class, you can use
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
, andself._dataset_name
- Parameters
data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.
- Returns
A list contain at most n strings.
- Return type
([str,])
-
class
fibber.paraphrase_strategies.
StrategyBase
(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶ Bases:
object
The base class for all paraphrase strategies.
The simplest way to write a strategy is to overwrite the
paraphrase_example
function. This function takes one data records, and returns multiple paraphrases of a given field.For more advanced use cases, you can overwrite the
paraphrase
function.Some strategy may have hyper-parameters. Add hyper parameters into the class attribute
__hyperparameters__
.Hyperparameters defined in
__hyperparameters__
can be added to the command line arg parser byadd_parser_args(parser)
. The value of the hyperparameters will be added toself._strategy_config
.-
__abbr__
¶ a unique string as an abbreviation for the strategy.
- Type
str
-
__hyper_parameters__
¶ A list of tuples that defines the hyperparameters for the strategy. Each tuple is
(name, type, default, help)
. For example:__hyperparameters = [ ("p1", int, -1, "the first hyper parameter"), ...]
- Type
list
Initialize the paraphrase_strategies.
This function initialize the
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
,self._dataset_name
.You should not overwrite this function.
self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.
- Parameters
arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.
-
classmethod
add_parser_args
(parser)[source]¶ create commandline args for all hyperparameters in
__hyperparameters__
.- Parameters
parser – an arg parser.
-
fit
(trainset)[source]¶ Fit the paraphrase strategy on a training set.
- Parameters
trainset (dict) – a fibber dataset.
-
paraphrase_dataset
(paraphrase_set, n, tmp_output_filename)[source]¶ Paraphrase one dataset.
- Parameters
paraphrase_set (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.
tmp_output_filename (str) – the output json filename to save results during running.
- Returns
A dict containing the original text and paraphrased text.
- Return type
(dict)
-
paraphrase_example
(data_record, n)[source]¶ Paraphrase one data record.
This function should be overwritten by subclasses. When overwriting this class, you can use
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
, andself._dataset_name
- Parameters
data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.
- Returns
A list contain at most n strings.
- Return type
([str,])
-
-
class
fibber.paraphrase_strategies.
TextAttackStrategy
(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶ Bases:
fibber.paraphrase_strategies.strategy_base.StrategyBase
This strategy is a wrapper for strategies implemented in TextAttack Package.
- The recipe is used to attack the classifier in metric_bundle.
If the attack succeeds, we use the adversarial sentence as a paraphrase. If the attack fails, we use the original sentence as a paraphrase.
This strategy always returns one paraphrase for one data record, regardless of n.
Initialize the paraphrase_strategies.
This function initialize the
self._strategy_config
,self._metric_bundle
,self._device
,self._output_dir
,self._dataset_name
.You should not overwrite this function.
self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.
- Parameters
arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.