fibber.paraphrase_strategies.ssrs_strategy module¶

class fibber.paraphrase_strategies.ssrs_strategy.SSRSStrategy(arg_dict, dataset_name, strategy_gpu_id, output_dir, metric_bundle, field)[source]¶

Bases: fibber.paraphrase_strategies.strategy_base.StrategyBase

Initialize the paraphrase_strategies.

This function initialize the self._strategy_config, self._metric_bundle, self._device, self._output_dir, self._dataset_name.

You should not overwrite this function.

self._strategy_config (dict): a dictionary that stores the strategy name and all hyperparameter values. The dict is also saved to the results.
self._metric_bundle (MetricBundle): the metrics that will be used to evaluate paraphrases. Strategies can compute metrics during paraphrasing.
self._device (torch.Device): any computation that requires a GPU accelerator should use this device.
self._output_dir (str): the dir name where the strategy can save files.
self._dataset_name (str): the dataset name.

Parameters

arg_dict (dict) – all args load from command line.
dataset_name (str) – the name of the dataset.
strategy_gpu_id (int) – the gpu id to run the strategy.
output_dir (str) – a directory to save any models or temporary files.
metric_bundle (MetricBundle) – a MetricBundle object.

fit(trainset)[source]¶

Fit the paraphrase strategy on a training set.

Parameters: trainset (dict) – a fibber dataset.

paraphrase_example(data_record, n)[source]¶

Paraphrase one data record.

This function should be overwritten by subclasses. When overwriting this class, you can use self._strategy_config, self._metric_bundle, self._device, self._output_dir, and self._dataset_name

Parameters

data_record (dict) – a dict storing one data of a dataset.
n (int) – number of paraphrases.

Returns

A list contain at most n strings.

Return type

([str,])

paraphrase_multiple_examples(data_record_list)[source]¶

fibber.paraphrase_strategies.ssrs_strategy.all_accept_criteria(candidate_paraphrases, **kwargs)[source]¶: Always accept proposed words.

fibber.paraphrase_strategies.ssrs_strategy.assign_cadidates(paraphrases_with_mask, candidate_words, tokenizer)[source]¶

fibber.paraphrase_strategies.ssrs_strategy.bleu_criteria_score(origin_list, paraphrases, bleu_metric, bleu_weight, bleu_threshold)[source]¶

fibber.paraphrase_strategies.ssrs_strategy.clf_criteria_score(origin_list, paraphrases, data_record_list, field, clf_metric, clf_weight)[source]¶

fibber.paraphrase_strategies.ssrs_strategy.count_mask(data, tokenizer)[source]¶

fibber.paraphrase_strategies.ssrs_strategy.joint_weighted_criteria(origin_list, prev_paraphrases, candidate_paraphrases, data_record_list, field, sim_metric, sim_threshold, sim_weight, clf_metric, clf_weight, ppl_metric, ppl_weight, stats, state, log_prob_trans_forward, log_prob_trans_backward, bleu_metric, bleu_weight, bleu_threshold, **kwargs)[source]¶

fibber.paraphrase_strategies.ssrs_strategy.none_constraint(**kwargs)[source]¶

fibber.paraphrase_strategies.ssrs_strategy.ppl_criteria_score(origin_list, paraphrases, ppl_metric, ppl_weight)[source]¶

Estimate the score of a sentence using USE.

Parameters

origin (str) – original sentence.
paraphrases ([str]) – a list of paraphrase_list.
ppl_metric (GPT2PerplexityMetric) – a GPT2PerplexityMetric metric object.
ppl_weight (float) – the weight parameter for the criteria.

Returns

a numpy array of size (batch_size,). All entries <=0.

Return type

(np.array)

fibber.paraphrase_strategies.ssrs_strategy.sample_word_from_logits(logits, temperature=1.0, top_k=0)[source]¶

Sample a word from a distribution.

Parameters

logits (torch.Tensor) – tensor of logits with size (batch_size, vocab_size).
temperature (float) – the temperature of softmax. The PMF is softmax(logits/temperature).
top_k (int) – if k>0, only sample from the top k most probable words.

fibber.paraphrase_strategies.ssrs_strategy.sim_criteria_score(origin_list, paraphrases, sim_metric, sim_threshold, sim_weight)[source]¶

Estimate the score of a sentence using USE.

Parameters

origin (str) – original sentence.
paraphrases ([str]) – a list of paraphrase_list.
sim_metric (MetricBase) – a similarity metric object.
sim_threshold (float) – the universal sentence encoder similarity threshold.
sim_weight (float) – the weight parameter for the criteria.

Returns

a numpy array of size (batch_size,). All entries <=0.

Return type

(np.array)

fibber.paraphrase_strategies.ssrs_strategy.smart_mask(toks, op)[source]¶

fibber.paraphrase_strategies.ssrs_strategy.tostring(tokenizer, seq)[source]¶

Convert a sequence of word ids to a sentence. The post prossing is applied.

Parameters

tokenizer (transformers.BertTokenizer) – a BERT tokenizer.
seq (list) – a list-like sequence of word ids.

fibber.paraphrase_strategies.sap_utils_euba module

fibber.paraphrase_strategies.strategy_base module