fibber.metrics.bert_lm_utils module¶

fibber.metrics.bert_lm_utils.compute_lm_loss(lm_model, seq, mask, tok_type, lm_label, stats)[source]¶

Compute masked language model training loss.

Parameters

lm_model (transformers.BertForMaskedLM) – a BERT language model.
seq (torch.Tensor) – an int tensor of size (batch_size, length) representing the word pieces.
mask (torch.Tensor) – an int tensor of size (batch_size, length) representing the attention mask.
tok_type (torch.Tensor) – an int tensor of size (batch_size, length) representing the token type id.
lm_label (torch.Tensor) – an int tensor of size (batch_size, length) representing the label for each position. Use -100 if the loss is not computed for that position.
stats (dist) – a dictionary storing training stats.

Returns

(torch.Scalar) a scalar loss value.

fibber.metrics.bert_lm_utils.fine_tune_lm(output_dir, trainset, filter, device, model_init='bert-base-cased', lm_steps=5000, lm_bs=32, lm_opt='adamw', lm_lr=0.0001, lm_decay=0.01, lm_period_summary=100, lm_period_save=5000, as_masked_lm=True, select_field=None)[source]¶

Returns a finetuned BERT language model on a given dataset.

The language model will be stored at <output_dir>/lm_all if filter is -1, or <output_dir>/lm_filter_? if filter is not -1.

If filter is not -1. The pretrained langauge model will first be pretrained on the while dataset, then it will be finetuned on the data excluding the filter category.

Parameters

output_dir (str) – a directory to store pretrained language model.
trainset (DatasetForTransformers) – the training set for finetune the language model.
filter (int) – a category to exclude from finetuning.
device (torch.Device) – a device to train the model.
model_init (str) – the backbone bert model.
lm_steps (int) – finetuning steps.
lm_bs (int) – finetuning batch size.
lm_opt (str) – optimzer name. choose from [“sgd”, “adam”, “adamW”].
lm_lr (float) – learning rate.
lm_decay (float) – weight decay for the optimizer.
lm_period_summary (int) – number of steps to write training summary.
lm_period_save (int) – number of steps to save the finetuned model.
as_masked_lm (bool) – use BERT as a masked language model. If False, use as auto-regressive.
select_field (None or str) – select one field for language model.

Returns

a finetuned language model.

Return type

(BertForMaskedLM)

fibber.metrics.bert_lm_utils.get_lm(lm_option, dataset_name, trainset, device, model_init='bert-base-cased', filter=- 1, lm_steps=5000, lm_bs=32, lm_opt='adamw', lm_lr=0.0001, lm_decay=0.01, lm_period_summary=100, lm_period_save=5000, select_field=None)[source]¶

Returns a BERT language model or a list of language models on a given dataset.

The language model will be stored at <output_dir>/lm_all if lm_option is finetune. The language model will be stored at <output_dir>/lm_filter_? if lm_option is adv.

If filter is not -1. The pretrained language model will first be pretrained on the while dataset, then it will be finetuned on the data excluding the filter category.

The re

Parameters

lm_option (str) – choose from [“pretrain”, “finetune”, “adv”]. pretrain means the pretrained BERT model without fine-tuning on current dataset. finetune means fine-tuning the BERT model on current dataset. adv means adversarial tuning on current dataset.
dataset_name (str) – a directory to store pretrained language model.
trainset (dict) – the training set for finetune the language model.
device (torch.Device) – a device to train the model.
model_init (str) – the backbone bert model.
lm_steps (int) – finetuning steps.
lm_bs (int) – finetuning batch size.
lm_opt (str) – optimzer name. choose from [“sgd”, “adam”, “adamW”].
lm_lr (float) – learning rate.
lm_decay (float) – weight decay for the optimizer.
lm_period_summary (int) – number of steps to write training summary.
lm_period_save (int) – number of steps to save the finetuned model.
select_field (str or None) – train language model on one specific field.

Returns

the tokenizer for the language model. (BertForMaskedLM): a finetuned language model if lm_option is pretrain or finetune. ([BertForMaskedLM]): a list of finetuned language model if lm_option is adv. The i-th

language model in the list is fine-tuned on data not having label i.

Return type

(BertTokenizerFast)

fibber.metrics.bert_lm_utils.new_stats()[source]¶: Create a new stats dict.

fibber.metrics.bert_lm_utils.write_summary(stats, summary, global_step)[source]¶: Save langauge model training summary.

fibber.metrics.attack_aggregation_utils module

fibber.metrics.metric_base module