fibber.resources.resource_utils module

fibber.resources.resource_utils.get_bert_clf_demo()[source]

Download the pretrained classifier for demo dataset.

fibber.resources.resource_utils.get_bert_lm_demo()[source]

Download the pretrained language model for demo dataset.

fibber.resources.resource_utils.get_counter_fitted_vector(download_only=False)[source]

Download default pretrained counter fitted embeddings and return a dict.

See https://github.com/nmrksic/counter-fitting

Parameters

download_only (bool) – set True to only download. (Returns None)

Returns

a dict of GloVe word embedding model.

”emb_table”: a numpy array of size(N, 300) “id2tok”: a list of strings. “tok2id”: a dict that maps word (string) to its id.

Return type

(dict)

fibber.resources.resource_utils.get_glove_emb(download_only=False)[source]

Download default pretrained glove embeddings and return a dict.

We use the 300-dimensional model trained on Wikipedia 2014 + Gigaword 5. See https://nlp.stanford.edu/projects/glove/

Parameters

download_only (bool) – set True to only download. (Returns None)

Returns

a dict of GloVe word embedding model.

”emb_table”: a numpy array of size(N, 300) “id2tok”: a list of strings. “tok2id”: a dict that maps word (string) to its id.

Return type

(dict)

fibber.resources.resource_utils.get_nltk_data()[source]

Download nltk data to <fibber_root_dir>/nltk_data.

fibber.resources.resource_utils.get_stopwords()[source]

Download default stopword words.

Returns

a list of strings.

Return type

([str])

fibber.resources.resource_utils.get_transformers(name)[source]

Download pretrained transformer models.

Parameters

name (str) – the name of the pretrained models. options are ["bert-base-cased", "bert-base-uncased", "gpt2-medium"].

Returns

directory of the downloaded model.

Return type

(str)

fibber.resources.resource_utils.get_universal_sentence_encoder()[source]

Download pretrained universal sentence encoder.

Returns

directory of the downloaded model.

Return type

(str)

fibber.resources.resource_utils.get_wordpiece_emb_demo()[source]

Download wordpiece embeddings for demo dataset.

fibber.resources.resource_utils.load_glove_model(glove_file, dim)[source]

Load glove embeddings from txt file.

Parameters
  • glove_file – filename.

  • dim – the dimension of the embedding.

Returns

“emb_table”: a numpy array of size(N, 300) “id2tok”: a list of strings. “tok2id”: a dict that maps word (string) to its id.

Return type

a dict