Skip to content

Releases: huggingface/transformers

Better model/tokenizer serialization, relax network connection requirements, new scripts and bug fixes

25 Apr 19:47
Compare
Choose a tag to compare

General updates:

  • Better serialization for all models and tokenizers (BERT, GPT, GPT-2 and Transformer-XL) with best practices for saving/loading in readme and examples.
  • Relaxing network connection requirements (fallback on the last downloaded model in the cache when we can't reach AWS to check eTag)

Breaking changes:

  • warmup_linear method in OpenAIAdam and BertAdam is now replaced by flexible schedule classes for linear, cosine and multi-cycles schedules.

Bug fixes and improvements to the library modules:

  • add a flag in BertTokenizer to skip basic tokenization (@john-hewitt)
  • Allow tokenization of sequences > 512 (@CatalinVoss)
  • clean up and extend learning rate schedules in BertAdam and OpenAIAdam (@lukovnikov)
  • Update GPT/GPT-2 Loss computation (@CatalinVoss, @thomwolf)
  • Make the TensorFlow conversion tool more robust (@marpaia)
  • fixed BertForMultipleChoice model init and forward pass (@dhpollack)
  • Fix gradient overflow in GPT-2 FP16 training (@SudoSharma)
  • catch exception if pathlib not installed (@potatochip)
  • Use Dropout Layer in OpenAIGPTMultipleChoiceHead (@pglock)

New scripts and improvements to the examples scripts:

v0.6.1 - Small install tweak release

18 Feb 11:01
8f46cd1
Compare
Choose a tag to compare

Add regex to the requirements for OpenAI GPT-2 tokenizer.

v0.6.0 - Adding OpenAI small GPT-2 pretrained model

18 Feb 10:40
0856a23
Compare
Choose a tag to compare

Add OpenAI small GPT-2 pretrained model

Bug fix update to load the pretrained `TransfoXLModel` from s3, added fallback for OpenAIGPTTokenizer when SpaCy is not installed

13 Feb 09:21
4e56da3
Compare
Choose a tag to compare

Mostly a bug fix update for loading the TransfoXLModel from s3:

  • Fixes a bug in the loading of the pretrained TransfoXLModel from the s3 dump (which is a converted TransfoXLLMHeadModel) in which the weights were not loaded.
  • Added a fallback of OpenAIGPTTokenizer on BERT's BasicTokenizer when SpaCy and ftfy are not installed. Using BERT's BasicTokenizer instead of SpaCy should be fine in most cases as long as you have a relatively clean input (SpaCy+ftfy were included to exactly reproduce the paper's pre-processing steps on the Toronto Book Corpus) and this also let us use the never_split option to avoid splitting special tokens like [CLS], [SEP]... which is easier than adding the tokens after tokenization.
  • Updated the README on the tokenizers options and methods which was lagging behind a bit.

Adding OpenAI GPT and Transformer-XL pretrained models, python2 support, pre-training script for BERT, SQuAD 2.0 example

11 Feb 13:52
03cdb2a
Compare
Choose a tag to compare

New pretrained models:

  • Open AI GPT pretrained on the Toronto Book Corpus ("Improving Language Understanding by Generative Pre-Training" by Alec Radford et al.).

    • This is a slightly modified version of our previous PyTorch implementation to increase the performances by spliting words and position embeddings in separate embeddings matrices.
    • Performance checked to be on part with the TF implementation on ROCStories: single run evaluation accuracy of 86.4% vs. authors reporting a median accuracy of 85.8% with the TensorFlow code (see details in the example section of the readme).
  • Transformer-XL pretrained on WikiText 103 ("Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" by Zihang Dai, Zhilin Yang et al.). This is a slightly modified version of Google/CMU's PyTorch implementation to match the performances of the TensorFlow version by:

    • untying relative positioning embeddings across layers,
    • changing memory cells initialization to keep sinusoïdal positions identical
    • adding full logits outputs in the adaptive softmax to use it in a generative setting.
    • Performance checked to be on part with the TF implementation on WikiText 103: evaluation perplexity of 18.213 vs. authors reporting a perplexity of 18.3 on this dataset with the TensorFlow code (see details in the example section of the readme).

New scripts:

  • Updated the SQuAD fine-tuning script to work also on SQuAD V2.0 by @abeljim and @Liangtaiwan
  • run_lm_finetuning.py let you pretrain a BERT language model or fine-tune it with masked-language-modeling and next-sentence-prediction losses by @deepset-ai, @tholor and @nhatchan (compatibility Python 3.5)

Backward compatibility:

  • The library is now compatible with Python 2 also

Improvements and bug fixes:

  • add a never_split option and arguments to the tokenizers (@WrRan)
  • better handle errors when BERT is feed with inputs that are too long (@patrick-s-h-lewis)
  • better layer normalization layer initialization and bug fix in examples scripts: args.do_lower_case is always True(@donglixp)
  • fix learning rate schedule issue in example scripts (@matej-svejda)
  • readme fixes (@danyaljj, @nhatchan, @davidefiocco, @girishponkiya )
  • importing unofficial TF models in BERT (@nhatchan)
  • only keep the active part of the loss for token classification (@Iwontbecreative)
  • fix argparse type error in example scripts (@ksurya)
  • docstring fixes (@rodgzilla, @wlhgtc )
  • improving run_classifier.py loading of saved models (@SinghJasdeep)
  • In examples scripts: allow do_eval to be used without do_train and to use the pretrained model in the output folder (@jaderabbit, @likejazz and @JoeDumoulin )
  • in run_squad.py: fix error when bert_model param is path or url (@likejazz)
  • add license to source distribution and use entry-points instead of scripts (@sodre)

4x speed-up using NVIDIA apex, new multi-choice classifier and example for SWAG-like dataset, pytorch v1.0, improved model loading, improved examples...

14 Dec 14:21
e1bfad4
Compare
Choose a tag to compare

New:

  • 3-4 times speed-ups in fp16 (versus fp32) thanks to NVIDIA's work on apex (by @FDecaYed)
  • new sequence-level multiple-choice classification model + example fine-tuning on SWAG (by @rodgzilla)
  • improved backward compatibility to python 3.5 (by @hzhwcmhf)
  • bump up to PyTorch 1.0
  • load fine-tuned model with from_pretrained
  • add examples on how to save and load fine-tuned models.

Added two pre-trained models and one new fine-tuning class

30 Nov 22:15
66d50ca
Compare
Choose a tag to compare

This release comprise the following improvements and updates:

  • added two new pre-trained models from Google: bert-large-cased and bert-base-multilingual-cased,
  • added a model that can be fine-tuned for token-level classification: BertForTokenClassification,
  • added tests for every model class, with and without labels,
  • fixed tokenizer loading function BertTokenizer.from_pretrained() when loading from a directory containing a pretrained model,
  • fixed typos in model docstrings and completed the docstrings,
  • improved examples (added do_lower_caseargument).

Small improvements and a few bug fixes.

26 Nov 09:57
Compare
Choose a tag to compare

Improvement:

  • Added a cache_dir option to from_pretrained() function to select a specific path to download and cache the pre-trained model weights. Useful for distributed training (see readme) (fix issue #44).

Bug fixes in model training and tokenizer loading:

  • Fixed error in CrossEntropyLoss reshaping (issue #55).
  • Fixed unicode error in vocabulary loading (issue #52).

Bug fixes in examples:

  • Fix weight decay in examples (previously bias and layer norm weights were also decayed due to an erroneous check in training loop).
  • Fix fp16 grad norm is None error in examples (issue #43).

Updated readme and docstrings

First release

17 Nov 11:23
Compare
Choose a tag to compare

This is the first release of pytorch_pretrained_bert.