This repository was archived by the owner on Mar 20, 2026. It is now read-only.
Releases: facebookresearch/fairseq
Releases · facebookresearch/fairseq
v0.7.0
Notable (possibly breaking) changes:
- d45db80: Remove checkpoint utility functions from utils.py into checkpoint_utils.py
- f2563c2: Move LM definitions into separate files
- dffb167: Updates to model API:
FairseqModel->FairseqEncoderDecoderModel- add
FairseqDecoder.extract_featuresandFairseqDecoder.output_layer encoder_out_dict->encoder_out- rm unused
remove_headfunctions
- 34726d5: Move
distributed_initintoDistributedFairseqModel - cf17068: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU)
- d45db80: Change default LR scheduler from
reduce_lr_on_plateautofixed - 96ac28d: Rename
--sampling-temperature->--temperature - fc1a19a: Deprecate dummy batches
- a1c997b: Add memory mapped datasets
- 0add50c: Allow cycling over multiple datasets, where each one becomes an "epoch"
Plus many additional features and bugfixes
v0.6.2
Changelog:
- 998ba4f: Add language models from Baevski & Auli (2018)
- 4294c4f: Add mixture of experts code from Shen et al. (2019)
- 0049349: Add example for multilingual training
- 48d9afb: Speed improvements, including fused operators from apex
- 44d27e6: Add Tensorboard support
- d17fa85: Add Adadelta optimizer
- 9e1c880: Add
FairseqEncoderModel - b65c579: Add
FairseqTask.inference_stepto modularize generate.py - 2ad1178: Add back
--curriculum - Misc bug fixes and other features
v0.6.1
v0.6.0
Changelog:
- 4908863: Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0
- no more FP16Trainer, we just have an FP16Optimizer wrapper
- most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
- Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
- Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
- 1c56b58: parallelize preprocessing
- Misc bug fixes and features
v0.5.0: 0.4.0 -> 0.5.0
Changelog: - 97b58b4: add Transformer model from Vaswani et al. (2017) - b2374e5: faster Transformer inference with improved caching - 2d27ae0: simulate large mini-batch training with delayed updates (`--update-freq`) - 7ee1d28: add FP16 training support (`--fp16`) - 2a84f46: faster inference by removing completed sentences from the batch - 663fd80: batched interactive generation - 4c2ef2d: add language modeling / gated convolutional model from Dauphin et al. (2017) - b59815b: add Hierarchical Neural Story Generation model from Fan et al. (2018) - ff68a9e: add FairseqTask to modularize task definitions (e.g., translation, language modeling)