Skip to content
/ RAVEN Public

The implementation for "RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models" (COLM '24)

License

Notifications You must be signed in to change notification settings

jeffhj/RAVEN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAVEN

This repository contains code for pretraining and evaluation for RAVEN (https://arxiv.org/abs/2308.07922).

Installation

The codebase uses the following dependencies:

  • python 3 (tested with 3.8)
  • fairscale (tested with 0.4.6)
  • transformers (tested with 4.18.0)
  • numpy (tested with 1.22.4)
  • faiss (tested with 1.7.2)

Recommend: you can use the Dockerfile to build a docker image that meets all the above requirements

Training RAVEN

Download corpora

python preprocessing/download_corpus.py --corpus enwiki-dec2021 --output_directory ${DATA_DIR}

Download pretrained Atlas checkpoint

python preprocessing/download_model.py --model {model download key} --output_directory ${DATA_DIR} 

model download key

  • models/atlas/xl for Atlas 3B
  • models/atlas/xxl for Atlas 11B

Train RAVEN

xl (3B)

sbatch example_scripts/train-raven-xl.sh

xxl (11B)

sbatch example_scripts/train-raven-xxl.sh

Note:

  • For the first run, please set PRETRAINED_MODEL=${DATA_DIR}/models/atlas/${size} to initialize model weights with Atlas checkpoint. For the following runs, set PRETRAINED_MODEL=none so the latest checkpoint can be loaded
  • If there is no prebuilt index, don't use the load_index_path option, and an index will be built automatically. You can use the save_index_path option to save the index for later usage.
  • We use a batch size of 64 for both 3B and 11B model training. batch_size = #nodes $\times$ ntasks-per-node $\times$ per_gpu_batch_size

Training time:

  • Training RAVEN (xl, 5k steps) takes about 1 day on 8 A100 GPUs
  • Training RAVEN (xxl, 5k steps) takes about 2-3 days on 32 A100 GPUs

In-Context Learning with RAVEN

NQ

sbatch example_scripts/evaluate-nq.sh

TQA

sbatch example_scripts/evaluate-tqa.sh

MMLU

sbatch example_scripts/evaluate-mmlu.sh

Note:

  • You may refer to Atlas to download and preprocess the dataset.
  • If there is no prebuilt index, don't use the load_index_path option, and an index will be built automatically. You can use the save_index_path option to save the index for later usage.
  • The default batch_size in the script is for xl. Please set a smaller batch_size, e.g., 2, for xxl

Prompting Strategies:

  • Standard In-Context Learning

set n_shots and fusion=0, e.g., n_shots=0 for 0-shot, n_shots=5 for 5-shot

  • Fusion-In-Context Learning

set n_shots and fusion, e.g., n_shots=64, fusion=5 for [64-shot, 5-fusion]

Ack and License

This repository is adapted from Meta's Atlas, a retrieval-augmented encoder-decoder language model on which RAVEN is based.

Code License

The majority of the code is licensed under CC-BY-NC, however portions of the project are available under separate license terms: huggingface transformers is licensed under the Apache 2.0 license, which covers src/modeling_bert.py and src/modeling_t5.py.

Data License

The wikipedia-derived data used in the repository, such as the corpora and indices available from download_corpus.py and download_index.py are licensed according to CC-BY-SA.

About

The implementation for "RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models" (COLM '24)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published