
🧠 QLORA: Efficient Finetuning of Quantized Large Language Models
Failed to add items
Add to basket failed.
Add to wishlist failed.
Remove from wishlist failed.
Adding to library failed
Follow podcast failed
Unfollow podcast failed
-
Narrated by:
-
By:
About this listen
The research introduces QLORA, a novel method for efficient finetuning of large language models by quantising pretrained models to 4-bit and using Low-Rank Adapters. This approach drastically reduces memory usage, enabling the finetuning of models with up to 65 billion parameters on a single 48GB GPU while maintaining 16-bit finetuning performance. Key innovations include the 4-bit NormalFloat (NF4) data type, double quantisation, and paged optimisers to manage memory. Using QLORA, the authors developed Guanaco, a family of models that achieves competitive performance with ChatGPT on the Vicuna benchmark and demonstrates state-of-the-art chatbot capabilities. The paper also examines the importance of data quality over quantity in finetuning and provides an analysis of chatbot evaluation methods, including a comparison between human and GPT-4 assessments.