🧠 QLORA: Efficient Finetuning of Quantized Large Language Models cover art

🧠 QLORA: Efficient Finetuning of Quantized Large Language Models

🧠 QLORA: Efficient Finetuning of Quantized Large Language Models

Listen for free

View show details

About this listen

The research introduces QLORA, a novel method for efficient finetuning of large language models by quantising pretrained models to 4-bit and using Low-Rank Adapters. This approach drastically reduces memory usage, enabling the finetuning of models with up to 65 billion parameters on a single 48GB GPU while maintaining 16-bit finetuning performance. Key innovations include the 4-bit NormalFloat (NF4) data type, double quantisation, and paged optimisers to manage memory. Using QLORA, the authors developed Guanaco, a family of models that achieves competitive performance with ChatGPT on the Vicuna benchmark and demonstrates state-of-the-art chatbot capabilities. The paper also examines the importance of data quality over quantity in finetuning and provides an analysis of chatbot evaluation methods, including a comparison between human and GPT-4 assessments.

No reviews yet