🧠 QLORA: Efficient Finetuning of Quantized Large Language Models

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

🧠 QLORA: Efficient Finetuning of Quantized Large Language Models

Listen for free

View show details

About this listen

The research introduces QLORA, a novel method for efficient finetuning of large language models by quantising pretrained models to 4-bit and using Low-Rank Adapters. This approach drastically reduces memory usage, enabling the finetuning of models with up to 65 billion parameters on a single 48GB GPU while maintaining 16-bit finetuning performance. Key innovations include the 4-bit NormalFloat (NF4) data type, double quantisation, and paged optimisers to manage memory. Using QLORA, the authors developed Guanaco, a family of models that achieves competitive performance with ChatGPT on the Vicuna benchmark and demonstrates state-of-the-art chatbot capabilities. The paper also examines the importance of data quality over quantity in finetuning and provides an analysis of chatbot evaluation methods, including a comparison between human and GPT-4 assessments.

No reviews yet

Audiobook Categories

Popular Lists

Explore Audible

🧠 QLORA: Efficient Finetuning of Quantized Large Language Models

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

🧠 QLORA: Efficient Finetuning of Quantized Large Language Models

About this listen