Abdullah Şamil Güser

Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment

Introduction

This paper reviews Parameter Efficient Fine-Tuning (PEFT) methods for large pretrained language models (PLMs), focusing on their utility in resource-constrained settings. The review covers various PEFT methods, their applications, and future prospects, and includes experiments evaluating their efficiency, providing key insights for researchers and practitioners in the field.

Some advantages of PEFT compared to full fine-tuning :

Main contributions of the paper :

PEFT Methods

Paper provides a figure that categorizes each PEFT method, I put here a simpler version of it for ones who are not interested in all of the details.

Main Category Subcategories Examples
Additive Fine-tuning Adapter-based Fine-tuning Sequential Adapter, Residual Adapter
. Soft Prompt-based Fine-tuning WARP, Promt-tuning, Prefix-tuning
. Others LST, IA, PASTA, AttentionFusion
Partial Fine-tuning Bias Update BitFit
. Pretrained Weight Masking Threshold-Mask
. Delta Weight Masking LT-SFT
Reparameterized Fine-tuning Low-rank Decomposition LoRA
. LoRA Derivatives Low-rank Adjustment, LoRA-guided Pretrained Weight Update, Quantization Adaption, LoRA-based Improvements, LoRA-based Multi-task Fine-tuning
Hybrid Fine-tuning Manual Combination MAM Adapter
. Automatic Combination AutoPEFT
Unified Fine-tuning   AdaMix, SparseAdapter


I also put the descriptions of these main categories in a table for easier comparison.

Main Category Description
Additive Fine-tuning The methods introduce new extra trainable parameters for task-specific fine-tuning
Partial Fine-tuning Only a subset of model’s parameters are updated, reducing computational load
Reparameterized Fine-tuning Modify the architecture to be more parameter-efficient through techniques like low-rank factorization
Hybrid Fine-tuning Combine different PEFT techniques for more efficient fine-tuning
Unified Fine-tuning Integrate various PEFT strategies into a single coherent approach


"figure-1"

1. Additive Fine-tuning

Subcategory Description
Adapters-based Fine-tuning Introduce modules that learn task-specific adaptations without altering base model
Soft Prompt-based Fine-tuning Insert trainable vectors (prompts) into the input sequence for task adaptation
Others Include various other methods that add parameters like bias terms or small networks

2. Partial Fine-tuning

Subcategory Description
Bias Update Only the bias term in the attention layer, feed-forward layer and layer normalization of the transformer is updated
Pretrained Weight Masking The pretrained weights are masked using various pruning criterion
Others Delta weights are masked via pruning techniques and optimization approxima- tion.

3. Reparameterized Fine-tuning

Subcategory Description
Low-rank Decomposition Various low-rank decomposition tech- niques are used to reparameterize the updated matrix
LoRA derivatives A series of PEFT methods are developed based on LoRA

4. Hybrid Fine-tuning

Subcategory Description
Mannual Combination Multiple PEFT methods are combined manually by sophisticated design
Automatic Combination Where various PEFT methods are incorporated automatically via structure search

References