This paper reviews Parameter Efficient Fine-Tuning (PEFT) methods for large pretrained language models (PLMs), focusing on their utility in resource-constrained settings. The review covers various PEFT methods, their applications, and future prospects, and includes experiments evaluating their efficiency, providing key insights for researchers and practitioners in the field.
Some advantages of PEFT compared to full fine-tuning :
Main contributions of the paper :
Paper provides a figure that categorizes each PEFT method, I put here a simpler version of it for ones who are not interested in all of the details.
| Main Category | Subcategories | Examples |
|---|---|---|
| Additive Fine-tuning | Adapter-based Fine-tuning | Sequential Adapter, Residual Adapter |
| . | Soft Prompt-based Fine-tuning | WARP, Promt-tuning, Prefix-tuning |
| . | Others | LST, IA, PASTA, AttentionFusion |
| Partial Fine-tuning | Bias Update | BitFit |
| . | Pretrained Weight Masking | Threshold-Mask |
| . | Delta Weight Masking | LT-SFT |
| Reparameterized Fine-tuning | Low-rank Decomposition | LoRA |
| . | LoRA Derivatives | Low-rank Adjustment, LoRA-guided Pretrained Weight Update, Quantization Adaption, LoRA-based Improvements, LoRA-based Multi-task Fine-tuning |
| Hybrid Fine-tuning | Manual Combination | MAM Adapter |
| . | Automatic Combination | AutoPEFT |
| Unified Fine-tuning | AdaMix, SparseAdapter |
I also put the descriptions of these main categories in a table for easier comparison.
| Main Category | Description |
|---|---|
| Additive Fine-tuning | The methods introduce new extra trainable parameters for task-specific fine-tuning |
| Partial Fine-tuning | Only a subset of model’s parameters are updated, reducing computational load |
| Reparameterized Fine-tuning | Modify the architecture to be more parameter-efficient through techniques like low-rank factorization |
| Hybrid Fine-tuning | Combine different PEFT techniques for more efficient fine-tuning |
| Unified Fine-tuning | Integrate various PEFT strategies into a single coherent approach |

| Subcategory | Description |
|---|---|
| Adapters-based Fine-tuning | Introduce modules that learn task-specific adaptations without altering base model |
| Soft Prompt-based Fine-tuning | Insert trainable vectors (prompts) into the input sequence for task adaptation |
| Others | Include various other methods that add parameters like bias terms or small networks |
| Subcategory | Description |
|---|---|
| Bias Update | Only the bias term in the attention layer, feed-forward layer and layer normalization of the transformer is updated |
| Pretrained Weight Masking | The pretrained weights are masked using various pruning criterion |
| Others | Delta weights are masked via pruning techniques and optimization approxima- tion. |
| Subcategory | Description |
|---|---|
| Low-rank Decomposition | Various low-rank decomposition tech- niques are used to reparameterize the updated matrix |
| LoRA derivatives | A series of PEFT methods are developed based on LoRA |
| Subcategory | Description |
|---|---|
| Mannual Combination | Multiple PEFT methods are combined manually by sophisticated design |
| Automatic Combination | Where various PEFT methods are incorporated automatically via structure search |