Tuning with PEFT: LLMs are no more a heavyweight.

Vishnu Nandakumar
3 min readAug 3, 2023
LLMs

The process of fine-tuning large language models (LLMs) can be a time-consuming and resource-intensive task. Traditionally, this requires significant computational power and storage capacity, making it difficult for individuals or organizations without access to high-end hardware to participate in this field. However, there is now a new approach that promises to change all that — PEFT. This innovative method allows users to achieve similar results as traditional fine-tuning techniques but with only a p100 GPU on Kaggle. In this article, we will explore how PEFT works and how it overcomes common challenges associated with fine-tuning LLMs.

What is PEFT?

PEFT stands for Parameter Efficient Fine-Tuning. As the name suggests, it freezes unwanted parameters and only trains the relevant ones — to create a powerful approach for fine-tuning LLMs with minimal resources. By leveraging these techniques together, PEFT enables users to significantly reduce the size of their models while maintaining efficiency.

Overcoming Challenges in Dealing with LLMs

One major challenge associated with fine-tuning LLMs is the sheer amount of computation required for training and inference. With PEFT, users can achieve similar results with just a p100 GPU on Kaggle. Additionally, PEFT addresses two other critical issues faced by LLMs — catastrophic forgetting and storage requirements.

Catastrophic forgetting occurs when an LLM is fine-tuned for multiple downstream tasks and begins to forget previous knowledge learned from earlier tasks. To address this issue, PEFT utilizes a technique called LoRA (Low-Rank Adaptation of Large Language Models), which encourages the model to focus more on the current task rather than remembering past ones. This ensures that the model remains accurate across all tasks while still retaining its ability to learn new concepts quickly.

Another problem facing LLMs is the massive storage requirements necessary to store these models. Since PEFT reduces the size of the model drastically, it makes it easier to store and inferencing.

There are different ways to approach PEFT and they are listed below on a high level.

  • LoRA: Injection of trainable low-rank weights to each layer of transformers and freezing of the whole model.
  • Prefix tuning: Rather than finetuning the whole model, we keep language model parameters frozen and optimize only a sequence of continuous task-specific vectors
  • Prompt-tuning: Effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks, these soft prompts are learned through backpropagation and unlike the hard prompts used in fine-tuning models.

Below I have shared links for the model and notebook that I have created for an NLI knowledge graph task, please have a try and provide your feedback.

Notebook: https://www.kaggle.com/code/vishnunkumar/peft-nligraph
Model: https://huggingface.co/vishnun/lora-NLIGraph

For inference, please follow the below


from transformers import AutoTokenizer, AutoModel
from peft import get_peft_config, PeftModel, PeftConfig, get_peft_model, LoraConfig, TaskType

peft_model_id = "vishnun/lora-NLIGraph"
config = PeftConfig.from_pretrained(peft_model_id)
inference_model = AutoModelForTokenClassification.from_pretrained(
config.base_model_name_or_path, num_labels=4, id2label=id2lab, label2id=lab2id
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(inference_model, peft_model_id)

text = "Arsenal will win the Premier League"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
logits = model(**inputs).logits

tokens = inputs.tokens()
predictions = torch.argmax(logits, dim=2)

for token, prediction in zip(tokens, predictions[0].numpy()):
print((token, model.config.id2label[prediction]))

## results : ('<s>', 'O')
('Arsenal', 'SRC')
('Ġwill', 'O')
('Ġwin', 'REL')
('Ġthe', 'O')
('ĠPremier', 'TGT')
('ĠLeague', 'O')
('</s>', 'O')

Until then, thanks and highly grateful to all the readers. :)

WRITER at MLearning.ai // Control AI Video 🗿/imagine AI 3D Models

--

--

Vishnu Nandakumar

Machine Learning Engineer, Cloud Computing (AWS), Arsenal Fan. Have a look at my page: https://bit.ly/m/vishnunandakumar