Veni AI
Model Training

Fine-Tuning and Transfer Learning: Model Training Guide

Comprehensive technical guide for LLM fine-tuning techniques, LoRA, QLoRA, PEFT methods, and customizing enterprise AI models.

Veni AI Technical TeamJanuary 11, 20256 min read
Fine-Tuning and Transfer Learning: Model Training Guide

Fine-Tuning and Transfer Learning: Model Training Guide

Fine-tuning is the process of customizing pre-trained models for specific tasks or domains. With the right fine-tuning strategies, performance increases of up to 40% can be achieved in enterprise AI solutions.

Transfer Learning Fundamentals

Transfer learning is the transfer of knowledge learned in one task to another task.

Advantages of Transfer Learning

  1. Data Efficiency: Good results with less data
  2. Time Saving: Much faster than training from scratch
  3. Cost Reduction: Less compute resources
  4. Performance: Leveraging pre-trained knowledge

Pre-training vs Fine-tuning

1Pre-training: 2- Large, general dataset (TBs) 3- Learning general language/task understanding 4- Training takes months 5- Cost in millions of dollars 6 7Fine-tuning: 8- Small, domain-specific dataset (MB-GB) 9- Specific task adaptation 10- Training takes hours-days 11- Cost in thousands of dollars

Full Fine-Tuning

Updating all model parameters.

Advantages

  • Maximum adaptation capacity
  • Highest potential performance

Disadvantages

  • High memory requirement
  • Risk of catastrophic forgetting
  • Separate model copy for each task

Hardware Requirements

Model SizeGPU Memory (FP32)GPU Memory (FP16)
7B28 GB14 GB
13B52 GB26 GB
70B280 GB140 GB

Parameter-Efficient Fine-Tuning (PEFT)

Fine-tuning by updating only a small portion of parameters.

PEFT Advantages

  • Memory Efficiency: 90%+ reduction
  • Speed: Faster training
  • Modularity: Single base model, multiple adapters
  • Catastrophic Forgetting: Minimized risk

LoRA (Low-Rank Adaptation)

The most popular PEFT method.

LoRA Theory

Updating the weight matrix approximately with low-rank matrices:

1W' = W + ΔW = W + BA 2 3Where: 4- W: Original weight matrix (d × k) 5- B: Low-rank matrix (d × r) 6- A: Low-rank matrix (r × k) 7- r: Rank (typical: 8-64)

Parameter Savings

1Original: d × k parameters 2LoRA: r × (d + k) parameters 3 4Example (d=4096, k=4096, r=16): 5Original: 16.7M parameters 6LoRA: 131K parameters 7Savings: ~127x

LoRA Configuration

1from peft import LoraConfig, get_peft_model 2 3config = LoraConfig( 4 r=16, # Rank 5 lora_alpha=32, # Scaling factor 6 target_modules=[ # Which layers to apply 7 "q_proj", 8 "k_proj", 9 "v_proj", 10 "o_proj" 11 ], 12 lora_dropout=0.05, 13 bias="none", 14 task_type="CAUSAL_LM" 15) 16 17model = get_peft_model(base_model, config)

LoRA Hyperparameters

Rank (r):

  • Low (4-8): Simple tasks, little data
  • Medium (16-32): General use
  • High (64-128): Complex adaptation

Alpha:

  • Generally alpha = 2 × r

Target Modules:

  • Attention layers: q_proj, k_proj, v_proj, o_proj
  • MLP layers: gate_proj, up_proj, down_proj

QLoRA (Quantized LoRA)

Combination of LoRA + 4-bit quantization.

QLoRA Features

  1. 4-bit NormalFloat (NF4): Custom quantization format
  2. Double Quantization: Quantizing quantization constants
  3. Paged Optimizers: GPU memory overflow management

QLoRA Memory Comparison

Method7B Model70B Model
Full FT (FP32)28 GB280 GB
Full FT (FP16)14 GB140 GB
LoRA (FP16)12 GB120 GB
QLoRA (4-bit)6 GB48 GB

QLoRA Implementation

1from transformers import BitsAndBytesConfig 2import torch 3 4bnb_config = BitsAndBytesConfig( 5 load_in_4bit=True, 6 bnb_4bit_use_double_quant=True, 7 bnb_4bit_quant_type="nf4", 8 bnb_4bit_compute_dtype=torch.bfloat16 9) 10 11model = AutoModelForCausalLM.from_pretrained( 12 "meta-llama/Llama-2-7b-hf", 13 quantization_config=bnb_config, 14 device_map="auto" 15)

Other PEFT Methods

Prefix Tuning

Adds learnable prefixes to input embeddings:

Input: [PREFIX_1, PREFIX_2, ..., PREFIX_N, token_1, token_2, ...]

Prompt Tuning

Learning soft prompts:

[SOFT_PROMPT] + "Actual input text"

Adapter Layers

Adding small networks between transformer layers:

Attention → Adapter → LayerNorm → FFN → Adapter → LayerNorm

(IA)³ - Infused Adapter

Multiplying activations with learned vectors:

output = activation × learned_vector

Data Preparation

Data Formats

Instruction Format:

1{ 2 "instruction": "Summarize this text", 3 "input": "Long text...", 4 "output": "Summary..." 5}

Chat Format:

1{ 2 "messages": [ 3 {"role": "system", "content": "You are a helpful assistant"}, 4 {"role": "user", "content": "Question..."}, 5 {"role": "assistant", "content": "Answer..."} 6 ] 7}

Data Quality

Good Data Characteristics:

  • Diversity (diverse examples)
  • Consistency (consistent format)
  • Accuracy (accurate labels)
  • Sufficient quantity (usually 1K-100K examples)

Data Augmentation

1# Paraphrasing 2augmented_data = paraphrase(original_data) 3 4# Back-translation 5translated = translate(text, "tr") 6back_translated = translate(translated, "en") 7 8# Synonym replacement 9augmented = replace_synonyms(text)

Training Strategies

Hyperparameter Selection

1training_args = TrainingArguments( 2 learning_rate=2e-4, # Typical for LoRA 3 num_train_epochs=3, 4 per_device_train_batch_size=4, 5 gradient_accumulation_steps=4, 6 warmup_ratio=0.03, 7 lr_scheduler_type="cosine", 8 fp16=True, 9 logging_steps=10, 10 save_strategy="epoch", 11 evaluation_strategy="epoch" 12)

Learning Rate

  • Full fine-tuning: 1e-5 - 5e-5
  • LoRA: 1e-4 - 3e-4
  • QLoRA: 2e-4 - 5e-4

Regularization

1# Weight decay 2weight_decay=0.01 3 4# Dropout 5lora_dropout=0.05 6 7# Gradient clipping 8max_grad_norm=1.0

Evaluation and Validation

Metrics

Perplexity:

PPL = exp(average cross-entropy loss) Lower = better

BLEU/ROUGE: Text generation quality

Task-specific: Accuracy, F1, custom metrics

Detecting Overfitting

1Train loss ↓ + Validation loss ↑ = Overfitting 2 3Solutions: 4- Early stopping 5- More dropout 6- Data augmentation 7- Fewer epochs

Deployment

Model Merging

Merging LoRA adapter into base model:

merged_model = model.merge_and_unload() merged_model.save_pretrained("merged_model")

Multi-Adapter Serving

Multiple adapters with a single base model:

1from peft import PeftModel 2 3base_model = AutoModelForCausalLM.from_pretrained("base") 4model_a = PeftModel.from_pretrained(base_model, "adapter_a") 5model_b = PeftModel.from_pretrained(base_model, "adapter_b")

Enterprise Fine-Tuning Pipeline

1┌─────────────┐ ┌─────────────┐ ┌─────────────┐ 2│ Data │────▶│ Training │────▶│ Evaluation │ 3│ Preparation │ │ (LoRA/QLoRA)│ │ & Testing │ 4└─────────────┘ └─────────────┘ └──────┬──────┘ 56 ┌─────────────┐ ┌──────▼──────┐ 7 │ Production │◀────│ Model │ 8 │ Deployment │ │ Registry │ 9 └─────────────┘ └─────────────┘

Common Issues and Solutions

1. Out of Memory

Solution: QLoRA, gradient checkpointing, reducing batch size

2. Catastrophic Forgetting

Solution: Lower learning rate, replay buffer, elastic weight consolidation

3. Overfitting

Solution: More data, regularization, early stopping

4. Poor Generalization

Solution: Increasing data diversity, instruction diversity

Conclusion

Fine-tuning is the most effective way to adapt pre-trained models to enterprise needs. Powerful customizations can be made even with limited resources using PEFT methods like LoRA and QLoRA.

At Veni AI, we provide consultancy and implementation services for enterprise fine-tuning projects. Contact us for your needs.

İlgili Makaleler