Model Card for Model ID
Model Details
This project fine-tunes Microsoft's Phi-2 language model using parameter-efficient fine-tuning (LoRA) on the Nemotron-Personas-India dataset. The model is loaded using 4-bit NF4 quantization through BitsAndBytes to reduce memory consumption while maintaining training and inference capability on limited hardware.
Model Description
- Developed by: Sachin Singh
- Model type: Causal Language Model
- Base model: Phi-2
- Language(s): English
- Quantization: 4-bit NF4 (BitsAndBytes)
- Fine-tuning method: LoRA (PEFT)
- Dataset: NVIDIA Nemotron-Personas-India (
en_INsplit)
Model Sources
- Base Model: microsoft/phi-2
- Dataset: nvidia/Nemotron-Personas-India
Direct Use
This model is intended for:
- Persona-conditioned text generation
- Instruction-following experiments
- Low-memory LLM deployment research
- Quantization benchmarking
- LoRA fine-tuning demonstrations
- LLM performance analytics studies
Downstream Use
The fine-tuned model can serve as a foundation for:
- Persona-based conversational agents
- Lightweight chatbot deployments
- LLM optimization research
- Quantization and efficiency studies
Out-of-Scope Use
This model is not intended for:
- Medical advice
- Legal advice
- Financial decision making
- Safety-critical systems
- High-risk automated decision systems
Bias, Risks, and Limitations
The model inherits limitations from:
- The Phi-2 base model
- The Nemotron-Personas-India dataset
- Quantization-induced approximation errors
- Limited fine-tuning duration
Generated responses may contain inaccuracies, hallucinations, biases, or incomplete information.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
model_id = "microsoft/phi-2"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto"
)
Training Details
Training Data
The model is fine-tuned using:
- Dataset:
nvidia/Nemotron-Personas-India - Split:
en_IN - Sample Size: 5,000 records
Persona records are transformed into instruction-response training examples before fine-tuning.
Training Hyperparameters
- Fine-tuning Method: LoRA
- Quantization: 4-bit NF4
- Epochs: 1
- Compute Type: FP16
- Double Quantization: Enabled
Summary
The project evaluates the trade-offs between model efficiency and generation capability when applying 4-bit quantization and LoRA fine-tuning to Phi-2.
Model Architecture and Objective
- Architecture: Phi-2 Transformer
- Objective: Causal Language Modeling
- Adaptation Method: LoRA
- Quantization Method: BitsAndBytes NF4 4-bit Quantization
Compute Infrastructure
GPU T4 x2
Citation [optional]
@misc{phi2,
title={Phi-2: The surprising power of small language models},
author={Microsoft Research}
}
Dataset
@misc{nemotron_personas_india,
title={Nemotron Personas India Dataset},
author={NVIDIA}
}
Model Card Authors
Sachin Singh