Model Card for Model ID

Model Details

This project fine-tunes Microsoft's Phi-2 language model using parameter-efficient fine-tuning (LoRA) on the Nemotron-Personas-India dataset. The model is loaded using 4-bit NF4 quantization through BitsAndBytes to reduce memory consumption while maintaining training and inference capability on limited hardware.

Model Description

Developed by: Sachin Singh
Model type: Causal Language Model
Base model: Phi-2
Language(s): English
Quantization: 4-bit NF4 (BitsAndBytes)
Fine-tuning method: LoRA (PEFT)
Dataset: NVIDIA Nemotron-Personas-India (en_IN split)

Model Sources

Base Model: microsoft/phi-2
Dataset: nvidia/Nemotron-Personas-India

Direct Use

This model is intended for:

Persona-conditioned text generation
Instruction-following experiments
Low-memory LLM deployment research
Quantization benchmarking
LoRA fine-tuning demonstrations
LLM performance analytics studies

Downstream Use

The fine-tuned model can serve as a foundation for:

Persona-based conversational agents
Lightweight chatbot deployments
LLM optimization research
Quantization and efficiency studies

Out-of-Scope Use

This model is not intended for:

Medical advice
Legal advice
Financial decision making
Safety-critical systems
High-risk automated decision systems

Bias, Risks, and Limitations

The model inherits limitations from:

The Phi-2 base model
The Nemotron-Personas-India dataset
Quantization-induced approximation errors
Limited fine-tuning duration

Generated responses may contain inaccuracies, hallucinations, biases, or incomplete information.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

model_id = "microsoft/phi-2"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto"
)

Training Details

Training Data

The model is fine-tuned using:

Dataset: nvidia/Nemotron-Personas-India
Split: en_IN
Sample Size: 5,000 records

Persona records are transformed into instruction-response training examples before fine-tuning.

Training Hyperparameters

Fine-tuning Method: LoRA
Quantization: 4-bit NF4
Epochs: 1
Compute Type: FP16
Double Quantization: Enabled

Summary

The project evaluates the trade-offs between model efficiency and generation capability when applying 4-bit quantization and LoRA fine-tuning to Phi-2.

Model Architecture and Objective

Architecture: Phi-2 Transformer
Objective: Causal Language Modeling
Adaptation Method: LoRA
Quantization Method: BitsAndBytes NF4 4-bit Quantization

Compute Infrastructure

GPU T4 x2

Citation [optional]

@misc{phi2,
  title={Phi-2: The surprising power of small language models},
  author={Microsoft Research}
}

Dataset

@misc{nemotron_personas_india,
  title={Nemotron Personas India Dataset},
  author={NVIDIA}
}

Model Card Authors

Sachin Singh

Model in Notebook

[More Information Needed]

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support