Model Card for Model ID

Model Details

This project fine-tunes Microsoft's Phi-2 language model using parameter-efficient fine-tuning (LoRA) on the Nemotron-Personas-India dataset. The model is loaded using 4-bit NF4 quantization through BitsAndBytes to reduce memory consumption while maintaining training and inference capability on limited hardware.

Model Description

  • Developed by: Sachin Singh
  • Model type: Causal Language Model
  • Base model: Phi-2
  • Language(s): English
  • Quantization: 4-bit NF4 (BitsAndBytes)
  • Fine-tuning method: LoRA (PEFT)
  • Dataset: NVIDIA Nemotron-Personas-India (en_IN split)

Model Sources

  • Base Model: microsoft/phi-2
  • Dataset: nvidia/Nemotron-Personas-India

Direct Use

This model is intended for:

  • Persona-conditioned text generation
  • Instruction-following experiments
  • Low-memory LLM deployment research
  • Quantization benchmarking
  • LoRA fine-tuning demonstrations
  • LLM performance analytics studies

Downstream Use

The fine-tuned model can serve as a foundation for:

  • Persona-based conversational agents
  • Lightweight chatbot deployments
  • LLM optimization research
  • Quantization and efficiency studies

Out-of-Scope Use

This model is not intended for:

  • Medical advice
  • Legal advice
  • Financial decision making
  • Safety-critical systems
  • High-risk automated decision systems

Bias, Risks, and Limitations

The model inherits limitations from:

  • The Phi-2 base model
  • The Nemotron-Personas-India dataset
  • Quantization-induced approximation errors
  • Limited fine-tuning duration

Generated responses may contain inaccuracies, hallucinations, biases, or incomplete information.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

model_id = "microsoft/phi-2"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto"
)

Training Details

Training Data

The model is fine-tuned using:

  • Dataset: nvidia/Nemotron-Personas-India
  • Split: en_IN
  • Sample Size: 5,000 records

Persona records are transformed into instruction-response training examples before fine-tuning.

Training Hyperparameters

  • Fine-tuning Method: LoRA
  • Quantization: 4-bit NF4
  • Epochs: 1
  • Compute Type: FP16
  • Double Quantization: Enabled

Summary

The project evaluates the trade-offs between model efficiency and generation capability when applying 4-bit quantization and LoRA fine-tuning to Phi-2.

Model Architecture and Objective

  • Architecture: Phi-2 Transformer
  • Objective: Causal Language Modeling
  • Adaptation Method: LoRA
  • Quantization Method: BitsAndBytes NF4 4-bit Quantization

Compute Infrastructure

GPU T4 x2

Citation [optional]

@misc{phi2,
  title={Phi-2: The surprising power of small language models},
  author={Microsoft Research}
}

Dataset

@misc{nemotron_personas_india,
  title={Nemotron Personas India Dataset},
  author={NVIDIA}
}

Model Card Authors

Sachin Singh

Model in Notebook

[More Information Needed]

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support