Banking Multilingual Intent Classifier

Repository: learn-abc/banking-multilingual-intent-classifier
Base Model: google/muril-base-cased
Task: Multilingual Intent Classification (Banking Domain)
Languages: English, Bangla (bn), Bangla Latin (bn-latn), Code-Mixed

Model Overview

This model is a multilingual banking intent classifier fine-tuned on a balanced English–Bangla–Banglish dataset derived from Banking77 and extended with synthetic code-mixed augmentation.

It is designed for:

AI banking assistants
Multilingual chatbots
Voice-to-intent pipelines
Intent routing systems
Hybrid Bangla-English financial applications

Supported Intents (14 Classes)

ACCOUNT_INFO
ATM_SUPPORT
CARD_ISSUE
CARD_MANAGEMENT
CARD_REPLACEMENT
CHECK_BALANCE
EDIT_PERSONAL_DETAILS
FAILED_TRANSFER
FALLBACK
FEES
GREETING
LOST_OR_STOLEN_CARD
MINI_STATEMENT
TRANSFER

Dataset Details

Total Samples

66,768

Language Distribution

English (en): 22,256
Bangla (bn): 22,256
Bangla Latin (bn-latn): 22,256

Code-Mixed Augmentation

2,500 synthetic code-mixed examples added

Final Training Split

Train: 63,306
Test: 13,854

Training Configuration

Base Model: google/muril-base-cased
Architecture: BertForSequenceClassification
Epochs: 7
Class weights applied to address imbalance
Tokenizer: MuRIL tokenizer
Framework: Hugging Face Transformers

Note: Some classifier layers were newly initialized (expected when adapting base MuRIL to classification head).

Evaluation Results

Overall Performance

Metric	Score
Accuracy	99.57%
F1 Micro	0.9957
F1 Macro	0.9959
Eval Loss	0.0178

Evaluation runtime: 10.1 seconds
Samples/sec: 1365

Language-wise Performance

Language	Accuracy
English	99.26%
Bangla	99.80%
Bangla Latin	99.62%
Code-Mixed	100.00%

Multilingual Prediction Examples

Input	Language	Prediction
what is my balance	en	CHECK_BALANCE
আমার ব্যালেন্স কত	bn	CHECK_BALANCE
amar balance koto ache	bn-latn	CHECK_BALANCE
আমার balance দেখাও	code-mixed	CHECK_BALANCE
card ta hariye geche	bn-latn	LOST_OR_STOLEN_CARD
weather kemon	code-mixed	FALLBACK

All tested predictions returned high confidence (~1.000).

Intended Use Cases

Banking chatbot intent routing
Voice assistant → STT → Intent classification
Multilingual customer support
Code-mixed South Asian applications
Fintech AI pipelines

Limitations

Domain-specific: Focused only on banking intents.
Synthetic augmentation: Code-mixed data partially generated programmatically.
Overconfidence: Softmax confidence may saturate near 1.0.
Not tested on adversarial or out-of-distribution queries.
Not designed for generative responses, classification only.

Architecture Notes

Based on MuRIL, optimized for Indian languages.
Classification head added on top of encoder.
Some warnings regarding unexpected/missing keys are normal due to task adaptation.
Class weights applied to handle skewed distribution.

Bias & Fairness

Balanced across 3 language representations.
Augmented for code-mixed robustness.
May not generalize to:
- Non-banking domains
- Slang-heavy dialects outside training distribution

Example Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "learn-abc/banking-multilingual-intent-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# Prediction function
def predict_intent(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=64)
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = model(**inputs)
        prediction = torch.argmax(outputs.logits, dim=-1).item()
        confidence = torch.softmax(outputs.logits, dim=-1)[0][prediction].item()
    
    predicted_intent = model.config.id2label[prediction]
    
    return {
        "intent": predicted_intent,
        "confidence": confidence
    }

# Example usage - English
result = predict_intent("what is my balance")
print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}")
# Output: Intent: CHECK_BALANCE, Confidence: 0.99

# Example usage - Bangla
result = predict_intent("আমার ব্যালেন্স কত")
print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}")
# Output: Intent: CHECK_BALANCE, Confidence: 0.98

# Example usage - Banglish (Romanized)
result = predict_intent("amar balance koto ache")
print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}")
# Output: Intent: CHECK_BALANCE, Confidence: 0.97

# Example usage - Code-mixed
result = predict_intent("আমার last 10 transaction দেখাও")
print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}")
# Output: Intent: MINI_STATEMENT, Confidence: 0.98

Production Recommendations

For real-world deployment:

Add confidence threshold fallback
Add OOD detector
Combine with:
- STT system
- Intent router
- Business rule engine
Log misclassifications for continual fine-tuning

Summary

This model achieves near state-of-the-art multilingual intent classification accuracy for banking-specific queries across:

English
Bangla (native script)
Bangla Latin
Code-mixed variants

It is optimized for fintech AI systems targeting South Asian multilingual users.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact Me

For any inquiries or support, please reach out to:

Author: Abhishek Singh
LinkedIn: My LinkedIn Profile
Portfolio: Abhishek Singh Portfolio

Downloads last month: 30

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for learn-abc/banking-multilingual-intent-classifier

Base model

google/muril-base-cased

Finetuned

(45)

this model

learn-abc
/

banking-multilingual-intent-classifier