Banking Multilingual Intent Classifier

  • Repository: learn-abc/banking-multilingual-intent-classifier
  • Base Model: google/muril-base-cased
  • Task: Multilingual Intent Classification (Banking Domain)
  • Languages: English, Bangla (bn), Bangla Latin (bn-latn), Code-Mixed

Model Overview

This model is a multilingual banking intent classifier fine-tuned on a balanced English–Bangla–Banglish dataset derived from Banking77 and extended with synthetic code-mixed augmentation.

It is designed for:

  • AI banking assistants
  • Multilingual chatbots
  • Voice-to-intent pipelines
  • Intent routing systems
  • Hybrid Bangla-English financial applications

Supported Intents (14 Classes)

ACCOUNT_INFO
ATM_SUPPORT
CARD_ISSUE
CARD_MANAGEMENT
CARD_REPLACEMENT
CHECK_BALANCE
EDIT_PERSONAL_DETAILS
FAILED_TRANSFER
FALLBACK
FEES
GREETING
LOST_OR_STOLEN_CARD
MINI_STATEMENT
TRANSFER

Dataset Details

Total Samples

66,768

Language Distribution

  • English (en): 22,256
  • Bangla (bn): 22,256
  • Bangla Latin (bn-latn): 22,256

Code-Mixed Augmentation

  • 2,500 synthetic code-mixed examples added

Final Training Split

  • Train: 63,306
  • Test: 13,854

Training Configuration

  • Base Model: google/muril-base-cased
  • Architecture: BertForSequenceClassification
  • Epochs: 7
  • Class weights applied to address imbalance
  • Tokenizer: MuRIL tokenizer
  • Framework: Hugging Face Transformers

Note: Some classifier layers were newly initialized (expected when adapting base MuRIL to classification head).


Evaluation Results

Overall Performance

Metric Score
Accuracy 99.57%
F1 Micro 0.9957
F1 Macro 0.9959
Eval Loss 0.0178
  • Evaluation runtime: 10.1 seconds
  • Samples/sec: 1365

Language-wise Performance

Language Accuracy
English 99.26%
Bangla 99.80%
Bangla Latin 99.62%
Code-Mixed 100.00%

Multilingual Prediction Examples

Input Language Prediction
what is my balance en CHECK_BALANCE
আমার ব্যালেন্স কত bn CHECK_BALANCE
amar balance koto ache bn-latn CHECK_BALANCE
আমার balance দেখাও code-mixed CHECK_BALANCE
card ta hariye geche bn-latn LOST_OR_STOLEN_CARD
weather kemon code-mixed FALLBACK

All tested predictions returned high confidence (~1.000).


Intended Use Cases

  • Banking chatbot intent routing
  • Voice assistant → STT → Intent classification
  • Multilingual customer support
  • Code-mixed South Asian applications
  • Fintech AI pipelines

Limitations

  1. Domain-specific: Focused only on banking intents.
  2. Synthetic augmentation: Code-mixed data partially generated programmatically.
  3. Overconfidence: Softmax confidence may saturate near 1.0.
  4. Not tested on adversarial or out-of-distribution queries.
  5. Not designed for generative responses, classification only.

Architecture Notes

  • Based on MuRIL, optimized for Indian languages.
  • Classification head added on top of encoder.
  • Some warnings regarding unexpected/missing keys are normal due to task adaptation.
  • Class weights applied to handle skewed distribution.

Bias & Fairness

  • Balanced across 3 language representations.

  • Augmented for code-mixed robustness.

  • May not generalize to:

    • Non-banking domains
    • Slang-heavy dialects outside training distribution

Example Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "learn-abc/banking-multilingual-intent-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# Prediction function
def predict_intent(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=64)
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = model(**inputs)
        prediction = torch.argmax(outputs.logits, dim=-1).item()
        confidence = torch.softmax(outputs.logits, dim=-1)[0][prediction].item()
    
    predicted_intent = model.config.id2label[prediction]
    
    return {
        "intent": predicted_intent,
        "confidence": confidence
    }

# Example usage - English
result = predict_intent("what is my balance")
print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}")
# Output: Intent: CHECK_BALANCE, Confidence: 0.99

# Example usage - Bangla
result = predict_intent("আমার ব্যালেন্স কত")
print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}")
# Output: Intent: CHECK_BALANCE, Confidence: 0.98

# Example usage - Banglish (Romanized)
result = predict_intent("amar balance koto ache")
print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}")
# Output: Intent: CHECK_BALANCE, Confidence: 0.97

# Example usage - Code-mixed
result = predict_intent("আমার last 10 transaction দেখাও")
print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}")
# Output: Intent: MINI_STATEMENT, Confidence: 0.98

Production Recommendations

For real-world deployment:

  • Add confidence threshold fallback

  • Add OOD detector

  • Combine with:

    • STT system
    • Intent router
    • Business rule engine
  • Log misclassifications for continual fine-tuning


Summary

This model achieves near state-of-the-art multilingual intent classification accuracy for banking-specific queries across:

  • English
  • Bangla (native script)
  • Bangla Latin
  • Code-mixed variants

It is optimized for fintech AI systems targeting South Asian multilingual users.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact Me

For any inquiries or support, please reach out to:


Downloads last month
30
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for learn-abc/banking-multilingual-intent-classifier

Finetuned
(45)
this model

Dataset used to train learn-abc/banking-multilingual-intent-classifier