Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
title: Napolab Leaderboard
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.38.2
app_file: app.py
pinned: true
python_version: '3.10'
tags:
- nlp
- portuguese
- benchmarking
- language-models
- gradio
datasets:
- ruanchaves/napolab
- assin
- assin2
- ruanchaves/hatebr
- ruanchaves/faquad-nli
short_description: The Natural Portuguese Language Benchmark
Napolab Leaderboard - Gradio App
A comprehensive Gradio web application for exploring and benchmarking Portuguese language models using the Napolab dataset collection.
Features
- π Benchmark Results: Single comprehensive table with one column per dataset and clickable model links
- π Model Analysis: Radar chart showing model performance across all datasets
- βΉοΈ About: Information about Napolab and citation details
Installation
- Navigate to the leaderboard directory:
cd dev/napolab/leaderboard
- Install the required dependencies:
pip install -r requirements.txt
- Extract data from external sources (optional but recommended):
# Extract data from Portuguese LLM Leaderboard
python extract_portuguese_leaderboard.py
# Download external models data
python download_external_models.py
- Run the Gradio app:
python app.py
The app will be available at http://localhost:7860
Data Management
The app uses a YAML configuration file (data.yaml) for adding new data, making it easy to edit and maintain.
Data Extraction Scripts
The leaderboard includes scripts to automatically extract and update data from external sources:
extract_portuguese_leaderboard.py
This script extracts benchmark results from the Open Portuguese LLM Leaderboard:
- Fetches data from the Hugging Face Spaces leaderboard
- Updates the
portuguese_leaderboard.csvfile - Includes both open-source and proprietary models
- Automatically handles data formatting and validation
download_external_models.py
This script downloads additional model data:
- Fetches model metadata from various sources
- Updates the
external_models.csvfile - Includes model links and performance metrics
- Ensures data consistency with the main leaderboard
Note: These scripts require internet connection and may take a few minutes to complete. Run them periodically to keep the leaderboard data up to date.
Usage
Benchmark Results Tab
- Single Comprehensive Table: Shows all models with one column per dataset
- Dataset Columns: Each dataset has its own column showing model performance scores
- Average Column: Shows the average performance across all datasets for each model
- Model Column: Clickable links to Hugging Face model pages
- Sorted Results: Models are sorted by overall average performance (descending)
Model Analysis Tab
- Radar chart showing each model's performance across all datasets
- Default view: Shows only bertimbau-large and mdeberta-v3-base models
- Interactive legend: Click to show/hide models, double-click to isolate
- Each line represents one model, each point represents one dataset
- Color-coded by model architecture
- Interactive hover information with detailed performance metrics
Model Hub Tab
- Access links to pre-trained models on Hugging Face
- Models are organized by dataset and architecture type
- Direct links to model repositories
Supported Datasets
The app includes all Napolab datasets:
- ASSIN: Semantic Similarity and Textual Entailment
- ASSIN 2: Semantic Similarity and Textual Entailment (v2)
- Rerelem: Relational Reasoning
- HateBR: Hate Speech Detection
- Reli-SA: Religious Sentiment Analysis
- FaQUaD-NLI: Factual Question Answering and NLI
- PorSimplesSent: Simple Sentences Sentiment Analysis
Model Architectures
The benchmark includes models based on:
- mDeBERTa v3: Multilingual DeBERTa v3
- BERT Large: Large Portuguese BERT
- BERT Base: Base Portuguese BERT
Data Management
The app now uses a YAML configuration file (data.yaml) for all data, making it easy to edit and maintain.
Editing Data
Simply edit the data.yaml file to:
- Add new datasets
- Update benchmark results
- Add new models
- Modify model metadata
Data Structure
The YAML file contains four main sections:
- datasets: Information about each dataset
- benchmark_results: Performance metrics for models on datasets
- model_metadata: Model information (parameters, architecture, etc.)
- additional_models: Additional models for the Model Hub
Data Management Tools
Use the manage_data.py script for data operations:
# Validate the data structure
python manage_data.py validate
# Add a new dataset
python manage_data.py add-dataset \
--dataset-name "new_dataset" \
--dataset-display-name "New Dataset" \
--dataset-description "Description of the dataset" \
--dataset-tasks "Classification" "Sentiment Analysis" \
--dataset-url "https://huggingface.co/datasets/new_dataset"
# Add benchmark results
python manage_data.py add-benchmark \
--dataset-name "assin" \
--model-name "new-model" \
--metrics "accuracy=0.92" "f1=0.91"
# Add model metadata
python manage_data.py add-model \
--model-name "new-model" \
--parameters 110000000 \
--architecture "BERT Base" \
--base-model "bert-base-uncased" \
--task "Classification" \
--huggingface-url "https://huggingface.co/new-model"
Customization
To add new datasets or benchmark results:
- Edit the
data.yamlfile directly, or - Use the
manage_data.pyscript for structured additions - The app will automatically reload the data when restarted
Troubleshooting
- Dataset loading errors: Ensure you have internet connection to access Hugging Face datasets
- Memory issues: Reduce the number of samples in the Dataset Explorer
- Port conflicts: Change the port in the
app.launch()call
Contributing
Feel free to contribute by:
- Adding new datasets
- Improving visualizations
- Adding new features
- Reporting bugs
License
This project follows the same license as the main Napolab repository.