Haakkim

community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

HassanB4  updated a Space about 13 hours ago
Haakkim/README
HassanB4  published a Space about 13 hours ago
Haakkim/README
HassanB4  updated a dataset about 13 hours ago
Haakkim/Haakkim-1.0v
View all activity

Organization Card
Haakkim
حَكِّم

An open arena-style human preference evaluation platform for Arabic large language models — built from the ground up for Arabic.

Current Snapshot — v1.0
Statistics from the first public release of Haakkim battle data
1,273
Total Battles
831
BT-Ranked Battles
67
Models Ranked
11
Arabic Dialects
582
ESS (Clamped)
0.35
Graph Density
MSA Leaderboard — Top 10
Bradley–Terry scores (1000-centered log-odds). Full 67-model leaderboard at haakkim.tech
Rank Model BT Score 95% CI Battles
1 mistralai/ministral-3b-2512 1001.75 [1001.20, 1002.93] 40
2 mistralai/ministral-8b-2512 1001.61 [1000.72, 1002.97] 43
3 Qwen/Qwen3-235B-A22B-Thinking-2507 1001.21 [1000.47, 1002.00] 38
4 Qwen/Qwen3-30B-A3B-Instruct-2507 1001.14 [999.96, 1002.83] 31
5 deepseek/deepseek-v3.2-exp 1001.13 [1000.27, 1002.16] 38
6 deepseek/deepseek-v3.1 1000.99 [999.81, 1002.07] 29
7 Qwen/Qwen3-235B-A22B-Instruct-2507 1000.98 [1000.12, 1002.08] 39
8 deepseek/deepseek-r1-0528 1000.93 [1000.10, 1002.14] 38
9 openai/gpt-oss-120b 1000.93 [1000.04, 1002.58] 25
10 deepseek/deepseek-v3.2 1000.89 [999.86, 1002.25] 31
Score scale: Haakkim uses unscaled log-odds units centered at 1000 — a 1-point gap corresponds to win odds of e¹ ≈ 2.7:1, producing a ~4-point spread across 67 models. Chatbot Arena-style Elo (×173.7) encodes identical win probabilities with hundreds-of-points spreads.
Arabic Dialect Coverage
11 varieties — from Modern Standard Arabic to regional dialects across the Arab world
MSA
77.5%
Tunisian
9.0%
Saudi
6.5%
Egyptian
3.5%
Levantine
1.7%
Sudanese
0.9%
Omani
0.4%
Iraqi
0.2%
Moroccan
<0.1%
Libyan
<0.1%
Algerian
<0.1%
Evaluation Modes
Three ways to compare Arabic LLMs — only Ranked Arena feeds the official leaderboard
⚔️

Ranked Arena

Random model pairing, single-turn MSA, matched system instruction. Results feed the official Bradley–Terry leaderboard.

✓ BT Leaderboard
↔️

Side-by-Side

User-selected model pair, any dialect. Useful for targeted comparisons but excluded from ranked scoring to prevent selection bias.

Win-rate only

10 Questions

Fixed Arabic prompt pool, any dialect. Provides consistent benchmarking within a curated set of questions.

Win-rate only
Scoring Methodology
Statistically rigorous Bradley–Terry model with four key components
1

Inverse-Probability Weighting

Corrects for non-uniform model exposure using ε-greedy adaptive sampling weights, clamped to [P1, P99].

2

Bootstrap Confidence Intervals

200 vote-level resamples per run to produce 95% CIs on every model's BT score.

3

Rankability Gate

BT scores published only when the comparison graph is fully connected and ESS is sufficient; otherwise win-rate fallback is shown.

4

Log-odds Scale

1000-centered unscaled log-odds. A 1-point gap ≈ 2.7:1 win odds. Full reproducibility: pipeline and dataset are open.

📦

Haakkim/Haakkim-1.0v — Battle Dataset

1,273 battle records (Parquet, PII-scrubbed). Includes voted comparisons and skipped battles across all 11 dialects and 3 evaluation modes. Full conversation transcripts, sampling weights, category annotations.

View Dataset →
Team
College of Computing, Umm Al-Qura University — Mecca, Saudi Arabia
MM
Dr. Mourad Mars
Academic Supervisor
Assistant Professor · Umm Al-Qura University
HB
Hassan Barmandah
AI Researcher
B.S. Software Engineering · Umm Al-Qura University
AA
Abdulrhman Alassaf
Software Engineer
Umm Al-Qura University
Citation
If you use Haakkim or this dataset in your research, please cite:
@misc{mars2026haakkim,
  title        = {Haakkim: An Arena-Style Human Preference Evaluation Platform for Arabic {LLMs}},
  author       = {Mars, Mourad and Barmandah, Hassan and Alassaf, Abdulrhman},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/datasets/Haakkim/Haakkim-1.0v}},
  note         = {College of Computing, Umm Al-Qura University, Mecca, Saudi Arabia}
}

models 0

None public yet