| --- |
| MachineLearningML: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML |
| license: apache-2.0 |
| base_model: |
| - Qwen/Qwen2.5-7B-Instruct |
| --- |
| |
| # MachineLearningLM |
|
|
| ## model summary |
|
|
| Can LLMs learn from 1,000 in-context examples? |
|
|
| Introducing **MachineLearningLM** 🧪📊 — a model continuously pretrained on millions of synthetic tabular ML tasks, enabling robust many-shot in-context learning. |
|
|
| 📈 **Scales from 8 to 1,024 examples** |
|
|
| 📈 **~15% improvement** on unseen tabular tasks compared to o3-mini / GPT-5-mini / Qwen-2.5-7B |
|
|
| 🌲 **Random-Forest–level robustness** |
| |
| 🧠 **MMLU score: 75.4%** |
|
|
| 📄 Read the paper: https://huggingface.co/papers/2509.06806 |
|
|
| GitHub: https://github.com/HaoAreYuDong/MachineLearningLM |
|
|
| ## evaluation and validation |
|
|
| We have developed an automated evaluation framework — simply configure the parameters to easily perform validation and evaluation. |
| **The code is now open-sourced at our GitHub.** |
|
|
| **Quick Start** |
|
|
| ```bash |
| pip install -r requirements.txt |
| python ./src/evaluation/model_pred/dl_model_pred.py \ |
| --input_dir ./demo_input.jsonl \ |
| --output_dir ./demo_output.jsonl \ |
| --model_name MachineLearningLM/MachineLearningLM-7B-v1 |
| ``` |
| **pipeline** |
| ```bash |
| # modify the evaluate_parameters.sh file |
| source evaluate_parameters.sh |
| |
| # Option 1 End-to-End Pipeline |
| ./scripts/evaluate_pipeline.sh |
| |
| # Option 2 Parallel Processing |
| ./scripts/multi_process/data_prep.sh |
| ./scripts/multi_process/prompt_gen.sh # For deep learning only |
| ./scripts/multi_process/model_pred.sh |
| ./scripts/multi_process/evaluation.sh |
| ./scripts/multi_process/report.sh |
| |
| # Option3 Sequential Processing |
| ./scripts/single_process/data_prep.sh |
| ./scripts/single_process/prompt_gen.sh # For deep learning only |
| ./scripts/single_process/model_pred.sh |
| ./scripts/single_process/evaluation.sh |
| ./scripts/single_process/report.sh |
| ``` |
|
|
| **Quants** |
|
|
| https://huggingface.co/mradermacher/MachineLearningLM-7B-v1-GGUF |
|
|
| For more usage details, please visit our GitHub. |
|
|
|
|