| --- |
| license: apache-2.0 |
| language: |
| - en |
| library_name: transformers |
| tags: |
| - code |
| - python |
| - maincoder |
| - code-generation |
| - reinforcement-learning |
| - mcpo |
| pipeline_tag: text-generation |
| |
| --- |
| <img src="https://huggingface.co/datasets/Maincode/assets/resolve/e51154e034201be1a5dad0e9c8de31d8b9f17643/maincoder_logo.png" alt="" width="1250"> |
|
|
| [**Maincoder-1B**](https://maincode.com/maincoder/) is a code-focused language model optimized for code generation and completion tasks. The model achieves strong performance on coding benchmarks while maintaining a compact size suitable for local deployment. |
|
|
| # Key Features |
|
|
| - **Code Generation**: Optimized for Python code completion and generation tasks. |
| - **Compact Size**: 1 billion parameters, lightweight enough to run on consumer hardware. |
| - **Deep Architecture**: Modern transformer architecture with RoPE embeddings, grouped-query attention, QK normalization and high depth-to-width ratio. |
| - **Advanced Data Mixing**: Pre-trained and mid-trained on custom data mixes developed for high-performance coding. |
| - **MCPO Algorithm**: Fine-tuned with specialised reinforcement learning policy optimisation algorithm to improve training stability and accelerate convergence. |
| - **SOTA Performance**: State-of-the-art performance on Python coding benchmarks HumanEval, HumanEval+ and MBPP+. |
|
|
| # Benchmark Results |
|
|
| <img src="https://huggingface.co/datasets/Maincode/assets/resolve/main/performance_h.png" alt="Benchmark Performance Across Baseline LLMs" width="1050"> |
|
|
| | Model | HumanEval | HumanEval+ | MBPP+ | MMLU | GSM8K | |
| |---|---:|---:|---:|---:|---:| |
| | [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B) | **0.7622** | **0.7256** | **0.7090** | 0.3054 | 0.2976 | |
| | [deepseek-ai/deepseek-coder-1.3b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct) | 0.5610 | 0.5305 | 0.6217 | 0.2705 | 0.0413 | |
| | [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) | 0.5366 | 0.5000 | 0.6799 | **0.5928** | 0.5505 | |
| | [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) | 0.4634 | 0.4451 | 0.6561 | 0.4984 | 0.4944 | |
| | [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) | 0.4024 | 0.3780 | 0.5582 | 0.5571 |**0.6865** | |
|
|
| # Model Overview |
|
|
| Maincoder uses a modern transformer decoder architecture with: |
|
|
| - **Rotary Position Embeddings**: With theta of 1,000,000. |
| - **RMSNorm**: Pre-normalization for stable training. |
| - **Grouped Query Attention**: 4:1 ratio of query to key-value heads. |
| - **QK Normalization**: RMSNorm applied to attention queries and keys. |
| - **SwiGLU MLP**: Gated linear units with SiLU activation. |
|
|
| | Attribute | Value | |
| |-----------|-------| |
| | Parameters | 1B | |
| | Hidden Size | 1536 | |
| | Layers | 32 | |
| | Attention Heads | 16 (4 KV heads) | |
| | Head Dimension | 96 | |
| | Vocabulary Size | 151,936 | |
| | Context Length | 2,048 | |
| | Precision | bfloat16 | |
|
|
| # Usage |
|
|
| ### Installation |
|
|
| ```bash |
| pip install transformers torch |
| ``` |
|
|
| ### Quick Start |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| "Maincode/Maincoder-1B", |
| torch_dtype="auto", |
| device_map="auto", |
| trust_remote_code=True, |
| ) |
| tokenizer = AutoTokenizer.from_pretrained( |
| "Maincode/Maincoder-1B", |
| trust_remote_code=True, |
| ) |
| |
| # Code completion example |
| prompt = '''def fibonacci(n: int) -> int: |
| """Return the n-th Fibonacci number.""" |
| ''' |
| |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=256, |
| temperature=0.2, |
| do_sample=True, |
| ) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| ### Code Completion |
|
|
| ```python |
| # Function completion |
| prompt = '''def quicksort(arr: list) -> list: |
| """Sort a list using the quicksort algorithm.""" |
| ''' |
| |
| # Class completion |
| prompt = '''class BinarySearchTree: |
| """A binary search tree implementation.""" |
| |
| def __init__(self): |
| ''' |
| |
| # Algorithm implementation |
| prompt = '''def dijkstra(graph: dict, start: str, end: str) -> tuple: |
| """Find the shortest path using Dijkstra's algorithm. |
| |
| Args: |
| graph: Adjacency list representation of the graph |
| start: Starting node |
| end: Target node |
| |
| Returns: |
| Tuple of (distance, path) |
| """ |
| ''' |
| ``` |
|
|
| # Additional Notes |
|
|
| ## Reproducibility |
|
|
| <details> |
| <summary>Model evaluations were run on 8 AMD MI355X GPUs via the <a href="https://github.com/EleutherAI/lm-evaluation-harness">EleutherAI</a> framework.</summary> |
|
|
| ```bash |
| docker run --rm -it \ |
| --device=/dev/kfd --device=/dev/dri --group-add=video \ |
| --ipc=host --security-opt seccomp=unconfined \ |
| -v $(pwd):/workspace -w /workspace \ |
| -e HF_TOKEN \ |
| -e PYTHONHASHSEED=0 \ |
| -e TORCH_DETERMINISTIC=1 \ |
| -e ROCBLAS_ATOMICS_MODE="0" \ |
| -e MIOPEN_FIND_MODE="1" \ |
| -e CUBLAS_WORKSPACE_CONFIG=":4096:8" \ |
| -e HF_ALLOW_CODE_EVAL="1" \ |
| rocm/pytorch:rocm7.1.1_ubuntu24.04_py3.12_pytorch_release_2.9.1 \ |
| bash -c 'pip install "lm_eval[hf]" && \ |
| accelerate launch -m lm_eval \ |
| --model hf --model_args "pretrained=Maincode/Maincoder-1B,trust_remote_code=True,dtype=float32" \ |
| --tasks humaneval,humaneval_plus,mbpp_plus,mmlu,gsm8k \ |
| --device cuda:0 --batch_size 32 --seed 42 \ |
| --confirm_run_unsafe_code' |
| ``` |
|
|
| </details> |
|
|
| ## Limitations |
|
|
| - Context length limited to 2,048 tokens |
| - Primarily optimized for Python, performance may vary on other languages |
| - May generate code with bugs or security issues - always review generated code |
|
|
| <div style="margin-left:14px; border-left:4px solid #3b82f6; background:rgba(59,130,246,0.08); padding:8px 10px; border-radius:8px; font-size:0.92em; margin:10px 0;"> |
| <strong>Disclaimer</strong>: This model has <strong>not</strong> undergone any alignment or safety tuning (e.g., RLHF/RLAIF, DPO, or safety fine-tuning). Outputs may be unsafe or biased. Please use appropriate safeguards and evaluate carefully for your use case. |
| </div> |
|
|
| ## License |
|
|
| This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{maincoder2025, |
| title = {Maincoder-1B: A High-Performance 1B Parameter Coding Model}, |
| author = {Maincode Team}, |
| year = {2025}, |
| organization = {Maincode}, |
| howpublished = {\url{https://huggingface.co/Maincode/Maincoder-1B}} |
| } |
| ``` |
|
|
| ## Contact |
|
|
| For questions, issues, or collaboration inquiries, please visit [Maincode](https://maincode.com). |
|
|
|
|
|
|