OpenTransformer
/

binary-transformers

Text Generation

binary-neural-network

zero-tokenization

wire-speed-learning

Model card Files Files and versions

OpenTransformer commited on 10 days ago

Commit

892b5b4

·

verified ·

1 Parent(s): 9d43dda

Upload folder using huggingface_hub

Files changed (1) hide show

README.md +27 -9

README.md CHANGED Viewed

@@ -1,3 +1,16 @@
 # Binary Transformers: Learning Language from Raw Binary
 **Zero-tokenization transformers that learn directly from network bytes, bits, and beyond.**
@@ -22,39 +35,44 @@ Traditional LLMs use tokenizers (BPE, SentencePiece) with 32k-256k vocabulary. T
 These models learn directly from raw binary data - no tokenizer, no preprocessing, just bytes flowing into neural networks. The ultimate goal: **wire-speed learning** where models absorb network traffic in real-time.
-## Results
 ### Byte-Level (vocab=256)
 ```
 Data: 350KB web crawl
 BPB: 4.68 (vs 8.0 random = 41% compression)
 Speed: 8.7 KB/s learning rate
 ```
 Learns HTML structure, XML tags, timestamps from raw bytes.
 ### Bit-Level (vocab=2)
 ```
 Data: 550KB
-Entropy: 1.008 bit/bit (vs 1.0 random)
 Speed: 0.7 KB/s
 ```
 Pure binary learning - discovers byte boundaries and ASCII from 0s and 1s.
 ### Dibit (vocab=4: 00,01,10,11)
 ```
-Data: 37KB
-BPB: 7.70 (vs 8.0 random = 3.7% compression)
-Speed: 0.26 KB/s
 ```
-2-bit tokens provide 2x context efficiency vs bit-level.
 ### Pure Binary (vocab=2, binary weights)
 ```
-Data: 37KB
-Entropy: 1.027 bit/bit
 Binary params: 99.8%
 ```
-**BITS ALL THE WAY DOWN** - input bits, binary weights, output bits. On specialized hardware, this enables XNOR+popcount operations instead of multiply-accumulate.
 ## Architecture

+---
+license: mit
+tags:
+- binary-neural-network
+- zero-tokenization
+- wire-speed-learning
+- bit-level
+- byte-level
+language:
+- en
+pipeline_tag: text-generation
+---
 # Binary Transformers: Learning Language from Raw Binary
 **Zero-tokenization transformers that learn directly from network bytes, bits, and beyond.**
 These models learn directly from raw binary data - no tokenizer, no preprocessing, just bytes flowing into neural networks. The ultimate goal: **wire-speed learning** where models absorb network traffic in real-time.
+## Results (Live Experiments - 16 Jan 2026)
 ### Byte-Level (vocab=256)
 ```
 Data: 350KB web crawl
 BPB: 4.68 (vs 8.0 random = 41% compression)
 Speed: 8.7 KB/s learning rate
+Params: 0.6M
 ```
 Learns HTML structure, XML tags, timestamps from raw bytes.
 ### Bit-Level (vocab=2)
 ```
 Data: 550KB
+Entropy: 1.008 bit/bit (vs 1.0 random = 0.8% compression)
 Speed: 0.7 KB/s
+Params: 85M
 ```
 Pure binary learning - discovers byte boundaries and ASCII from 0s and 1s.
 ### Dibit (vocab=4: 00,01,10,11)
 ```
+Data: 437KB
+BPB: 7.55 (vs 8.0 random = 5.7% compression)
+Speed: 0.25 KB/s
+Params: 37.8M
 ```
+2-bit tokens provide 2x context efficiency vs bit-level. **Best compression so far!**
 ### Pure Binary (vocab=2, binary weights)
 ```
+Data: 806KB
+Entropy: 0.995 bit/bit (0.5% compression)
 Binary params: 99.8%
+Params: 4.7M
 ```
+**BITS ALL THE WAY DOWN** - input bits, binary weights (-1/+1), output bits.
+On specialized hardware, this enables XNOR+popcount operations instead of multiply-accumulate.
 ## Architecture