Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,3 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Binary Transformers: Learning Language from Raw Binary
|
| 2 |
|
| 3 |
**Zero-tokenization transformers that learn directly from network bytes, bits, and beyond.**
|
|
@@ -22,39 +35,44 @@ Traditional LLMs use tokenizers (BPE, SentencePiece) with 32k-256k vocabulary. T
|
|
| 22 |
|
| 23 |
These models learn directly from raw binary data - no tokenizer, no preprocessing, just bytes flowing into neural networks. The ultimate goal: **wire-speed learning** where models absorb network traffic in real-time.
|
| 24 |
|
| 25 |
-
## Results
|
| 26 |
|
| 27 |
### Byte-Level (vocab=256)
|
| 28 |
```
|
| 29 |
Data: 350KB web crawl
|
| 30 |
BPB: 4.68 (vs 8.0 random = 41% compression)
|
| 31 |
Speed: 8.7 KB/s learning rate
|
|
|
|
| 32 |
```
|
| 33 |
Learns HTML structure, XML tags, timestamps from raw bytes.
|
| 34 |
|
| 35 |
### Bit-Level (vocab=2)
|
| 36 |
```
|
| 37 |
Data: 550KB
|
| 38 |
-
Entropy: 1.008 bit/bit (vs 1.0 random)
|
| 39 |
Speed: 0.7 KB/s
|
|
|
|
| 40 |
```
|
| 41 |
Pure binary learning - discovers byte boundaries and ASCII from 0s and 1s.
|
| 42 |
|
| 43 |
### Dibit (vocab=4: 00,01,10,11)
|
| 44 |
```
|
| 45 |
-
Data:
|
| 46 |
-
BPB: 7.
|
| 47 |
-
Speed: 0.
|
|
|
|
| 48 |
```
|
| 49 |
-
2-bit tokens provide 2x context efficiency vs bit-level.
|
| 50 |
|
| 51 |
### Pure Binary (vocab=2, binary weights)
|
| 52 |
```
|
| 53 |
-
Data:
|
| 54 |
-
Entropy:
|
| 55 |
Binary params: 99.8%
|
|
|
|
| 56 |
```
|
| 57 |
-
**BITS ALL THE WAY DOWN** - input bits, binary weights, output bits.
|
|
|
|
| 58 |
|
| 59 |
## Architecture
|
| 60 |
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- binary-neural-network
|
| 5 |
+
- zero-tokenization
|
| 6 |
+
- wire-speed-learning
|
| 7 |
+
- bit-level
|
| 8 |
+
- byte-level
|
| 9 |
+
language:
|
| 10 |
+
- en
|
| 11 |
+
pipeline_tag: text-generation
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
# Binary Transformers: Learning Language from Raw Binary
|
| 15 |
|
| 16 |
**Zero-tokenization transformers that learn directly from network bytes, bits, and beyond.**
|
|
|
|
| 35 |
|
| 36 |
These models learn directly from raw binary data - no tokenizer, no preprocessing, just bytes flowing into neural networks. The ultimate goal: **wire-speed learning** where models absorb network traffic in real-time.
|
| 37 |
|
| 38 |
+
## Results (Live Experiments - 16 Jan 2026)
|
| 39 |
|
| 40 |
### Byte-Level (vocab=256)
|
| 41 |
```
|
| 42 |
Data: 350KB web crawl
|
| 43 |
BPB: 4.68 (vs 8.0 random = 41% compression)
|
| 44 |
Speed: 8.7 KB/s learning rate
|
| 45 |
+
Params: 0.6M
|
| 46 |
```
|
| 47 |
Learns HTML structure, XML tags, timestamps from raw bytes.
|
| 48 |
|
| 49 |
### Bit-Level (vocab=2)
|
| 50 |
```
|
| 51 |
Data: 550KB
|
| 52 |
+
Entropy: 1.008 bit/bit (vs 1.0 random = 0.8% compression)
|
| 53 |
Speed: 0.7 KB/s
|
| 54 |
+
Params: 85M
|
| 55 |
```
|
| 56 |
Pure binary learning - discovers byte boundaries and ASCII from 0s and 1s.
|
| 57 |
|
| 58 |
### Dibit (vocab=4: 00,01,10,11)
|
| 59 |
```
|
| 60 |
+
Data: 437KB
|
| 61 |
+
BPB: 7.55 (vs 8.0 random = 5.7% compression)
|
| 62 |
+
Speed: 0.25 KB/s
|
| 63 |
+
Params: 37.8M
|
| 64 |
```
|
| 65 |
+
2-bit tokens provide 2x context efficiency vs bit-level. **Best compression so far!**
|
| 66 |
|
| 67 |
### Pure Binary (vocab=2, binary weights)
|
| 68 |
```
|
| 69 |
+
Data: 806KB
|
| 70 |
+
Entropy: 0.995 bit/bit (0.5% compression)
|
| 71 |
Binary params: 99.8%
|
| 72 |
+
Params: 4.7M
|
| 73 |
```
|
| 74 |
+
**BITS ALL THE WAY DOWN** - input bits, binary weights (-1/+1), output bits.
|
| 75 |
+
On specialized hardware, this enables XNOR+popcount operations instead of multiply-accumulate.
|
| 76 |
|
| 77 |
## Architecture
|
| 78 |
|