OpenTransformer commited on
Commit
892b5b4
·
verified ·
1 Parent(s): 9d43dda

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +27 -9
README.md CHANGED
@@ -1,3 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Binary Transformers: Learning Language from Raw Binary
2
 
3
  **Zero-tokenization transformers that learn directly from network bytes, bits, and beyond.**
@@ -22,39 +35,44 @@ Traditional LLMs use tokenizers (BPE, SentencePiece) with 32k-256k vocabulary. T
22
 
23
  These models learn directly from raw binary data - no tokenizer, no preprocessing, just bytes flowing into neural networks. The ultimate goal: **wire-speed learning** where models absorb network traffic in real-time.
24
 
25
- ## Results
26
 
27
  ### Byte-Level (vocab=256)
28
  ```
29
  Data: 350KB web crawl
30
  BPB: 4.68 (vs 8.0 random = 41% compression)
31
  Speed: 8.7 KB/s learning rate
 
32
  ```
33
  Learns HTML structure, XML tags, timestamps from raw bytes.
34
 
35
  ### Bit-Level (vocab=2)
36
  ```
37
  Data: 550KB
38
- Entropy: 1.008 bit/bit (vs 1.0 random)
39
  Speed: 0.7 KB/s
 
40
  ```
41
  Pure binary learning - discovers byte boundaries and ASCII from 0s and 1s.
42
 
43
  ### Dibit (vocab=4: 00,01,10,11)
44
  ```
45
- Data: 37KB
46
- BPB: 7.70 (vs 8.0 random = 3.7% compression)
47
- Speed: 0.26 KB/s
 
48
  ```
49
- 2-bit tokens provide 2x context efficiency vs bit-level.
50
 
51
  ### Pure Binary (vocab=2, binary weights)
52
  ```
53
- Data: 37KB
54
- Entropy: 1.027 bit/bit
55
  Binary params: 99.8%
 
56
  ```
57
- **BITS ALL THE WAY DOWN** - input bits, binary weights, output bits. On specialized hardware, this enables XNOR+popcount operations instead of multiply-accumulate.
 
58
 
59
  ## Architecture
60
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - binary-neural-network
5
+ - zero-tokenization
6
+ - wire-speed-learning
7
+ - bit-level
8
+ - byte-level
9
+ language:
10
+ - en
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
  # Binary Transformers: Learning Language from Raw Binary
15
 
16
  **Zero-tokenization transformers that learn directly from network bytes, bits, and beyond.**
 
35
 
36
  These models learn directly from raw binary data - no tokenizer, no preprocessing, just bytes flowing into neural networks. The ultimate goal: **wire-speed learning** where models absorb network traffic in real-time.
37
 
38
+ ## Results (Live Experiments - 16 Jan 2026)
39
 
40
  ### Byte-Level (vocab=256)
41
  ```
42
  Data: 350KB web crawl
43
  BPB: 4.68 (vs 8.0 random = 41% compression)
44
  Speed: 8.7 KB/s learning rate
45
+ Params: 0.6M
46
  ```
47
  Learns HTML structure, XML tags, timestamps from raw bytes.
48
 
49
  ### Bit-Level (vocab=2)
50
  ```
51
  Data: 550KB
52
+ Entropy: 1.008 bit/bit (vs 1.0 random = 0.8% compression)
53
  Speed: 0.7 KB/s
54
+ Params: 85M
55
  ```
56
  Pure binary learning - discovers byte boundaries and ASCII from 0s and 1s.
57
 
58
  ### Dibit (vocab=4: 00,01,10,11)
59
  ```
60
+ Data: 437KB
61
+ BPB: 7.55 (vs 8.0 random = 5.7% compression)
62
+ Speed: 0.25 KB/s
63
+ Params: 37.8M
64
  ```
65
+ 2-bit tokens provide 2x context efficiency vs bit-level. **Best compression so far!**
66
 
67
  ### Pure Binary (vocab=2, binary weights)
68
  ```
69
+ Data: 806KB
70
+ Entropy: 0.995 bit/bit (0.5% compression)
71
  Binary params: 99.8%
72
+ Params: 4.7M
73
  ```
74
+ **BITS ALL THE WAY DOWN** - input bits, binary weights (-1/+1), output bits.
75
+ On specialized hardware, this enables XNOR+popcount operations instead of multiply-accumulate.
76
 
77
  ## Architecture
78