Unary coding is an entropy encoding that represents a natural number, n, with n − 1 ones followed by a zero. For example 5 is represented as 11110. Some representations use n − 1 zeros followed by a one. The ones and zeros are interchangeable without loss of generality.
| n |
Unary code |
| 1 |
1 |
| 2 |
01 |
| 3 |
001 |
| 4 |
0001 |
| 5 |
00001 |
| 6 |
000001 |
| 7 |
0000001 |
| 8 |
00000001 |
| 9 |
000000001 |
| 10 |
0000000001 |
Unary coding is an optimally efficient encoding for the following discrete probability distribution

for n = 1,2,3,....
In symbol-by-symbol coding, it is optimal for any geometric distribution

for which k ≥ φ = 1.61803398879…, the golden ratio, or, more generally, for any discrete distribution for which

for n = 1,2,3,.... Although it is the optimal symbol-by-symbol coding for such probability distributions, its optimality can, like that of Huffman coding, be over-stated. Arithmetic coding has better compression capability for the last two distributions mentioned above because it does not consider input symbols independently, but rather implicitly groups the inputs.
A modified unary encoding is used in UTF-8. Unary codes are also used in split-index schemes like the Golomb Rice code. Unary coding is prefix-free, and can be uniquely decoded.
References
- Khalid Sayood, Data Compression, 3rd ed, Morgan Kaufmann.
- Professor K.R Rao, EE5359:Principles of Digital Video Coding.
|