Coding Systems

Bar Codes

In January, the US is joining the rest of the world in using 13 digit, as opposed to 12 digit, bar codes. The current US system is called UPN, or 'Universal Product Code'. The system everyone else uses is EAN, so called because it is administered by the European Assistance Network.

Bar codes are quite an interesting topic in themselves. When I can I'll put something together about them.

Morse Code

Morse Code is probably one of the more famous coding systems. It was demonstrated in 1836 by Samuel Morse for use initially on telegraphs. It is still used today by radio operators, and will often get through when a voice communication would be incomprehensible.

It is a system made of dots and dashes. A dash is three times the length of a dot, three 'units'. The space between characters is three units, and that between words is seven units. There is one unit between components of a character.

It is essentially a binary system, a signal is 'on' or 'off', and information is transmitted by varying the length of the signals.

Letter Morse Letter Morse Digit Morse
A .- N -. 0 -----
B -... O --- 1 .----
C -.-. P .--. 2 ..---
D -.. Q --.- 3 ...--
E . R .-. 4 ....-
F ..-. S ... 5 .....
G --. T - 6 -....
H .... U ..- 7 --...
I .. V ...- 8 ---..
J .--- W .-- 9 ----.
K -.- X -..-
L .-.. Y -.--
M -- Z --..
Letter Morse Punctuation Mark Morse
Ä .-.- Full-stop (period) .-.-.-
Á .--.- Comma --..--
Ã… .--.- Colon ---...
Ch ---- Question mark (query) ..--..
É ..-.. Apostrophe .----.
Ñ --.-- Hyphen -....-
Ö ---. Fraction bar -..-.
Ü ..-- Brackets (parentheses) -.--.-
Quotation marks .-..-.
At sign @ .--.-.
Equals sign -...-

Note that American Morse is slightly different (as might be expected from the US!), particularly the full stop (period), comma and question mark (query).

It should be noted that Morse Code uses patterns of varying length. The most common English letter, 'E', is given the shortest symbol, 'dit'. The next most common, 'T', has the next shortest, 'dah'. 'A' is 'didah' and 'I' is 'didit'. This is not coincidence!

ASCII

ASCII is a very well known coding system. It is an 8 bit system and is used in computer systems to allow characters and symbols to be represented as strings of numbers. At the fundamental level a computer deals in lists of numbers.

Computers operate on a binary system, in other words they have only two digits, these are 'zero' and 'one'.

To count, we begin as usual. 0, 1 .... then run out of digits. When counting in decimal we run out of digits at 9, and then move to the next column.

So we have: 0, 1, 10, 11, 100, 101, 110, 111, 1000.

 0   0000   0
 1   0001   1
 2   0010   2
 3   0011   3
 4   0100   4
 5   0101   5
 6   0110   6
 7   0111   7
 8   1000   8
 9   1001   9
10   1010   A
11   1011   B
12   1100   C
13   1101   D
14   1110   E
15   1111   F
16  10000  10

This all leads to the old joke that there are 10 kinds of people in the world, those who understand binary and those who don't.

Note that the columns aren't 'units', 'tens', hundreds', but are instead '1' , '2', '4', '8', '16' etc.

Using this we can decode a binary number: 01001010 is 1 lot of '2', 1 lot of '8' and 1 lot of '64', So 01001010 is 74 in decimal.

The right hand column above is 'hexadecimal'. It's a counting system with 16 digits, 0 through to F, when we run out of digits we move to the next column. This is convenient for using computers as strings of bits can be broken up into lumps of 4 digits. So 01001010 becomes 0100 1010, and in hex this is written as 4A (which means 4 lots of 16, plus 10 ('A' is 10 in decimal). Hex is most commonly seen today when specifying colours in webpages, or when using painting packages. To differentiate these systems, people oftem write binary like this: %10 (binary for '2') and hex like this: $10 (hex for '16)

You should be able to see that with 8 bits, we can easily represent 256 different numbers (0 to 255 in decimal, $00 to $FF in Hex)

The ASCII code assigns a different item to each number. The ASCII code only defines 0 through to 127 (%00000000 to %01111111), the remaining codes could be used for other purposes - several extended ascii codes were used, these would include other characters such as 'é'

This is the ASCII chart. The number down the side is the first digit, and the number along the top is the second digit. The 'low' codes tend to be reserved for 'control' codes.

Thus $20 ('32' in decimal) is a space, and 'A' is $41 (65 in decimal)

    0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F
0  NUL SOH STX ETX EOT ENQ ACK BEL BS  HT  LF  VT  FF  CR  SO  SI
1  DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM  SUB ESC FS  GS  RS  US
2   SP  !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /
3   0   1   2   3   4   5   6   7   8   9   :   ;   <   =   >   ?
4   @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O
5   P   Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _
6   `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o
7   p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~ DEL

Some of the control codes are quite fun, $07 is a 'bell'. When I was at school we used Lynx computers with a single printer in the room. We discovered that sending an ascii '$07' to the printer would make it beep. We soon were using it to send morse messages to each other. A primative instant messaging system!

Note that whilst 'A' is $41, 'a' is $61. In binary, this is %00100001 and %01100001 - the lower case and upper case characters differ by a single bit. This is quite intentional.

Of course, ASCII is rather limited, especially when we consider the many languages around the world. This is why there is much work on Unicode