Index of Coincidences - A worked example

This article is designed to walk you through the breaking of a Beaufort Cipher. The same method can be used for Vigenère - or indeed any time when a repeating is key applied letter by letter.

Suppose that we wished to crack the following message:

VKMHG QFVMO IJOII OHNSN IZXSS CSZEA WWEXU
LIOZB AGEKQ UHRDH IKHWE OBNSQ RVIES LISYK
BIOVF IEWEO BQXIE UUIXK EKTUH NSZIB SWJIZ
BSKFK YWSXS EIDSQ INTBD RKOZD QELUM AAAEV
MIDMD GKJXR UKTUH TSBGI EQRVF XBAYG UBTCS
XTBDR SLYKW AFHMM TYCKU JHBWV TUHRQ XYHWM
IJBXS LSXUB BAYDI OFLPO XBULU OZAHE JOBDT
ATOUT GLPKO FHNSO KBHMW XKTWX SX

We might begin by trying to establish the key length. This is done by using the 'Index of Coincidences'.

Essentially, the message is 'shifted' along, compared with the original, and the number of matches are counted.

Original: VKMHGQFVMOIJOIIOHNSNIZXSSCSZEA...
 Shift 1: KMHGQFVMOIJOIIOHNSNIZXSSCSZEAW...
 Shift 2: MHGQFVMOIJOIIOHNSNIZXSSCSZEAWW...
 Shift 3: HGQFVMOIJOIIOHNSNIZXSSCSZEAWWE...
 Shift 4: GQFVMOIJOIIOHNSNIZXSSCSZEAWWEX...
 Shift 5: QFVMOIJOIIOHNSNIZXSSCSZEAWWEXU...
 Shift 6: FVMOIJOIIOHNSNIZXSSCSZEAWWEXUL...
 Shift 7: VMOIJOIIOHNSNIZXSSCSZEAWWEXULI...

When this is done for the full message above, we get the following number of coincidences:

Shift Coincidences
1 8
2 12
3 11
4 13
5 9
6 25
7 11

Notice the big leap in coincidences at a shift of 6? There is another leap (though not as noticeable) at a shift of 12 too. This implies that the key is 6 letters long.

If this isn't immediately clear, then consider the what is happening. If we match a random string of letters against another random string, we would expect roughly 1 match in 26 letters. So in 260 letters we'd expect roughly 10 coincidences (though we would not be surprised by deviations from this).

English, or any language, does not consist of a random string of letters. A random English sentence compared to another English sentence would see more matches as some letters are more common than others.

If we shift the ciphertext against itself by a distance equal to the size of the key, we are comparing letters which are encoded using the same key letter. Therefore 'e' will be encoded the same way each time - as will the other letters. So we'd expect more matches than for an arbitrary shift (where 'e' will have been encoded using different key letters).

Once the key length is established, the rest of the decode should come quite readily - but I'll save this for a later article.