Suppose that we wished to crack the following message:
VKMHG QFVMO IJOII OHNSN IZXSS CSZEA WWEXU LIOZB AGEKQ UHRDH IKHWE OBNSQ RVIES LISYK BIOVF IEWEO BQXIE UUIXK EKTUH NSZIB SWJIZ BSKFK YWSXS EIDSQ INTBD RKOZD QELUM AAAEV MIDMD GKJXR UKTUH TSBGI EQRVF XBAYG UBTCS XTBDR SLYKW AFHMM TYCKU JHBWV TUHRQ XYHWM IJBXS LSXUB BAYDI OFLPO XBULU OZAHE JOBDT ATOUT GLPKO FHNSO KBHMW XKTWX SX
We might begin by trying to establish the key length. This is done by using the 'Index of Coincidences'.
Essentially, the message is 'shifted' along, compared with the original, and the number of matches are counted.
Original: VKMHGQFVMOIJOIIOHNSNIZXSSCSZEA... Shift 1: KMHGQFVMOIJOIIOHNSNIZXSSCSZEAW... Shift 2: MHGQFVMOIJOIIOHNSNIZXSSCSZEAWW... Shift 3: HGQFVMOIJOIIOHNSNIZXSSCSZEAWWE... Shift 4: GQFVMOIJOIIOHNSNIZXSSCSZEAWWEX... Shift 5: QFVMOIJOIIOHNSNIZXSSCSZEAWWEXU... Shift 6: FVMOIJOIIOHNSNIZXSSCSZEAWWEXUL... Shift 7: VMOIJOIIOHNSNIZXSSCSZEAWWEXULI...
When this is done for the full message above, we get the following number of coincidences:
Notice the big leap in coincidences at a shift of 6? There is another leap (though not as noticeable) at a shift of 12 too. This implies that the key is 6 letters long.
If this isn't immediately clear, then consider the what is happening. If we match a random string of letters against another random string, we would expect roughly 1 match in 26 letters. So in 260 letters we'd expect roughly 10 coincidences (though we would not be surprised by deviations from this).
English, or any language, does not consist of a random string of letters. A random English sentence compared to another English sentence would see more matches as some letters are more common than others.
If we shift the ciphertext against itself by a distance equal to the size of the key, we are comparing letters which are encoded using the same key letter. Therefore 'e' will be encoded the same way each time - as will the other letters. So we'd expect more matches than for an arbitrary shift (where 'e' will have been encoded using different key letters).
Once the key length is established, the rest of the decode should come quite readily - but I'll save this for a later article.