Index of Coincidences – A worked example

This article is designed to walk you through the breaking of a Beaufort Cipher. The same method can be used for Vigenère – or indeed any time when a repeating is key applied letter by letter.

Suppose that we wished to crack the following message:

VKMHG QFVMO IJOII OHNSN IZXSS CSZEA WWEXU
LIOZB AGEKQ UHRDH IKHWE OBNSQ RVIES LISYK
BIOVF IEWEO BQXIE UUIXK EKTUH NSZIB SWJIZ
BSKFK YWSXS EIDSQ INTBD RKOZD QELUM AAAEV
MIDMD GKJXR UKTUH TSBGI EQRVF XBAYG UBTCS
XTBDR SLYKW AFHMM TYCKU JHBWV TUHRQ XYHWM
IJBXS LSXUB BAYDI OFLPO XBULU OZAHE JOBDT
ATOUT GLPKO FHNSO KBHMW XKTWX SX

We might begin by trying to establish the key length. This is done by using the ‘Index of Coincidences’.

Essentially, the message is ’shifted’ along, compared with the original, and the number of matches are counted.

Original: VKMHGQFVMOIJOIIOHNSNIZXSSCSZEA...
 Shift 1: KMHGQFVMOIJOIIOHNSNIZXSSCSZEAW...
 Shift 2: MHGQFVMOIJOIIOHNSNIZXSSCSZEAWW...
 Shift 3: HGQFVMOIJOIIOHNSNIZXSSCSZEAWWE...
 Shift 4: GQFVMOIJOIIOHNSNIZXSSCSZEAWWEX...
 Shift 5: QFVMOIJOIIOHNSNIZXSSCSZEAWWEXU...
 Shift 6: FVMOIJOIIOHNSNIZXSSCSZEAWWEXUL...
 Shift 7: VMOIJOIIOHNSNIZXSSCSZEAWWEXULI...

When this is done for the full message above, we get the following number of coincidences:

Shift Coincidences
1 8
2 12
3 11
4 13
5 9
6 25
7 11

Notice the big leap in coincidences at a shift of 6? There is another leap (though not as noticeable) at a shift of 12 too. This implies that the key is 6 letters long.

If this isn’t immediately clear, then consider the what is happening. If we match a random string of letters against another random string, we would expect roughly 1 match in 26 letters. So in 260 letters we’d expect roughly 10 coincidences (though we would not be surprised by deviations from this).

English, or any language, does not consist of a random string of letters. A random English sentence compared to another English sentence would see more matches as some letters are more common than others.

If we shift the ciphertext against itself by a distance equal to the size of the key, we are comparing letters which are encoded using the same key letter. Therefore ‘e’ will be encoded the same way each time – as will the other letters. So we’d expect more matches than for an arbitrary shift (where ‘e’ will have been encoded using different key letters).

Once the key length is established, the rest of the decode should come quite readily – but I’ll save this for a later article.

Series Information

"Index of Coincidences – A worked example" is 29th in a larger sequence of 47 posts

  • Add to Delicious
  • Digg This Post
  • Stumble This Post
  • RSS Feed
This entry was posted in Classical Cryptography and tagged , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

One Comment

  1. Matthew
    Posted October 17, 2004 at 12:56 am | Permalink

    I’m curious as to how dependable this method is for short keylengths. Just looking over the frequency charts for English, I think we would expect one out of fifteen pairs of randomly selected letters to be the same, but looking at shifts of 1 and 2 on plaintext, I’m not seeing anywhere near that level of hits. Of course, a keylength of 1 indicates a monoalphabet and is a trivial case, but one might like a test that turns up a distinctly negative answer instead of an indeterminate one.

    Another technique that people use to find the keylength of periodic ciphers is the Kasiski method, where you find the gaps between sufficiently long repeated texts. For instance, in your sample message, WEOB appears twice 24 characters apart, KTUH 60 characters apart, and TBDR 54 characters apart. Assuming that it is very unlikely for tetragrams to repeat in a 250 character passage by chance, we conclude that these are probably the same words or partial words encoded by the same part of the key. From that, we first investigate the greatest common factor of 24, 54, and 60, which agrees that the key-length is 6.

    (Edit: This method does rely on reasonable amounts of ciphertext – as does Kasiski. The best it can do is to give an indication. When I was writing this I tried a shorter length of puzzle and found that by chance there was not a noticeable spike at the key length. With a key length of ‘1′ we would expect it to be ‘all spike’ – i.e. all at the 1 in 15. Note I haven’t confirmed that figure myself. Like any method for finding key length, it is a tool in the toolbox, no more, no less.

    If the text is to short it won’t work. Similarly, there is some probability that the text will be ‘freakish’ in some way, and a spike won’t appear. In which case, tough. Best case scenario, use a different tool, worst case assume Key Length=1, try to solve. Repeat for Key Length=2 and so on.

    I’d better do that Kasiski article now! -Murk )

    [Obviously, you can cut this out if I'm forecasting your next entry. :) ]

    (Shucks, that’d be unfair! – Murk )

Post a Comment

Your email is never published nor shared.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Subscribe without commenting

  • Categories

  • Archives

  • Recent Comments