Classical Cryptography

Vigenère

The Vigenère cipher is a polyalphabetic substitution. Blaise de Vigenère actually produced a more sophisticated autokey cipher, but through an accident of history his name has become attached to this weaker cipher. The cipher below is not le chiffre indéchiffrable, though it has sometimes been mistakenly called that. This said, the Vigenère cipher is reasonably secure - requiring more work than a simple monoalphabetic substitution. Yet it is still possible to break by pencil and paper methods, and if one knows the techniques it is quite vulnerable. I have heard that a well known scientific magazine said that it was "uncrackable" as late as 1917 - even though it had been broken before then.

Grommit

The Vigenère cipher makes use of a tableau.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
B C D E F G H I J K L M N O P Q R S T U V W X Y Z A
C D E F G H I J K L M N O P Q R S T U V W X Y Z A B
D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
E F G H I J K L M N O P Q R S T U V W X Y Z A B C D
F G H I J K L M N O P Q R S T U V W X Y Z A B C D E
G H I J K L M N O P Q R S T U V W X Y Z A B C D E F
H I J K L M N O P Q R S T U V W X Y Z A B C D E F G
I J K L M N O P Q R S T U V W X Y Z A B C D E F G H
J K L M N O P Q R S T U V W X Y Z A B C D E F G H I
K L M N O P Q R S T U V W X Y Z A B C D E F G H I J
L M N O P Q R S T U V W X Y Z A B C D E F G H I J K
M N O P Q R S T U V W X Y Z A B C D E F G H I J K L
N O P Q R S T U V W X Y Z A B C D E F G H I J K L M
O P Q R S T U V W X Y Z A B C D E F G H I J K L M N
P Q R S T U V W X Y Z A B C D E F G H I J K L M N O
Q R S T U V W X Y Z A B C D E F G H I J K L M N O P
R S T U V W X Y Z A B C D E F G H I J K L M N O P S
S T U V W X Y Z A B C D E F G H I J K L M N O P Q R
T U V W X Y Z A B C D E F G H I J K L M N O P Q R S
U V W X Y Z A B C D E F G H I J K L M N O P Q R S T
V W X Y Z A B C D E F G H I J K L M N O P Q R S T U
W X Y Z A B C D E F G H I J K L M N O P Q R S T U V
X Y Z A B C D E F G H I J K L M N O P Q R S T U V W
Y Z A B C D E F G H I J K L M N O P Q R S T U V W X
Z A B C D E F G H I J K L M N O P Q R S T U V W X Y

The cipher also requires a key, the ciphertext is formed by first writing the key underneath the plaintext. In the following example I will use the plaintext "electronics for dogs" and the key "grommit".

 Plaintext : electronics for dogs
       Key : GROMMITGROM MIT GROM
Ciphertext :

The question remains: How do I now form the ciphertext?

To do form the ciphertext we need to use the Vigenère tableau, the plaintext is encrypted letter by letter using the correspponding key letter.

Using the tableau, a paper copy is useful, locate the first plaintext letter, "E", along the left hand side of the table, then locate the corresponding letter of the key, "G", along the top of the table.

Reading along from the E and down from the G will give the ciphertext letter, in this case the ciphertext is "K".

Messy Grommit

Continuing this for all of the other letters we can obtain the full ciphertext.

 Plaintext : electronics for dogs
       Key : GROMMITGROM MIT GROM
Ciphertext : KCSOFZHTZQE RWK JFUE

You'll notice that a given letter of the plaintext does not yield the same letter of the ciphertext each time it appear. For instance the word "electronics" with the key "grommit" yields the ciphertext "kcsofzhtzqe" - the plaintext letter "e" yields both the ciphertext letters "k" and "s". This is a feature of any polyalphabetic cipher. Polyalphabetic meaning "many alphabets".

Decoding is easy for anyone with the key. One just needs to find the letter of the key from the side of the table, read along the row to find the ciphertext letter and then move to the top of the column to find the original plaintext letter.

How can it be Broken?

Obviously, a Vigenère cipher is more secure than a straightforward monoalphabetic substitution to a casual analysis. Nevertheless, the cipher is still vulnerable to attack.

The first job is to find the length of the key. This may be done by using the Method of Coincidences. The plaintext is shifted against itself and the number of matches between letters is counted. A random shift will give a low number, a shift which is a multiple of the key length will give a high number. This is because some letters are more frequent in English, and a letter is more likely to match a copy of itself which has been coded with the same keyletter.

Using the ciphertext above, a shift of 3 gives the following...

KCSOF ZHTZQ ERWKJ FUE.. .
   KC SOFZH TZQER WKJFU E

There is one coincidence. The ciphertext would be shifted different amounts in order to select the most likely key length. (Obviously there is not enough ciphertext in this example to produce a statistically significant result, but hopefully you get the idea.)

That bit is quite important, go back and make sure you get it.

So, we have the key length, how does this help?

Let's assume that the key is found to be five letters long. We may then divide the message up into ciphertext which has all been enciphered with the same letter. In other words, letters one, six, eleven etc. were all encrypted with the first letter of the key, letters two, seven, twelve etc. were all encrypted with the second letter and so on.

Each "group" of letters is simply enciphered by a Caesar shift (think what happens if the Vigenere cipher is used with the key GGG). Thus the key letter may simply be found by observing the relative frequencies of letters. Once the key letter has been determined for each group then the message may be reassembled. The cipher is solved.

Edit (17/10/04): A series of posts which show how Beaufort can be analysed are elsewhere on this site. Vigenère can be analysed using identical methods, the only difference is in how the key is applied to the text.

Monoalphabetic Substitution

A monoalphabetic substitution is one where a letter ofplaintext always produces the same letter of ciphertext.

The simplest examples of monoalphabetic substitutions are probably the Caesar Cipher and Atbash.

These are special cases, of the more general substitution, so you may like to read the description of these first.

In general, an example of a monoalphabetic substitution is shown below.

PLAINTEXT   a b c d e f g h i j k l m
CIPHERTEXT  Q R S K O W E I P L T U Y

PLAINTEXT   n o p q r s t u v w x y z
CIPHERTEXT  A C Z M N V D H F G X J B

You may naïvely think that this cipher is secure, after all there are 26! different cipher alphabets ( 4 x 1026 ) to choose from, however the letter frequencies and underlying patterns will be unchanged - and as such the cipher can be solved by pen and paper techniques. The best way to see how the cryptanalysis is performed is by doing some analysis.

Let us try to solve a monoalphabetic cipher. In solving this I will not use any of the more advanced techniques available.

These more advanced techniques include looking systematically at the position of letters in words in order to identify vowels, pattern words, and looking at the letter frequencies, though common pairings (TH, HE etc.) may come up. For more detail, see books such as cryptanalysis by Helen Fouche Gaines

Note, this cipher was NOT constructed specially for this page, it was generated automatically by a program and I solved it as I wrote the page. (I no longer have a copy of the program)

Here is the ciphertext:

Ftt bdv bfhvq efno ukvj f tnmvbnxv ic bdv 

fkvjfyv Fxvjnlfz fjv qevzb ic bdv yukvjzxvzb 

nz tvqq bdfz f qvluzo.

                                      Pnx Mnviny

Let's count some letters. The most common letter is V, followed by F, N, B, Z. The most common English letters are ETAIN... it is highly likely that we have some matches here - though it is not a certainty. Also 'F' probably stands for I or A as 'F' appears alone more than once.

It should also be noted that where V ends a three letter word, that word is 'BDV'. The most common trigraph (three letter sequence) in English is 'THE' - so let us guess that 'BDV' is 'THE'.

    the t  e         e       et  e    the 
Ftt bdv bfhvq efno ukvj f tnmvbnxv ic bdv 

  e   e   e        e   e t    the    e   e t 
fkvjfyv Fxvjnlfz fjv qevzb ic bdv yukvjzxvzb 

    e   th      e
nz tvqq bdfz f qvluzo.

                                            e
                                      Pnx Mnviny

The sequence 'BDFZ' gives a little hope, as f is likely to be 'A' or 'I' then 'BDFZ' could be 'THIS' or 'THAN' (not 'THAT'!).

What else can we establish? Either N or Z is likely to be a vowel ('NZ') - and so is I or C ('IC').

Looking at the first word, Ftt - F is likely to be A or I, so what could this be?

  • ADD
  • AHH
  • ALL
  • ANN
  • ASS
  • ILL

I think that the only reasonable answer is 'ALL'. This, in turn, implies that 'BDFZ' is 'THAN'. Filling in these assumptions give us the following:

All the ta e   a     e  a l  et  e    the
Ftt bdv bfhvq efno ukvj f tnmvbnxv ic bdv

a e a e A e   an a e   ent    the    e n ent 
fkvjfyv Fxvjnlfz fjv qevzb ic bdv yukvjzxvzb 

 n le   than a  e  n
nz tvqq bdfz f qvluzo.

                                            e
                                      Pnx Mnviny

It's at this stage where, if all is correct, words will appear. I spent a few moments looking, and I thought I recognised a word in 'yukvjzxvzb' - '***e*n*ent'.

To my mind, this looks like 'govErNmENT'. Trying these letters, 'Fxvjnlfz' reveals itself to be 'Amer**an'. This is looking promising, let us assume further that 'nl' represents 'ic'.

All the ta e   ai  over a li etime    the
Ftt bdv bfhvq efno ukvj f tnmvbnxv ic bdv

average American are   ent    the government 
fkvjfyv Fxvjnlfz fjv qevzb ic bdv yukvjzxvzb 

in le   than a secon 
nz tvqq bdfz f qvluzo.

                                       im  ie ig
                                      Pnx Mnviny

We immediately recognise the words 'taxes' and 'lifetime', and can guess therefore that 'E' represents 'P', 'O' represents 'D', and 'IC' represents 'BY'.

We therefore have:

All the taxes paid over a lifetime by the 

average American are spent by the government 

in less than a second

                                      *im *iebig
                                      Pnx Mnviny

Now, all we have are the last two letters to account for. These are a bit of a shot in the dark as we have no way of checking the letters elsewhere in the text, and the name might be something unusual, like Fim Qiebig.

Unaccounted for letters are: F, J, K, Q, U, W, Z.
The name could be Kim or Jim, Fiebig, Wiebig or Ziebig.

I would have to try and look this up as I don't know of any person by any of these names. Fortunately, I do have the facility to cheat on my monoalphabet generator, and this quote was by someone called Jim Fiebig.

Try this one for yourself... please don't email me for a solution - I won't give it to you! A solution may be found by viewing the HTML comments on this page

Yxdy pq  yjc xzpvpyw ya icqdepzc ayjceq xq 

yjcw qcc yjcuqcvrcq.

                                   Xzexjxu Vpsdavs

When cryptanalysing more complex ciphers, such as Vigenère, one of the first steps could be to try and reduce the cipher into a series of monoalphabetic ciphers.

Of course the analysis may be done automatically by a computer program which observes letter positions and frequencies etc.


26!

26! is a shorthand way of writing the number given by

26 x 25 x 24 x 23 x 22 x 21 ..... 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1


4 x 1026

4 x 1026 is an example of scientific notation. It is a shorthand way of wring very large, or very small numbers. e.g. an atomic nucleus is about 10-15 metres across.

4 x 1026, is an approximation to 26! and corresponds to the following number.

400 000 000 000 000 000 000 000 000
I am sure you'll agree that this is a lot of possible combinations!

Playfair

The Playfair cipher was invented by a rather clever chap by the name of Wheatstone. Playfair's name is attached to it as he is the one who was a vocal supporter of it in government circles. Sir Charles WheatstoneHistory is funny like that.

Playfair first demonstrated this cipher at a dinner in 1854. The dinner was given by a lord Granville, and a notable guest was Lord Palmerston.

The cipher is a form of monoalphabetic substitution, but relies on DIGRAPHS rather than single letters - and it is simple to master. The playfair cipher is believed to be the first digraphic system.

Again, we start with a keyword and then place the remaining letters in a 5x5 square - for instance using "cryptogram" as a keyword we obtain:

CRYPTOGAM
BDEFHIKLN
QSUVWXZ--

This can be read of by columns:

CBQRDSYEUPFVTHWOIXGKZALMN

and then placed into a 5 by 5 square:

C  B  Q  R  D
S  Y  E  U  P
F  V  T  H  W
O IJ  X  G  K
Z  A  L  M  N

Note that I and J are entered into the same cell. This system of generating the square degenerated into simply entering the keyword directly into the 5 by 5 square (this is the method we shall use for demonstration purposes, however you should be aware that ANY method of placing letters into the grid may be used).

Playfair demonstrated this system at the party by using the keyword Palmerston.

P  A  L  M  E
R  S  T  O  N
B  C  D  F  G
H IJ  K  Q  U
V  W  X  Y  Z

To encipher some text, that text must first be split into digraphs - double letters are seperated, here I've used an x - so each bigraph will consist of different letters. If it turns out that the last letter is on its own an x is added to the end of the message.

So the message "Lord Granville's dinner party", when split into digraphs will become lo rd gr an vi lx le sd in ne rp ar ty.

Now the text is ready to encipher. For example, in order to encipher ay we must locate a and y in the square, and find the letter which is in the same row as a and the same column as y.

P  A  L  M  E
.  .  .  O  .
.  .  .  F  .
.  .  .  Q  .
.  .  .  Y  .

Hence the first letter of the enciphered digraph is M, the second letter is found by examining the column containing the first letter and the row containg the second.

.  A  .  .  .
.  S  .  .  .
.  C  .  .  .
. IJ  .  .  .
V  W  X  Y  Z

So the second letter is W. Therefore ay becomes MW. You may like to think of this by imagining the plaintext letters as being one corner of a rectangle, and the ciphertext letters as being the other corners of the rectangle.

What happens if the two letters fall in the same row or column? If they fall in the same row then the letters to the right are taken, and if they fall in the same column then the letters underneath are taken.

Note that the table "wraps", so Y is to the right of X, Z is to the right of Y, and V is to the right of Z. Thus el becomes PM. Note that the order of the letters in the digraph is important and should be preserved.

Using these rules the message "Lord Granville's Dinner Party" is encoded as follows:

lo rd gr an vi lx le sd in ne rp ar ty
MT TB BN ES WH TL MP TC US GN BR PS OX

and becomes "MTTBBNESWHTLMPTCUSGNBRPSOX". (Note the encoding of LX to avoid a double letter). To decode the same rules are used in reverse.

What are the advantantages of such a system?

The prime reason is that one of the main weapons of the cryptanalyst is weakened. You will have noticed for example, that the letter "e" does not always encipher to the same letter - how it enciphers depends upon what it is paired with - much more ciphertext must be obtained in order to make use of digraphic frequency analysis (and there are many more digraphs than single letters). In other words it COULD be broken using the same techniques as a single-letter monoalphabet, but we'd need more text. (Note that this is not the best way to crack playfair!)

Also we now have less elements available for analysis in a 100 letter message enciphered using a single letter substitution we have 100 message elements (from a choice of 26) for analysis - if the message had been enciphered using digraphs then we'd only have 50 message elements (from a choice of 676).

The cipher had many advantages, no cumbersome tables or apparatus was required, it had a keyword which could be easily changed and remembered and it was very simple to operate. These considerations lend the system well to use as a 'field cipher'.

Apparently Wheatstone and Playfair presented this system to the Foreign office for diplomatic use, but it was dismissed as being too complex. Wheatstone countered by claiming that he could teach three schoolboys out of four to use the system in less that fifteen minutes - the under secretary at the FO replied "That is very possible, but you could never teach it to attachés."

The cipher was mentioned at Granville's party with a view to its use in the Crimea. The system was not used in the Crimean war, but there are reports that it served in the Boer war.

Jefferson Wheel Cipher

Thomas Jefferson

The Jefferson Cipher was invented by a certain Mr. Thomas Jefferson, it is simple in operation and yet is still reasonably secure by todays standards (about the same strength as Vigenere - so don't trust your life to it!).

It was not adopted by the US when it was invented through some historical quirk of fate - even though it would undoubtably have withstood any contemporary cryptanalytical attack.

Instead of being placed into use it was filed in Jefferson's papers until 1922 when it was rediscovered. By coincidence, in that year the US army started using an almost identical system which was invented independently.

But what was the Jefferson wheel cipher? Simply imagine a cylinder of wood, about 15cm long and 4cm across, bore out the centre to allow a spindle to be inserted. Then slice the cylinder into slices about 5mm across.

The surface of each slice is divided into 26 sections, and one letter is assigned randomly to each section.

Jefferson Wheel Cipher

The slices are placed onto the spindle, and you are now ready to encode. Of course, the person receiving the message must have a similar cylinder whose wheels are arranged in exactly the same way. In the picture, you can see that the wheels are arranged to spell out the name of the cipher.

When in use the wheels are turned so that a fragment of the message appears along one side of the cylinder, the cylinder is then turned and another line is copied out at random.

This is the ciphertext.

This is repeated for each message fragment until the entire message is coded.

The decoder uses their cylinder to enter the ciphertext, and then turns the cylinder examining each row until the plaintext is seen.

Polybius

Polybius was a greek who invented a system of converting alphabeticcharacters into numeric characters. It was devised to enable messages to be easily signalled using torches.

Here I've presented a polybius square using our current alphabet. Note that i and j share the same position. However thjs wjll not cause much of a problem when decoding as jt wjll usually be obvjous from the context iust whjch was jntended!

# 1 2 3 4 5
1 a b c d e
2 f g h ij k
3 l m n o p
4 q r s t u
5 v w x y z

Each letter may be represented by two numbers by looking up the row the letter is in and the column. For instance h=23 and r=42.

The idea was that a message may be transmitted by holding different combinations of torches in each hand. The chequerboard has other important characteristics, namely the reduction in the number of different characters, the conversion to numbers and the reduction of a symbol into two parts which are separately manipulable. As such chequerboards form the basis for many more ciphers.

Polybius is still a monoalphabetic cipher, one character is always represented by the same two digits - but it does make a good building block for other ciphers, sometimes very strong ciphers, such as adfgvx, which I will discuss later.

Atbash

Atbash is a simple substitution very similar in nature to the Caesar Substitution. Whereas the Caesar substitution was Roman in origin, atbash is Jewish in origin.

In atbash, the last letter represents the first, the second to last represents the second and so on.

Atbash is even simpler to solve than the Caesar Substitution as there is only one solution to try!

You may start to wonder how we could start to solve a monoalphabetic substitution if we do not know whether the cipher alphabet is atbash, caesar, some combination of the two or just random! This will be addressed later when I discuss monoalphabetic substitution. In the meantime, can you decode the following atbash? I've provided a key to save some legwork.

PLAINTEXT   a b c d e f g h i j k l m
CIPHERTEXT  Z Y X W V U T S R Q P O N
PLAINTEXT   n o p q r s t u v w x y z
CIPHERTEXT  M L K J I H G F E D C B A

Blf ziv mlg z Qvwr bvg (Wzigs Ezwvi gl Ofpv Hpbdzopvi)

You shouldn't have had a problem with this, it's fairly easy!

Atbash can also be combined with a caesar shift, to produce a Reversed Caesar substitution.

An example is shown below.

PLAINTEXT   a b c d e f g h i j k l m
CIPHERTEXT  W V U T S R Q P O N M L K
PLAINTEXT   n o p q r s t u v w x y z
CIPHERTEXT  J I H G F E D C B A Z Y X