Monoalphabetic Substitution

A monoalphabetic substitution is one where a letter ofplaintext always produces the same letter of ciphertext.

The simplest examples of monoalphabetic substitutions are probably the Caesar Cipher and Atbash.

These are special cases, of the more general substitution, so you may like to read the description of these first.

In general, an example of a monoalphabetic substitution is shown below.

PLAINTEXT   a b c d e f g h i j k l m
CIPHERTEXT  Q R S K O W E I P L T U Y

PLAINTEXT   n o p q r s t u v w x y z
CIPHERTEXT  A C Z M N V D H F G X J B

You may naïvely think that this cipher is secure, after all there are 26! different cipher alphabets ( 4 x 1026 ) to choose from, however the letter frequencies and underlying patterns will be unchanged - and as such the cipher can be solved by pen and paper techniques. The best way to see how the cryptanalysis is performed is by doing some analysis.

Let us try to solve a monoalphabetic cipher. In solving this I will not use any of the more advanced techniques available.

These more advanced techniques include looking systematically at the position of letters in words in order to identify vowels, pattern words, and looking at the letter frequencies, though common pairings (TH, HE etc.) may come up. For more detail, see books such as cryptanalysis by Helen Fouche Gaines

Note, this cipher was NOT constructed specially for this page, it was generated automatically by a program and I solved it as I wrote the page. (I no longer have a copy of the program)

Here is the ciphertext:

Ftt bdv bfhvq efno ukvj f tnmvbnxv ic bdv 

fkvjfyv Fxvjnlfz fjv qevzb ic bdv yukvjzxvzb 

nz tvqq bdfz f qvluzo.

                                      Pnx Mnviny

Let's count some letters. The most common letter is V, followed by F, N, B, Z. The most common English letters are ETAIN... it is highly likely that we have some matches here - though it is not a certainty. Also 'F' probably stands for I or A as 'F' appears alone more than once.

It should also be noted that where V ends a three letter word, that word is 'BDV'. The most common trigraph (three letter sequence) in English is 'THE' - so let us guess that 'BDV' is 'THE'.

    the t  e         e       et  e    the 
Ftt bdv bfhvq efno ukvj f tnmvbnxv ic bdv 

  e   e   e        e   e t    the    e   e t 
fkvjfyv Fxvjnlfz fjv qevzb ic bdv yukvjzxvzb 

    e   th      e
nz tvqq bdfz f qvluzo.

                                            e
                                      Pnx Mnviny

The sequence 'BDFZ' gives a little hope, as f is likely to be 'A' or 'I' then 'BDFZ' could be 'THIS' or 'THAN' (not 'THAT'!).

What else can we establish? Either N or Z is likely to be a vowel ('NZ') - and so is I or C ('IC').

Looking at the first word, Ftt - F is likely to be A or I, so what could this be?

  • ADD
  • AHH
  • ALL
  • ANN
  • ASS
  • ILL

I think that the only reasonable answer is 'ALL'. This, in turn, implies that 'BDFZ' is 'THAN'. Filling in these assumptions give us the following:

All the ta e   a     e  a l  et  e    the
Ftt bdv bfhvq efno ukvj f tnmvbnxv ic bdv

a e a e A e   an a e   ent    the    e n ent 
fkvjfyv Fxvjnlfz fjv qevzb ic bdv yukvjzxvzb 

 n le   than a  e  n
nz tvqq bdfz f qvluzo.

                                            e
                                      Pnx Mnviny

It's at this stage where, if all is correct, words will appear. I spent a few moments looking, and I thought I recognised a word in 'yukvjzxvzb' - '***e*n*ent'.

To my mind, this looks like 'govErNmENT'. Trying these letters, 'Fxvjnlfz' reveals itself to be 'Amer**an'. This is looking promising, let us assume further that 'nl' represents 'ic'.

All the ta e   ai  over a li etime    the
Ftt bdv bfhvq efno ukvj f tnmvbnxv ic bdv

average American are   ent    the government 
fkvjfyv Fxvjnlfz fjv qevzb ic bdv yukvjzxvzb 

in le   than a secon 
nz tvqq bdfz f qvluzo.

                                       im  ie ig
                                      Pnx Mnviny

We immediately recognise the words 'taxes' and 'lifetime', and can guess therefore that 'E' represents 'P', 'O' represents 'D', and 'IC' represents 'BY'.

We therefore have:

All the taxes paid over a lifetime by the 

average American are spent by the government 

in less than a second

                                      *im *iebig
                                      Pnx Mnviny

Now, all we have are the last two letters to account for. These are a bit of a shot in the dark as we have no way of checking the letters elsewhere in the text, and the name might be something unusual, like Fim Qiebig.

Unaccounted for letters are: F, J, K, Q, U, W, Z.
The name could be Kim or Jim, Fiebig, Wiebig or Ziebig.

I would have to try and look this up as I don't know of any person by any of these names. Fortunately, I do have the facility to cheat on my monoalphabet generator, and this quote was by someone called Jim Fiebig.

Try this one for yourself... please don't email me for a solution - I won't give it to you! A solution may be found by viewing the HTML comments on this page

Yxdy pq  yjc xzpvpyw ya icqdepzc ayjceq xq 

yjcw qcc yjcuqcvrcq.

                                   Xzexjxu Vpsdavs

When cryptanalysing more complex ciphers, such as Vigenère, one of the first steps could be to try and reduce the cipher into a series of monoalphabetic ciphers.

Of course the analysis may be done automatically by a computer program which observes letter positions and frequencies etc.


26!

26! is a shorthand way of writing the number given by

26 x 25 x 24 x 23 x 22 x 21 ..... 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1


4 x 1026

4 x 1026 is an example of scientific notation. It is a shorthand way of wring very large, or very small numbers. e.g. an atomic nucleus is about 10-15 metres across.

4 x 1026, is an approximation to 26! and corresponds to the following number.

400 000 000 000 000 000 000 000 000
I am sure you'll agree that this is a lot of possible combinations!