Monoalphabetic Substitution

A monoalphabetic substitution is one where a letter of
plaintext always produces the same letter of ciphertext.

The simplest examples of monoalphabetic substitutions are
probably the
Caesar Cipher and Atbash.

These are special cases, of the more general substitution,
so you may like to read the description of these first.

In general, an example of a monoalphabetic
substitution is shown below.

PLAINTEXT   a b c d e f g h i j k l m
CIPHERTEXT  Q R S K O W E I P L T U Y

PLAINTEXT   n o p q r s t u v w x y z
CIPHERTEXT  A C Z M N V D H F G X J B

You may naïvely think that this cipher is secure,
after all there are 26! different
cipher alphabets ( 4 x
10
26 ) to choose
from, however the letter frequencies and underlying
patterns will be unchanged – and as such the cipher
can be solved by pen and paper techniques. The best
way to see how the cryptanalysis is performed is by
doing some analysis.

Let us try to solve a monoalphabetic cipher. In solving this I will
not use any of the more advanced techniques available.

These more advanced techniques include looking systematically
at the position of letters in words in order to identify vowels,
pattern words, and looking
at the letter frequencies, though common pairings (TH, HE etc.)
may come up. For more detail, see books such as
cryptanalysis by Helen Fouche Gaines

Note, this
cipher was NOT constructed specially for this page, it was
generated automatically by a program and I solved it as I
wrote the page. (I no longer have a copy of the program)

Here is the ciphertext:

Ftt bdv bfhvq efno ukvj f tnmvbnxv ic bdv 

fkvjfyv Fxvjnlfz fjv qevzb ic bdv yukvjzxvzb 

nz tvqq bdfz f qvluzo.

                                      Pnx Mnviny

Let’s count some letters. The most common letter is V, followed
by F, N, B, Z. The most common English letters are ETAIN… it
is highly likely that we have some matches here – though it is
not a certainty. Also ‘F’ probably stands for I or A as ‘F’
appears alone more than once.

It should also be noted that where V ends a three letter word,
that word is ‘BDV’. The most common trigraph (three letter
sequence) in English is ‘THE’ – so let us guess that ‘BDV’
is ‘THE’.

    the t  e         e       et  e    the
Ftt bdv bfhvq efno ukvj f tnmvbnxv ic bdv 

  e   e   e        e   e t    the    e   e t
fkvjfyv Fxvjnlfz fjv qevzb ic bdv yukvjzxvzb 

    e   th      e
nz tvqq bdfz f qvluzo.

                                            e
                                      Pnx Mnviny

The sequence ‘BDFZ’ gives a little hope, as f is likely to
be ‘A’ or ‘I’ then ‘BDFZ’ could be ‘THIS’ or ‘THAN’ (not ‘THAT’!).

What else can we establish? Either N or Z is likely to be
a vowel (‘NZ’) – and so is I or C (‘IC’).

Looking at the first word, Ftt – F is likely to be A or I,
so what could this be?

  • ADD
  • AHH
  • ALL
  • ANN
  • ASS
  • ILL

I think that the only reasonable answer is ‘ALL’. This, in turn,
implies
that ‘BDFZ’ is ‘THAN’. Filling in these assumptions give us the following:

All the ta e   a     e  a l  et  e    the
Ftt bdv bfhvq efno ukvj f tnmvbnxv ic bdv

a e a e A e   an a e   ent    the    e n ent
fkvjfyv Fxvjnlfz fjv qevzb ic bdv yukvjzxvzb 

 n le   than a  e  n
nz tvqq bdfz f qvluzo.

                                            e
                                      Pnx Mnviny

It’s at this stage where, if all is correct, words will appear. I
spent a few moments looking, and I thought I recognised a
word in ‘yukvjzxvzb’ – ‘***e*n*ent’.

To my mind, this looks like ‘govErNmENT’. Trying these letters,
‘Fxvjnlfz’ reveals itself to be ‘Amer**an’. This is looking
promising, let us assume further that ‘nl’ represents ‘ic’.

All the ta e   ai  over a li etime    the
Ftt bdv bfhvq efno ukvj f tnmvbnxv ic bdv

average American are   ent    the government
fkvjfyv Fxvjnlfz fjv qevzb ic bdv yukvjzxvzb 

in le   than a secon
nz tvqq bdfz f qvluzo.

                                       im  ie ig
                                      Pnx Mnviny

We immediately recognise the words ‘taxes’ and ‘lifetime’, and
can guess therefore that ‘E’ represents ‘P’, ‘O’ represents ‘D’,
and ‘IC’ represents ‘BY’.

We therefore have:

All the taxes paid over a lifetime by the 

average American are spent by the government 

in less than a second

                                      *im *iebig
                                      Pnx Mnviny

Now, all we have are the last two letters to account for.
These are a bit of a shot in the dark as we have no way of
checking the letters elsewhere in the text, and the name
might be something unusual, like Fim Qiebig.

Unaccounted for letters are: F, J, K, Q, U, W, Z.
The name could be Kim or Jim, Fiebig, Wiebig or Ziebig.

I would have to try and look this up as I don’t know of any
person by any of these names. Fortunately, I do have the
facility to cheat on my monoalphabet generator, and this
quote was by someone called Jim Fiebig.

Try this one for yourself… please don’t email me for a
solution – I won’t give it to you! A solution may be found by viewing the HTML comments on
this page

Yxdy pq  yjc xzpvpyw ya icqdepzc ayjceq xq 

yjcw qcc yjcuqcvrcq.

                                   Xzexjxu Vpsdavs

When cryptanalysing more complex ciphers, such
as Vigenère, one of the first steps could be
to try and reduce the cipher into a series of monoalphabetic ciphers.

Of course the analysis may be done automatically by a computer
program which observes letter positions and frequencies etc.


26!

26! is a shorthand way of writing the number given by

26 x 25 x 24 x 23 x 22 x 21 ….. 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1


4 x 1026

4 x 1026 is an example of scientific notation. It is a shorthand way of wring very large, or very small numbers. e.g. an atomic nucleus is about 10-15 metres across.

4 x 1026, is an approximation to 26! and corresponds to the following number.

400 000 000 000 000 000 000 000 000
I am sure you’ll agree that this is a lot of possible combinations!

Series Information

"Monoalphabetic Substitution" is 6th in a larger sequence of 47 posts

  • Add to Delicious
  • Digg This Post
  • Stumble This Post
  • RSS Feed
This entry was posted in Classical Cryptography and tagged , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

One Comment

  1. Babatunde
    Posted May 15, 2008 at 12:47 am | Permalink

    The solution to the code puzzle is:

    Tact is the ability to describe others as they see themselves

    Abraham Lincoln.

Post a Comment

Your email is never published nor shared.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Subscribe without commenting

  • Categories

  • Archives

  • Recent Comments