Simple Substitution Cipher

What is it? A simple substitution cipher is a message in which every letter of the English alphabet has been replaced by another letter in the English alphabet according to a specific permutation of the English [A-Z] alphabet. This is also known as a monoalphabetic substitution cipher. Note that a letter MAY map to itself. The following are all examples of simple substitution ciphers:

The Caesar Shift Cipher, in which each letter is shifted by a fixed amount (ex. with a shift of 3 A--> D and B --> E)

The Affine Shift Cipher, which relies on modular arithmetic. Encryption is given by C = (aP + b) mod 26, where a and b are keys chosen by the message-sender, P is a letter in the plaintext, and C is the ciphertext.

The Atbash Cipher, in which each letter is replaced by its opposite: A --> Z, B --> Y, C--> X, etc.

The Keyword Cipher is only slightly more secure. First, the message-sender decides on a key phrase. For example, let's say the key was "APPLE". Then the mapping would be generated as follows:



Based on this mapping, A --> A, B --> P, ... Y --> Y, Z --> Z. As you can see, this mapping is not very secure because a large portion of the letters map to themselves. Generally, the longer the keyword, the more secure the cipher. The most secure type of simple substitution cipher is one in which the keyed alphabet is generated randomly. This deters dictionary attacks in which a cryptanalyst can generate all possible keyed alphabets using words in the dictionary and see if any of them break the message.  

An Aristocrat cipher is any simple substitution cipher with spaces and a Patristocrat cipher is one without spaces. An Aristocrat is much easier to decode by hand than a Patristocrat because word boundaries are known.

Symbolic Substitution Ciphers  

Pigpen is a simple substitution cipher that works with 26 symbols instead of 26 letters. Sometimes, it is easier to manually replace these symbols with letters so that the ciphers can more easily be cracked using computer software.

Morse Code is a substitution cipher that relies on dots and dashes. You can look up a Morse Code table to learn more about it. What is unique about this cipher, though, is that the substitutions are of variable lengths, and characters are needed to indicate where one symbol starts and the next one begins...

How do I recognize it?  . Check out Sample Challenge #1 under Tutorials. Almost always, the IOC is close to or greater than the IOC of English plaintext (0.0667). The percentages of the frequency distribution should also closely match the percentages for a normal English plaintext as shown below.


How do I break it?  Practice. Do a search for "free cryptograms". Once again, check out the tutorial for Sample Challenge #1 to see some cool tricks that you can use to break a substitution cipher by hand. For Patristocrats, one thing that I like to do is look at the trigram frequencies of the letters in the message in a sliding window.  The most common trigrams in English are "THE" and "AND", so the most common trigrams in the message should correspond to these words. You can do this trick for bigrams and quadrigrams too, but MAKE SURE TO CROSS CHECK with single-letter frequencies. If "ABC" is the trigram that corresponds to the word "THE", then I would still expect "C" to be the most frequent letter in the ciphertext since "E" is the most common letter in English plaintext. 

For reference, the most frequent letters in English in order are:



I would recommend memorizing the first 12 of these so that you can easily make solid progress on a cryptogram without having to constantly look up the frequencies.


To find some tools that can do the work for you, check out the links provided in the Tools section. Cryptograms can easily be solved using online software through dictionary attacks and a strategy known as hill-climbing. With hill-climbing, cryptanalysts find a way to score a message based on how closely it matches English plaintext. They use a randomly generated alphabet as a default letter mapping. Then, randomly switch two letters in the alphabet. If the new plaintext looks "closer" to English than the older one (a better score), then the old mapping is discarded and the new mapping becomes the default mapping. 

Additional Notes:  A more secure variant on the simple substitution cipher is the homophonic substitution cipher, which is a substitution cipher in which letters can be represented by multiple "homophones", or symbols. Usually, more frequent letters are given more homophones. This flattens the frequency distribution of letters in the message and makes it harder to crack. These ciphers usually need to be decoded manually because computer programs are only made to solve the monoalphabetic substitution ciphers. I have coded a manual decryption tool that works for both simple and homophonic substitution ciphers in Python. You can check it out in the Tools section under the simple substitution folder.