Polyalphabetic Ciphers

What is it?  A polyalphabetic substitution cipher is any substitution-based cipher that uses multiple substitution alphabets. The following best describes a general algorithm that applies to any polyalphabetic substitution cipher:

  1. Choose N = # of alphabets to use. 

  2. Determine an algorithm to generate each alphabet.

  3. Break the plaintext into N-grams. In each N-gram, encrypt the first letter using the first alphabet, the second letter using the second alphabet... the Nth letter using the Nth alphabet.

 

The only difference between different polyalphabetic substitution ciphers is the algorithm that is used to generate each alphabet. Let's look at some examples to understand this better:

The Vigenere cipher is the most widely known polyalphabetic cipher. It is also the easiest to decrypt. Each alphabet is generated by applying a Caesar Shift to the [A-Z] alphabet. The length of a key used for a Vigenere cipher is equal to the number of alphabets that were used. The key is given in a shorthand alphabetical form so that each alphabet does not have to be explicitly written out:

                                        

P = ABCDEF                                English alphabet:  ABCDEFGHIJKLMNOPQRSTUVWXYZ

Key = KEY   --------------->                   Alphabet #1:   KLMNOPQRSTUVWXYZABCDEFGHIJ

                                                             Alphabet #2:  EFGHIJKLMNOPQRSTUVWXYZABCD

                                                             Alphabet #3:  YZABCDEFGHIJKLMNOPQRSTUVWX

N = 3 because we are using 3 alphabets

Now we split the message into 3-grams and encrypt using the appropriate alphabet:

ABC | DEF

A --> K from alphabet #1, B --> F from alphabet #2, C --> A from alphabet #3, D --> N from alphabet #1, E --> I from alphabet #2, and F --> D from alphabet #3. So our ciphertext is:

C = KFANID

This is clearly much more secure than a simple substitution cipher, and yet we only applied different Caesar Shifts to different parts of the message. But this cipher can be EVEN MORE secure. We just need to choose a stronger method of alphabet generation. The ideal method would be to randomly generate each of the alphabets. But there are more complicated algorithm-based alphabet generation methods out there known as the Quagmire Ciphers. Quagmire III is known as a Keyed Vigenere Cipher and it is very difficult to decrypt. There are currently NO ONLINE TOOLS available to decrypt the Quagmires, but if you need to manually decode one, you can either do it by hand or write a manual decryption Python program (I have yet to write this). For more information on the Quagmires, check out this link.

The Progressive Key Cipher is essentially a Vigenere cipher with 26 alphabets. The alphabets are generated as shown:

English alphabet:  ABCDEFGHIJKLMNOPQRSTUVWXYZ

Alphabet #1:          BCDEFGHIJKLMNOPQRSTUVWXYZA

Alphabet #2:          CDEFGHIJKLMNOPQRSTUVWXYZAB

Alphabet #3:          DEFGHIJKLMNOPQRSTUVWXYZABC

...........................................................................................

Alphabet #25:        ZABCDEFGHIJKLMNOPQRSTUVWXY

Alphabet #26:        ABCDEFGHIJKLMNOPQRSTUVWXYZ   

 

Another way to think about it is that the first letter is shifted by 1, the second letter is shifted by 2... the nth letter is shifted by n...all the way until the end of the message.  A variant on this cipher is starting from a shift other than 1. For example, the first letter might be shifted by 13, the second by 14, ... the nth letter shifted by n + 12... till the end of the message. Yet another variant is that the change in shift will occur after every m characters. So if m = 2 then the shift will only increase every two characters. Combining these variants can greatly increase this cipher's security.

How do I recognize it?  See Sample Challenge #4 under Tutorials. The index of coincidence of a polyalphabetic cipher is much lower than that of English plaintext, typically around 0.045. For a progressive key cipher, it is even lower, around 0.040. The more alphabets that are used, the lower the index of coincidence will be. The frequencies are flattened. Shown below are sample frequency distributions for the same message encrypted using different polyalphabetic schemes. The one on the left was encrypted with a Vigenere cipher that used 6 alphabets and the one on the right was encrypted using a standard progressive key cipher (26 alphabets):

How do I break it?  First, you need to determine the number of alphabets (N) that were used. For that, you need a probable key lengths calculator (see Tools), or you can watch the Tutorial for Sample Challenge #4 to see how this can be done manually. Then, you need to break up the ciphertext into groups such that each group contains letters that were encrypted using the same alphabet. Conduct a frequency analysis on each group separately and use the letter frequencies as hints as you would when solving a substitution cipher. If this is a Vigenere cipher, you could guess that the most frequent letter in each group represents the letter 'E' and use that as a basis for determining the shift that was applied to the alphabet. For the Quagmires, though... good luck! You can use cribbing to solve a Quagmire with spaces, but a Quagmire without spaces is very difficult to decrypt when a sizeable number of substitution alphabets is used.

Takeaway: Polyalphabetic ciphers are more secure than simple substitution ciphers because they use multiple alphabets. As such, they require more rigorous cryptanalysis techniques and more manual decryption skills. Even today, modern computing cannot easily be used to solve many of these ciphers and their variants.

© 2020 by The Gunn Cryptology Club

Website created and managed by Arush Chhatrapati and Aaryan Agrawal