A polygraphic substitution cipher replaces groups of letters or characters in a message with different characters. Specific types of polygraphic substitution ciphers include digraph substitutions, trigraph substitutions, etcetera. The following are examples of such ciphers:
The Playfair Cipher is a digraph substitution cipher that utilizes a 5 * 5 grid (the letters 'I' and 'J' are typically merged together) and replaces digraphs in a message according to the following rules:
both letters in the same row --> shift each letter right one space (rightmost space wraps around to the left)
both letters in the same column --> shift each letter down one space (bottommost space wraps around to the top)
letters in neither the same row nor column --> draw a rectangle in the grid with the letters as opposite corners of a rectangle. Shift the letters horizontally to the other two corners of the rectangle.
Repeated letters can be separated by padding ('X')
In some variants of the Playfair cipher, they change the method with which letters are replaced. For example, letters that appear in the same column might be shifted up instead of down, and letters that appear in the same row might be shifted left instead of right. A sample encryption with the rules given above is shown here:
P = BEHAPPY Grid: P L A Y F
Key = PLAYFAIR I/J R B C D
E G H K M
N O Q S T
U V W X Z
P: BE | HA | PX | PY
BE --> rectangle switch --> IH
HA --> same column --> QB
PX --> rectangle switch --> YU
PY --> same row --> LF
C = IHQBYULF
The Hill Cipher is a complex digraph/trigraph substitution cipher that relies on matrix multiplication in a modulo 26 system. The key is either a square 2 * 2 matrix or a square 3 * 3 matrix. When decrypting, it is necessary to multiply the adjugate matrix by the modular multiplicative inverse of its determinant because of the rules of modular arithmetic. For the sake of cleanliness, a sample encryption will not be shown below, but you can read more about the cipher over here. Note that for a Hill Cipher to work properly, the matrix MUST BE invertible mod 26, meaning that the matrix's determinant must be coprime to 26. Otherwise, the determinant will not have a multiplicative inverse and an encrypted message cannot be decoded.
How do I recognize it? Both of these ciphers have very strangely different frequency distributions despite both being in the same general category of ciphers. A 2 * 2 Hill cipher has an index coincidence of around 0.040. The IOC of a 3 * 3 Hill cipher is just slightly lower, closer to around 0.3886. On the other hand, a Playfair cipher has an IOC of around 0.055. This makes it easy to distinguish a Playfair cipher solely on the basis of the IOC since none of the other ciphers have an IOC anywhere close to 0.055. But the frequencies of a Playfair cipher are intriguing, nevertheless:
Note: not all of the bigram frequencies are shown here. You may notice that the frequencies do not add up to 100%
Looking at the single-letter frequencies, one indicator that a Playfair cipher may have been used is the number of distinct letters that appear in the message. But though twenty-five distinct letters is significant in a 1114 character long message, it is not significant enough to come to any concrete conclusions. As for the bigram frequencies, a Playfair cipher should contain no repeating letter pairs at all. For example, there should not be any occurrences of the bigrams "AA" or "BB" -- they should be separated by padding and might instead appear as "AXA" or "BXB". However, in some variants of the Playfair cipher, these letters would not be padded and would simply remain the same in the message as in the ciphertext. Hence, the best basis for distinguishing a Playfair cipher is its index of coincidence. The Hill cipher is a completely different story. Looking at a 2 * 2 Hill cipher's frequencies, they closely match polyalphabetic frequencies:
So how to distinguish a Hill cipher from a polyalphabetic cipher? One method is to assume that the cipher is polyalphabetic and use a probable key lengths calculator to determine the number of alphabets that were used. If the cipher is a Hill cipher, then the "best" probable key length should correspond to an IOC around 0.040. But how then to distinguish a Hill cipher from a transposed polyalphabetic cipher? This is a lot more difficult, but if we look at the bigram frequencies for a transposed polyalphabetic cipher (on the left), and compare them to the bigram frequencies of a Hill cipher (on the right), the Hill cipher's frequencies are slightly less "flat":
How do I break it? These are both extremely difficult ciphers to decrypt. The best way would be to use a brute-force method, though you could also analyze bigram frequencies of an English plaintext and make some conjectures about which digraphs represent which bigrams. It would be much more difficult than simple substitution, though: in simple substitution, you had twenty-six possibilities. Now, you have six-hundred and seventy-six! There exist online brute-forcing programs, and I have provided links under Tools.
Takeaway: Even though the Playfair cipher and the Hill cipher are in the same category of ciphers, they are completely different. This shows that cipher categorizations are not always accurate and that cryptanalytic techniques should not always be generalized to a group of similarly-behaving ciphers because they may have unique properties.