Caesar cipher
Spoiler: The Caesar cipher is a classic in the field of cryptography. Pretty easy to use but also to crack, it have been reviewed so many times.
The Caesar cipher is the most ancient known cryptographic algorithm. It was made famous by Julius Caesar as he used this cipher in its private correspondences (and through his self-promotion books, which shows that personnal branding is not that new).
But I digress … This algorithm replaces each letter of the plaintext message with the letter that follows it exactly $ n $ places in the alphabet; and if it overflows, we continue from the beginning.
!!!! The reference shift, which is often wrongly called Caesar
cipher, consists of shifting three letters. For example, the cipher
for hello
iserqmrxu
. You can think of it as a
4.7 bits block cipher in ECB mode, which means that it breaks faster
than a trinket.
Mathematically
This algorithm is an automorphism for the cyclic group . You must admit it impresses. By associating each letter (from to ) with a number (from to ), we describe “very simply” the encryption operation of a character after a shift by the following equation:
clear : a b c d e f g h i j k l m n o p q r s t u v w x y z
ciphered : d e f g h i j k l m n o p q r s t u v w x y z a b c
!!! Since we are in a cyclic group, and it is obvious that all the operations will be modulo n, we will systematically omit it in the following to simplify the notations.
The decryption is the reverse operation, in the mathematical sense, it is, knowing the cipher and the shift, to find the clear text.
And this is where the Caesar cipher takes on its modern meaning as the first asymmetric encryption in history… For any encryption key , there is a reverse key , within the meaning of the group $ (Z_ {26}, +)$, which cancels its effects.
The encryption is then considered as a 4.7-bit block cipher (one alphabetic character) in ECB mode (each block is encrypted independently of the others).
Implementation
The implementation of the Caesar cipher does not involve any particular difficulty. The arithmetic on characters avoids us the use of correspondence table and makes the code shorter.
The following implementation, in C
, encrypts a single
character or a complete string.
char code_char(int k, char c)
{
if (c >= 'a' && c <= 'z') {
return 'a' + (c - 'a' + k) % 26 ;
} else if (c >= 'A' && c <= 'Z') {
return 'A' + (c - 'A' + k) % 26 ;
} else{
return c ;
}
}
void code_string(int k, char * message)
{
for (char * ptr = message; *ptr != '\0'; ++ptr) {
*ptr = code_char(k, *ptr) ;
}
}
Example
Security of the algorithm
As you can imagine, this algorithm is not secure (at all). The first traces of attacks date back to the 9th century by Frequency analysis. We will now see why and to what extent.
Asymetric Version
The asymmetric version of the algorithm, using a public key and a private key, is not secure for two reasons: the chosen operation and the size of the group.
- The chosen operation The security of this cryptosystem is based on the difficulty of reversing an addition, i.e. of computing a subtraction (grade 2 / year 3) in a cyclic group (grade 12 / year 13).
- The size of the group. With only 26 elements, it is very fast to compute all the possibilities, even if the operation was more complex.
To make this algorithm more secure, it would then be necessary to increase the size of the group and at the same time use an operation that is more difficult to reverse. You will then naturally get modern algorithms like RSA or El Gamal.
Symetric Version
Given the ease in finding the reverse of the key, we can understand why this algorithm was used symmetrically, the correspondents agreeing a priori on a pair of keys that they keep secret.
The problem this time is due to the size of the key and the ECB operating mode which always encrypts the blocks in the same way.
The size of the keys. Once again, their small number makes it possible to test all the keys and to compare the results with the information on the clear text (i.e. a dictionary).
ECB mode. Which, by encrypting the same combinations in the same way, preserves the structure of the text in its encrypted version and which allows analysis directly on the encrypted text in order to cryptanalyze it.
It is on this point that the Caesar cipher was and remains attacked. It is possible to find the key by comparing the frequency of appearance of letters in the cipher text and in a reference corpus.
Regal point of view
Regarding cryptography, France offers two complementary and all in all consistent points of view. At one side, the law n°2004-575 of june 21th 2004 and its Décret n°2007-663 of may 2nd 2007 that tells us maximum key length allowed, and at the other side, the ANSSI’s security general norms, appendix B1 that tells us minimum key length to be considered secure.
- For the asymetric version, ANSSI recommends using a multiplicative group and 2048 bits length keys. The maximum allowed being 112bits (for the same groups).
- For the symetric version, ANSSI recommends using 128 bits long keys and to avoid ECB mode. The maximum allowed being 56 bits.
The Caesar cipher is thus fully legal to be used (since it respect the maximum key lengths). And for the same reason, ANSSI does not consider this cipher as secure.
Conclusion
Compared to modern cryptographic standards, Caesar cipher, is not only obsolete, but downright picky. The keys are ridiculously weak by standards but the operations are not even worthy of appearing in the texts.
And this is precisely where its interest lies. Its simplicity makes it the ideal candidate for starting out in cryptography, discovering the basics before continuing with more complex algorithms.