BYU logo Computer Science

To start this assignment, download this zip file.

The following guide pages cover material needed for this assignment:

Project 5 - Cipher


Julius Caesar was known (among many things) for encoding important military instructions using what became known as the Caesar Cipher.

The Caesar Cipher was a simple mapping from one letter of the alphabet to another: A -> D, B -> E, C -> F, etc.

A cipher that performs a simple character substitution is called a simple substitution cipher. Many other simple substitution ciphers have since been invented, each one using a different strategy for coming up with the letter mappings.

Text that has not been encoded is called plaintext. Text that has been encoded is called ciphertext.

These ciphers are easy to generate and easy to use. They are also fairly easy to break.

One strategy for breaking a simple substitution cipher is to do a character frequency analysis.

In regular English text, each letter is used at a fairly consistent frequency. By analyzing how frequent a letter is in plaintext and ciphertext, you can create reasonable predictions for which characters map to which.


In this project, you will implement a simple substitution cipher using a randomized alphabet. You will use that cipher to encode and decode some messages. Finally, you will perform a character frequency analysis on encoded and decoded text as a first step towards breaking the cipher.


Write a program called that generates a randomized simple substitution cipher which it saves to CSV file.

The program should have the following command-line arguments:

  • The name of the CSV file in which to save the cipher

The output file should have the following format:

  • each line is a mapping from one letter to another, separated by a comma (no spaces)

For example:



  • Use random.sample(alphabet, len(alphabet)) to generate a randomized version of the alphabet
  • You can import string and then use alphabet = string.ascii_lowercase
    • Or you can simply hardcode the alphabet: alphabet = 'abcdefg...'

You can test your file using:

$ python my_cipher.csv


Write a program named that takes

  • A CSV file containing a simple substitution cipher (in the format described above)
  • An input file containing plaintext to encode
  • An output file to write with the ciphertext

This program then encodes the input file and writes the output file using the provided cipher.

Characters that are not in the cipher should be preserved.

Preserve casing of the input characters; for example, i -> x and I -> X.

You can test your file using:

$ python test_files/cipher.csv test_files/message.txt my_encoded.txt


Write a program named that takes

  • A CSV file containing a simple substitution cipher (in the format described above)
  • An input file containing ciphertext to decode
  • An output file to write with the decoded plaintext

This program then decodes the input file and writes the output file using the provided cipher.

As with, preserve casing and characters not found in the cipher.

NOTE The cipher CSV is the same one as used by To decode using this cipher, the cipher will need to be inverted.

You can test your file using:

$ python test_files/cipher.csv test_files/message.encoded.expected.txt my_decoded.txt


Write a program named that performs a character-frequency analysis on an input file.

Every distinct character in the file should be counted, and the final counts for each character should be divided by the total number of characters in the file. Also convert any uppercase characters to lowercase.

The frequencies should be rounded to 3 decimal places.

The program should print a dictionary mapping each character to its associated frequency.


For a file containing


The frequencies should be

{'a': 0.111, 'b': 0.333, 'c': 0.333, '\n': 0.222}

Another file might have


The frequencies should be

{'a': 0.25, 'b': 0.5, '\n': 0.25}

You can test your file by creating your own file with a few characters:

$ python my_short_file.txt


The test_files folder contains several files you can use to test your code. These files are used by the autograder.

You are also encouraged to use your own files to test these programs.

  • Generate a cipher CSV
  • Encode and decode custom messages
  • Perform your own frequency analysis

Manual Grading

Refer to the Quality Code guide page for detailed explanations and examples on each of these rubric criteria.

Intent The intent of this assignment is that you create and use dictionaries to map keys to values.

Note: creating a dictionary is not necessary for build_cipher. You should not use a csv writer for this assignment. They are useful, but so is understanding how to create a list of strings with a particular format.

You should not use a csv reader for this assignment. They are useful, but so is understanding how to strip and split a line yourself.

Remember not to use system arguments outside of if __name__ == "__main__":.




Just for fun

More ciphers

There are many other ways to generate a simple substitution cipher. Which ones can you implement?

For example, the ROT ciphers (e.g. ROT13) use a rotation to generate the cipher, and you only need the rotation factor to decode the message.

Crack the code

Can you use the character frequency analysis to derive the cipher from a ciphertext?

Try building a custom cipher CSV by hand based on the frequencies you see. How well does the custom cipher decode the message?

Here is a ciphertext you can practice on:

Bmansopvwopimah! Gmv uorz tiaihuzd puz bicuzs csmfzbp, puz wohp csmfzbp mt BH110. 

Lz umcz gmv uorz zafmgzd puih bwohh. 
Oad xmsz ixcmspoapwg, lz umcz gmv uorz nsmla tsmx puih zkczsizabz.

Lz nirz gmv hmxz cospian lmsdh mt mvs csmcuzp pm bmahidzs:

  I cwzod lipu gmv pm poez buosnz mt gmvs pzhpixmag mt Fzhvh Busihp. 
  Lmse tms ip. Mla ip. Bosz tms ip. 
  Avspvsz ip hm puop ip liww nsml. 
  Puza lopbu tms xisobwzh pm uoccza ia gmvs witz.
  --Svhhzww X. Azwhma
Oh gmv bmxz vapm Busihp liww oww gmvs uzosp wmrz Uix ia toipu,
Uz liww ywzhh gmv oad czstmsx xisobwzh ia gmvs witz.

Xog gmv uorz Uih ywzhhianh lipu gmv ia gmvs tvpvsz zadzormsh.

Hiabzszwg, BH 110 iahpsvbpmsh oad POh