Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

In this project, we will write some code to crack codes. There is a simple kind of code called a substitution cipher in which every

In this project, we will write some code to crack codes. There is a simple kind of code called a substitution cipher in which every letter of the alphabet is mapped to a dierent letter. You will write some code to encode and decode messages. The details of your task are included in this document, so you will really have to read it. A simple case of this might be described as follows:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z J K L M N W X C D P Q R S T U V A E F O B G H I Z Y

Each letter in the top line (called the plaintext) will get changed into the corresponding letter in the bottom line (called the ciphertext). For example the string "HELLO" is encoded as "CNRRU" using the code above. The string "GDLOUEZ" is decoded as "VICTORY" . It is not too hard to decode such a code if you know the ciphertext for the whole alphabet. We'll call this the codestring. Here is a little code that prints a decoded output.

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" codestring = "BCDEFGHIJKLMNOPQRSTUVWXYZA" ciphertext = "IFMMPXPSME"

for char in ciphertext: print(alphabet[codestring.index(char)], end = "")

# Output HELLOWORLD

If you wanted to produce a string and not just print the output, you might try something like the following.

Simple Substitution Ciphers

Lesson: Appending to a string is slow. Appending to a list is faster.

# Don't do this! # Concatenating strings creates a whole new string each time. plaintext = "" for char in ciphertext: plaintext = plaintext + alphabet[codestring.index(char)]

You could do this instead by using a list and appending to the list. Then, to get a string at the end, you use the join function. Technically, join is a string method, so you call it on the the string you want to separate the individual elements of the list.

plaintextlist = [] for char in ciphertext: plaintextlist.append(alphabet[codestring.index(char)])

plaintext = "".join(plaintextlist)

It is a very common operation in python to produce a list by iterating over a dierent list. As a result, there is a special syntax for it, called list comprehension. Here is the above code again, using list comprehension.

# List Comprehension is the right tool in this case. plaintextlist = [alphabet[codestring.index(char)] for char in ciphertext] plaintext = "".join(plaintextlist)

# Output

def decode(codestring, cyphertext): alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" plaintextlist = [alphabet[codestring.index(char)] for char in cyphertext return "".join(plaintextlist)

code1 = "BCDEFGHIJKLMNOPQRSTUVWXYZA" code2 = "CDEFGHIJKLMNOPQRSTUVWXYZAB"

print(decode(code1, "IFMMPXPSME")) print(decode(code1, "TFDSFUTFDSFU")) print(decode(code2, "FKHHGTGPVEQFG"))

Packaging this into a function

HELLOWORLD SECRETSECRET DIFFERENTCODE

One slightly annoying thing about the code above is that it requires us to nd the index of each character in the cyphertext as it appears in the alphabet string. This isn't so bad, but it does require searching through the whole string. That is, it's about 26 times slower than a normal list access. Imagine if we also had lowercase letters and punctuation. It could be 100 times slower. Again, for tiny problems you can't see the dierence, but as soon as you need to decode millions of messages, the dierence between 5 minutes and 500 minutes, is a lot. A better way to map encoded letters to their corresponding decoded letters is to use a dictionary. (A dictionary is also known as a mapping.) We can create a dictionary from the code string as follows.

codestring = "BCDEFGHIJKLMNOPQRSTUVWXYZA" alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" code = {} inverse = {} for i in range(26): code[alphabet[i]] = codestring[i] inverse[codestring[i]] = alphabet[i]

Now, we can decode a ciphertext as follows.

ciphertext = "IFMMPXPSME" plaintext = "".join([inverse[c] for c in ciphertext])

When you are comfortable with list comprehensions, you will nd this version very satisfying because it does pretty much exactly what it says: make a list of characters where each character is the inverse of the next character in the ciphertext, then join them back up.. Was that too fast? Break it into pieces.

ciphertext = "IFMMPXPSME" # Use list comprehension to convert the letters to a decoded list listofplaintext = [inverse[c] for c in ciphertext]

Storing a code as a dictionary

# Join the list of plaintext letters into a string. plaintext = "".join(listofplaintext)

In case you were wondering if there is dictionary comprehension in the same way that there is list comprehension, there is! So, we could create the code dictionary by rst turning the alphabet and codestring into a list of pairs (tuples) and doing a dictionary comprehension. There is a convenient function called zip that does this for us. So, the following code could create a code dictionary.

code = {b:a for (a,b) in zip(codestring, alphabet)} inverse = {a:b for (a,b) in zip(alphabet, codestring)}

If this is terribly confusing, try making two lists in a python shell and zip them. What is the result? Play around with some small examples. Imagine an alphabet of only 4 letters perhaps to keep things short.

We could start with a mostly empty class called Cipher stored in a le called cipher.py. Here is its contents.

# contents of cipher.py class Cipher: def __init__(self): pass

In the code above, we created a new class. We also added a constructor, but it doesn't do anything. The methods of a class all accept a variable called self as their rst parameter. In this case, self can be used to access attributes or other methods. In our case, we will make an attribute to store the code. We will make another attribute to store the inverse of the code. We will add two methods, encode and decode .

Next, we will write some code to test it. We'll use the unittest package. Here is a sample test le.

Packaging this into a class

Testing our code

#contents of testcipher.py import unittest from cipher import Cipher

class TestCipher(unittest.TestCase): def testcreate(self): pass

if __name__ == '__main__': unittest.main()

The unittest.main() function nds all the classes that inherit from unittest.Testcase and runs all the methods that start with the word test . The line if __name__ == '__main__': checks that the test le itself is being run, not just imported.

1. Write a constructor for the Cipher class that takes a codestring and stores both the code and its inverse in two dictionaries. 2. Write an encode method for the Cipher class that takes a plaintext as input and returns the corresponding ciphertext. 3. Write a decode method for the Cipher class that takes a ciphertext as input and returns the corresponding plaintext.

1. Adapt your code so that it automatically converts plaintext, ciphertext, and codestrings to uppercase. Use the str.upper() method. This method returns an uppercase version of the string so, for example, 'Hello'.upper() == 'HELLO' . 2. The encode and decode methods should leave all spaces and punctuation in place. This will mean that you should check if the letter is in the code and leave it if not. Checking if a key k is in a dictionary D can be done by writing if k in D: .

Your mission

Some other concerns

Part 2

It is possible to break simple substitution ciphers without even using a computer. This is why, for example, such puzzles appear in newspapers. In this homework assignment, you will write a program that automatically decodes a variant of these codes without knowing the codestring in advance.

One approach to dening a codestring for a substitution cipher is to specify a codeword instead. The idea is that a codeword (with no repeating letters is given) and that is used as the start of the codestring. The rest of the codestring is lled in with the remaining letters of the alphabet in reverse order. For example, if the codeword is ZEBRA , then the codestring would be:

ZEBRAYXWVUTSQPONMLKJIHGFDC

If the codeword is DOG , then the codestring would be:

DOGZYXWVUTSRQPNMLKJIHFECBA

You will want to be able to generate such a codestring from a codeword. Here is the code I used to extend the codeword into a codestring.

w = 'DOG' cs = '' for a in 'ZYXWVUTSRQPONMLKJIHGFEDCBA': if a not in w: cs = cs + a

In the kingdom of Vulgaria, instead of choosing a cipher in advance, the spies send coded messages that all start with a string of 30 characters that we'll call the preamble. The codeword for the cipher is a substring of those rst 30 characters. They have some secret

Breaking Substitution Ciphers

Codeword Substitution

Hiding Codewords in Plain Sight

way of nding the codeword in there, so they can decode messages. Their idea is that they want anyone deciphering the messages to break a new code for each message. Maybe this would work until word gets out that the codeword is hiding in plain sight. Now code breakers can try all the dierent substrings of those rst 30 characters and see which one produces a message that looks like real words. Incidentally, this is related to how the early versions of the German Enigma machine was cracked.

You can get a program to do this for you. You will be given a list of frequently used words. Use this list to check if you have the correct codeword. Then try all the dierent choices and see which one produces something the looks like a real message. If we have a guess at what the codestring is, we can try to decode the message. Then, we can look at it to see if it resembles a real message, for example by checking if the words are real words. If there is a small number of possible codestrings, we can choose between them by checking which produces the largest There is a le called words.py included in the skeleton. It contains a string with a big collection of words. Technically, I got this list from processing the text of Bram Stoker's Dracula and pulling out the unique words that appeared more than a certain number of times. You can import the variable wordlist from words.py using the following line of python at the top of your ciphercracker.py le.

from words import wordlist

You will then be able to use the wordlist variable in your code. This approach is not considered great style, but it is easy and appropriate for this case.

1. Add support to the Cipher class for codeword substitution ciphers. 2. Implement a new class called CipherCracker that will be saved in a le called ciphercracker.py . A CipherCracker object should store a ciphertext that you are

Using data

To Do:

trying to decode. It should separate out the preamble from the beginning of the ciphertext. It should implement methods called decode , quality , mostlikelydecode , and mostlikelycodeword . The description of the desired input and output of each method is given in the skeleton

____________ the ciphercracker class is as below___________

from words import wordlist from cipher import Cipher

class CipherCracker: def __init__(self, ciphertext): """ The constructor takes a ciphertext that is assumed to have a preamble of 30 characters that contains the codeword.

The preamble is removed from the ciphertext and stored separately. When we decode the message, we will omit the preamble. """ self.preamble = ciphertext[:30] self.ciphertext = ciphertext[30:]

def decode(self, i, j): """ Attempts to decode the message using self.preamble[i:j] as the codeword. Returns the resulting string. """ pass

def quality(self, i, j): """ Decodes the message using self.preamble[i:j] as the codeword and returns a number that gives an indication of how many of the words in the decoded string are real words. It does this by checking if the words are in the `wordlist` variable imported from `words.py`.

There are several ways this could be done. The most important thing is that for a real message, the correct values of i and j should give a higher output than other values. """ pass

def mostlikelycodeword(self): """ Return the codeword that has the highest quality score among all substrings of the preamble. """ pass

def mostlikelydecode(self): """ Return the decoded message that you get by decoding with the most likely codeword. """ pass