CAB340 Assessment 1 Historical cryptanalysis,

Probability, Information theory

This is an individual assignment. It is acceptable to discuss the general approach with your fellow students and

the unit staff, but your answers should be your own.

Submission

There are 30 marks available in total for this assignment with part marks as shown, plus 5 marks of extra credit.

Your answers should be written in a report and submitted by the due date of Friday 11 September, 2019 at

11:59pm. Submission should be made online via Blackboard under Assessment > Assignment 1 submission.

Your submission must be in PDF format. If you are submitting a scan, make sure that it is easy to read.

1 Cryptanalysis of historical ciphers

This part of the assignment involves practical experiments with historical ciphers. It is allowed and expected

that you will make use of cryptanalytic software, such as the CrypTool1 and CrypTool2 packages, to perform

the experiments.

Although no other tools should be necessary, you are free to use any other software or even to write your

own little programs or shell scripts to solve the tasks at hand. Note that different versions of CrypTool have

slightly different capabilities; and in particular the java-based JCrypTool and browser-based CrypTool-Online

are unfortunately more limited than the Windows-only CrpyTool1 and CrypTool2.

For each experiment, describe clearly what outputs you found and explain why you think these results occurred

using the theory that we have studied in the lectures. Use tables and figures as appropriate.

Obtaining data

You need to obtain your individual folder of 4 ciphertext files. To do this, go to Blackboard and navigate to

Assessment > Assignment 1 and download the ZIP archive Assignment1-ciphertexts.zip (the archive is located

right next to the present PDF). Unzip the archive. It extracts to a folder containing 200 sub-folders of the form

. . .-Student, numbered from 000-Student to 199-Student.

Next, identify the last two digits of your QUT Student ID. If abcdefxy is your 8-digit QUT ID, then xy are

the last two digits. Your individual sub-folder is at your option either the one named 0xy-Student or the

one named 1xy-Student. Be sure to select either one of those two folders only. (You may delete all the other

sub-folders, which are useless to you.) Within your folder, you will find four files, 0.txt, 1.txt, 2.txt,

3.txt. These are the files that you will use for the questions in this task.

Be sure to write your full Student ID and the full name of the actual folder you used on your

report. No mix-and-match: you must use one folder only for the entire task, and that folder must be named

wxy-Student where xy equal the last two digits of your QUT ID and you get to choose w ∈ {0, 1}.

Cryptanalysis tasks

This problem solving tasks concern cryptanalysis of historical ciphers. You are given four ciphertext files. The

plaintexts were all written in English with a similar style of text. Each text is different and each is encrypted

with a different cipher. The four ciphers used, in random order in your set, are:

? random simple substitution cipher

? Vigen`ere cipher

? transposition cipher

? 2 × 2 Hill cipher

All plaintexts are written in English and use upper and lower case letters.

a. (3 marks) For each ciphertext, using either a table or a histogram, present the following characteristics:

? the 10 most common single-character frequencies;

? the 10 most common digram frequencies;

? the autocorrelation (as a function of the shift up to 10).

b. (3 marks) Explain briefly but clearly how each of the three characteristics measured in the preceding

part is expected to look for the ciphertext from each of the four cipher algorithms used.

Use the measured values to argue which ciphertext comes from which of the four ciphers.

c. (4 marks) Explain, with direct reference to the statistics that you found, the procedure you can use to

cryptanalyse each of the 4 ciphers. You are not expected to perform the cryptanalysis in this part — you

need to explain how it can be done in principle for each type of cipher and how the measured statistics

can help.

d. (4 marks) Now do the cryptanalysis for TWO ciphertexts of your choice, amongst the ones which you’ll

have identified as: (1) random simple substitution, (2) Vigen`ere, and (3) transposition — i.e., leave out

the significantly harder (4) Hill cipher as extra credit for the next question.

You can use the automatic tools in CrypTool or other where applicable, or write your own. If you are

using a tool, reference it. If you’re working by hand, you may need to calculate (for your own use) more

than the 10 most common frequencies that you wrote down in Task 1.

No need to decrypt the entire ciphertexts, especially if working by hand, but provide enough decrypted

text to convince us that your cryptanalyses succeeded.

e. Optional: extra credit (5 marks)

If you’re in for a little extra challenge, attempt to cryptanalyse the 2 × 2 Hill cipher as follows. Recall

that the attack on the Hill cipher we’ve seen in class is, at its core, a known-plaintext attack. Since we

don’t know the plaintext, we have to guess it, to make this into a ciphertext-only attack.

i. Determine the non-overlapping digram frequencies. Note that using the n-gram tool in CrypTool

directly will taint the statistics because it counts overlapping bigrams, which can start at every

position. (In the 2-by-2 Hill cipher, the relevant blocks start at every other position.) The nonoverlapping

digram frequencies can be obtained in CrypTool by:

Page 2

? defining the alphabet to include uppercase and lowercase letters with no space;

? using the format text document tool to remove spaces;

? using the format text document tool to split into blocks of size 2;

? using the n-gram analysis tool to count the bigrams.

(On MacOS and Linux, if you have a little bit of experience with basic Unix tools such as sed, grep

and sort, you can probably do it faster with a small shell script or directly on the command line.)

ii. Next, try to find the bigram which maps to ‘th’. This should be one of the most common of your

non-overlapping bigrams. Because spaces in the plaintext are preserved in the ciphertext you can

use the word structure of the original ciphertext to rule out most options. For example, ‘th’ will not

occur on its own as a word and you are likely to see it at the start of some 3-letter words.

iii. Next, try to find at least two more likely candidates for bigram plaintext/ciphertext pairs. Possible

plaintext bigrams you might search for are:

? ‘ti’ - this bigram cannot usually end a word or be a word on its own.

? ‘on’ - this can be a word on its own or part of another word.

? ‘he’ - this is likely to occur at the end of common 3-letter words and could be a word on its own.

iv. Once you have identified likely candidates for two bigrams, perform the known-plaintext attack that

we covered in the lecture, under the assumption that your guesses are correct.

You can get the full “extra credit” marks for this question without finding the plaintext as long as you

complete the method described above and show that at least two different reasonable candidate pairs do

not result in a sensible plaintext.

In any case, this “extra credit” question is time-consuming and entirely optional, so I recommend you

leave it until the end.

Practical hints

? The Hill cipher used here assumes that the character ‘A’ maps to the number 0. This is not the default

in CrypTool1. You can change this by clicking the “Further Hill options” button.

? The transposition cipher tool in both versions of CrypTool allow you to set various things to rows or

columns. The encryption was done reading in by rows, permuting by columns, and reading out by rows.

This is equivalent to the method used in class.

? CrypTool can be installed without special permissions, so you should be able to install it on lab computers

if needed.

? The java-based multi-platform JCrypTool and the platform-independent CrypTool-Online do not provide

all the tools or functionalities you’d want to use to complete all the tasks. If you use those versions,

be prepared to spend more time working out certain transformations by hand, or writing some basic

processing functions in your favourite programming or scripting language.

2 Probability

Imagine that you are playing the following game, involving three closed doors and a prize behind one of them.

You may have heard of this game before, perhaps presented to you in the form of a paradox, as some aspects

Page 3

of its strategies can be counter-intuitive. In this question, we’ll use probability theory to get to the bottom of

it and shine some bright light on any paradox.

In front of you are three closed doors, numbered 1, 2, 3. Behind one of the doors, call it D ∈ {1, 2, 3}, is a shiny

diamond (as you know: a highly valued form of pure carbon). Behind the two other doors are small lumps of

coal (also pure carbon, but considered much less valuable). You get to select one door, let’s call it S ∈ {1, 2, 3}.

You get to keep what is behind the door you open.

Clearly, the outcome you seek is S = D. You have control over S, but have no idea whatsoever how D was

selected. D could be uniformly random; D could be the constant 2; or D could be equivalent to the room

temperature modulo 3, you just don’t know. The only thing that you know is that D does not change once the

game has started, and in particular D is not affected by your choice of S.

a. (3 marks) Is there a strategy you can employ to choose S, so that, no matter how D was chosen, you

have a well defined prior probability of getting the diamond? (Hint: always picking the same door is not

the answer, and what you’re thinking next probably is.)

Justify your strategy as follows.

First write down the 3 × 3 joint probability table for Pr(D, S). Be sure to use placeholder mathematical

symbol(s) for any numerical probability(ies) which you don’t know — e.g., don’t make unstated assumptions

such as taking the prior distribution of D to be

); write it as (p1, p2, p3) with pi ≥ 0 and

p1 + p2 + p3 = 1 but otherwise unknown.

Then, using the table, determine as precisely as you can Pr(D = S), i.e., the marginal or prior probability

of the event of interest “D = S”.

Does your strategy induce an exact numerical probability for this event, even if there were numerical

unknowns in the joint table?

After you announce your choice of S, but before you can open that door, you are shown one of the other two

doors: call it B where B 6= S. You are told, truthfully, that door B is bad (B 6= D, i.e., it has coal behind it).

You are then given an opportunity to switch your selection from S to to the third door: call it T, where T 6= R

and T 6= B. Should you accept the offer to switch? In other words, should you open S or T? Let’s find out.

b. (3 marks) Assume for now that this “unexpected twist” was actually totally expected: you knew in

advance, maybe from the rules of the game, that — regardless of D and for any choice of S you make —

you would be shown a bad door B (B 6= S and B 6= D) and given the option to switch.

Under this rule of the game, write down the joint distribution Pr(D, S, B). Since this is a 3-dimensional

table, for clarity make a 2-dimensional table with axes D and S as you did above, and in each cell consider

all the values of B. Use as many additional undetermined placeholder symbols as you need to denote new

unknown probabilities. (Hint: you don’t know how D and B are chosen, but there will be constraints,

e.g., on what combinations of D, S, B can have non-zero probability, etc.)

Then, using the table, condition on the event that B = b for some specific value of b ∈ {1, 2, 3}: how do

the posterior probabilities of the respective events (D = S|B = b) and (D = T|B = b) compare?

Conclude whether or not you should be accepting the offer to switch.

(Hint: there are several ways to attack this problem; all will lead to the same conclusion, but some

more easily than others. Here I suggest that you investigate the effects of conditioning on B = b only.

“Without loss of generality”, you can even condition on B = b for b = 1. Now, you might think you should

be conditioning not just on B = b but on (S = s∧B = b), since by the time you’re told that B = b you have

Page 4

already announced S = s, but if you do that you’ll need to do some contortions to reach the conclusion

you want. Intuitively, you want to condition on B to see how learning B affects the probabilities that

D = S and D = T. However you don’t need or want to condition on S, as doing do might just unravel

the very benefit of the strategy for S which you studied in part (a) of this problem. One of the great

powers of rigorous probability theory is that you can make sound inferences while conditioning or not on

anything you want, but it takes a bit of flair (or trials) to figure out what inferences you want to make...)

c. (2 marks) Assume, contrarily to part (b), that the game host is adversarial and that any information

they tell you, while true, is trying to “manipulate” you in the worst possible way. Specifically, now the

host has discretion to tell you about B and offer you to switch — or not — based, e.g., on whether you

picked the good door or not.

With those slightly different rules, should you switch — as in: “definitely yes”, “neither here nor there”,

or “absolutely not”?

Briefly justify your answer in any way you see fit.

3 Information theory

AES128 is a symmetric cipher which uses a 128-bit key, which Kerchhoff’s principle dictates should be keyed

as uniformly at random as possible.

Due to a design flaw bug, the hardware random number generator we use turns out to have a (not so subtle)

bias. Specifically, when we use it to generate an AES128 key K, with probability 1

4

the generator gets stuck

and output a key made of 128 zero bits, 000 . . . 02. When the bug is not triggered, the generator functions

normally and outputs a uniformly distributed 128-bit random key (including possibly the all-zero key as one of

its random outputs).

a. (2 marks) Calculate the entropy of (the distribution of) the key K as generated by this generator.

b. (2 marks) Now, let F ∈ {0, 1} be a binary random variable such that F = 1 when the generator’s flaw

is triggered, and F = 0 when the generator is working properly.

Calculate the conditional entropy H(F|K) of F given K, and in one short sentence interpret the result.

c. (2 marks) Is this key generator secure (in the natural sense that: will an encryption system built that

way be “almost always” hard to break)? Briefly argue why or why not.

d. (1 mark) From the above, would you say that generating keys of sufficiently high entropy is: necessary;

sufficient; both; or neither ; for the cryptographic security of a system based on them?

e. (1 mark) Can you think of a possibly better metric than entropy to capture the security of a non-uniform

key distribution, but ideally with the same additive properties as entropy when combining independent

variables? (Hint: recall that the entropy is defined as a weighted average of something, but perhaps

averages are not the most suitable for this purpose?)

Page 5

版权所有：留学生编程辅导网 2018 All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。