Question

1 Approved Answer

Posted on Feb 16, 2024

For continuous random variables X and Y , taking on continuous values x and y respectively with probability densities p(x) and p(y) and with joint

For continuous random variables X and Y , taking on continuous values x and y respectively with probability densities p(x) and p(y) and with joint probability distribution p(x, y) and conditional probability distribution p(x|y), define: (i) the differential entropy h(X) of random variable X; (ii) the joint entropy h(X, Y ) of the random variables X and Y ; [1 mark] (iii) the conditional entropy h(X|Y ) of X, given Y ; (iv) the mutual information i(X; Y ) between the continuous random variables X and Y ; (v) how the channel capacity of a continuous channel which takes X as its input and emits Y as its output would be determined. (b) For a time-varying continuous signal g(t) which has Fourier transform G(k), state the modulation theorem and explain its role in AM radio broadcasting. How does modulation enable many independent signals to be encoded into a common medium for transmission, and then separated out again via tuners upon reception? (c) Briefly define (i) The Differentiation Theorem of Fourier analysis: if a function g(x) has Fourier transform G(k), then what is the Fourier transform of the n th derivative of g(x), denoted g (n) (x)? (ii) If discrete symbols from an alphabet S having entropy H(S) are encoded into blocks of length n, we derive a new alphabet of symbol blocks S n. If the occurrence of symbols is independent, then what is the entropy H(S n) of the new alphabet of symbol blocks? [2 marks] (iii) If symbols from an alphabet of entropy H are encoded with a code rate of R bits per symbol, what is the efficiency of this coding? [2 marks] (d) Briefly explain (i) how 10 V is expressed in dBV; [1 mark] (ii) the YCrCb coordinate system.

(a) Although the following code compiles and executes without error, give four reasons why it is a poor test. @Test public void testIt() { long time = System.currentTimeMillis(); double r = solve(40229321L); if (r < 1000.0) { assertThat(r == 430.6).isTrue(); } long elapsed = System.currentTimeMillis() - time; assertThat(elapsed).isLessThan(3000); } [8 marks] (b) You are running a project to develop the next version of an operating system that supports mandatory access control and limits the covert-channel bandwidth that a process at security level High can use to signal down to a process at security level Low. Discuss the relative contribution of unit testing, integration testing and regression testing in checking that the covert-channel bandwidth is still acceptably low. [8 marks] (c) The operating system is now going to be used in a different environment so that the load pattern will change. How might this affect covert-channel bandwidth?

(a) In the database context, what do we mean by redundant data? [1 mark] (b) Why might it be a good idea to have redundant data in a database? [2 marks] (c) Why might it be a bad idea to have redundant data in a database? [2 marks] (d) Suppose a database has tables R(A, B) and S(B, C). Explain how using an index could improve performance when joining R and S. Is there a downside to using an index? [4 marks] (e) In SQL, what could be returned when evaluating the following expression? NOT (a OR (NOT a)) [2 marks] (f ) Suppose R(start, end) is a table in a relational database representing arcs in a directed graph. That is, each record (x, y) R represents an arc from node x to node y. (i) Write an SQL query that returns the start and end of all 3-hop paths in the directed graph represented by R. Your query should return columns named start, end. Each row (x, y) in the result of your query should indicate that there exists a path in R x z u y for some nodes z and u. [4 marks] (ii) What is the transitive closure of R? Why is this difficult to compute in SQL if we ignore recursive query constructs?

A mobile telephone company has 5 million prepay customers who buy scratchcards to pay for air time. Each card has a code of 9 decimal digits, and at any time there are about 20 million cards active (issued to the supply chain and not yet used). (a) Discuss the relative advantages and disadvantages of implementing the code system with a database of random numbers versus an encrypted counter. [4 marks] (b) If you were using an encrypted-counter system, how would you go about selecting, adapting or designing a suitable cipher? [4 marks] (c) Some of the customers have got clever. As they are allowed two invalid code attempts, they try two random codes before entering a correct one. The telephone company is now getting 2000 complaints a month from people who bought a scratchcard and found, when they tried to use it, that someone else had already guessed the number. How would you modify the system to reduce the level of complaints? [8 marks] (d) You are now approached by a telephone company in China which wants to use your system to manage 100 million prepay customers. What further modifications would you consider?

A dataflow analyser is required which can report on local variables having write-write dataflow anomalies. A write-write anomaly is present in a program if there is a path in the flowgraph containing two writes to a given variable and with no intervening read to that variable. For example y=a; if (p) x=1; if (q) x=2; if (y==b) y=1; else y=2; has an anomaly for x but not for y. Given node n in the flowgraph, let R(n) be the set of variables v for which a node n 0 exists with n 0 writing to v and having a path from n 0 to n which does not contain a read from v. (i) Give dataflow equations for R(n) and thence construct an algorithm which reports variables having such anomalies. Pay attention to the initialisation of any iteration which you employ. [8 marks] (ii) Discuss briefly to what extent your algorithm could be extended to deal with global variables or with address-taken local variables. [4 marks] (b) Let us say that an undirected graph (N, E) is k-cyclic if N = {n1, . . . , nk} and E = {(n1, n2),(n2, n3), . . .(nk1, nk),(nk, n1)}. (i) Give a function body, or flowgraph, for which the register inference graph for its local variables forms a 4-cyclic graph.

Write Java program that will prompt the user to input a size of an array. Create an array of type int. Ask a user to fill the array. Create a functions sortArray() and mostFrequency(). Write program that asks the user to input 10 integers of an array. The program then inserts a new value at position (or index) given by the user, shifting each element right and dropping off the last element. Write program that asks the user to input 5 integers in an Array named "gradearray." then passes gradearray in a function to find how many passing grades exist. Write program that first gets a list of integers from the input and adds them to an array. The input begins with an integer indicating the number of integers that follow

(a) List four items of metadata that you might find in a File Control Block (FCB). [4 marks] (b) Consider a Unix process accessing a file using the standard API. Is protection provided through Access Control Lists or Capabilities? Justify your answer. [2 marks] (c) Consider a filesystem structured as a directed acyclic graph (DAG) where files are structured from sets of 4096-byte disk blocks with 64-bit addresses. The first block of each file contains the following information: control information: 1024 bytes direct block pointers: 1008 bytes indirect block pointer: 8 bytes double indirect block pointer: 8 bytes immediate data: 2048 bytes The data bytes of a file start at the beginning of the immediate data. After the immediate data, the file data is found on the block addressed by the first direct block pointer and then carries on in a fashion similar to the structure defined by a Unix inode. We consider the first byte of the file to be byte 0, then byte 1, etc. Files are named by directory entries that are 128 bytes long. Directories are stored as files limited to a single block in size. Only two levels of directory are allowed. The root directory is stored in block 0. You may find it useful to know that 126 8 = (27 2) 2 3 = 1008. (i) Assuming identical structure for the first blocks of both files and directories, what is the maximum number of files this filesystem may contain? Without changing the size of a disk block, a disk block address, or a directory, how might you increase this, and to what? [4 marks] (ii) How many disk blocks must be read to access byte 72 of a named file? How many must be read to access byte 223? [3 marks] (iii) How big is the largest single file that can be stored in this filesystem?

(a) Consider the eigenface algorithm for face recognition in computer vision. (i) What is the role of the database population of example faces upon which this algorithm depends? [3 marks] (ii) What are the features that the algorithm extracts, and how does it compute them? How is any given face represented in terms of the existing population of faces? [4 marks] (iii) What are the strengths and the weaknesses of this type of representation for human faces? What invariances, if any, does this algorithm capture over the factors of perspective angle (or pose), illumination geometry, and facial expression? [4 marks] (iv) Describe the relative computational complexity of this algorithm, its ability to learn over time, and its typical performance in face recognition trials.

Let D be a domain with bottom element . Let h, k : D D be continuous functions with h strict (so h() = ). Let B = {true, false}. Define the conditional function, if : B D D D by if(b, d, d0 ) = d if b = true, d 0 if b = false, and otherwise. Let p : D B be a continuous function. The function f is the least continuous function from D D to D such that x D. f(x, y) = if(p(x), y, h(f(k(x), y))) . (a) State the principle of fixed point induction. What does it mean for a property to be admissible? [4 marks] (b) Show that b B, d, d0 D. h(if(b, d, d0 )) = if(b, h(d), h(d 0 )) . [3 marks] (c) Prove that the property Q(g) def x, y D. h(g(x, y)) = g(x, h(y)) , where g is a continuous function from D D to D, is admissible. [5 marks] (d) Prove Q(f) by fixed point induction.

In (1) and (2) below, the words in the sentences have been assigned tags from the CLAWS 5 tagset by a stochastic part-of-speech (POS) tagger: (1) Turkey NP0 will VM0 keep VVI for PRP several DT0 days NN2 in PRP a AT0 fridge NN1 (2) We PNP have VHB hope VVB that CJT the AT0 next ORD year NN1 will VM0 be VBI peaceful AJ0 In sentence (1), Turkey is tagged as a proper noun (NP0), but should have been tagged as a singular noun (NN1). In sentence (2), hope is tagged as the base form of a verb (VVB: i.e., the present tense form other than for third person singular), but should be NN1. All other tags are correct. (a) Describe how the probabilities of the tags are estimated in a basic stochastic POS tagger. [7 marks] (b) Explain how the probability estimates from the training data could have resulted in the tagging errors seen in (1) and (2). [6 marks] (c) In what ways can better probability estimates be obtained to improve the accuracy of the basic POS tagger you described in part (a)? For each improvement you mention, explain whether you might expect it to improve performance on examples (1) and (2).

(a) Let R X Y and P Y for sets X and Y . Prove that y Y. ( x X. x R y) y P

x X. (x R y y P) (b) Define the notions of (i) injective function between two sets [1 mark] (ii) surjective function between two sets [1 mark] (c) Let N+ = {n N | n > 0} and define the function e : N N N+ by e(m, n) = 2m(2n + 1) Without using the Fundamental Theorem of Arithmetic, prove that e is (i) injective (ii) surjective You may use any other standard results provided that you state them clearly.

Consider the problem of rendering a scene consisting of spheres graphically using ray tracing. (a) Give a brief overall description of the mathematics underlying the algorithm. Discuss modelling the geometry of individual spheres, formulating the vector equation of a ray, modelling different lighting effects on the surfaces of the spheres, and considering spheres made of refractive and mirrored material. (b) What is meant by spatial aliasing and temporal aliasing in an image? (c) Describe how super-sampling can be used to reduce spatial aliasing.

Some areas of land currently covered by forest had a different previous purpose. An experiment is to be conducted to see whether areas of forest can be automatically classified according to their purpose 50 years ago. There are three categories: meadow, garden and managed woodland. The classification is to be based on the trees currently present: for instance, an area with several apple trees is relatively likely to have been a garden. There are some locations where the true category is known from historical data and the number of trees of each type observed within a fixed distance has been recorded. There is data from 25 meadows, 30 gardens and 45 woodlands. The average number of individual trees at each location is 52. (a) Give formulae for two possible approaches to Naive Bayes classification for this task. (b) How could you derive parameter estimates for use in the Naive Bayes classifiers from this type of data? (c) How would you use the available data to train and test a Naive Bayes classifier? (d) You are now given a large catalogue of tree species with each species manually assigned to zero or more of the categories. Describe a modification to your previous experiment which makes use of this data.