Question
1. While working in a consulting centre, a client comes to you with the following problem: I am studying disease A in a certain population,
1. While working in a consulting centre, a client comes to you with the following problem:
“I am studying disease A in a certain population, and have managed to collect p features, X, on each of a number of subjects, and we have k different outcomes, Y , related to the disease. The feature of most interest is a categorical variable with two levels related to presence or absence of a certain genetic mutation. ”
(a) Describe how your client might go about testing whether the k outcomes differ, on average, between the two groups (based on presence or absence of this mutation). You can ignore the other features for the moment.
(b) Describe a technique to your client that may help them to find interesting combinations of features and outcomes. What sort of information / output does this technique provide? (c) Suppose none of the features are associated in any way with the outcomes, can you detect this with this technique?
(d) Your client tells you “ A colleague of mine mentioned a technique called PCA. What if I run PCA on my features X and my outcomes Y separately then I regress the Y scores of its most important principal component onto the X scores of its most important principal component. Won’t I get the same thing?” Will the client get the same result?
2. Suppose your next client in the consulting centre comes in with a problem from biophysics about molecular structure.
“Given a pair of molecules Mi , Mj , we have computed dissimilarities / distances between the two molecules using some computer application that is considered the ‘gold standard’ in our field. We have these dissimilarities for n different molecules, and we would like to see if there is any interesting structure in our population of molecules. For instance, is there a natural grouping of the molecules based on these dissimilarities?”
(a) Describe some techniques your client might use to discover interesting groupings of the molecules.
(b) The client then asks: “We actually have some labels for these molecules based on other studies. Can you help us come up with a rule for classifying a new protein into these classes based on its dissimilarites with the n existing proteins we have?” Is there a natural way to create discriminant functions for use in some sort of LDA technique in this situation?
(c) What if your client told you “In fact, we’ve been working on approximating this gold standard ‘distance’ by using some features of each molecule to come up with our own distance function dapprox(Mi , Mj ) which is some symmetric function based on the features of molecules Mi and Mj . It seems that this dapprox approximates the ‘gold standard’ really well.” Describe how you might use dapprox to create discriminant functions for classification
Step by Step Solution
3.45 Rating (161 Votes )
There are 3 Steps involved in it
Step: 1
REQUIREMENT 1 a Describe how your client might go about testing whether the k outcomes differ on average between the two groups based on presence or absence of this mutation You can ignore the other f...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started