GENE_NAME HEALTHY_SAMP HEALTHY_SAMP HEALTHY_SAMP DISEASE_SAMPI DISEASE_SAMPI DISEASE_SAMPI DIFFERENCE_OF *(HEALTHY) x(DISEASE) X(DISEASE)-X(HIS(HEALTHY) S(DISEASE) T-statistic Deno Degrees Of Free Degrees Of Free Degrees of Free T-Statistic p-value Number of Tests (nt) Rank (i) nt/j nt/i * p-value FDR adjusted p-value Gene26 207.16 201.27 214.2 159.31 169.06 160.62 -44.55666667 207.5533333 162.9966667 -44.55666667 6.488947013 5.291694751 4.834199922 546.1330109 142.0591999 3.844404383 -9.21696814 0.000926441 40 40 0.037057624 0.037057624 Gene 154.01 162.38 152.3 231.79 216.3 242.15 73.82333333 156.26 230.0833333 73.82333333 5.361613563 13.00426597 8.121121296 4349.746912 1634.716224 2.660857492 9.09028823 0.004465702 40 2 20 0.089314049 0.089314049 Gene17 278.08 267.81 296.51 316.8 325.78 310.72 36.98333333 280.8 317.7833333 36.98333333 14.54205281 7.573257775 9.466177217 8029.685264 2667.201045 3.010528689 3.906892136 0.029595763 40 13.33333333 0.394610179 0.363444022 Gene2 173.46 166.88 165.8 179.86 175.83 176.21 8.586666667 168.7133333 177.3 8.586666667 4.146050329 2.22515168 2.716678937 54.46948412 17.77790576 3.063886424 3.160721921 0.049395567 40 10 0.493955672 0.363444022 Gene28 223.9 217.51 228.59 212.11 203.86 189.46 -21.52333333 223.3333333 201.81 -21.52333333 5.561693387 11.46331104 7.356175033 2928.250439 1012.485916 2.892139429 -2.925886515 0.064009448 40 0.512075581 0.363444022 Gene14 03.75 178.36 205.8 220.4 223.32 231.75 29.17333333 195.9833333 225.1566667 29.17333333 15.29798789 5.893694371 9.46509494 8026.013726 3109.770657 2.580902134 3.082201871 0.065832187 40 6 6.666666667 0.43888125 0.363444022 Gene7 165.18 175.59 157.79 173.97 189.2 193.47 19.36 166.1866667 185.546666 19.36 8.942596566 10.2504943 7.853718157 3804.532054 968.6369171 3.92771738 2.465074454 0.070470097 40 7 5.714285714 0.402686267 0.363444022 Gene12 161.68 153.8 162.62 165.97 167.88 177.27 11.00666667 159.3666667 170.3733333 11.00666667 4.843731344 6.048556302 4.473878506 400.6237986 104.9398664 3.817651121 2.460206877 0.072688804 40 8 5 0.363444022 0.363444022 Gene21 173.53 173.02 169.04 155.81 169.28 159.01 -10.49666667 171.8633333 161.3666667 -10.49666667 2.458339548 7.037445086 4.30383808 343.1023522 138.2950802 2.48094402 -2.438908358 0.110444113 40 9 4.444444444 0.490862724 Gene6 160.54 159.04 159.18 168.69 59.7 161.04 6.916666667 159.5866667 166.5033333 6.916666667 0.828573071 4.762670819 2.791031271 60.68184958 28.6106294 2.120954724 2.478175984 0.124365329 40 10 0.497461316 Gene3 129.47 133.65 130.25 126.61 103.36 115.61 15.93 131.1233333 115.193333 15.93 2.222641072 11.63059901 6.836445633 2184.34613 1017.922404 2.145886683 -2.330158222 0.13651653 40 11 3.636363636 0.496423779 Gene10 221.49 227.81 220.85 210.66 211.3 222.68 -8.49 223.3833333 214.8933333 -8.49 3.846938176 6.752016983 4.486597325 405.199006 127.635123 3.174666953 -1.892302648 0.14972102 40 12 3.333333333 0.499070084 Gene34 44.39 145.79 142.1 141.34 136.3 126.76 -9.293333333 144.0933333 134.8 -9.293333333 1.862802548 7.404836257 4.408387208 377.6756055 167.6969717 2.25213134 -2.108102781 0.15511478 40 13 3.076923077 0.47727625 Gene40 178.2 178.8 161.1 164.88 156.88 162.4 -11.31333333 172.7 161.3866667 -11.31333333 10.05037313 4.095135325 6.265783626 1541.35109 582.4587484 2.646283696 -1.805573574 0.18088588 40 14 2.857142857 0.516816803 Gene16 187.04 193.29 178.72 189.46 206.84 194.59 10.61333333 186.35 196.9633333 10.61333333 7.309466465 8.929761102 6.662555399 1970.440534 511.8417692 3.849706399 1.592982377 0.189137685 40 15 2.666666667 0.50436716 Gene32 243.67 245.03 231.43 239.2 227.51 218.8 -11.54 240.0433333 228.503333 1.54 7.490295944 10.23621186 7.323126988 2875.982644 784.8096692 3.664560665 -1.575829563 0.196605768 40 16 2.5 0.491514419 Gene27 235.59 214.19 221.41 10.9 214.27 217.37 -9.543333333 223.73 214.1866667 -9.543333333 10.88700142 3.225807392 6.555726081 1847.069326 786.4935381 2.348486335 -1.455724845 0.264916172 40 17 2.352941176 0.62333217 Gene35 183.02 186.25 170.81 205.8 184.13 183.43 11.09333333 180.0266667 191.12 11.09333333 8.143613039 12.71806982 8.719092715 5779.432332 1697.831703 3.404007784 1.272303632 0.283222705 40 18 2.222222222 0.629383789 Gene37 172.84 170.62 162.29 162.99 142.6 172.5 -9.2 168.5833333 159.3833333 -9.2 5.562070957 15.30217087 9.400231676 7808.259333 3099.239475 2.519411422 -0.978699283 0.412126503 40 19 2.105263158 0.867634744 Gene 191.52 183.82 213.75 204.0 210.97 9.01 196.3633333 205.3733333 9.01 15.54170626 5.069243862 9.438253134 7935.357255 3278.008761 2.42078586 0.954625805 0.425095263 40 20 0.850190526 Gene18 180.03 185.2 175.23 163.36 184.8 174.02 -6.083333333 180.1533333 174.07 -6.083333333 4.986144135 10.73508733 6.833830876 2181.006233 772.1579125 2.824559844 -0.890179087 0.442627273 40 21 1.904761905 0.843099567 Gene20 261.86 264.82 259.12 265.12 259.74 242.65 -6.096666667 261.9333333 255.8366667 -6.096666667 2.850707515 11.73252885 6.970862373 2361.27209 1056.343216 2.235326601 -0.87459289 0.465530226 40 22 1.818181818 0.846418593 Gene19 43.01 235.72 241.9 255.87 237.82 240.49 4.516666667 240.21 244.7266667 4.516666667 3.927862014 9.742311498 6.064671833 1352.786381 513.6907909 2.633464344 0.744750382 0.517252411 40 23 1.739130435 0.89956941 Gene11 204.9 184.05 202.28 184.9 197.88 192.71 -5.246666667 197.0766667 191.83 -5.246666667 11.35722824 6.534592566 7.564996879 3275.17878 1025.60658 3.193406568 -0.69354512 0.535054582 40 24 1.666666667 0.891757637 Gene36 201.5 221.9 227.6 208.63 214.28 211.04 -5.693333333 217.01 211.3166667 -5.693333333 13.73420183 2.83514256 8.096631673 4297.516296 1980.291419 2.170143371 -0.703173068 0.549752923 40 25 1. 0.879604677 Gene38 164.19 166.46 172.62 174.9 167.88 166.99 2.166666667 167.7566667 169.9233333 2.166666667 4.362021703 4.332832022 3.549679172 158.7656 39.6931893 3.999819688 0.610383801 0.57458660 40 26 1.538461538 0.883979398 Gene31 271.77 263.68 263.21 260.1 69.53 262.68 -2.116666667 66.22 264.1033333 -2.116666667 4.812182457 4.873462151 54226993 244.4817163 61.13021467 3.999359688 -0.53529215 0.620826911 40 27 1.481481481 0.919743572 Gene9 22.1 219.64 202.64 210.7 210 213.12 -3.533333333 214.8233333 211.29 -3.533333333 10.62783296 1.628588346 6.207606445 1484.898271 709.1625248 2.093875831 -0.56919416 0.62435446 40 28 1.428571429 0.891934954 Gene15 270.1 273.21 252.17 273.4 266.46 266.97 3.783333333 265.16 268.9433333 3.783333333 11.35663242 3.868001206 6.926628168 2301.905543 936.5502784 2.45785581 0.546201303 0.630519611 40 29 1.379310345 0.869682222 Gene4 164.45 175.86 158.2 171.02 163.9 170.97 2.443333333 166.1866667 168.63 2.443333333 8.932526705 4.096376448 5.673632973 1036.203253 369.3348826 2.805592708 0.430647055 0.697645993 40 30 1.333333333 0.930194657 Gene2 192.98 183.48 187.05 184.15 180.52 193.2 -1.88 187.8366667 185.9566667 -1.88 4.798607437 6.530209287 4.678684525 479.1759916 130.4836281 3.672307387 -0.401822348 0.710089069 40 31 1.290322581 0.91624396 Gene23 242.72 229.33 241.1 248.89 235.52 234.44 1.9 237.7166667 239.616666 1.9 7.308093687 8.049076552 5.276832181 1552.251433 391.6598234 3.963264395 0.302700462 0.777339445 40 32 1.25 0.971674306 Gene30 318.27 325.53 319.21 323.8 318.32 323.3 0.803333333 321.0033333 321.8066667 0.803333333 3.948282327 3.029873485 2.87338747 68.16740706 18.18276058 3.749013069 0.279577099 0.79454164 40 33 1.212121212 0.963080776 Gene8 115.98 112.07 118.3 118.15 110.09 116.44 0.556666667 115.45 114.8933333 -0.556666667 3.148634625 4.246767398 3.052263277 86.79365272 23.53043559 3.688569742 0.182378326 0.864864756 40 34 1.176470588 1.017487948 Gene33 170.07 154.47 189.61 177.47 161.28 170.08 -1.773333333 171.3833333 169.61 -1.773333333 17.60677521 8.105226709 11.19066923 15682.82284 5578.604682 2.811244699 -0.158465352 0.884781592 40 35 1.142857143 1.011178962 Gene24 179.89 165.14 172.9 172.74 174.66 172.41 0.616666667 172.6533333 173.27 0.616666667 7.378891064 1.215030864 4.317573907 347.503451 164.8204331 2.108376034 0.142827125 0.898917114 40 36 1.111111111 0.998796793 Gene 199.84 178.48 188.19 181.07 193.52 189.02 -0.966666667 188.8366667 187.87 -0.966666667 10.69467313 6.30416529 7.167485225 2639.169146 814.5190726 3.240156351 -0.134868317 0.9006638 40 37 1.081081081 0.973690692 Gene13 175.05 184.9 169.2 179.55 166.27 181.36 -0.666666667 176.3933333 175.7266667 199999999 0- 7.920898518 8.239565118 6.598764194 1896.052842 474.7498894 3.99379312 -0.101029018 0.924396116 40 38 1.052631579 0.973048544 Gene22 175.2 166.65 168.74 181.87 165 .8 164.78 0.62 170.1966667 170.8166667 0.62 4.457245039 9.586043675 6.103530854 1387.792632 491.0489756 2.82617967 0.101580547 0.925863083 40 39 1.025641026 0.949603162 jene25 215.35 204.72 207.54 217.3 212.95 195.21 -0.716666667 209.2033333 208.4866667 -0.716666667 5.50674435 11.70183889 7.466754464 3108.332584 1092.789708 2.844401407 -0.095981014 0.929894473 40 0.929894473Introduction In this project we'll explore some of the techniques that could be used to explore gene ex\" pression activity. As a motivating example, we'll imagine that we are comparing the gene activity in a healthy human lung tissue sample to the gene activity in a lung cancer tumor sample. We will attempt to identify genes that seem to be showing a change in activity in tumor tissue samples relative to healthy control samples. A quick primer on gene expression: while we won't dive too deeply into the underlying biology, the basic idea is that genes (DNA sequences) in the genome are transcribed into messenger RNA sequences, which are then used as blueprints to manufacture proteins (the biological molecules which carry out most cellular functions). When we reference \"gene ach tivity\" in this study, we are broadly addressing this process of taking instructions from the genome and using them to produce proteins for the cell. Each gene will have a baseline, quantitative activity level in healthy (lung) tissue the genes may be inactive / off, or active to some extent following a spectrum of values corresponding to how much protein product is ultimately produced. The numerical value associated with \"activity\" broadly captures the number of messenger RNAs and,! or proteins produced. This activity level often changes dramatically in cancerous tissues, and the focus of this study will be to explore some tech niques that are used to identify and study the genes that are behaving dierently in tumors based on their observed, numerical activity levels- Note: Genes work in complicated ways, and you shouldn't read too much into the meaning of a positive difference in activity versus a negative difference in activity. A drop in activity could result in less production of some compound, or in less production of something inhibiting production of that compound.[ The study will proceed in two phases: First, we will compare the gene activity levels for 40 handpicked genes across two samples (one healthy, one cancerous). These genes are suspected to be broadly involved in tumorigenesis and the onset of cancer, but we'll seek to conrm whether these genes are behaving differently in this strain of cancer specically. To do this, we will use a scatterplot and regression analysis to explore the overall trend in gene activity behaviour (and to provide a preliminary glimpse into the genes that may be showing dilferences in activity level). In the second phase of the study, we'll collect multiple replicate measurements from healthy and disease tissues, allowing us to conduct hypothesis tests to conrm the (statistical) signicance of any changes in activity level. We will rst do this focusing on a subset of the 40 candidates (those that seemed most suspicious from the scatterplot / regression analysis), and then using the complete set of 40 candidates. Because we will be conducting several hypothesis tests in the context of a single study, this will give us a chance to test out corrections for the multiple testing problem. Hypothesis Testing (all genes) A spreadsheet showing the results of independent samples ttests on all forty genes is provided in \"Hypothesis Testing all.xls\". For the hypothesis tests included in this spreadsheet, the following null and alternative hypotheses were used: H0 : p'healthy : \"diseased Ha. : \"healthy i ,U'diseased Numerators of test statistics were determined (on a genebygene basis) by taking the mean of the disease tissue activity levels and subtracting the mean of the healthy tissue activity levels. In addition to providing the data from the remaining 37 genes, several new columns have been added to this spreadsheet showing intermediate calculations of the Benjamini'Hochberg False Discovery Rate adjustment to the pvalues. All genes were then sorted by pvalue, from smallest (rst row) to largest (last row). Question 3 a): (2 marks) In addition to the three genes we examined previously, a fourth gene has achieved a pvalue that would be considered statistically signicant at a pvalue cutr off of 0.05 (without making any attempt to correct for multiple testing). Looking at the information available in this spreadsheet, suggest why we might not have recognized this candidate in our prior scatterplot analysis. Question 3 b): (3 marks) Use the Benj aminiuHochberg method to determine Falseh Discoveryuratenadjusted pvalues for the genes in this data set. The provided spreadsheet has provided some preliminary work towards this goal, and has correct adjusted pvalues for the rst three genes. Provide the completed list of 40 adjusted p-values, in their current order, as part of your report. Question 3 c): (2 marks) How many genes would be considered statistically signicant if we used a Benjaminiaochberg FalseDiscovery-Rate adjusted cutoff of 0.40? Out of this list of genes, how many would you expect to be false positives? (Round up to the next integer.)