Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

PYTHON logistic regression exercise (scikit is not allowed) This project has to have classes and defined some functions by programmer. It is asked to implement

PYTHON logistic regression exercise (scikit is not allowed)

This project has to have classes and defined some functions by programmer. It is asked to implement a logistic regression for genetic alterations/mutations data-sets. There are two datasets. The first one is to train the model and the second one is to test. 1_adel dataset consists of approximately 190000 mutations and 2_beqw has around 380000 mutations. They have (52columns) with 51 input variables (feature) and last column will be considered as output-response- variable. Last column is the important one and its name is lab_dri_mut that represents one feature which is either A or B. Use only numpy,pandas and matplotlib for cost vs. number of iterations etc. scikit is not allowed.

Firstly: Its asked to write a class logistic_reg in a logreg.py and implement functions for this class in this .py file. Then, write a LRdriver.py to call logistic_reg.

Secondly: Present and find learning rate and the number of iterations parametric. Plot cost-error vs. number of iterations. Provide weights in ascending manner, with corresponding features. Thirdly: Evaluation of the model for the real labels (latest/last column of 1_adel dataset) y and predicted_y values using classical parameters like Specificty,Sensitivity,Precison,Accuracy,Confusion mat.

Lastly: Do with scikit and compare your evaluation to this one.

(P.s.When I checked datasets last column header is empty in excel. I think first column name should be empty it is like index right?)

1_adel dataset as .txt file first 5 line:

"max_no_interactions" "cellular.component" "development" "DNA.damage" "immune" "metabolic" "pathway" "proliferation" "signaling" "NA_hallmarks" "acetylation" "caspase.cleavage" "di.methylation" "methylation" "mono.methylation" "N.Glycosylation" "O.GalNAc" "O.GlcNAc" "phosphorylation" "succinylation" "sumoylation" "tri.methylation" "ubiquitylation" "gene.length" "silent" "nonsense" "splice.site" "missense" "recurrent.missense" "normalized.missense.position.entropy" "frameshift.indel" "inframe.indel" "normalized.mutation.entropy" "Mean.Missense.MGAEntropy" "Mean.VEST.Score" "lost.start.and.stop" "missense.to.silent" "non.silent.to.silent" "expression_CCLE" "replication_time" "HiC_compartment" "gene_betweeness" "gene_degree" "oncogene.score" "tsg.score" "other.score" "driver.score" "SIFT" "PolyPhen" "Condel" "average_rank" "labels_driver_mutation" "X10_52587965_A.T" 8 "no" "no" "no" "no" "no" "no" "no" "yes" "no" "no" "no" "yes" "no" "no" "no" "no" "no" "yes" "no" "no" "no" "yes" 1809 0.263157894737 0.0877192982456 0 0.631578947368 0.0526315789474 0.974452200301 0.0175438596491 0 0.914987035442 0.523518675893 0.467035087719 0 2.25 2.625 113129.1 613 0.01875195 6.51647478599e-06 9 0.2 0.061 0.739 0.261 0 0.992 0.892 0.708475128205128 "driver" "X17_67016638_C.T" 1 "no" "no" "no" "no" "yes" "no" "no" "no" "no" "yes" "no" "no" "no" "no" "yes" "no" "no" "yes" "no" "no" "no" "yes" 4875 0.221153846154 0.0769230769231 0.00961538461538 0.673076923077 0.0288461538462 0.988917632922 0.0192307692308 0 0.938567987975 0.972683935097 0.318442307692 0 2.91666666667 3.375 263022.9 602 -0.041954 0 0 0.074 0.038 0.888 0.112 0.14 0.3 0.343 0.333121842105263 "driver" "X2_169788976_C.A" 1 "no" "no" "no" "yes" "no" "no" "no" "no" "no" "yes" "no" "no" "no" "no" "no" "no" "no" "yes" "no" "no" "no" "yes" 3966 0.229166666667 0.0520833333333 0 0.697916666667 0 1 0.0208333333333 0 0.965749797911 0.802749053609 0.448927083333 0 2.91304347826 3.21739130435 223582.3 560 0.001437665 5.04083044541e-06 5 0.102 0.044 0.854 0.146 0.17 0.998 0.533 0.662369743589744 "passenger" "X2_169781249_G.C" 1 "no" "no" "no" "yes" "no" "no" "no" "no" "no" "yes" "no" "no" "no" "no" "no" "no" "no" "yes" "no" "no" "no" "yes" 3966 0.229166666667 0.0520833333333 0 0.697916666667 0 1 0.0208333333333 0 0.965749797911 0.802749053609 0.448927083333 0 2.91304347826 3.21739130435 223582.3 560 0.001437665 5.04083044541e-06 5 0.102 0.044 0.854 0.146 0 0.999 0.935 0.786867692307692 "passenger" "X2_169780319_G.T" 1 "no" "no" "no" "yes" "no" "no" "no" "no" "no" "yes" "no" "no" "no" "no" "no" "no" "no" "yes" "no" "no" "no" "yes" 3966 0.229166666667 0.0520833333333 0 0.697916666667 0 1 0.0208333333333 0 0.965749797911 0.802749053609 0.448927083333 0 2.91304347826 3.21739130435 223582.3 560 0.001437665 5.04083044541e-06 5 0.102 0.044 0.854 0.146 0 0.93 0.821 0.776464358974359 "passenger"

2_beqw dataset: as .txt file first 5 line:

"max_no_interactions" "cellular.component" "development" "DNA.damage" "immune" "metabolic" "pathway" "proliferation" "signaling" "NA_hallmarks" "acetylation" "caspase.cleavage" "di.methylation" "methylation" "mono.methylation" "N.Glycosylation" "O.GalNAc" "O.GlcNAc" "phosphorylation" "succinylation" "sumoylation" "tri.methylation" "ubiquitylation" "gene.length" "silent" "nonsense" "splice.site" "missense" "recurrent.missense" "normalized.missense.position.entropy" "frameshift.indel" "inframe.indel" "normalized.mutation.entropy" "Mean.Missense.MGAEntropy" "Mean.VEST.Score" "lost.start.and.stop" "missense.to.silent" "non.silent.to.silent" "expression_CCLE" "replication_time" "HiC_compartment" "gene_betweeness" "gene_degree" "oncogene.score" "tsg.score" "other.score" "driver.score" "SIFT" "PolyPhen" "Condel" "average_rank" "labels_driver_mutation" "X1_100007041_A.T" 3 "no" "no" "no" "no" "yes" "no" "no" "no" "no" "yes" "no" "no" "no" "no" "no" "no" "no" "yes" "no" "no" "no" "yes" 1104 0.375 0 0 0.625 0 1 0 0 1 0.536624619856 0.490375 0 1.25 1.25 287540.5 530 0.008874619 6.41846868919e-08 2 0.016 0.005 0.979 0.021 0.01 0.997 0.911 0.741031176470588 "passenger" "X1_100007056_T.C" 3 "no" "no" "no" "no" "yes" "no" "no" "no" "no" "yes" "no" "no" "no" "no" "no" "no" "no" "yes" "no" "no" "no" "yes" 1104 0.375 0 0 0.625 0 1 0 0 1 0.536624619856 0.490375 0 1.25 1.25 287540.5 530 0.008874619 6.41846868919e-08 2 0.016 0.005 0.979 0.021 0.16 0.998 0.919 0.787120294117647 "passenger" "X1_100007065_C.T" 3 "no" "no" "no" "no" "yes" "no" "no" "no" "no" "yes" "no" "no" "no" "no" "no" "no" "no" "yes" "no" "no" "no" "yes" 1104 0.375 0 0 0.625 0 1 0 0 1 0.536624619856 0.490375 0 1.25 1.25 287540.5 530 0.008874619 6.41846868919e-08 2 0.016 0.005 0.979 0.021 0.02 1 0.945 0.789172058823529 "passenger" "X1_100007074_C.G" 3 "no" "no" "no" "no" "yes" "no" "no" "no" "no" "yes" "no" "no" "no" "no" "no" "no" "no" "yes" "no" "no" "no" "yes" 1104 0.375 0 0 0.625 0 1 0 0 1 0.536624619856 0.490375 0 1.25 1.25 287540.5 530 0.008874619 6.41846868919e-08 2 0.016 0.005 0.979 0.021 0.35 0.997 0.897 0.809202352941176 "passenger" "X1_100011456_G.T" 3 "no" "no" "no" "no" "yes" "no" "no" "no" "no" "yes" "no" "no" "no" "no" "no" "no" "no" "yes" "no" "no" "no" "yes" 1104 0.375 0 0 0.625 0 1 0 0 1 0.536624619856 0.490375 0 1.25 1.25 287540.5 530 0.008874619 6.41846868919e-08 2 0.016 0.005 0.979 0.021 0.03 0.999 0.873 0.843884705882353 "passenger"

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Oracle Database 19c DBA By Examples Installation And Administration

Authors: Ravinder Gupta

1st Edition

B09FC7TQJ6, 979-8469226970

More Books

Students also viewed these Databases questions

Question

How do Excel Pivot Tables handle data from non OLAP databases?

Answered: 1 week ago