Question
Small Project 1 : Linear Regression Models In this project, you will get to use WEKA Tool. Please see the References and Resources Section for
- Small Project 1: Linear Regression Models
- In this project, you will get to use WEKA Tool.
- Please see the References and Resources Section for guidelines on how to get this tool.
- This assignment involves building and evaluating fault prediction models using Linear Regression, implemented in WEKA. Your task is to build models to predict the number of faults based on the other attributes of programs in the dataset. Each model is to be built and evaluated using 10-fold cross validation on the fit data set, and then validated using the test data set.
- The datasets have already been preprocessed for use in Weka.
- You could download the datasets from the link under References & Resources.
- Use the fit dataset to build models based on 10-fold cross validation. When you build the model, you will get several statistical indicators, the measures of the quality of fit (in the case of fit data) and the predictive quality (for the test data), at the end of each run, as listed below:
- Correlation coefficient
- Mean absolute error (also called AAE, which stands for Average Absolute Error)
- Root mean squared error
- Relative absolute error
- Root relative squared error
- The Linear regression models could be built with three different options for attribute selection in WEKA.
- No Attribute Selection
- M5 method
- Greedy method
- You have to use each attribute selection method for building the models. Consequently, you will have three different models. Compare the models, how many and which independent variables were selected? After building the models, evaluate their performance by supplying the test data set. Compare the quality of fit and predictive quality for each model built. Also compare the qualities of fit and predictive qualities among all the different models respectively. Your comparisons should not be based on just one parameter. Use all the statistical indicators (mentioned hereabove) provided by Weka to perform the comparisons.
- Don't forget to include all the results based on the 10-fold cross validation and the test data set for each model.
- In this project, you will get to use WEKA Tool.
Data Set fit.arff
@relation FIT
@attribute NUMUORS real @attribute NUMUANDS real @attribute TOTOTORS real @attribute TOTOPANDS real @attribute VG real @attribute NLOGIC real @attribute LOC real @attribute ELOC real @attribute FAULTS real
@data 22,85,203,174,9,0,362,40,0 21,87,186,165,5,0,379,32,0 30,107,405,306,25,0,756,99,0 6,5,19,6,2,0,160,9,0 21,47,168,148,7,0,352,29,0 28,38,161,114,10,3,375,40,0 27,218,1522,1328,114,0,1026,310,0 21,78,156,135,5,0,300,27,0 6,13,55,38,1,0,291,21,0 7,6,19,8,2,0,135,9,0 22,83,168,145,6,0,317,30,0 5,3,14,6,1,0,144,7,0 22,37,115,95,8,0,164,21,0 9,9,32,13,3,0,201,14,0 26,26,90,64,10,0,166,24,0 24,35,120,83,6,4,151,25,0 26,82,313,275,12,0,293,44,0 14,50,130,108,8,0,291,27,0 6,5,19,6,2,0,144,9,0 4,9,34,27,1,0,237,13,0 31,172,1221,1104,35,4,1158,151,0 21,26,81,56,3,2,236,18,0 9,17,59,47,2,0,136,21,0 6,5,19,6,2,0,155,9,0 8,7,104,30,1,0,495,32,0 6,5,19,6,2,0,162,9,0 12,59,232,205,13,0,410,47,0 21,79,156,135,5,0,303,27,0 36,212,921,827,36,3,736,137,0 19,86,220,193,7,0,349,34,0 41,203,1285,1055,66,13,882,190,0 10,25,147,102,9,0,602,53,0 23,86,173,150,6,0,322,31,0 14,25,72,42,2,0,363,21,0 23,29,255,189,16,0,584,68,0 18,10,56,33,2,0,211,16,0 13,62,238,207,15,0,491,53,0 14,49,124,102,8,0,311,27,0 26,41,119,88,9,1,298,30,0 15,36,111,92,9,0,196,31,0 10,18,41,33,3,0,183,11,0 17,65,126,113,5,0,184,21,0 31,59,239,187,10,3,403,48,0 7,7,29,12,3,0,186,13,0 18,14,64,47,3,0,116,12,0 18,37,97,67,2,0,380,28,0 7,6,43,18,4,0,240,19,0 27,86,380,321,16,1,709,59,0 20,68,231,207,7,1,257,41,0 31,142,519,427,30,0,501,104,0 4,3,6,3,1,0,19,3,0 6,9,137,90,1,0,802,51,0 30,97,374,312,9,1,605,48,0 5,4,19,10,1,0,179,8,0 11,27,50,40,3,0,171,11,0 17,66,132,119,5,0,240,26,0 42,673,2194,1494,614,1,1992,687,0 18,44,242,159,7,7,671,52,0 5,2,7,2,1,0,104,3,0 11,6,17,9,3,0,108,6,0 4,3,11,4,1,0,141,5,0 12,19,138,93,3,0,664,44,0 6,7,93,60,1,0,623,35,0 5,4,31,9,1,0,328,13,0 5,2,10,4,1,0,129,5,0 35,171,1301,1154,50,9,1769,223,0 21,86,188,167,5,0,341,31,0 4,2,6,2,1,0,108,3,0 4,4,16,6,1,0,177,7,0 21,89,184,163,5,0,339,31,0 21,56,241,196,18,0,382,58,0 7,12,175,108,1,0,1137,65,0 19,89,267,240,7,0,368,40,0 23,137,492,459,68,0,483,99,0 9,12,153,87,7,0,638,58,0 18,37,129,82,9,3,533,36,0 5,5,27,12,1,0,257,13,0 9,10,105,59,5,0,486,40,0 24,82,404,307,30,0,718,94,0 18,34,100,64,3,0,315,27,0 6,5,19,6,2,0,162,9,0 11,14,150,98,2,0,779,50,0 6,5,19,6,2,0,155,9,0 21,98,193,172,5,0,352,31,0 22,98,220,198,14,0,349,37,0 5,14,99,60,1,0,603,38,0 18,37,79,65,4,0,152,15,0 5,8,59,35,1,0,362,23,0 6,8,71,25,6,0,361,31,0 6,11,113,40,9,0,581,49,0 10,20,42,31,1,0,77,7,0 21,90,178,157,5,0,347,30,0 7,8,20,12,1,0,105,8,0 6,5,18,7,2,0,169,9,0 21,90,191,170,5,0,346,32,0 6,5,19,6,2,0,157,9,0 6,22,46,43,1,0,158,12,0 23,32,90,64,7,2,202,23,0 4,2,6,2,1,0,94,3,1 6,9,35,10,2,0,286,17,1 5,7,46,20,1,0,396,21,1 24,109,250,216,12,0,415,46,1 22,66,229,191,7,2,477,46,1 18,60,159,135,8,0,287,26,1 8,8,27,12,1,0,192,11,1 24,104,238,211,20,0,333,45,1 20,44,122,101,11,0,364,33,1 30,111,509,442,8,12,686,74,1 28,110,362,318,16,0,361,54,1 33,206,940,872,77,0,816,173,1 25,39,141,116,12,0,224,39,1 19,41,170,135,10,0,306,38,1 26,189,772,705,60,0,494,140,1 37,63,330,258,18,5,477,56,1 31,248,1339,1202,80,0,1109,238,1 19,16,96,56,6,1,327,27,1 30,60,562,443,45,6,1001,105,1 36,115,350,252,18,0,909,83,1 21,83,163,142,5,0,306,28,1 22,103,401,360,12,0,585,64,1 7,20,55,26,1,0,538,24,1 18,57,163,137,4,0,300,33,1 4,2,6,2,1,0,99,3,1 6,16,69,45,8,0,190,23,1 6,6,30,18,4,0,128,11,1 6,13,79,52,9,0,194,26,1 10,8,131,84,5,2,631,44,1 21,10,49,15,4,4,118,17,1 4,9,41,16,1,0,383,17,1 13,34,154,121,14,0,540,45,1 25,112,303,246,7,1,886,66,1 4,4,11,4,1,0,114,5,1 8,11,19,15,1,0,106,6,1 20,63,311,259,10,10,603,52,2 22,77,227,201,13,5,289,38,2 36,308,1822,1637,63,13,1914,280,2 39,178,1678,1433,45,1,1884,237,2 30,56,241,205,15,9,306,54,2 28,56,233,199,16,9,331,52,2 63,409,2565,1980,185,20,4416,555,2 46,232,1067,843,80,1,1632,233,2 54,222,1168,889,39,12,1716,200,2 26,97,304,254,5,1,687,51,2 28,79,748,585,38,12,1398,163,2 23,109,524,399,48,0,986,147,3 21,152,817,723,17,7,1041,120,3 26,47,226,181,10,2,457,53,3 40,102,654,533,29,0,949,106,3 22,117,324,298,4,1,382,49,3 30,136,428,350,21,1,1043,83,3 35,284,1335,1251,41,1,1267,184,3 28,40,421,286,28,0,726,111,3 85,446,3094,2637,225,3,2758,637,4 29,131,515,457,22,0,445,83,4 21,31,95,56,7,0,173,26,4 31,148,422,350,23,0,619,76,4 41,240,807,715,28,2,1014,135,4 29,116,361,314,8,2,506,56,4 31,130,375,283,25,0,515,77,5 30,198,959,863,29,6,992,142,5 36,132,444,326,16,0,447,84,5 28,164,1084,892,43,0,1903,246,5 42,147,686,584,28,5,1057,112,5 18,143,458,402,19,0,743,77,5 31,181,738,669,25,1,843,104,5 27,154,1577,1308,94,6,2111,294,6 29,92,838,693,42,0,1460,185,6 38,189,1076,925,35,2,1479,169,6 32,212,854,741,44,0,896,170,6 24,420,3614,3243,7,0,2348,546,7 46,351,1472,1365,58,3,1571,233,8 62,386,5801,5029,221,42,6795,866,8 56,781,5231,4706,343,10,6028,957,8 25,150,871,704,16,8,1374,148,9 43,308,2237,1797,109,15,2909,485,10 31,122,410,315,19,0,899,69,10 73,409,3325,2754,140,32,3329,671,10 57,547,2204,2011,78,7,2491,322,11 45,185,965,800,33,13,1912,179,12 50,435,1811,1676,63,23,1665,268,12 46,418,2647,2301,115,32,2042,365,13 68,345,1997,1657,87,19,2228,396,14 54,319,2238,1798,126,33,2579,463,15 32,303,1085,990,34,4,1323,161,16 61,453,2364,2023,74,58,3374,367,20 42,318,1715,1477,52,17,3336,300,22 66,1124,8606,7736,448,31,9163,1412,29 63,633,4180,3748,145,30,3991,607,29
Data Set test.arff
@relation TEST
@attribute NUMUORS real @attribute NUMUANDS real @attribute TOTOTORS real @attribute TOTOPANDS real @attribute VG real @attribute NLOGIC real @attribute LOC real @attribute ELOC real @attribute FAULTS real
@data
6,12,127,45,10,0,641,55,0 5,5,41,12,1,0,407,17,0 23,28,95,66,4,2,241,20,0 5,5,35,20,1,0,254,14,0 6,10,43,26,1,0,264,17,0 3,6,25,6,1,0,279,13,0 15,21,47,32,5,1,122,12,0 6,11,155,96,1,0,915,58,0 36,159,1480,1275,41,1,1704,203,0 17,62,121,108,5,0,200,21,0 25,27,109,75,4,2,285,24,0 40,77,488,360,30,5,498,99,0 6,5,41,24,1,0,303,16,0 24,18,172,100,11,2,422,52,0 13,16,40,33,5,0,136,9,0 32,68,320,253,12,7,437,60,0 10,11,36,24,3,0,158,13,0 14,29,52,42,2,0,123,9,0 15,43,91,72,3,0,355,21,0 6,5,41,24,1,0,303,16,0 28,131,440,365,30,0,447,79,0 16,21,88,50,6,0,396,29,0 22,37,115,95,8,0,147,21,0 34,25,306,183,2,0,1170,96,0 17,65,126,113,5,0,182,21,0 12,13,54,34,6,0,336,20,0 23,96,189,163,7,0,284,31,0 18,50,118,103,5,0,195,21,0 13,20,76,53,3,0,300,23,0 21,80,159,138,5,0,297,28,0 4,9,15,12,1,0,128,6,0 6,15,111,72,13,0,248,38,0 4,9,27,24,1,0,174,10,0 31,186,860,766,25,0,1072,129,0 26,104,328,268,18,0,403,65,0 5,2,10,4,1,0,127,5,0 4,3,14,4,1,0,140,7,0 19,51,237,201,4,0,344,38,0 35,138,800,590,35,0,1656,149,0 21,91,186,165,5,0,337,30,0 27,103,356,307,37,0,445,75,0 18,95,238,217,18,0,343,42,0 24,95,357,275,38,1,443,70,0 30,99,634,510,10,0,866,84,0 4,11,62,45,1,0,404,24,0 8,12,155,90,7,0,556,59,0 7,5,16,8,2,0,170,7,0 17,65,125,112,5,0,208,21,0 20,81,221,184,15,2,427,46,1 21,59,425,356,9,0,427,80,1 8,8,33,14,1,0,216,13,1 17,71,345,283,19,0,812,87,1 10,13,44,30,2,0,194,14,1 13,23,150,81,1,1,675,48,1 33,66,221,166,14,4,531,46,1 18,30,197,158,15,1,362,46,1 18,17,95,57,6,1,329,27,1 19,40,107,75,8,0,276,27,1 1,1,4,1,1,0,23,2,1 25,49,146,102,13,0,529,45,1 4,2,6,2,1,0,91,3,1 4,2,6,2,1,0,96,3,1 6,11,66,42,8,0,177,23,1 23,86,173,151,7,0,293,31,1 21,123,595,437,85,0,719,125,1 32,145,831,733,22,1,1023,150,1 28,173,632,581,86,0,763,135,2 37,188,1454,1300,71,0,1189,244,2 34,135,1116,889,52,4,2108,246,2 43,293,1413,1269,56,7,1425,217,2 32,111,747,544,69,0,1103,199,2 37,111,446,342,32,1,831,116,2 42,142,793,635,53,1,1095,169,3 25,39,349,278,16,0,674,67,3 6,6,31,12,4,0,165,13,3 30,149,627,469,28,0,1152,115,4 35,149,675,527,34,0,894,140,4 32,166,819,704,69,4,1302,149,4 40,179,1477,1285,50,1,1775,248,4 21,87,240,218,7,0,296,32,5 42,276,3205,2852,95,25,3302,339,5 48,172,821,622,70,12,874,174,5 31,89,431,339,26,6,503,86,6 31,408,2553,2281,31,0,2580,315,6 52,313,1170,1038,47,1,1171,211,7 88,450,2189,1703,127,23,2927,450,8 29,141,363,307,19,1,727,80,9 44,241,1239,1044,69,2,1448,225,10 36,297,1234,1106,31,0,1066,137,12 49,375,1645,1496,57,21,1404,242,12 52,208,1134,814,58,2,1540,204,15 55,363,2378,1933,109,20,2680,437,19 53,430,3063,2668,108,23,2954,429,25 72,737,5163,4603,169,25,4530,665,42
References & Resources
- References:
- A number of pertinent research papers related to the coursework are now available under the References page. Students are encouraged to access and study these papers in order to follow the material being covered or to be covered later in the course.
- WEKA Tool:
- WEKA, developed by researchers at University of Waikato, New Zealand, stands for Waikato Environment for Knowledge Analysis.
- Weka is an open source software issued under the GNU General Public License, and could be downloaded for free from http://www.cs.waikato.ac.nz/ml/weka/index.html.
- For your projects, you should download the file weka-3-2-3jre.exe (11,305,621 bytes, including the Java Runtime Environment), which could be found under "Note for Windows users" heading on the above mentioned page.
- You could also refer to the Weka Tutorial if you need assistance with the tool.
- Datasets:
- Weka requires that the datasets to be used be in ".arff" format. We have two datasets, namely Fit dataset (FIT.arff), for building the models, and Test dataset to evaluate their performance on the fresh data. The data sets are from a project labeled as CCCS. You could download them using the following links:
- Fit dataset
- Test dataset
- The metrics of the CCCS data set are listed below:
- Number of unique operators (NUMUORS)
- Number of unique operands (NUMUANDS)
- Total number of operators (TOTOTORS)
- Total number of operands (TOTOPANDS)
- Mc Cabe's cyclomatic complexity (VG)
- Number of logical operators (NLOGIC)
- Lines of code (LOC)
- Executable line of code (ELOC)
- Number of faults (FAULTS)
- Weka requires that the datasets to be used be in ".arff" format. We have two datasets, namely Fit dataset (FIT.arff), for building the models, and Test dataset to evaluate their performance on the fresh data. The data sets are from a project labeled as CCCS. You could download them using the following links:
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started