Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

data mining subject 1- summary the artical 2-what is data size 3- recoreds applied 4-what techqinecs is used 5- explain resualts EMPIRICAL STUDY ON SELECTION

data mining subject
1- summary the artical
2-what is data size
3- recoreds applied
4-what techqinecs is used
5- explain resualts image text in transcribed
image text in transcribed
image text in transcribed
image text in transcribed
image text in transcribed
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS - DATA MINING APPROACH SANGITA GUPTA', SUMA. V, 'Jain Univeraly, Rusgalere 'Dayanada Sagar lastitute, Elangalore, India Abstract- Ose of the essential roquisites of any software industy is the developenent of custemet satisfied products. However, accoliplishing the aforssaid busines dejective dependi upon the depth of quality of prodact that is enginsered in the organization. Thus, geseration of high quality depends upon procesh, which is in tum depends upon the pecgle. Existing mennario in If industries derwands a zequirement for deploying the right personel for achieving desipable quality in the prodect through the existing proeess. The goal of this paper is to idenify the criteria which will be used in induatrial practice to select memben of a software preject tean, and to look for relationships between thee criteris and project success. Using senaistructurtd interviews and eqalitative metbods for dita arahois and synthesis, a set of team buikling criteria was identified from preject managers in industry. The fiadings show that the conkistent wes of the set of critcria carrelated significantly winh preject suevess, and the criteria related so hunan factors peresent strong corselatices with sotware quality and therdby progect sueces. This Lnowledge enables decision makiag for progect managen ia allocation of righn pencentel to realiac desired level 1. INTRODUCTION purpose of knowlodge infrastructure for project management is to provide information from past The main objective of any sottware company is to experience of the organization to improve the provide quality sottware to its eustomers. The best of execution of new projects. To achieve this objective, software system is bound to fail withoat the right the knowledge infrastructure bas to compile and people working on it. One of the ways to achieve erganice empirical data which is present in the highest level of eoulity in software system is through systems and is available for use by project managers discovering knowledge for deployment of project [6]. Consequently, the key elenents of building personnel to a project by predicting their knowledge infrastructure are collecting and performanee. The knowledge is hidden arsong the organizing the knowledge, making it available data set and it is extractable through data mining through models, and reusing it to improve the techniques. Present paper is designed to justify the execution of projects, In softuare development the capabilities of data mining techniques in context of main components can be broadly classified as human noftware sucesss by offering a data mining model foe aspect-people and processes. Theugh proeesses have software companies to select the right personel for been well organized and developed, humsin aspect is their project. In thas research, the classification still at preliminary stages for study. Withoet deep method is used to evaluate peoject member's consideration into human aspect of software performance. By this task we extract knowledge that engineering even the best of processes will not give describes project member's' performance in the the desired quality, To acsomplish the above-sid current project. It helps earlier ideatification of objective of soltware quality, organizations are now parameters related to human component resulting in looking deeply into bunsan aspects using varieus better software quality and thereby project success. techniques-some of them non puramictric and soene Sottware Engineering is a discipline that aims at paransetric data mining methods. However data producing high quality software through systematic, mining techniques in sottware engineering have well-disciplined approach of software developmest. It proved to be importans tools for decision making of' involves methods, tools, bes practices and standards manstgement. The data collected require proper to achieve its objective. [5]. However software method of extracting knowledge from farge engiseering is not oely about tools and methods but repositories for better decision making. Knowledge also human aspect involved to work on it. Even the diseovery in databuses (KDD), often called data best of software system cansot be develoged without mining, aims at the discovery of useful information correct team memberk. Therefore Human Aspect in fron large eollections of data. The main fiuctions of Sotware enginecring wbich is an important basis for data mining are applying various methods and software quality beeds more understanding and algorithens in oeder to discover and extract patterns of deeper investigation. To achieve high quality stored data [2]. Data mining and knowledge soft ware, it is essential to extract knowledge from the discovery applications have got a rich focus due to its large dataset related to project members. The main significanoe in decision making and it has become an essential component in various erganizztions. Data Knowledge Discovery in Databuse, relers lo mining lechniques have been introduced into new extracting or mining" knowledge from large arsounts fields of Statistics. Databases, Machine Learsing and of data. The segacnces of seps identified in Pattern Recognition. There are increasing research extracting knowledge from data are shoen in Figuec interests in using data mining in every aspect of 1 technology, Data Mining, concerns with developing methods that discover knowledge from data originating from empirical environments [2]. Data Mining uses many techniques such as Decison Trees, Neural Nictworks, Naive Bayes, K-Nearest neighbour, and many cthers. Using these techmiques many kinds of knowledge can be discovered such as association rules, classifications and clastering. The discovered knowledge can be ased for prediction in diverse applications [2]. Section II has more references on applications. Finure 1-Research Masthodolopy The main objective of this paper is to toe data mining necthodologies to predict project members Various algorithms and techniques like Classification. 'performance for the purticular project. Data miniag Clustering. Regression, Artificial lotelligenoc, Neural provides many tasks that could be used to study the Netwoeks, Asociatice Rules, Decivion Treek. project member's performance. In this researth, the Genctic Algoridam. Nearest Neiphbour metbal etc. classifieation task is used to evaluate project are used for data mining process [2]. Our Techniqoes msember_s performance. There are many approuches and methods in data mining need brief mertion so that are used for data classification, the decision tree have better understinding. nocthod is used bere. Information lake college pereentile, experienes, domain knonledge Classification is the mos conmonly applied data assessment, commumication skills, reasoning skills, mining technique, which employs a set of pre: time efficiency ete was collected frum the project classified examples to develop a model thit can management system for prediction of performance for classify the population of recerds at large. This that project. Organization of the peper is as follows: approach frequently employs decision tree of acural Soctice II spocifies the related werk in the domains of netwoek-based classification algorithrms. The das mining. Section III povider research classifier-training alporithen uses these pre-clarificed methodology followed during this investigation. examples to determine the set of parameters roqairod Sectica IV presents research work and technique for proper discrimination. The algorithm then development details, Section V indicates the rosults encodes these parametern inso a model is culled a obrained by elassificatioe lechnique for effective classifier. The anthors will be using decision tree for project management. Section VI summarizes and their research woek. concludes the poper. 11. BACKGROUND AND RELATEDWORK Teprosents a choice between a number of alternutives, life. The increaxing demand of soteware has led to the take acticns. From this node, users split each node progress of continual research in the areas of quality tecursively asecrding to decision tree leaming assurance and effective project management]6, Data algorithm. The final result is a decision troe in which maining and pattern recognitice techniques have each branch represents a posible scenario of decisica proven as one of the established techniques for and its cutcons. Decisinn troe is troe-thapod effective project management. Data mining has been structures that represent sets of decisions. These used foe many aspects of sottware projects like defect decisicns generate rules for the classificative of a management, test analysis, code optimization etc[4]. dathset. Specific decision tree methods iaclade Authors in [8] have used data mining for Bug Reports Clessification and Regressice Trees (CART) and Chi Classification asing Ted Data Mining. Data mining Square Automatic lateracticn Detectice has also been used by authors in [9] for other domains (CHAID). The authoes in [1] have doac a coenparative like ofucational databases. The aubhors in [7] have study of the methods. The authors in [11] huve developed a data mining framework based on developed many decisioe tree algarithen Lke ID3 and decision tree and association rules to generate useful C5. The authors in [3] have investigated ant rules for personnel selectica asd retention based on incremental method for finding next node of the several attributes of employee for high lechnology decision tree. The Decision Trees algorithm is a industry. Data mining also popularly known as classification algerithm aned for predictive modeling Empirical Stady as Selectot of toam mambers bor aflere projecta - Dute anning Approach of malivariate attribuies. For discrete aftributes, the Atibute Selection Measure function (hecuristic) ca algorithen makes predicticns based on the existing C4.5 algarithm [10]. The drantuck of C4.5 relationships between input columns in a dataset. It heuristic function (Gain ratio) is that, if the split uses the values, known as states, of those columns to information approaches zero, the ratio becomes predict the states of a column that you designate as unstable. In proposed technique for split critsria we predictable. Spocifically, the algorithm identifles the have considered the maximum occurrences of cach input columns that are cocrelated with the predictable attribute value then calculating the average maximum colvann. The authors in [10] have used a Knowledge occurrences of combination of each category attribute based Decision Trees aigorithm which uses feature thas split information never reaches zero and gives selection so guide the selection of the most useful mofe importance to realistic attributes and aceurate attributes. In this stady we have develepod an results. The algorithen is as follows. algorithm to lind the attributes using incremental method according to their mapping with porformance. Stepl. Let D, the Data partition be a training set of Thereafer a decision troe was constructed based on class labelied tuples. Suppose the class label attribute derived knowledge. has m distinst valaes defining n distinct classes, Ci ( Data selection and transformation oIn this step only for i=1,2Nem ). Let Ci,D be the set of tuples of class those fielda were selected which were required for tuples in D and Ci,D respectively. Suppose attribute data mining A few derived sariables were selected. A on partion D having disinct values al,a2-m,av,as While some of the information for the variables was extracted from the databasc. All the predictor and III. RESEARCH METHODOLOGY response variables which were derived from the database are given in Table 1 for reference. The huge. This research focuses upon the selection of project data collected was thereby sampled and asalyzed. personnel using classification technique for cfiective Therefore, this work directed towards formsulatica of project managenent and thereby resulting in good bypochesis for selection criteria of project personed software quality. In order bo achieve the which has further impact on software quality of aforementioned objective, a deep investigation is sottware projects. Modes of data collection include carried out upon similar son cribical projects from iateractions with project developing team and human software industries to get the parameters for selecticen resource management. Empirical dea analysis of project persanel. includes application of decision tree techaiques to predict the efficiency of each project menber for Data Preparaticens - The training data set used in this further deployment. The obvervational resalts indicane stady was oteained from soffware companies of that toough most soliware companies lay a lot of Bangalore. Initially sire of the data is 40 . In this step emphasis on general percentile aggregate but oher data stored in different tables was joibed in a single fictors like domain specifie knowledge and roasoning table after joining proctss, erroos were removod. skills play significant role for best performance in the observed frem the training data. organization. The mast important factor which was analyzed was prograrmming skills. Hertby the Siep2. Calculate Atribute Selection Measuremant selection criteria have to be reframed giving more Function (ASMF) for that attribute. Sicps for weighage to aspects like programnang skills, depth calculating this finction is as followx: in domain knowledge and reasoniag skills rather than aggregate percentile. 2.1 No. occurrences of each attribule value. Data mining and panern recognition is gaining 2.20 ccurrences of each category attribute. popularity because of its potentials to enhance our understanding and identifyisg. extractiag and Step3. Compute average maximum occurrence for evaluating variables related to any process. By each atrribute which denotes thenASMF - ai'CID. means of this method of classification medbod on mulivariate attributes, it was found that the facters 3.1 Maximum occurrence of coenbination of each like project members 'programming skills, reasoning category and repeat Seep 2 knosiedge assessinent and cher attributes were is Maximum highly correlatod with the peoject member's performance rather than GPA. Step4. Then on the basis of sorted values of ASME IV. RESEARCH WORK we will divide the given traising set into subiets and move to another level of tree. PROPOSED TECHNIQUE In our research work, a Siep5. Then we will repeat the same steps on each data mining technique is weed which is bused on new subset itcratively and derive a decision tree V. EMPIRICAL. DATA ANALYSIS ISING DECISION TEEE ALGORITHM Data was collected from a sotware compuny in Bangalore, It was preprocossed and proparad fir analysis, It war subjected to data mining techaique and the related to the bypothesis. Data preparation is shown in Table 1- proyect personecl's attributes selected fire analyak and Table 2-data with valoes of the atributes mentioned in Table 1. TABLEE L. Sothware project personncl related The demain values for some of the variabies were defined for the present investigation as follows: All atrribetes marks are normelized out of 10GPA - Inc rovar sockinsu dy use angorinum can oe put in a Previous instisution marks DKA - Domain Knowlodge Assesment maric. A conflusion matrix is a bble that shows the Performance in domain knouledge assossnemt of the rewlts of the classification experiment. (aicid) is company. PS- progranming Aills results oteainad calculated by dividing the aunser of occurrences of by taking internal assessenent on the progranining ai in Cid. Bost actribate is soch that good mups to I, coesepte. CS-Communicetion skills rewalts obtained by perfiomance. Therefore the ideal matrix should be as seminar pecentation of employec. Semiar performance is evaluated ime foeir clases; Poer Presentation and coenmunicato skill is low, Averape - Either prescratice is fine ar Conmunicative all is averape. Good - Both presentation and Coenmunication skill is good, RS- Reasoning skills, Reasoning stills performunce GP - General Proficiency performance Overall performance from previcat project. TE - Time efficiency of employee. P-Performines The ASMF 3 ideully when all good will perform sood. all average will perform average and all poot will perform poor. The confusion matrix for GPA and The implementation of the above stady was done in PS is shown below. Table III denotes the ASMF TreePlan sottware also called DTREG and the results SCORE of all attributes. of classification by the author and the tool were matchiag. Table IV shows the results obtained from the sodtware tool. TABLE TV- IMPLEMENTATION RFSULT Therefore the ASMF (GPA) =3/8+6/18+5/15= 1.07 Similarly for PS Computing for all Attributes we get the following: TABLE III -ASMF SCORE FOR AII. ATTRIRIITFS Finiahod the analyso at 20-5un-2013 ta-106-17 Analvis ren tim:. 00000.17 The data set of 40 employee used in this stady which was obtained from soltware company in Blangalere was basis for our classification technique. The result Basod on this computation-Table III we can derive and rules obeained can classify project menbers iato decision tree with PS which has the highest ASMF as three classes of performance- good (should be root node and oeher attributes further down in their deployed), average (can be deployod with training) order. One classificaticn rule can be generated for and poor(should not be deployed). each path froen each terninal node to root node. Pruning tochnique was executed by rensoving nodes. Further soope of this research woek is asing other with less than desired number of objects and after tree classification techniques and domg a comparative pruning process we have the following rules: Study. We can experiment on different set of astributes and find the most promising selection RULE 1 if (PS-"O000") and (GPA-"GOOD" of criteria. "AVERAGE") and (BS-" GOOD" or "AVERAGE" and (DKA- "OOOD" of 'AVFRRAGE") and (CS-"GOODP" of VI. CONCLUSION AVERAGE.) then P=GOOD. RULE I If (PS= "AVERAGE") and (GPA-"AVERAGE" of In this paper, the classification task is used ow project "GOOD") and (RS-"GOOD" of "AVERAGE') mand member's database to predict the project menber's P=6000 there are many appecoches that are used for data RIZ. 3 If aPS=GOOD") and (GPA="AVFRCEGE" of classification, the decision tree method is used here. "POOK' and (RS- "AVLRAGE") and (DKA- "AVELAGEL" The resulting decision tree provides a representation land (CS- GOOOF or AVERAGE) then P-AVERAGE. of the concept those appoals to human bocause it renders the classification process self-evident. These Rta. A if pararserers were collected from the employer's "AVERACE:"land (DKA-AVEKAGE") and current project. It was noted that thoagh the GPA RULE. 5 Ir(PS-" "OOR") then P-POOR. skills, donain knowledge and reasaing skills played

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions