Answered step by step
Verified Expert Solution
Question
1 Approved Answer
1 . Use the ML Practitioner Assessment project to answer this question. Which two of the following statements about the grade column in the schools
Use the ML Practitioner Assessment project to answer this question. Which two of the following statements about the "grade" column in the schoolsdata dataset are true?so umaThe median is higher than the mean.It has a minimum value of and a maximum value of It has a mean of and a standard deviation of both values rounded downIt has distinct valuesWhat kind of columns can be used as inputs to the PCA card? so umaOnly numerical variables.Both numerical and categorical variables.Any kind of variables numerical categorical, text, dates, vectors...Only categorical variablesTo use Spearman's correlation in the correlation matrix card, what are some necessary steps to first perform on ordinal variables in your data? so umaMap the categories to numbers, and treat the variables as numerical variables.Instead of Spearman's, you can compute Pearson's correlation matrix because it handles the ordinal variables as well.Drop them from your dataset, as DSS cannot compute Spearman's correlation matrix when you have ordinal variables.No special steps are ever requiredYou have a dataset about bank customers and credit cards defaults, and you want to build a ML model that will help you predict whether a new customer will default or not. What kind of ML task should you choose for this? so umaRegressionClusteringTwoclass classificationMulticlass classificationUse the ML Practitioner Assessment project to answer this question. Classification models establish a certain probability as a threshold. In the Evaluate recipe, if the model assigns a probability to a record that is below the threshold, then what will the prediction for that record be so umaUnable to be determinedgrade repeatedMissing valuegrade not repeatedWhich two of the following statements are true about the PCA card heatmap? so umaA blue color means the column has a negative relationship with the component.The darker a cell color, the weaker the relationship between the column and the principal component.A blue color means the column has a positive relationship with the component.The darker a cell color, the stronger the relationship between the column and the principal componentWhich three of the following actions you can perform in the Design Tab of the Visual ML tool? mais de umaProcess your features cleaning feature generation, etcChoose how you want to split the data into the training set and testing set.Visualize the performance of your previous sessions to update your model.Customize the evaluation metric you want to optimizeWhich of the following statements regarding adjustment methods is false? so umaAdjustment methods will adjust the value of the observed pvalue, and compare it to the predefined significance level.Adjustment methods are used for statistical tests that test multiple hypotheses at the same time.Using adjustment methods can decrease the probability of making Type I errors incorrectly rejecting the null hypothesisAdjustment methods adjust the distribution of the population so that it resembles the normal distributionWhich of the following is not a possible way to categorize statistical tests?so umaParametric tests vs nonparametric tests.Location tests vs distribution testssample tests vssample tests.Descriptive tests vs inferential testsWhen using a bivariate analysis card in Dataiku, which one of the following statements is true? so umaYou can choose one or more columns as factors and choose one response column in order to see how the response varies across values of each factor.Bivariate analysis is not an option in the Statistics tab of a dataset.You can choose one column as a factor and one or more response columns in order to see how each response varies across values of the factor.DSS automatically selects all the possible pairs of factors and responses and runs the analysis Use the ML Practitioner Assessment project to answer this question. Looking at the validationscored dataset, the model achieves the highest correct prediction rate for which school? so umaRCGPLTMS Which of the following statements regarding pvalues is false?so sumaIf a pvalue is smaller than the significance level alpha this is evidence that the null hypothesis should be rejected.The pvalue is the probability of observing a test statistic at least as extreme as the one computed from the sample, given that the null hypothesis is true.The pvalue is the probability that the null hypothesis is true.The smaller the pvalue, the more confidence you can have in rejecting the null hypothesis H Which three of the following things can you do with a model that has been deployed in the Flow? mais de umaModify the model, by doing some further feature engineering for instance, in order to improve the model.Use the model to score an unlabeled dataset.Evaluate the model against a labeled test dataset.Package and deploy the model as an API to make realtime predictions Use the ML Practitioner Assessment project to answer this question. Looking at the variable importance charts for the random forest model, which of the following features is the most important to the model? so umaschoolsupschool is RCgradetraveltime What feature of Dataiku can help detect differences in grouplevel performance? For example, for a model that predicts student outcomes, you want to analyze the difference in model performance between girls and boys. so umaIndividual explanationsPartial DependenceSubpopulation AnalysisVariable importance When using a model to score a dataset, which three of the following options can be computed for the scored dataset? mais de umaThe output prediction value.The individual explanations ie the most important features for each prediction.The ML model and hyperparameters that have been used for this prediction.The probability associated with each class, for a classification task Use the ML Practitioner Assessment project to answer this question. Which school in the schoolsdata dataset has the highest median grade? so umaMSRCLTGPWhich two of these statements are true regarding the active version of the model? mais de umaYou can not retrain a model that has been activated.The active version of the model is always the most recently deployed model.You can deploy several versions of a model and roll back and activate a previous version of the model.The active version of the model is the version of the model used when running the Retrain, Score or Evaluate recipesUse the ML Practitioner Assessment project to answer this question.In the confusion matrix for the Random forest model, if you increase the cutoff threshold, which of the following measures strictly increases?RecallPrecisionUse the ML Practitioner Assessment project to answer this question. Which one of the following statements about the median grade, by number of past failures, in the schoolsdata dataset, is true? so umaIt does not seem to have a trend here.It seems to have a trend: the fewer failures a student had in the past, the lower the median grade.It seems to have a trend: the fewer failures a student had in the past, the higher the median grade.The relationship has a Ushape: students with a moderate number of failures had a higher median grade, while students with very few or very many past failures had a lower median gradeUse the ML Practitioner Assessment project to answer this question.Let's say a subject matter expert has analyzed a chart showing the proportion of students who repeat a grade to those who don't when considering the following factors: students whose travel time is or more hours and whose number of absences is at least repeat a grade. The subject matter expert has asserted that the proportion of students the model will predict as "repeated" will likely be more than False. When analyzing schoolsdata, the percentage of repeated is less than of the records when traveltime is or more hours and absences are equal to or more.True. When analyzing schoolsdata, the percentage of repeated is greater than of the records when traveltime is or more hours and absences are equal to or moreAfter building a machine learning model, Dataiku generates a number of visualizations and statistical summaries to understand the model. Which of the following charts will be found in the model report for both classification and clustering tasks? so umaVariables importanceROC CurveConfusion matrixCluster profilesUse the ML Practitioner Assessment project to answer this question. Looking at the partial dependence plots for the studytime feature on the logistic regression model, which one of the following statements is true?um so corretaIt seems to have a trend: the more time a student studies, the lower the probability of repeating a grade.It seems to have a trend: the less time a student studies, the lower the probability of repeating a grade.It does not seem to have a trend here.The relationship has a Ushape: students who spend a moderate amount of time studying had a lower probability of repeating a grade, while students with very few or very many hours studying had a higher probability of repeating a gradeUse the ML Practitioner Assessment project to answer this question. In the deployed model, which two of the following features contributed the least to the prediction? Mais de uma opcaoabsencesWalcfailuresMeduWhat kind of insights does the Individual explanations tool provide? so uma opcaoStatistics about the predictions, such as average and standard deviation.How the predictions vary depending on a specific feature.Which features were the most important in the prediction for a specific record.The number of records that were correctly predicted.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started