Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1 . Use the ML Practitioner Assessment project to answer this question. Which two of the following statements about the grade column in the schools

1.Use the ML Practitioner Assessment project to answer this question. Which two of the following statements about the "grade" column in the schools_data dataset are true?(so uma)The median is higher than the mean.It has a minimum value of 10 and a maximum value of 21.It has a mean of 10.56 and a standard deviation of 3.06(both values rounded down).It has 21 distinct values.2.What kind of columns can be used as inputs to the PCA card?( so uma)Only numerical variables.Both numerical and categorical variables.Any kind of variables (numerical, categorical, text, dates, vectors...).Only categorical variables.3.To use Spearman's correlation in the correlation matrix card, what are some necessary steps to first perform on ordinal variables in your data?( so uma)Map the categories to numbers, and treat the variables as numerical variables.Instead of Spearman's, you can compute Pearson's correlation matrix because it handles the ordinal variables as well.Drop them from your dataset, as DSS cannot compute Spearman's correlation matrix when you have ordinal variables.No special steps are ever required.4.You have a dataset about bank customers and credit cards defaults, and you want to build a ML model that will help you predict whether a new customer will default or not. What kind of ML task should you choose for this? ( so uma)RegressionClusteringTwo-class classificationMulticlass classification5.Use the ML Practitioner Assessment project to answer this question. Classification models establish a certain probability as a threshold. In the Evaluate recipe, if the model assigns a probability to a record that is below the threshold, then what will the prediction for that record be?( so uma)Unable to be determined1(grade repeated)Missing value0(grade not repeated)6.Which two of the following statements are true about the PCA card heatmap?( so uma)A blue color means the column has a negative relationship with the component.The darker a cell color, the weaker the relationship between the column and the principal component.A blue color means the column has a positive relationship with the component.The darker a cell color, the stronger the relationship between the column and the principal component.7.Which three of the following actions you can perform in the Design Tab of the Visual ML tool? (mais de uma)Process your features (cleaning, feature generation, etc).Choose how you want to split the data into the training set and testing set.Visualize the performance of your previous sessions to update your model.Customize the evaluation metric you want to optimize.8.Which of the following statements regarding adjustment methods is false? (so uma)Adjustment methods will adjust the value of the observed p-value, and compare it to the pre-defined significance level.Adjustment methods are used for statistical tests that test multiple hypotheses at the same time.Using adjustment methods can decrease the probability of making Type I errors (incorrectly rejecting the null hypothesis).Adjustment methods adjust the distribution of the population so that it resembles the normal distribution.9.Which of the following is not a possible way to categorize statistical tests?(so uma)Parametric tests vs. non-parametric tests.Location tests vs. distribution tests.1-sample tests vs.2-sample tests.Descriptive tests vs. inferential tests.10.When using a bivariate analysis card in Dataiku, which one of the following statements is true? (so uma)You can choose one or more columns as factors and choose one response column in order to see how the response varies across values of each factor.Bivariate analysis is not an option in the Statistics tab of a dataset.You can choose one column as a factor and one or more response columns in order to see how each response varies across values of the factor.DSS automatically selects all the possible pairs of factors and responses and runs the analysis.11. Use the ML Practitioner Assessment project to answer this question. Looking at the validation_scored dataset, the model achieves the highest correct prediction rate for which school? (so uma)RCGPLTMS12. Which of the following statements regarding p-values is false?(so suma)If a p-value is smaller than the significance level (alpha), this is evidence that the null hypothesis should be rejected.The p-value is the probability of observing a test statistic at least as extreme as the one computed from the sample, given that the null hypothesis is true.The p-value is the probability that the null hypothesis is true.The smaller the p-value, the more confidence you can have in rejecting the null hypothesis H0.13. Which three of the following things can you do with a model that has been deployed in the Flow? (mais de uma)Modify the model, by doing some further feature engineering for instance, in order to improve the model.Use the model to score an unlabeled dataset.Evaluate the model against a labeled test dataset.Package and deploy the model as an API to make real-time predictions.14. Use the ML Practitioner Assessment project to answer this question. Looking at the variable importance charts for the random forest model, which of the following features is the most important to the model? (so uma)schoolsupschool is RCgradetraveltime15. What feature of Dataiku can help detect differences in group-level performance? For example, for a model that predicts student outcomes, you want to analyze the difference in model performance between girls and boys. (so uma)Individual explanationsPartial DependenceSubpopulation AnalysisVariable importance16. When using a model to score a dataset, which three of the following options can be computed for the scored dataset? (mais de uma)The output prediction value.The individual explanations (ie the most important features) for each prediction.The ML model and hyperparameters that have been used for this prediction.The probability associated with each class, for a classification task.17. Use the ML Practitioner Assessment project to answer this question. Which school in the schools_data dataset has the highest median grade? (so uma)MSRCLTGP18.Which two of these statements are true regarding the active version of the model? (mais de uma)You can not retrain a model that has been activated.The active version of the model is always the most recently deployed model.You can deploy several versions of a model and roll back and activate a previous version of the model.The active version of the model is the version of the model used when running the Retrain, Score or Evaluate recipes.19.Use the ML Practitioner Assessment project to answer this question.In the confusion matrix for the Random forest model, if you increase the cut-off threshold, which of the following measures strictly increases?RecallPrecision20.Use the ML Practitioner Assessment project to answer this question. Which one of the following statements about the median grade, by number of past failures, in the schools_data dataset, is true? (so uma)It does not seem to have a trend here.It seems to have a trend: the fewer failures a student had in the past, the lower the median grade.It seems to have a trend: the fewer failures a student had in the past, the higher the median grade.The relationship has a U-shape: students with a moderate number of failures had a higher median grade, while students with very few or very many past failures had a lower median grade.21.Use the ML Practitioner Assessment project to answer this question.Let's say a subject matter expert has analyzed a chart showing the proportion of students who repeat a grade to those who don't when considering the following factors: students whose travel time is 2 or more hours and whose number of absences is at least 8 repeat a grade. The subject matter expert has asserted that the proportion of students the model will predict as "repeated" will likely be more than 20%.False. When analyzing schools_data, the percentage of repeated=1 is less than 20% of the records when traveltime is 2 or more hours and absences are equal to 8 or more.True. When analyzing schools_data, the percentage of repeated=1 is greater than 20% of the records when traveltime is 2 or more hours and absences are equal to 8 or more.22.After building a machine learning model, Dataiku generates a number of visualizations and statistical summaries to understand the model. Which of the following charts will be found in the model report for both classification and clustering tasks? (so uma)Variables importanceROC CurveConfusion matrixCluster profiles23.Use the ML Practitioner Assessment project to answer this question. Looking at the partial dependence plots for the studytime feature on the logistic regression model, which one of the following statements is true?(um so correta)It seems to have a trend: the more time a student studies, the lower the probability of repeating a grade.It seems to have a trend: the less time a student studies, the lower the probability of repeating a grade.It does not seem to have a trend here.The relationship has a U-shape: students who spend a moderate amount of time studying had a lower probability of repeating a grade, while students with very few or very many hours studying had a higher probability of repeating a grade.24.Use the ML Practitioner Assessment project to answer this question. In the deployed model, which two of the following features contributed the least to the prediction? (Mais de uma opcao)absencesWalcfailuresMedu25.What kind of insights does the Individual explanations tool provide? (so uma opcao)Statistics about the predictions, such as average and standard deviation.How the predictions vary depending on a specific feature.Which features were the most important in the prediction for a specific record.The number of records that were correctly predicted.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Concepts of Database Management

Authors: Philip J. Pratt, Mary Z. Last

8th edition

1285427106, 978-1285427102

More Books

Students also viewed these Databases questions

Question

What gives value to an audit?

Answered: 1 week ago

Question

=+Does it present new cocktails or review restaurants?

Answered: 1 week ago

Question

=+Is the message on-strategy?

Answered: 1 week ago