Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js
Mohammed Arif Mazumder
Quiz by , created more than 1 year ago

Quiz on Knowledge Check, created by Mohammed Arif Mazumder on 04/25/2020.

6066
0
0
Mohammed Arif Mazumder
Created by Mohammed Arif Mazumder about 5 years ago
Rate this resource by clicking on the stars below:
1 2 3 4 5 (0)
Ratings (0)
0
0
0
0
0

0 comments

There are no comments, be the first and leave one below:

Close

Knowledge Check

Each question in this quiz is timed.

Begin Quiz

Question 1 of 15 Question 1 of 15

1

Which of these is the last step of an iteration within the CRISP-DM process?

Select one of the following:

  • Deployment

  • Evaluation

  • Modelling

  • Analysis

Explanation

Question 2 of 15 Question 2 of 15

1

If a model predicts the number of days a customer takes to come back to a company's website, which metric is the most adequate to assess the performance of the model?

Select one of the following:

  • Recall

  • Precision

  • Mean absolute error

  • Accuracy

Explanation

Question 3 of 15 Question 3 of 15

1

Machine Learning steps generally followed

1)

2)

3)

4)
5)

Drag and drop to complete the text.

    Model Deployment
    Preparing Data
    Build and Train Models
    Data Ingestion
    Monitoring Models

Explanation

Question 4 of 15 Question 4 of 15

1

A face recognition model for accessing a gym is posing issues because it denies access to too many customers. Why might this be?

Select one of the following:

  • The model is too sensitive and a higher threshold would solve the problem.

  • The model is too specific and a higher threshold would solve the problem.

  • The model is too sensitive and a lower threshold would solve the problem.

  • The model is too specific and a lower threshold would solve the problem.

Explanation

Question 5 of 15 Question 5 of 15

1

What is PII?

Select one of the following:

  • Personally Identifiable Index

  • Publically Incriminating Index

  • Publicly Identifiable Information

  • Personally Identifiable Information

Explanation

Question 6 of 15 Question 6 of 15

1

What is used to compare the predicted and actual values of a binary classification model?

Select one of the following:

  • R-squared value

  • BLEU score

  • Confusion matrix

  • Correlation coefficient

Explanation

Question 7 of 15 Question 7 of 15

1

A weather forecast model must update its predictions first thing in the morning, everyday. It is trained on daily historical data, available publicly, but that data is only refreshed at noon each day, so it is not available at the correct time necessary for updating the model's predictions. How could this problem be solved?

Select one of the following:

  • Replicate the dataset and run the predictions in production with that.

  • Do not use the dataset when running the model in production.

  • Train the model with different data because the model must make inferences using the same type of input data it saw during training.

  • Replicate the dataset and train the model with that.

Explanation

Question 8 of 15 Question 8 of 15

1

An online retail company wants to improve the speed at which it analyzes how users interact with its website. What is the most pressing architectural question you would address first?

Select one of the following:

  • Whether the data is being put into a data lake before analysis

  • Whether processes str in place to clean and preprocess data before storage

  • Whether the data ingestion processes are event-driven and real time, or nightly batch.

  • Whether business rules are applied on data in transit or in-situ

Explanation

Question 9 of 15 Question 9 of 15

1

Which module would you use to evaluate the performance of a binary classifier using scikit-learn?

Select one of the following:

  • sklearn.metrics.median_absolute_error

  • sklearn.metrics.auc

  • sklearn.metrics.mean_absolute_error

  • sklearn.metrics.r2_score

Explanation

Question 10 of 15 Question 10 of 15

1

After sampling 10,000 values of a random variable you observe that the mode, median, and mean are the same. What is the most likely variable distribution?

Select one of the following:

  • Logarithmic distribution

  • Uniform distribution

  • Normal distribution

  • Poisson distribution

Explanation

Question 11 of 15 Question 11 of 15

1

Which technique can be useful to handle highly imbalanced true/false labels?

Select one of the following:

  • Simple random sampling

  • Convenience sampling

  • Systematic sampling

  • Stratified sampling

Explanation

Question 12 of 15 Question 12 of 15

1

You have a dataset with Female and Male features. What will be the feature names when the code below is executed?

import pandas as pd





def azureml_main(dataframe1 = None, dataframe2 = None):
    pd.get_dummies(dataframe1)
    return dataframe1,

Select one of the following:

  • Female, Male

  • Female_Yes, Female_No, Male_Yes, Male_No

  • Female_Yes, Male_No

  • Female_No, Male_Yes

Explanation

Question 13 of 15 Question 13 of 15

1

You have a dataset that you want to use for your company's ML algorithm. It has 30 dimensions and you want to reduce the size to 3 dimensions to decrease memory usage and computation time. Which method should you choose?

Select one of the following:

  • t-Distributed Stochastic Neighbor Embedding

  • Linear Discriminant Analysis

  • Principal Component Analysis

  • K-means model stacking

Explanation

Question 14 of 15 Question 14 of 15

1

What is Kubernetes?

Select one of the following:

  • A serverless platform to build and manage your apps.

  • A proprietary platform built by Google and Docker to run and manage your applications.

  • An an open-source system to deploy, manage, and run Cloud Foundry apps.

  • A container orchestrator to provision, manage, and scale applications.

Explanation

Question 15 of 15 Question 15 of 15

1

Your team is building a data engineering and data science development environment.
The environment must support the following requirements:
✑ support Python and Scala
✑ compose data storage, movement, and processing services into automated data pipelines
✑ the same tool should be used for the orchestration of both data engineering and data science support workload isolation and interactive workloads
✑ enable scaling across a cluster of machines

You need to create the environment.
What should you do?

Select one of the following:

  • Build the environment in Apache Hive for HDInsight and use Azure Data Factory for orchestration.

  • Build the environment in Azure Databricks and use Azure Data Factory for orchestration.

  • Build the environment in Azure Databricks and use Azure Container Instances for orchestration.

  • Build the environment in Apache Spark for HDInsight and use Azure Container Instances for orchestration.

Explanation