Databricks Certified Professional Data Scientist 시험 - Databricks실제시험문제와 답 - 138문항

Question No : 1

You recommend a movie with three stars but the user loves it (he'd rate it five stars).
So which statement correctly applies?

A.In both cases, the contribution to the RMSE is the same
B.In both cases, the contribution to the RMSE is the different
C.In both cases, the contribution to the RMSE, could varies
D.None of the above

정답:

Question No : 2

Suppose that the probability that a pedestrian will be tul by a car while crossing the toad at a pedestrian crossing without paying attention to the traffic light is lo be computed. Let H be a discrete random variable taking one value from (Hit. Not Hit). Let L be a discrete random variable taking one value from (Red. Yellow. Green).
Realistically, H will be dependent on L That is, P(H = Hit) and P(H = Not Hit) will take different values depending on whether L is red, yellow or green. A person is. for example, far more likely to be hit by a car when trying to cross while Hie lights for cross traffic are green than if they are red In other words, for any given possible pair of values for Hand L. one must consider the joint probability distribution of H and L to find the probability* of that pair of events occurring together if Hie pedestrian ignores the state of the light Here is a table showing the conditional probabilities of being bit. defending on ibe stale of the lights (Note that the columns in this table must add up to 1 because the probability of being hit oi not hit is 1 regardless of the stale of the light.)

A.The marginal probability P(H=Hit) is the sum along the H=Hit row of this joint distribution table, as this is the probability of being hit when the lights are red OR yellow OR green.
B.marginal probability that P(H=Not Hit) is the sum of the H=Not Hit row
C.marginal probability that P(H=Not Hit) is the sum of the H= Hit row

정답:
Explanation:
The marginal probability P(H=Hit) is the sum along the H=Hit row of this joint distribution table, as this is the probability of being hit when the lights are red OR yellow OR green. Similarly, the marginal probability that P(H=Not Hit) is the sum of the H=Not Hit row

Question No : 3

A fruit may be considered to be an apple if it is red, round, and about 3" in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of the

A.Presence of the other features.
B.Absence of the other features.
C.Presence or absence of the other features
D.None of the above

정답:
Explanation:
In simple terms, a naive Bayes classifier assumes that the value of a particular feature is unrelated to the presence or absence of any other feature, given the class variable. For example, a fruit may be considered to be an apple if it is red, round, and about 3" in diameter A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of the presence or absence of the other features.

Question No : 4

Refer to image below

A.Option A
B.Option B
C.Option C
D.Option D

정답:
Explanation:

Text
Description automatically generated

Question No : 5

Select the choice where Regression algorithms are not best fit

A.When the dimension of the object given
B.Weight of the person is given
C.Temperature in the atmosphere
D.Employee status

정답:
Explanation:
Regression algorithms are usually employed when the data points are inherently numerical variables (such as the dimensions of an object the weight of a person, or the temperature in the atmosphere) but unlike Bayesian algorithms, they're not very good for categorical data (such as employee status or credit score description).

Question No : 6

What describes a true limitation of Logistic Regression method?

A.It does not handle redundant variables well.
B.It does not handle missing values well.
C.It does not handle correlated variables well.
D.It does not have explanatory values.

정답:

Question No : 7

Question-34. Stories appear in the front page of Digg as they are "voted up" (rated positively) by the community. As the community becomes larger and more diverse, the promoted stories can better reflect the average interest of the community members.
Which of the following technique is used to make such recommendation engine?

A.Naive Bayes classifier
B.Collaborative filtering
C.Logistic Regression
D.Content-based filtering

정답:
Explanation:
One scenario of collaborative filtering application is to recommend interesting or popular information as judged by the community. As a typical example, stories appear in the front page of Digg as they are "voted up" (rated positively) by the community. As the community becomes larger and more diverse, the promoted stories can better reflect the average interest of the community members.

Question No : 8

A data scientist wants to predict the probability of death from heart disease based on three risk factors: age, gender, and blood cholesterol level.
What is the most appropriate method for this project?

A.Linear regression
B.K-means clustering
C.Logistic regression
D.Apriori algorithm

정답:
Explanation:
Logistic regression is used widely in many fields, including the medical and social sciences. For example, the Trauma and Injury Severity Score (TRISS), which is widely used to predict mortality in injured patients, was originally developed by Boyd et al. using logistic regression. Many other medical scales used to assess severity of a patient have been developed using logistic regression. Logistic regression may be used to predict whether a patient has a given disease (e.g. diabetes; coronary heart disease), based on observed characteristics of the patient (age, sex, body mass index, results of various blood tests, etc.; age, blood cholesterol level, systolic blood pressure, relative weight, blood hemoglobin level, smoking (at 3 levels), and abnormal electrocardiogram.).Another example might be to predict whether an American voter will vote Democratic or Republican, based on age, income, sex, race, state of residence, votes in previous elections, etc. The technique can also be used in engineering, especially for predicting the probability of failure of a given process, system or product. It is also used in marketing applications such as prediction of a customer's propensity to purchase a product or halt a subscription, etc.[citation needed] In economics it can be used to predict the likelihood of a person's choosing to be in the labor force, and a business application would be to predict the likelihood of a homeowner defaulting on a mortgage. Conditional random fields, an extension of logistic regression to sequential data, are used in natural language processing.

Question No : 9

Select the correct statement which applies to Supervised learning

A.We asks the machine to learn from our data when we specify a target variable.
B.Lesser machine's task to only divining some pattern from the input data to get the target variable
C.Instead of telling the machine Predict Y for our data X, we're asking What can you tell me about X?

정답:
Explanation:
Supervised learning asks the machine to learn from our data when we specify a target variable.
This reduces the machine's task to only divining some pattern from the input data to get the target variable.
In unsupervised learning we don't have a target variable as we did in classification and regression.
Instead of telling the machine Predict Y for our data X> we're asking What can you tell me about X?
Things we ask the machine to tell us about X may be What are the six best groups we can make out of X? or What three features occur together most frequently in X?

Question No : 10

Which of the following statement true with regards to Linear Regression Model?

A.Ordinary Least Square can be used to estimates the parameters in linear model
B.In Linear model, it tries to find multiple lines which can approximate the relationship between the outcome and input variables.
C.Ordinary Least Square is a sum of the individual distance between each point and the fitted line of regression model.
D.Ordinary Least Square is a sum of the squared individual distance between each point and the fitted line of regression model.

정답:
Explanation:
Linear regression model are represented using the below equation

Where B(0) is intercept and B(1) is a slope. As B(0) and B(1) changes then fitted line also shifts accordingly on the plot. The purpose of the Ordinary Least Square method is to estimates these parameters B(0) and B(1). And similarly it is a sum of squared distance between the observed point and the fitted line. Ordinary least squares (OLS) regression minimizes the sum of the squared residuals. A model fits the data well if the differences between the observed values and the model's predicted values are small and unbiased.

Question No : 11

A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The response variable, admit/don't admit, is a binary variable.
Above is an example of

A.Linear Regression
B.Logistic Regression
C.Recommendation system
D.Maximum likelihood estimation
E.Hierarchical linear models

정답:
Explanation:
Logistic regression
Pros: Computationally inexpensive, easy to implement, knowledge representation easy to interpret
Cons: Prone to underfitting, may have low accuracy Works with: Numeric values, nominal values

Question No : 12

A problem statement is given as below
Hospital records show that of patients suffering from a certain disease, 75% die of it.
What is the probability that of 6 randomly selected patients, 4 will recover?
Which of the following model will you use to solve it?

A.Binomial
B.Poisson
C.Normal
D.Any of the above

정답:

Question No : 13

Which of the following true with regards to the K-Means clustering algorithm?

A.Labels are not pre-assigned to each objects in the cluster.
B.Labels are pre-assigned to each objects in the cluster.
C.It classify the data based on the labels.
D.It discovers the center of each cluster.
E.It find each objects fall in which particular cluster

정답:
Explanation:
Clustering does not require any predefined labels on the object, rather it consider the attributes on the object. Hence, option-B is out. Clustering is different than classification technique.
Hence you can discard the option-C as well. It does not use the pre-defined labels, hence it is called unsupervised learning and option-Ais correct. Main purpose of the Clustering technique is to determine the center of each Cluster and then find the distance from that center. If object is near the center than it would fall in that particular cluster. Hence, finally you will have group or clusters created and get to know that objects fall in which particular cluster.

Question No : 14

Which of the following is a correct example of the target variable in regression (supervised learning)?

A.Nominal values like true, false
B.Reptile, fish, mammal, amphibian, plant, fungi
C.Infinite number of numeric values, such as 0.100, 42.001, 1000.743..
D.All of the above

정답:
Explanation:
We address two cases of the target variable. The first case occurs when the target variable can take only nominal values: true or false; reptile, fish: mammal, amphibian, plant, fungi. The second case of classification occurs when the target variable can take an infinite number of numeric values, such as 0.100, 42.001, 1000.743, .... This case is called regression.

Question No : 15

What is the probability that the total of two dice will be greater than 8, given that the first die is a 6?

A.1/3
B.2/3
C.1/6
D.2/6

정답:

Databricks Databricks Certified Professional Data Scientist 시험