 • Premanand S

# Q&A | Data Science | Metrics

Hey guys, in this blog, we will discuss about some Q&A sessions w.r.t metrics in data science,

How do you analyze the performance of the predictions generated by regression models versus classification models?

In the regression model, the most commonly known evaluation metrics include:

• R-squared (R2)

• Root Mean Squared Error (RMSE)

• Residual Standard Error (RSE)

• Mean Absolute Error (MAE)

Classification is the problem of identifying to which of a set of categories/classes a new observation belongs, based on the training set of data containing records whose class label is known. Following are the performance metrics used for evaluating a classification model:

• Accuracy

• Precision and Recall

• Specificity

• F1-score

• AUC-ROC

How do you assess logistic regression versus simple linear regression models?

• Linear Regression is used to handle regression problems whereas Logistic regression is used to handle classification problems.

• Linear regression provides a continuous output but Logistic regression provides discreet output. • The purpose of Linear Regression is to find the best-fitted line while Logistic regression is one step ahead and fitting the line values to the sigmoid curve.

• The method for calculating loss function in linear regression is the mean squared error whereas for logistic regression it is maximum likelihood estimation.

What is cross-validation and why would you use it?

Cross-validation is a statistical method used to estimate the performance (or accuracy) of machine learning models. It is used to protect against overfitting in a predictive model, particularly in a case where the amount of data may be limited. In cross-validation, you make a fixed number of folds (or partitions) of the data, run the analysis on each fold, and then average the overall error estimate.

What’s the name of the matrix used to evaluate predictive models?

Confusion Matrix: A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classifier. It can be used to evaluate the performance of a classification model through the calculation of performance metrics like accuracy, precision, recall, and F1-score.

What’s the F1 score? How would you use it?

The F1 score is a measure of a model’s performance. It is a weighted average of the precision and recall of a model, with results tending to 1 being the best, and those tending to 0 being the worst.

What is overfitting and how to prevent it?

Overfitting is an error where the model ‘fits’ the data too well, resulting in a model with high variance and low bias. As a consequence, an overfit model will inaccurately predict new data points even though it has high accuracy on the training data.

Few approaches to prevent overfitting are:

- Cross-Validation: Cross-validation is a powerful preventative measure against overfitting. Here we use our initial training data to generate multiple mini train-test splits. Now we use these splits to tune our model.

- Train with more data: It won’t work every time, but training with more data can help algorithms detect the signal better or it can help my model to understand general trends in particular.

- We can remove irrelevant information or noise from our dataset.

- Early Stopping: When you’re training a learning algorithm iteratively, you can measure how well each iteration of the model performs.

Up until a certain number of iterations, new iterations improve the model. After that point, however, the model’s ability to generalize can weaken as it begins to overfit the training data.

Early stopping refers to stopping the training process before the learner passes that point.

- Regularization: It refers to a broad range of techniques for artificially forcing your model to be simpler. There are mainly 3 types of Regularization techniques: L1, L2,&, Elastic- net.

- Ensembling: Here we take the number of learners and using these we get a strong model. They are of two types: Bagging and Boosting.

How will you define the number of clusters in a clustering algorithm?

By determining the Silhouette score and elbow method, we determine the number of clusters in the algorithm.

How can you select k for k-means?

The two methods to calculate the optimal value of k in k-means are:

Elbow method

Silhouette score method

Silhouette score is the most prevalent while determining the optimal value of k.

What is a ROC Curve? Explain how a ROC Curve works?

AUC – ROC curve is a performance measurement for the classification problem at various threshold settings. ROC is a probability curve and AUC represents the degree or measure of separability. It tells how much the model is capable of distinguishing between classes. Higher the AUC, the better the model is at predicting 0s as 0s and 1s as 1s.

What is the Confusion Matrix?

A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classifier. It is used to measure the performance of a classification model. It can be used to evaluate the performance of a classification model through the calculation of performance metrics like accuracy, precision, recall, and F1-score.

For example, in the case of a binary classifier, it predicts all data instances of a test dataset as either positive or negative. This produces four outcomes-

True positive(TP) — Correct positive prediction

False-positive(FP) — Incorrect positive prediction

True negative(TN) — Correct negative prediction

False-negative(FN) — Incorrect negative prediction

It helps in calculating various measures including error rate (FP+FN)/(P+N), specificity(TN/N), accuracy(TP+TN)/(P+N), sensitivity (TP/P), and precision( TP/(TP+FP) ).

A confusion matrix is essentially used to evaluate the performance of a machine learning model when the truth values of the experiments are already known and the target class has two or more two categories of data. It helps in the visualization and evaluation of the results of the statistical process.

When accuracy should not be used as a parameter to measure the performance of the Classification model?

There are 2 major situations when you will not use accuracy as a parameter for classification performance measures-

A) When you have a severely imbalanced dataset. If suppose you have a dataset with 90% as a positive class and 10% as a negative class, then even a dumb model can classify all the data points as positive class and attain the accuracy of 90%, which is not sensible.

B) Since accuracy does not consider probability values so if I have a threshold of 0.5 and my model M1 gives the probability value for a "x" as 0.9 and my second model gives the probability value of 0.55, both of them will be labeled as a positive class but we know the performance of M1 is much better than M2 when you see the actual probabilities.

What is the difference between R square and adjusted R square?

R square and adjusted R square values are used for model validation in the case of linear regression. R square indicates the variation of all the independent variables on the dependent variable. i.e. it considers all the independent variables to explain the variation. In the case of Adjusted R squared, it considers only significant variables(P values less than 0.05) to indicate the percentage of variation in the model.

Thus Adjusted R2 is always lesser than R2.

What do you understand about the true positive rate and the false-positive rate?

The True Positive Rate (TPR) defines the probability that an actual positive will turn out to be positive.

The True Positive Rate (TPR) is calculated by taking the ratio of the [True Positives (TP)] and [True Positive (TP) & False Negatives (FN) ].

Formula: TPR=TP/TP+FN

The False Positive Rate (FPR) defines the probability that an actual negative result will be shown as a positive one i.e the probability that a model will generate a false alarm.

The False Positive Rate (FPR) is calculated by taking the ratio of the [False Positives (FP)] and [True Negatives (TN) & False Positives(FP)].

Formula: FPR=FP/TN+FP

What is a ROC Curve? Explain how a ROC Curve works?

AUC – ROC curve is a performance measurement for the classification problem at various threshold settings. ROC is a probability curve and AUC represents the degree or measure of separability. It tells how much the model is capable of distinguishing between classes. Higher the AUC, the better the model is at predicting 0s as 0s and 1s as 1s.

What is the difference between recall and precision?

While calculating the Precision of a model, we should consider both Positive as well as Negative samples that are classified.

While calculating the Recall of a model, we only need all positive samples while all negative samples will be neglected.

Hence Precision quantifies the number of positive class predictions that actually belong to the positive class. Recall quantifies the number of positive class predictions made out of all positive examples in the dataset.

What do you understand by Recall and Precision?

Precision is defined as the fraction of relevant instances among all retrieved instances. Recall, sometimes referred to as ‘sensitivity, is the fraction of retrieved instances among all relevant instances. A perfect classifier has precision and recall both equal to 1.

Explain ROC curve.

A Receiver Operator Characteristic (ROC) curve is a graphical plot used to show the diagnostic ability of binary classifiers.A ROC curve is constructed by plotting the true positive rate (TPR) against the false positive rate (FPR). The true positive rate is the proportion of observations that were correctly predicted to be positive out of all positive observations (TP/(TP + FN)). Similarly, the false positive rate is the proportion of observations that are incorrectly predicted to be positive out of all negative observations (FP/(TN + FP)).

Why is mean square error a bad measure of model performance?

A disadvantage of the mean-squared error is that it is not very interpretable because MSEs vary depending on the prediction task and thus cannot be compared across different tasks. Assume, for example, that one prediction task is concerned with estimating the weight of trucks and another is concerned with estimating the weight of apples. Then, in the first task, a good model may have an RMSE of 100 kg, while a good model for the second task may have an RMSE of 0.5 kg. Therefore, while RMSE is viable for model selection, it is rarely reported and R2 is used instead.

What is the difference between squared error and absolute error?

Squared error is the squared difference between the predicted values and the actual value. Absolute Error is the difference between the measured value and true value. The squared error is differentiable everywhere, whereas the absolute error is not (its derivative is undefined at 0). This makes the squared error more susceptible to mathematical optimization strategies.

How will you evaluate the performance of a logistic regression model?

The confusion matrix can be used to evaluate a logistic regression model. The accuracy, sensitivity, and specificity of the model can be useful indicators of what you want to do with it - focusing on true positives or false negatives. We can also utilize precision and recall to evaluate your model, as well as the f1 score.

S𝐪𝐮𝐚𝐫𝐞𝐝 𝐞𝐫𝐫𝐨𝐫 𝐚𝐧𝐝 𝐚𝐛𝐬𝐨𝐥𝐮𝐭𝐞 𝐞𝐫𝐫𝐨𝐫?

Mean squared error (MSE), and mean absolute error (MAE) are used to evaluate the regression problem's accuracy. The squared error is everywhere differentiable, while the absolute error is not (its derivative is undefined at 0). This makes the squared error more amenable to the techniques of mathematical optimization.

Precision and Recall? How they are related to ROC curve?

The recall is the ratio of the relevant results returned by the search engine to the total number of relevant results that could have been returned. The precision is the proportion of relevant results in the list of all returned search results. When dealing with highly skewed datasets, Precision-Recall (PR) curves in give a more informative picture of an algorithm's performance. We show that a deep connection exists between ROC space and PR space, such that a curve dominates in ROC space if and only if it dominates in PR space.

What is F1 score?

The F1 score is defined as the harmonic mean of precision and recall. As a short reminder, the harmonic mean is an alternative metric for the more common arithmetic mean. It is often useful when computing an average rate. In the F1 score, we compute the average of precision and recall.

Write the formula to calculate R-square?

R^2 = 1 - (RSS/TSS) where RSS = sum of squares of residual and TSS = Total sum of squares

The Idea behind the precision-recall trade-off is that when a person changes the threshold for determining if a class is positive or negative it will tilt the scales. It means that it will cause precision to increase and recall to decrease, or vice versa.

Equation to calculate the precision and recall rate.?

Precision = True positives/ (True positives + False positives) = TP/ (TP + FP).

Recall = TruePositives / (TruePositives + FalseNegatives) = TP / (TP + FN).

Explain the difference between Variance and R squared error.

Considering this aspect in regression analysis, the variance is the mean squared error that measures the squared and thus, the summed difference between the actual values and the values predicted through the formed regression equation. R-squared error is completely different in concept as compared to the variance

What is the precision/recall ratio?

When it comes to the precision we're talking about the true positives over the true positives plus the false positives. As opposed to recall which is the number of true positives over the true positives and the false negatives.

RMSE and MSE?

MSE (Mean Squared Error) represents the difference between the original and predicted values which are extracted by squaring the average difference over the data set. It is a measure of how close a fitted line is to actual data points. The lesser the Mean Squared Error, the closer the fit is to the data set. The MSE has the units squared of whatever is plotted on the vertical axis. RMSE (Root Mean Squared Error) is the error rate by the square root of MSE. RMSE is the most easily interpreted statistic, as it has the same units as the quantity plotted on the vertical axis or Y-axis. RMSE can be directly interpreted in terms of measurement units, and hence it is a better measure of fit than a correlation coefficient.

Specificity?

Specificity (SP) is calculated as the number of correct negative predictions divided by the total number of negatives.

What is R2? What are some other metrics that could be better than R2 and why?

R-squared (R2) is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model. R-squared does not measure the goodness of fit. R-squared does not measure predictive error. R-squared does not allow you to compare models using transformed responses. R-squared does not measure how one variable explains another. Some better metrics that could be better than R2 are:

Mean Squared Error (MSE).

Root Mean Squared Error (RMSE).

Mean Absolute Error (MAE)

Which evaluation metric should you prefer to use for a dataset having a lot of outliers in it?

Mean Absolute Error(MAE) is preferred when we have too many outliers present in the dataset because MAE is robust to outliers whereas MSE and RMSE are very susceptible to outliers and these start penalizing the outliers by squaring the error terms.

I know that a linear regression model is generally evaluated using Adjusted R² or F value. How would you evaluate a logistic regression model?

We can use the following methods:

• Since logistic regression is used to predict probabilities, we can use AUC-ROC curve along with the confusion matrix to determine its performance.

• Also, the analogous metric of adjusted R² in logistic regression is AIC. AIC is the measure of fit which penalizes model for the number of model coefficients. Therefore, we always prefer model with minimum AIC value.

• Null Deviance indicates the response predicted by a model with nothing but an intercept. Lower the value, the better the model. Residual deviance indicates the response predicted by a model on adding independent variables. Lower the value, the better the model.

How would you evaluate a logistic regression model?

Model Evaluation is a very important part in any analysis to answer the following questions,

How well does the model fit the data? Which predictors are most important? Are the predictions accurate?

So, the following are the criterion to access the model performance,

1. Akaike Information Criteria (AIC): In simple terms, AIC estimates the relative amount of information lost by a given model. So, the less information lost the higher the quality of the model. Therefore, we always prefer models with minimum AIC.

2. Receiver operating characteristics (ROC curve): ROC curve illustrates the diagnostic ability of a binary classifier. It is calculated/ created by plotting True Positive against False Positive at various threshold settings. The performance metric of ROC curve is AUC (area under curve). Higher the area under the curve, better the prediction power of the model.

3. Confusion Matrix: In order to find out how well the model does in predicting the target variable, we use a confusion matrix/ classification rate. It is nothing but a tabular representation of actual Vs predicted values which helps us to find the accuracy of the model.