Get your Machine Learning Basics Right to Crack the Interviews

Introduction to machine learning

By implementing cutting-edge technologies such as artificial intelligence (AI) and machine learning, businesses are making information and services more accessible to consumers. These technologies are increasingly being adopted in various business sectors such as banking, finance, retail, manufacturing, and healthcare.

Some of the in-demand organizational roles adopting AI include data scientists, artificial intelligence engineers, machine learning engineers, and data analysts. Machine learning interviews require in-depth knowledge of machine learning concepts and algorithms, as well as rigorous preparation in technical and programming skills, so if you are applying for a position in this area, you should be aware of the machine learning questions your hiring manager might ask. Knowing the types of interview questions is essential.Such

To help streamline your efforts as you embrace this learning journey, we decided to launch a set of key ML questions you can expect to face during your interview. Each part consists of 10 questions, providing concise and focused content for each topic. In the first part, I decided to address meaningful questions related to machine learning and statistics. This will give you enough background and revision material before your next interview. The remaining sections address questions specific to deep learning, computer vision, NLP, time series analysis, and more.

So, if you’re ready to start your dream career in ML, continue reading below to refresh your memory and add new knowledge to your existing know-how.

1. What are the main types of machine learning algorithms?

Broadly speaking, ML algorithms can be classified into three main categories:

A. Supervised learning: These algorithms base their predictions on inferring functions based on labeled training data. That is, the target variable exists.

If the target variable is continuous, the usual choice of algorithms varies. Regression models (linear, quadratic, polynomial)

If the target variable is a categorical variable, recommended algorithms include Logistic Regression, Naive Bayes, KNN, SVM, Decision Trees, Boosting Algorithms, Random Forests, etc.

B. Unsupervised learning: These algorithms predict a target variable based on some pattern in a given data set. The data for this purpose have no dependent variable or label to predict. Algorithms that fall into this category include: Clustering algorithms, anomaly detection, latent space models, singular value decomposition, principal component analysis, etc.

C. Reinforcement learning: These algorithms use a trial-and-error based approach and learning is based on rewards received from previous actions.

Source: Expert Insights

2. How can you determine important variables from your dataset?

Various measures can be implemented to select the variables of interest from the dataset.

1. Identify and discard correlation values before finalizing the variables of interest

2. Select variables based on the p” value obtained from the hypothesis test

3.Forward, backward, gradual selection

4. Lasso Regression

5. Use Random Forest to Select Variables Based on Feature Importance Plots

6. You can choose top features based on the information you get from the set of features available

3. Describe covariance and correlation.

covariance Indicates the degree to which two random variables depend on each other. Higher numbers indicate higher dependence. Their values are in the range -∞ and +∞. The problem with covariance is that it is hard to compute without performing normalization over the entire dataset, and changing the scale of the data affects covariance.

correlation A statistical measure that determines how strongly two variables are related. Its values range from -1 to +1 and are scale independent.

Source: Expert Insights

4. What is the “P” value?

The P – value is used to determine the hypothesis test. The P-value indicates the minimum significance level at which the null hypothesis can be rejected. The lower the P – value, the more likely you are to reject the null hypothesis.

5. What are parametric and nonparametric models?

parametric model It is parameter bound and only knowledge of the model’s parameters is required to predict new data.

Nonparametric model An unlimited number of input parameters allows more flexibility in predicting new data. All it needs to know to provide predictions is the state of the data and the model parameters.

Tabular representation of differences between parametric and nonparametric models

6. What is the difference between the sigmoid function and the softmax function?

The sigmoid function is used for binary classification methods with only two output classes, while the softmax function is applied to multiclass methods. So it’s clear that the inputs and outputs of both parts are slightly different.

The sigmoid function takes only one input and outputs a single number representing the probability of belonging to class 1 or 2.

On the other hand, the softmax function is vectorized. That is, you receive a vector with as many entries as there are classes. The output vector contains the probabilities of belonging to that class.

Schematic diagram of the activation function, source: Nomidl

7. How can the normality of a dataset be determined?

The easiest way to determine normality is to plot the given data. However, some normality tests also exist:

Shapiro-Wilk test
Anderson-Darling test
Kolmogorov-Smirnov test
Martinez-Iglewicz test
D’Agostino skewness test

8. How can I choose the K value for the K-means clustering algorithm?

K values can be selected in two ways: direct method and statistical test method.

1. Direct method: Includes elbow and silhouette methods

2. Statistical test method: Contains gap statistics

The silhouette method is most frequently used to determine the optimal K value.

9. How can I handle outliers in my dataset?

Outliers are data points that differ significantly from the rest of the dataset. Approaches that can be used to find outliers include box plots, z-scores, and scatterplots.

You can usually handle outliers with the following strategies:

1. The easiest way is to remove outliers

2. They are marked as outliers separately and can be used as different feature vectors

3. You can transform the features instead to reduce the influence of outliers

10. Explain the difference between a loss function and a cost function.

The term loss function can be used when dealing with a single data point, while the term cost function can be used when calculating the sum of the errors of multiple data. So intuitively both terms mean the same thing and there is not much difference between them. So the loss function captures the difference between the actual and predicted values for a single data point, while the cost function sums the differences across the training data.

Machine learning summary

In the first part of this series, we brushed up on the fundamental problems of machine learning that you can expect to face. Having these thorough guys to boost your preparation. In summary, the main points of this article are:

Different categories of machine learning – how and by what criteria can we classify into supervised, unsupervised and reinforcement learning.
We then dealt with how to determine various essential features of data, how to find correlations and covariances, and how to extract important and meaningful inferences from such data. We talked about p-values and LASSO regression, but
Next, we discussed parametric and nonparametric models
The main differences between sigmoid and softmax activation functions were discussed next.
We then discussed the important steps of data normalization and different ways to do it.
We next discussed outliers, another important factor that affects model performance, and detailed different ways to handle them.
And finally, we discussed the difference between a cost function and a loss function. These are the two most common terms you may have used while developing your ML model.

These basic questions should be a great primer to build on in the next few blogs. Please look forward to future parts.

Part 2 of this series covered essential aspects of deep learning and DL. We hope this article adds value to your existing technical know-how on machine learning!

Media shown in this article are not owned by Analytics Vidhya and are used at the author’s discretion.

Get your Machine Learning Basics Right to Crack the Interviews

Introduction to machine learning

1. What are the main types of machine learning algorithms?

2. How can you determine important variables from your dataset?

3. Describe covariance and correlation.

4. What is the “P” value?

5. What are parametric and nonparametric models?

6. What is the difference between the sigmoid function and the softmax function?

7. How can the normality of a dataset be determined?

8. How can I choose the K value for the K-means clustering algorithm?

9. How can I handle outliers in my dataset?

10. Explain the difference between a loss function and a cost function.

Machine learning summary

Related

Top Customer Analytics Interview Questions

Book Your Seats Now For Upcoming DataHour Session(s)

You may also like

Leave a Comment Cancel Reply

About Us

Recent Articles

Featured