Glimpse to Machine Learning
An introduction about the steps involved in Machine Learning
Hey guys! Thanks for the overwhelming support with your detailed feedback and motivation for the past blogs, especially Machine Learning. Keep supporting me, encourage me and correct me if I am wrong in any of my understanding about the topics!! It’s not about one side knowledge sharing, it’s like I am viewing (understanding) an object (here it is Machine Learning) from my side, it may be correct (my level of observation) or may not be that much accurate (as per your expectation) too, so lets together learn. Learning always makes us to grow wisely.
REASON for blogging....
I came across different blogs, videos, and books and from different websites for each different level of steps or codes in Machine Learning to make myself clear. When exploring, I questioned or think myself (being lazy literally) like, for any concept we need to explore so many web links, googling, bla bla and all! And What if, to my understanding, if I see all the concepts and coding part in one place, like a notes that we write for each lessons in our schooling! This motivates me to write the notes / blogging! (Again it’s to my level of understanding only not THE best kind of notes, kind of E-notes for me, may be useful for someone like me lazy)
Why Still Machine Learning?!...
Secondly, many can ask a question like when Deep Learning plays vital role in recent applications. Why still Machine Learning? Simple answer is each technology has its own potential / advantage which cannot be replaced by any other, like Machine Learning can be accessed by small amount of dataset for better efficiency but in deep learning you need more dataset or else over-fitting (if u read same some 10 questions again and again u can answer it, if anything asked apart from the question u can’t answer well) problem will come, then another
execution time is faster than Deep Learning depending on the dataset, interpretability of algorithms will be easy to decode and hardware complexity is not there like GPU (Graphics Processing Unit).
Glimpse about Machine Learning.....
In this blog, we are going to see the glimpse about the overall steps that are taking place in Machine Learning. Many websites and people share many different road-maps for writing the Machine Learning codes. It may be differ slightly with their own usage and applications. As a beginner level many expect to learn some basic understanding both in programming level and in layman level of conceptual understanding, but mostly we cant get all in same place. In the upcoming blogs, will explain deeply about each topics that takes place in Machine Learning. But here we are making blue prints (table of content kind) of Machine Learning, before making construction (understanding of each topics in a flow) or we can put like this way, cracking the complex structure into simple (to my best I hope) to make the steps clear understanding and its importance for using it.
“In preparing for battle I have always found that plans are useless, but planning is indispensable”
To develop Machine Learning we need to decide on three basics before coding stuff,
Languages : You can choose anyone from these languages (Python, R, MATLAB, Octave, Julia, C++, C) each has its own advantage
Integrated development environment (IDE's) : Again its up-to the users interest (R studio, Pycharm, iPython / Jupyter Notebook, Julia, Spyder, Anaconda, Rodeo, Google - Colab), with IDE's the coding will be fancy and easier too
Platform : for Machine Learning application to deploy (IBM, Microsoft Azure, Google cloud, amazon, Mlflow)
The following are the steps in Machine Learning that can be deeply discuss in the upcoming blogs,
Data Collection or Acquisition:
Very first step for programming, getting information (data), is that for academic or for research related useful for the society.
If we are using academic purpose means, there are many websites, where different datasets are already stored for the practice purpose,
another type is research for social problems (like healthcare domain for diagnosis purpose), for that we will be using some sensors or devices like ECG, PPG or any-other medical device for data acquisition then will do some signal processing then will get the dataset.
There is no life without oxygen, similarly there wont be any program (w.r.t Machine Learning) without libraries, which means, in this technical world you cant (it means you can still write, but still you can concentrate for other functionalities or operations rather than concentrate for that main part of program) write 1000's of lines of code for small application or operations. There are so many libraries which overcomes the above problem and replaces 100's and 1000's of lines into very few lines.
Loading Datasets (.csv, .json, .xlsx, .xml, .docx, .txt, .pdf, images (.png, .jpg), .mp3, .mp4 )
Nowadays usage of Machine Learning application is vast, the word DATA here we are using is enormous type, in simple words if we are using time series application the data will be in numeric type (.csv) format, suppose if we are using image processing application, the data will be in image type (.png, .jpg or others) format, and the list goes on...
Pre-processing and Exploratory Data Analysis (EDA)
Machine Learning is not about just algorithms alone, its an art of understanding the data. We need to understand the data before we process it. For easy understanding here pre-processing refers to make-up before a wedding (modelling in Machine Learning), which means we need to analyse the data and do the needful before modelling of data, like the following conditions,
Viewing the data (.head(), .tail(), .shape, .columns)
Kind of data (.info())
Understanding basic level of mathematics (.describe())
Target / label details (.unique(), .value_counts) - for checking imbalanced classification or not
Checking and processing missing data (isnull().sum(), SimpleImputer)
Outliers detection (Univariate outlier: The Box-plot rule, Grubbs Test, Multivariate outlier: Mahalanobis Distance, Cook’s Distance etc... )
Skewness of data (log transformation, square root transformation, box-cox transformation) and Kurtosis of data
Correlation between features (different variables) (.corr())
Separation of datasets (Dependent and Independent variables)
Encoding categorical data for both dependent and independent variables (label encoder, OneHotEncoder, pd.get_dummies)
Exploratory Data Analysis (EDA) in short we can say like visual representation also apart from some mathematics, I am biggest fan of Sherlock Holmes movie. So EDA can be explained from movie point of view as, when a murder happens in a place, police officials brought a person with suspect mindset, for investigation purpose, before interrogating with him, our hero Sherlock Holmes starts telling the reason (he / she is the culprit or not, either way we can take it), he can tell a lot about the person by his sheer power of observation. In the same way, we can say a lot about data and relationship among the features before modelling and feeding the data to algorithms . Some of them are jotted below,
Relationship (Scatter plot, Marginal Histogram, Pair Plot, Heat Map)
Data over time (Line chart, Area chart, Stack area chart, Area chart unstacked)
Ranking (Vertical bar chart, Horizontal bar chart, Multi set bar chart, Stack bar, Lollipop chart)
Distribution (Histogram, Density curve with histogram, Density plot, Box plot, Strip plot, Violin plot, Population pyramid)
Comparison (Bubble chart, Bullet chart, Pie chart, Net pie chart, Donut chart, Tree map, Diverging bar, Choropleth map, Bubble map)
Splitting of Datasets
After done all the pre-processing and EDA, data will be ready for the next step. In this process, we are splitting the data into training, testing for the algorithmic kind. Whatever we do with the data, we need to check for the efficiency so we need another set of data for that, so that's why we are splitting the single dataset into training and testing with some percentage.
There is a famous quote , "Equality is not in regarding different things similarly, equality is in regarding different things differently", there will be 100's of meaning behind that dont bother about it, the reason I quoted here is we will be having different features (variables) with different variation among them, in order to process, we should make them same with reasonable range as shown above figure for better understanding.
Normalization (z score, min-max, scaling to unit length, logarithmic scale) and standardization
Feature transformation (Scaling (Minmax scaler, standardscaler, normalizer, robustscaler), Discretization, Binning
As the saying goes, "If you put garbage in, you will only get garbage to come out". Not all the features that we have in dataset will be useful for our algorithmic modelling some might be redundant or not useful for us, so w.r.t target / label variable, we need to eradicate the feature which is not useful, it's same kind of HR process, not all the candidate who appears for a interview, will be selected, he / she needs to undergo some process before the selection. Some of the process are,
Random forest classifier, chi-2, select from model, variance threshold, correlation threshold, Pearson’s correlation (heat_map), chi squared, ANOVA f value, maximal information coefficient (MIC)
wrapper based – forward search, backward search, recursive feature elimination (RFE)(rfe.support_, rfe.ranking_)
sequential feature selector (SFB, SBS, SFFS, SBFS)
Embedded methods – lasso regularization in linear regression, select k best in random forest, gradient boosting machine (GBM) (univariate and multi variate feature selection)
"Not everything that can be counted counts and not everything that counts can be counted"
If we are having many features in our dataset, without eliminating any features (if each feature has at least minimum level of usefulness) we need to process for the algorithmic step. Is there any way? yes, dimentionality reduction is the solution without eliminating the features and utilizing the all features with usefulness. Still confused, the main purpose of dimentionality reduction is, find a low dimensional representation of the data that retains as much information as possible. Some popular techniques are,
Principal Component Analysis (PCA)
Linear Discriminant Analysis (LDA)
After all the process takes on data, the main step is to check our data and classify or predict or for unsupervised according to the application.
Regression (Simple Regression, Multiple Linear Regression, Support Vector Regression, Decision tree, Random Forest, Logistic Regression, Ridge Regression, Lasso Regression, Polynomial Regression, Elastic net regression)
Classification (K nearest neighbor, Decision tree, Random Forest, Support Vector Machine, Naïve Bayes, Stochastic Gradient Descent, Logistic Regression)
Ensemble learning (Boosting, Bagging, Stacking)
Hyper parameter tuning – Algorithm parameter, Grid search: gridsearchCV / RandomSearchCV / Bayesian Hyperparameter Optimization
Cross validation - holdout method, LOOCV, k fold CV, stratified k fold CV, leave P out CV, CV for time series
Supervised learning (Classification and Regression algorithms) has almost same steps except algorithmic part. When we come to unsupervised learning, it will be useful when you want to explore data without any specific goals like predicting and classification. It infers patterns from a dataset without label or outcomes
Clustering (K Means clustering, K Medoids, Hierarchical clustering, Self Organizing Map)
Association Rule learning (Apriori, Eclat)
Another interesting type of learning is Reinforcement learning, which is totally different from supervised and unsupervised algorithms. It simply learn from experience,
Upper Confidence Bound
And finally the model that we created was verified by using some statistical calculation as follows, this makes or finalize whether our model is meaningful or not,
Regression - MAE, MSE, RMSE, R2, Adjusted R2, AIC, AICc, BIC, Mallows Cp Errors: MSPE, MAPE, (R)MSLE, MASE
Classification - (AUC ROC curve, accuracy_score, precision, recall, f1, confusion matrix (2 class and multi), sensitivity, specificity, logarithmic loss
Not accuracy metrics alone which gives justification of our model, each metrics has its own importance about the model we created, we need to think about all to analyse the application.
I hope i have jotted out most of the steps that can be processed in Machine Learning with some basic level of understandings, it may exclude or include some steps w.r.t to application specific. In the upcoming series will see the steps in some what deeper level of understanding with programming point of view. If I missed out any, please mention it so that I can make use of it in the upcoming blogs for the clarity and better understandings!
“A clear vision, backed by definite plans, gives you a tremendous feeling of confidence and personal power”
Happy Learning! Stay safe ! Stay happy!
"Never stop learning because life never stops teaching!"