• Premanand S

Machine Learning - An Intuition behind the trending word!

Hey guys! Thanks for the support and motivation for the past blogs. Keep supporting me, encourage me and correct me if I am wrong in any of my views about the topics!! I hope from the last blog, Deep Learning concepts got cleared, the link also provides some insights and might be better for the beginners!


So, here in this post, will discuss about the Machine Learning (ML) (my actual area of interest). A bit bigger blog, but I bet you it wont waste your time. Some can think even like why still Machine Learning, when we are in the Deep Learning era as it is effective and advantage over Machine Learning, some simple reasons are, No need to work in the huge datasets, if the data set is comparatively small, computation time (processing time) also gets reduced, Deep Learning architecture is bit complex to understand.

It's again normal human psychology, if we get better result by using less effect, why should we go for more effect! But still Deep Learning has its own advantage in the process as well as in accuracy.


Okies! lets get dive into Machine Learning concept.


Introduction: (Our hero introduction)


Slightly general intro,

Machine Learning algorithm (set of rules to achieve some outcome), means it can access the data (categorical, numerical, image, video or anything) and use it to learn for themselves without any programming (like without order means sequential steps to do!). But still how it works? by simply observing the data (through instructions in order to observe the pattern and making decision or

prediction)


Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed. —Arthur Samuel, 1959

Slightly Engineering intro,

A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. —Tom Mitchell, 1997

Slightly Layman intro,

In a very simple explanation if i need to explain about Machine Learning, for a child if we need to teach about alphabet means what we will do,

  1. teach by drawing lines like standing line, sleeping line and slanting line

  2. showing the image of different format (kind of art)

  3. by holding there hands, we make them to write the letter

  4. showing some kids animation song in youtube

  5. in advance if we want to make them more understanding we will teach them A for Apple bla bla... either by showing the chart, or real things etc..

mostly we will do all the things to make our kiddo to learn right! so why we are doing all things, its same thing right but different methods? simple answer is we are making our kiddo to observe the pattern for each alphabet with the phonics and wordings (A for Apple...) for better understanding. This is what Machine Learning also do, it observe the data (here its alphabet) for better understanding and used for application purpose!


I hope this picture is epic layman understanding of Machine Learning,



A baby learns to crawl, walk and then run. We are in the crawling stage when it comes to applying Machine Learning - Dave Waters

I hope you got some idea about what is Machine Learning?! Still you need some information, just watch the below video for some better understanding,

https://www.youtube.com/watch?v=ukzFI9rgwfU&list=PLEiEAq2VkUULYYgj13YHUWmRePqiu8Ddy


Why Use Machine Learning?

We are living in an internet era, when we are reading, watching, monitoring, browsing, sending, transaction or buying anything through online, those data's that are generated are stored in cloud (virtual database, we can use it anywhere in the world irrespective of devices too like gmail account) and will be used for different applications.


Applying ML techniques to dig into large amounts of data can help discover patterns that were not immediately apparent. This is called data mining (another technology)


Data's collected from simple examples are, that's happening in our day-to-day life,

  1. health monitoring

  2. feather forecasting

  3. GPS tracking

  4. E-commerce

Doctors can be replaced by software – 80% of them can. I’d much rather have a good machine learning system diagnose my disease than the median or average doctor.

Types of Machine Learning:

On the basis of the amount and type of supervision they are getting during training, ML can be broadly classified as,



  1. Supervised learning (data’s with label) – Classification and Regression

  2. Unsupervised Learning (data’s without label) – Clustering, Anomaly detection,Dimensionality reduction, Association rule based learning

  3. Semi-Supervised Learning (Partial data’s with and without label)

  4. Reinforcement Learning (Learning through rewards and punishment)

Types of Machine Learning - Insights:

Anyway in the upcoming session, we will be seeing each algorithms with detailed stuff, before that will see some introduction about the types of ML and its algorithms,


1. SUPERVISED LEARNING:

In supervised learning, you train your model on a labelled dataset that means we have both raw input data (some numerical number - information) as well as its results (class / label). We split our data into a training dataset and test dataset where the training dataset is used to train our network whereas the test dataset acts as new data for predicting results or to see the accuracy of our model.


The model performs fast because the training time taken is less as we already have desired results in our dataset. This model predicts accurate results on unseen data or new data without even knowing a prior target. Under this learning, we have two broader types,

  1. Classification

  2. Regression

CLASSIFICATION


Classification is a process of finding an algorithm which helps in dividing the datasets into classes based on different parameters (features - other than label column). In Classification, a computer program is trained on the training datasets and based on that training, it categorizes the data into different classes.

Example : Spam mail filter (Spam or Ham classification)

Each message (data) which has label (2 classes - tick and not class), we will be modelling (processing) with some algorithms and in testing part, when we are giving new instance (testing data), the algorithm need to classify / predict the correct label (?). Some classification algorithms are,

  1. Support Vector Machine

  2. Kernel Support Vector Machine

  3. K Nearest Neighbor

  4. Logistic Regression

  5. Decision Tree classification

  6. Random Forest classification

  7. Naive Bayes classifier


REGRESSION

Regression is a process of finding the correlations between dependent and independent variables. It helps in predicting the continuous variables. Regression is straight opposite process of classification (no classification of labels but predicting the future)

Example: Share value in the market (prediction), Weather Forecasting


Here the data values will be continuous pattern, it predicts the future but not used for the classification. Some of the algorithms are as follows,

  1. Simple Linear Regression

  2. Multiple Linear Regression

  3. Polynomial Regression

  4. Support Vector Regression

  5. Decision Tree Regression

  6. Random Forest Regression

  7. Lasso Regression

  8. Ridge Regression

  9. Elastic Net Regression

  10. Logistic Regression

  11. LAD Regression (Least Absolute Deviation)

2. Unsupervised Learning:

In unsupervised learning, the information used to train is neither classified nor labelled in the dataset. Unsupervised learning studies on how systems can infer a function to describe a hidden structure from unlabelled data. The main task of unsupervised learning is to find patterns in the data.


Once a model learns to develop patterns, it can easily predict patterns for any new dataset in the form of clusters. The system doesn’t figure out the right output, but it explores the data and can draw inferences from datasets to describe hidden structures from unlabeled data. To be precise, it's like when we are joining the college for the first, we don't know who is good and who is bad, after the days passes, we come to know about the person and make our mind to form gang called friends, good person, bad person and studious (padips gang) person group. This learning is broadly classified as,

  1. Clustering

  2. Anomaly detection

  3. Dimensionality reduction

  4. Association Rule

Clustering

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. In simple words, the aim is to segregate groups with similar traits and assign them into clusters.


Some of the algorithms are,

  1. K Means Clustering

  2. Hierarchical clustering

Anomaly detection

Anomaly detection is the process of identifying unexpected items or events in data sets, which differ from the norm. And anomaly detection is often applied on unlabeled data which is known as unsupervised anomaly detection. Anomaly detection has two basic assumptions:

  1. Anomalies only occur very rarely in the data.

  2. Their features differ from the normal instances significantly.


Association Rule learning

The goal is to dig into large amounts of data and discover interesting relations between attributes. For example, suppose you own a supermarket. Running an association rule on your sales logs may reveal that people who purchase basumathi rice and chicken masala powder also tend to buy ghee (briyani combination). Some algorithms are,

  1. Apriori

  2. EcLat

Dimensionality Reduction

The goal is to simplify the data without losing too much information. One way to do this is to merge several correlated features into one. Some algorithms are,

  1. Principal Component Analysis

  2. Linear Discriminant Analysis

  3. Kernel PCA

3. Semi Supervised Learning

An approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Semi-supervised learning falls between unsupervised learning (with no labeled training data) and supervised learning (with only labeled training data). Some of the algorithms are,

  1. Semi-supervised Generative Adversarial Network


4. Reinforcement Learning

It is a Machine Learning algorithm that allows software agents and machines to automatically determine the ideal behavior within a specific context to maximize its performance. It does not have labelled dataset or results associated with data so the only way to perform a given task is to learn from experience.


The goal of reinforcement learning in this case is to train the dog (agent) to complete a task within an environment, which includes the surroundings of the dog as well as the trainer. First, the trainer issues a command or cue, which the dog observes (observation). The dog then responds by taking an action. If the action is close to the desired behavior, the trainer will likely provide a reward, such as food or a ball; otherwise, no reward or a negative reward will be provided like beating or no food or tied up in corner. At the beginning of training, the dog will likely take more random actions like rolling over when the command given is “sit,” as it is trying to associate specific observations with actions and rewards. This association, or mapping, between observations and actions is called policy. Some of the algorithms are,

  1. Upper Confidence Bound

  2. Thompson Sampling


How Machine Learning Works?

In a simple way, machine Learning work in the flow as follows, steps at each process can be explained in the upcoming blogs,

Main Challenges of Machine Learning:

Even though every technology, if it needs to work properly, it need to overcome some challenges, its not about the technology but its with algorithm and data's,

  1. Insufficient Quantity of Training Data (for simple algorithm to work properly and efficiently we need minimum of 1000's of data)

  2. Non-representative Training Data (missing data's interrupt the efficiency and modelling)

  3. Poor-Quality Data (if the data which has outlier, errors, noise along with data's will surely affect the accuracy)

  4. Irrelevant Features (garbage in; garbage out, if we process with irrelevant data information it affects the efficiency of the system)

  5. Over-fitting the Training Data (an algorithm that models the training data too well)

  6. Under-fitting the Training Data (an algorithm that can neither model the training data nor generalize to new data - useless algorithm for that particular dateset)

  7. Testing and Validating

  8. Hyper parameter Tuning and Model Selection (parameter tuning makes perfect model)

Technology that laid foundation in 16th Century and its trending even in the year 2020 is not a normal thing (History of ML):

Machine Learning concept is not started now, it laid some foundation from 16th century itself. Here history is important because we need to know what motivated to find this trending technology? just by understanding the mathematics and algorithm’s alone won’t be sufficient, the history makes us to think better more in finding new things, so if you are interested please go through it or just skip to the next topic, you can see the reality of technology. A technology that laid foundation in 16th century and it is trending now is not an easy way how the evaluation takes place.


For easy understanding in history, it is separated as groundwork, theory to reality and modern ML,


Laying Groundwork,

  1. In 1642, French teen builds the first mechanical calculator (Pascaline)

  2. In 1649, The modern binary system is born (German Mathematician Philospher and Poet Gottfried Wilhelm Leibniz) – laid foundation for modern computing

  3. In 1770 – ‘The Turk’ - A chess playing automaton debuts, then dupes Europe for decades

  4. In 1834 – Father of Computer (Charles Babbage) invents punch card programming

  5. In 1842 – Ada Lovelace’s algorithm make’s her world’s first computer programmer

  6. In 1847 – A mystic’s algebra makes CPUs possible more than a century before they are invented

  7. In 1927 - AI debuts on the silver screen (Metropolis) thinking machine

  8. In 1936 - Alan Turing conceives his "Universal Machine"

From theory to reality,

  1. In 1943 – A human ‘neural network’ is modeled with electrical circuits

  2. In 1952 – A computer improves its checker game (Arthur Samuel created a program that helped an IBM computer get better at checkers)

  3. In 1959 – A neural network learns to make phone calls clearer (Stanford’s MADALINE – used to reduce the echoes over phone lines)

  4. In 1968 – Kubrick’s sets a high bar for computer intelligence (visited Marvin Minsky of MIT’s Artificial Intelligence Lab)

  5. In 1979 – the Stanford cart takes a slow but significant spin

  6. In 1982 – movie audience meet ‘Blade Runners’ replicants

  7. In 1985 - NETtalk teaches itself to pronounce new words (Terry Sejnowski and Charles Rosenberg)

  8. In 1997 – IBM’s Deep blue beats a chess champion

  9. In 1999 – Computer-aided diagnosis catches more cancers (developed at the university of Chicago, reviewed 22,000 mammograms and detected cancer 52% more accurately than radiologists)

Modern ML takes place,

  1. In 2006, Neural Net research gets a reboot as ‘Deep Learning’ (When his field fell off the academic radar, computer scientist Geoffrey Hinton rebranded neural net research as “deep learning.” Today, the internet’s heaviest hitters use his techniques to improve tools like voice recognition and image tagging)

  2. In 2009, BellKor’s Pragmatic Chaos nets the $1M Netflix prize (In 2006, Netflix offered $1M to anyone who could beat its algorithm at predicting consumer film ratings. The BellKor team of AT&T scientists took the prize three years later, beating the second-place team by mere minutes)

  3. In 2011, Watson computer wins at Jeopardy (Though not a perfect player, IBM’s Watson did manage to outwit two Jeopardy! champions in a three-day showdown. Plans for this technology include powering a computerized doctor’s assistant.)

  4. In 2012, google Brain detects human faces in images (A neural network created by Google learned to recognize humans and cats in YouTube videos — without ever being told how to characterize either. It taught itself to detect felines with 74.8% accuracy and faces with 81.7%)

  5. In 2014, Chat-bot “Eugene Goostman” passes the turning test (Devised by cryptanalyst Alan Turing in 1950, this test requires a machine to fool a person into thinking it’s human through conversation. Sixty years to the day after Turing’s death, a chat-bot convinced 33% of human judges that it was a Ukrainian teen)

  6. In 2014, Computers help improve the ER experience (Health-tech began using event simulation to predict ER wait times based on data like staffing levels, medical histories, and hospital layouts. These predictions help hospitals reduce the wait, a key factor in better patient outcomes)

  7. In 2015, A computer wins at the world’s hardest board-game (Google’s AlphaGo was the first program to best a professional player at Go, considered the most difficult board game in the world. With this defeat, computers officially beat human opponents in every classical board game)

  8. In 2015, Machine and Humans paired up to fight fraud online (When PayPal set out to fight fraud and money laundering on its site, it took a hybrid approach. Human detectives define the characteristics of criminal behavior, then a machine learning program uses those parameters to root out the bad guys on the PayPal site)

  9. In 2016, Read my lips, LipNet (Kubrick’s fictional HAL 9000 could read lips in 2001. It would take an Oxford team a little longer, but the results were no less impressive. This artificial-intelligence system identified lip-read words with an accuracy of 93.4%)

  10. In 2016, Natural Language Processing gives life to digital personal shopper (The North Face became the first retailer to use IBM Watson’s natural language processing in a mobile app. The Expert Personal Shopper helps consumers find what they’re looking for through conversation, just as a human sales associate would)

  11. In 2017, a machine learns how to stop online trolling (As part of its anti-harassment efforts, Alphabet’s Jigsaw team built a system that learned to identify trolling by reading millions of website comments. The underlying algorithms could be a huge help for sites with limited resources for moderation.)

Application of ML

There are numerous application of ML that we are using day-to-day life,

  1. Virtual Personal Assistant (Siri, Alexa)

  2. Social Media Services (People you may know, Face Recognition)

  3. Email Spam and Malware Filtering

  4. Online Customer Support

  5. Product Recommendation (Amazon, Flipkart)

  6. Online Fraud Detection


Below video shows the top 10 application in the field of Machine Learning,

https://www.youtube.com/watch?v=HKcO3-6TYr0


Must have ML Engineer Skills that will get you hired,

  1. Programming Language: Python, R, SQL

  2. Mathematics: Calculus, Linear Algebra, Matrices, Statistics, Probability Theory, Optimization, Graph Theory

  3. Neural Network Architecture

  4. Language Processing: Audio & Video Analysis

  5. Industry Knowledge (Business & Profits)

  6. Effective Communication (Presentation skills, Attention and Focus)

  7. Rapid Prototyping (Ideas, A/B testing)

  8. Keep Updating (Papers, R&D)

Some important tools for Machine Learning are,


Some of the best ML online courses are,

  1. Free Machine Learning Course (fast.ai)

  2. Machine Learning Course by Stanford University (Coursera)

  3. Deep Learning Course (deeplearning.ai)

  4. Machine Learning Course A-Z: Hands on Python & R in Data Science (Udemy)

  5. Free Machine Learning Data Science Course (Harvard University)

Datasets to practice

Datasets plays important role in the machine Learning, if we dont have good dataset whatever algorithms we do it will be waste only. we can come across many web links to get the datasets, it plays vital role in processing, hence below are few web links with globally approved,

  1. UC Irvine Machine Learning Repository

  2. Kaggle datasets

  3. Amazon’s AWS datasets

  4. Wikipedia’s list of Machine Learning datasets

  5. http://dataportals.org/search

  6. http://opendatamonitor.eu/

  7. https://www.quandl.com/

  8. https://www.reddit.com/r/datasets

Some of the famous Machine Learning projects you should try are,

  1. House Prediction project

  2. Stock Price Prediction project

  3. EMAIL spam detection project

  4. UBER data analysis project

  5. IRIS flowers classification project

  6. Credit card fraud detection project

  7. Movie recommended systems project

  8. RGB color detection project

  9. Customer sentiment analysis project

  10. Speech Emotion Recognition project

Important libraries for Machine Learning in Python:

Library in a school or university which makes us comfort in sharing knowledge. Likewise, as the technology getting better each day, programming also getting easier by using libraries, some of the libraries are as follows,

  1. Numpy

  2. Matplotlib

  3. Scikit-learn

  4. Lightgbm

  5. Pandas

  6. Seaborn

  7. Statsmodels

  8. Scipy

  9. XGBOOST

  10. NLTK

Machine Learning Vs Deep Learning:

  1. On data: Excellent performance on Small datasets Vs Large datasets

  2. On Hardware: Can work on low end / normal specification Vs Powerful system with GPU

  3. On feature: Best feature extraction is must Vs no need to concentrate

  4. On Execution time: Its bit slow (min to hours) depend on algorithms and datasets Vs Very slow (may be to few days to a week) depend upon the datasets size

  5. On Interpretability: Few algorithms can be interpret but many are not able to Vs it’s difficult to interpret


In-order to proceed with Machine Learning, the road-map is given below

I have given a very brief insightful introduction about Machine Learning in this blog (to my knowledge), and hope you people also get convinced with the information. Enjoy reading ! Share with your friends if you find it useful ! Sharing knowledge is an another kind of greatest service ! Love more ! Learn more ! Happy days ahead!



153 views2 comments

Recent Posts

See All