William Gosset-Pioneer of modern statistics

Testing new ideas and figuring out better ways of doing things is an everyday process. But how do we know if the change we implemented is significantly better than the previous method or not. Here’s when t-test comes to our rescue. t-test is used to differentiate between two sample means to find out if the difference between them is actually real or occurred simply due to chance. Any idea is called a hypothesis and the process of testing these ideas is called the Hypothesis Testing.

t-test is one of the most important statistical tool that is used to test the…

Source: smbc-comics

Today, in this golden age of being drowned in data every single moment, we are being flooded by a lot of opinions masked under the name of information. False conclusions are disastrous! Here’s Hasan Minhaj’s take on it: https://www.youtube.com/watch?v=icNirsV1rLA

Generally, bigger the data, better the results. But on the contrary, the more data, higher the risk of being fooled by randomness to make false conclusions. And p-hacking is just that.

But before we delve into the details, lets understand the basic terms.

Statistics is the science that helps us manage risk in this uncertain world. …

Source: Blindmen-and-Elephant

If you have been in the data science space for a decent amount of time, you would have already realized how ensemble techniques are one of the core strategies for winning Kaggle competitions.

The story of 2006 Netflix prize is one of the game changing tale in the AI folklore. The winning entry used the ensemble technique to bag the million dollar prize.

Before we move on to look into ensemble techniques, let’s look into the concept of “Wisdom of crowd”.

Wisdom of crowd is the idea of collective intelligence popularized by James Surowiecki in his book, The Wisdom of…

Let’s say, we are left in the middle of the mountain ranges, blind-folded and need to find the lowest point in the range.

Source : The spine mountain range

One of the most intuitive way we go about it is to feel the slope of the ground.

From the position that we are standing, we try checking all possible directions for the greatest descending slope/downhill and move in that direction.

We take each step, one at a time, iteratively until a point where there is no downward slope in all possible directions and stop there. …

Source : https://wiki.pathmind.com/word2vec

Natural Language Processing(NLP) is a branch of AI which helps understand and interpret human language bridging the gap between human and machine language.

We use the concept of analogies between words to predict a country, given the name of a capital city.

Word Embedding:

Machine learning and deep learning algorithms generally deal with numeric data. So, for converting text into numbers, BagofWords technique has been developed to extract numeric features from text. It uses the concept of frequency distribution of words to find the number of times each word appeared in the text which is also known as the vectorization…

Image Source: Reputationx

Sentiment Analysis:

Sentiment analysis is an NLP technique that allows us to classify if a text, tweet or comment is either positive, neutral or negative. Today’s technology enables users to express their emotions and thoughts more openly on social platforms than ever before. So Sentiment Analysis has become a mandatory tool for every business to understand user sentiment and gauge their performance to tailor their products and services according to user needs, thus making systems more efficient.

Logistic Regression:

Logistic regression is a supervised machine learning technique for classification problems. Supervised machine learning algorithms train on a labeled dataset along…

In a no-perfect-solution world, optimizing the existing solutions is the only way of progress. But real life problems often come with a set of constraints. Lagrangian function comes to the rescue in times of handling such situations. Every business has a lot of constraints to deal with on a daily basis. The constraints might include the manufacturing equipment, workforce, budget, etc. So the goal is to optimize the function, within the constraints.

Step 1: Identifying objective function: It represents the goal — Maximizing the profit/Minimizing the error rate

Step 2: Identifying constraint function: It represents the limitations in the system…

Naïve Machine Translation:

The aim of this project is to translate English words to French using word embedding and vector space models.

Image Source: Fiverr

When we train the word embeddings for a vocabulary, the main focus is to optimize the word embedding such that the core meanings and the relationships between words is maintained. The idea behind this concept is given by John Rupert Firth in 1950s: “You shall know a word by the company it keeps” — Firth, J.R. (1957)

It works on the principal that the semantic or meaning of a word is mostly captured by the context or its…


Trying to make sense of this world using math and data

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store