Testing new ideas and figuring out better ways of doing things is an everyday process. But how do we know if the change we implemented is significantly better than the previous method or not. Here’s when t-test comes to our rescue. t-test is used to differentiate between two sample means to find out if the difference between them is actually real or occurred simply due to chance. Any idea is called a hypothesis and the process of testing these ideas is called the Hypothesis Testing.

Source: smbc-comics

Today, in this golden age of being drowned in data every single moment, we are being flooded by a lot of opinions masked under the name of information. False conclusions are disastrous! Here’s Hasan Minhaj’s take on it: https://www.youtube.com/watch?v=icNirsV1rLA

Generally, bigger the data, better the results. But on the contrary, the more data, higher the risk of being fooled by randomness to make false conclusions. And p-hacking is just that.

But before we delve into the details, lets understand the basic terms.

Source: Blindmen-and-Elephant

If you have been in the data science space for a decent amount of time, you would have already realized how ensemble techniques are one of the core strategies for winning Kaggle competitions.

The story of 2006 Netflix prize is one of the game changing tale in the AI folklore. The winning entry used the ensemble technique to bag the million dollar prize.

Before we move on to look into ensemble techniques, let’s look into the concept of “Wisdom of crowd”.

Let’s say, we are left in the middle of the mountain ranges, blind-folded and need to find the lowest point in the range.

Source : The spine mountain range

One of the most intuitive way we go about it is to feel the slope of the ground.

From the position that we are standing, we try checking all possible directions for the greatest descending slope/downhill and move in that direction.

Source : https://wiki.pathmind.com/word2vec

Natural Language Processing(NLP) is a branch of AI which helps understand and interpret human language bridging the gap between human and machine language.

We use the concept of analogies between words to predict a country, given the name of a capital city.

Word Embedding:

Image Source: Reputationx

Sentiment Analysis:

Sentiment analysis is an NLP technique that allows us to classify if a text, tweet or comment is either positive, neutral or negative. Today’s technology enables users to express their emotions and thoughts more openly on social platforms than ever before. So Sentiment Analysis has become a mandatory tool for every business to understand user sentiment and gauge their performance to tailor their products and services according to user needs, thus making systems more efficient.

Logistic Regression:

In a no-perfect-solution world, optimizing the existing solutions is the only way of progress. But real life problems often come with a set of constraints. Lagrangian function comes to the rescue in times of handling such situations. Every business has a lot of constraints to deal with on a daily basis. The constraints might include the manufacturing equipment, workforce, budget, etc. So the goal is to optimize the function, within the constraints.

Step 1: Identifying objective function: It represents the goal — Maximizing the profit/Minimizing the error rate

Naïve Machine Translation:

The aim of this project is to translate English words to French using word embedding and vector space models.

Image Source: Fiverr

When we train the word embeddings for a vocabulary, the main focus is to optimize the word embedding such that the core meanings and the relationships between words is maintained. The idea behind this concept is given by John Rupert Firth in 1950s: “You shall know a word by the company it keeps” — Firth, J.R. (1957)

