How is Bayes Theorem derivated f
Bayes' theorem
Bayes' theorem describes the probability of occurrence of an event related to any condition, which is also considered for the case of Conditional probability. It is the most important rule in data science, the mathematical rule that describes how to update a belief by given some evidence. In other words - it describes the act of learning.
Conditional probabiltiy
"Probability of event A and event B equals
the probability of event A times the probability of event B given event A"
Suppose we want to compute the intersection of Event A and Event B, which is .
For is the probability of event B given the consideration of event A, thus yields the red portion which is .
On the other hand, the formula can also be interpreted as follows
which will also result in the red portion
Derivation
Bayes' theorem
For is the also called "total probability"
Interpretation
The formula can be interpreted as follows
-
Posterior
The updated probability after the evidence is considered. -
Prior
The probability before the evidence is even considered. -
Likelihood
The probability of the evidence, given the belief is true. -
Marginal
The probability of the evidence, under any circumstance
Bayes' Rule lets you calculate the posterior (or "updated") probability. This is a conditional probability. It is the probability of the hypothesis being true if the evidence is present.
Think of the prior (or "previous") probability as your belief in the hypothesis before seeing the new evidence. If you had a strong belief in the hypothesis already, the prior probability will be large.
The prior is multiplied by a fraction. Think of this as the "strength or weight" of the evidence. The posterior probability is greater when the top part (numerator) is big, and the bottom part (denominator) is small.
The numerator is the likelihood. This is another conditional probability. It is the probability of the evidence being present, given the hypothesis is true.
The denominator is the marginal probability of the evidence. That is, it is the probability of the evidence being present, whether the hypothesis is true or false. The smaller the denominator, the more "convincing" the evidence.