Last update: **28/06/2016 11:32**

What **prediction** means in an application? Let's try to be
extremely pragmatic through a limited set of examples in the field of software
prediction.

Imagine you're serving a community of customers to whom you offer web templates. What is the most appropriate web templates that you may want to propose right away to a new customer? Simple stats should give you valuable answers!

Imagine that you already have 2000 customers and that 10% of them are Beauty Centers and Salons. Imagine you have 100 templates to choose from. Imagine that Beauty Salons have chosen template #1 100 times, template #2 50 times, template #3 20 times, template #4 and #5 10 times and the rest of the templates have been chosen 1 time if chosen at all.

What is the prediction you can make for this brand new customer which happens to be a … Beauty Salon for the most appropriate templates that fit her activity?

Well, obviously the most appropriate template that comes to mind is template #1 because it represents 50% of the choices made by the group of Beauty Centers & Salons. Then comes template #2 with 25%, etc. Easy stats; easy probabilities. Correct?

The problem can be posed in these terms: what's the probability that a
given template (

This is noted `A`

) will be chosen knowing that the customer is a
Beauty Salon (`B`

)?`P(A|B)`

, where A
designates the template and B designates the group of customers (Beauty Salons
in our case).

Well, if we believe the **Bayes theorem**, **this probability** is
to be calculated as follows: it is the **probability that a
given template (A) has already been taken by a Beauty Salon** x the **probability that it is a Beauty Salon** / by the **probability that this template will be chosen (entire
customer base)**.

P(A|B)= (P(B|A)xP(A)) /P(B)

should be read "probability that A will be true if we
know that B is true": more on the Bayes theorem on this wikipedia page.**P(A|B)**

Let's put this in numbers. For that we need to add one information: within
2000 customers template #1 has been chosen 500 times: template #1 has been
chosen 500 times and 100 times it was by a Beauty Salon. That corresponds to
`100 / 500 = 20%`

Now we can calculate the whole thing!

P(A) = 500 / 2000 = 25% (template #1 probability) P(B) = 200 / 2000 = 10% (Beauty Salon probability) P(B|A) = 100 / 500 = 20% (20% of the time, when template #1 was chosen it was a Beauty Salon) P(A|B) = ( 20% x 25% ) / 10% = 50%

Do the same with the other templates you have to offer and present the choice of templates in the order of the most probable template. You're done! Your software is now capable of easy prediction and you help the customer make an enlighted choice.

Now that you grasped the idea, there should be no major difficulty applying the same sort of principle on … more difficult problems and even to combine your calculations with more things you may already know.

Imagine that you are developing a **Document Classifier** whose
purpose is to categorize documents: customer emails, customer letters, supplier
letters, contracts, invoices, etc. Imagine you have received 200.000 documents.
Imagine that half of these documents are invoices. Imagine that half of the
invoices are coming from consultancy firms. What's the probability that a new
document will be an invoice of a consultancy firm? The **Baeys
formula** can answer this question in the form of a probability level.

Go a bit further and compare this to the words that the OCR capable solution has detected on a document and bring the Bayes formula to a slightly higher level: what's the probability that a series of words will appear on an invoice of a consulting firm? You simply apply Bayes on Bayes (which is what we call serial combinations).

As a matter of fact, serial combinations of that sort are limitless and the
more you know about a situation, the more you will know tomorrow and the more
accurate your predictions will be, something that is definitely worth it in the
field of **Business Intelligence**.

Hope it helps.