### Project Description

After reading a paper about Thomson sampling, I implemented it. This code replicates multi armed bandit problem. We have 5 different distributions from which we choose. Due to the fact we have a simulation, we set up their distributions, but we pretend we do not know them. We go through 30 time periods and each time periods we can make 100 draws together. We decide from which distribution we will draw. Outcome of each draw is either 1 or 0 and we want to maximize the sum of all draws and determine distributions, if possible.

First, we only draw the same amount (20) from each distribution each time frame. In second step we implement Thomson sampling method where I create an automatic process of determining from where to draw so we optimize the good results. If we know that distribution is bad for example, we do not draw from there as much as from the others. The more certain we are about distributions, the better decisions we can make based on where we can draw from. The algorithm deployed is an algorithm of Thomson sampling.

This can be useful in automating different marketing campaigns or features on website where we decide which features to display to the user to have the best outcome (conversion). Even though we do not know which one to display at the beginning, we can have better guesses as time goes on and Thomson sampling for example will give much better results than just random sampling.

## Random choosing

### We should be able to at least beat no-system way of selecting.

import random

import numpy as np

### say we have 5 different distributions.

### Distribution 1): binomial, mean 0.3

### 2): mean 0.35, 3): mean 0.28, 4) mean 0.24, 5) mean 0.27

### Say we have 300 days of testing.

gamma = 1

m = 300

#### Each day we have 1000 selections.

selects = 100

distribution = [0.2, 0.2, 0.2, 0.2, 0.2] ### At the beginning all are equally likely to be chosen.

clicks1, clicks2,clicks3,clicks4,clicks5 = 0,0,0,0,0

impr1,impr2,impr3,impr4,impr5 = 0,0,0,0,0

successes = 0

i = 0

while i<30:

first_draw = int(round(distribution[0]*selects))

second_draw = int(round(distribution[1]*selects))

third_draw = int(round(distribution[2]*selects))

fourth_draw = int(round(distribution[3]*selects))

fifth_draw = int(round(distribution[4]*selects))

one = np.random.binomial(first_draw,0.3)

two = np.random.binomial(second_draw,0.35)

three = np.random.binomial(third_draw,0.28)

four = np.random.binomial(fourth_draw,0.24)

five = np.random.binomial(fifth_draw,0.27)

clicks1 = clicks1 + one

clicks2 = clicks2 + two

clicks3 = clicks3 + three

clicks4 = clicks4 + four

clicks5 = clicks5 + five

impr1 = impr1 + first_draw

impr2 = impr2 + second_draw

impr3 = impr3 + third_draw

impr4 = impr4 + fourth_draw

impr5 = impr5 + fifth_draw

impr1,impr2,impr3,impr4,impr5 = float(impr1), float(impr2),float(impr3),float(impr4),float(impr5)

# distribution = [(clicks1/impr1)**gamma, (clicks2/impr2)**gamma, (clicks1/impr3)**gamma,

# (clicks4/impr4)**gamma, (clicks5/impr5)**gamma]

i += 1

success = clicks1 + clicks2 + clicks3 + clicks4 + clicks5

## Thomson sampling;

import random

import numpy as np

### say we have 5 different distributions.

### Distribution 1): binomial, mean 0.3

### 2): mean 0.35, 3): mean 0.28, 4) mean 0.24, 5) mean 0.27

### Say we have 300 days of testing.

gamma = 1

m = 300

#### Each day we have 1000 selections.

selects = 100

distribution = [0.2, 0.2, 0.2, 0.2, 0.2] ### At the beginning all are equally likely to be chosen.

best_ones = [0,0,0,0,0]

### We count successes and tries.

clicks1, clicks2,clicks3,clicks4,clicks5 = 0,0,0,0,0

impr1,impr2,impr3,impr4,impr5 = 0,0,0,0,0

successes = 0

i = 0

while i<30:

### We get beta distribution and random trials

sample1 = np.random.beta(clicks1 + 1,impr1 - clicks1 + 1, size = 1000)

sample2 = np.random.beta(clicks2 + 1,impr2 - clicks2 + 1, size = 1000)

sample3 = np.random.beta(clicks3 + 1,impr3 - clicks3 + 1, size = 1000)

sample4 = np.random.beta(clicks4 + 1,impr4 - clicks4 + 1, size = 1000)

sample5 = np.random.beta(clicks5 + 1,impr5 - clicks5 + 1, size = 1000)

### And using MC method we find the probability of certain distribution being the best.

indeksi = np.zeros(len(sample1))

for a in range(len(sample1)):

indeksi[a] = np.argmax([sample1[a],sample2[a],sample3[a],sample4[a],sample5[a]])

prob1 = float(len(np.where(indeksi==0)[0]))/float(len(sample1))

prob2 = float(len(np.where(indeksi==1)[0]))/float(len(sample1))

prob3 = float(len(np.where(indeksi==2)[0]))/float(len(sample1))

prob4 = float(len(np.where(indeksi==3)[0]))/float(len(sample1))

prob5 = float(len(np.where(indeksi==4)[0]))/float(len(sample1))

### Based on that we determine next draws

first_draw = int(round(prob1*selects))

second_draw = int(round(prob2*selects))

third_draw = int(round(prob3*selects))

fourth_draw = int(round(prob4*selects))

fifth_draw = int(round(prob5*selects))

### And finally we get results of the draws

one = np.random.binomial(first_draw,0.3)

two = np.random.binomial(second_draw,0.35)

three = np.random.binomial(third_draw,0.28)

four = np.random.binomial(fourth_draw,0.24)

five = np.random.binomial(fifth_draw,0.27)

### We update successful clicks and trials.

clicks1 = clicks1 + one

clicks2 = clicks2 + two

clicks3 = clicks3 + three

clicks4 = clicks4 + four

clicks5 = clicks5 + five

impr1 = impr1 + first_draw

impr2 = impr2 + second_draw

impr3 = impr3 + third_draw

impr4 = impr4 + fourth_draw

impr5 = impr5 + fifth_draw

### And just collect some data.

impr1,impr2,impr3,impr4,impr5 = float(impr1), float(impr2),float(impr3),float(impr4),float(impr5)

distribution = [(clicks1/impr1)**gamma, (clicks2/impr2)**gamma, (clicks1/impr3)**gamma,

(clicks4/impr4)**gamma, (clicks5/impr5)**gamma]

i += 1

### In the end we will get distribution as well as how many successful ones we had

### (opposed to loss) success = clicks1 + clicks2 + clicks3 + clicks4 + clicks5