Multi armed bandits example 2018-02-27T22:13:46+00:00

Project Description

After reading a paper about Thomson sampling, I implemented it. This code replicates multi armed bandit problem. We have 5 different distributions from which we choose. Due to the fact we have a simulation, we set up their distributions, but we pretend we do not know them. We go through 30 time periods and each time periods we can make 100 draws together. We decide from which distribution we will draw. Outcome of each draw is either 1 or 0 and we want to maximize the sum of all draws and determine distributions, if possible.

First, we only draw the same amount (20) from each distribution each time frame. In second step we implement Thomson sampling method where I create an automatic process of determining from where to draw so we optimize the good results. If we know that distribution is bad for example, we do not draw from there as much as from the others. The more certain we are about distributions, the better decisions we can make based on where we can draw from. The algorithm deployed is an algorithm of Thomson sampling.

This can be useful in automating different marketing campaigns or features on website where we decide which features to display to the user to have the best outcome (conversion). Even though we do not know which one to display at the beginning, we can have better guesses as time goes on and Thomson sampling for example will give much better results than just random sampling.

Random choosing

### We should be able to at least beat no-system way of selecting.
import random
import numpy as np
### say we have 5 different distributions.
### Distribution 1): binomial, mean 0.3
### 2): mean 0.35, 3): mean 0.28, 4) mean 0.24, 5) mean 0.27
### Say we have 300 days of testing.
gamma = 1
m = 300
#### Each day we have 1000 selections.
selects = 100
distribution = [0.2, 0.2, 0.2, 0.2, 0.2] ### At the beginning all are equally likely to be chosen.
clicks1, clicks2,clicks3,clicks4,clicks5 = 0,0,0,0,0
impr1,impr2,impr3,impr4,impr5 = 0,0,0,0,0
successes = 0
i = 0
while i<30:
first_draw = int(round(distribution[0]*selects))
second_draw = int(round(distribution[1]*selects))
third_draw = int(round(distribution[2]*selects))
fourth_draw = int(round(distribution[3]*selects))
fifth_draw = int(round(distribution[4]*selects))
one = np.random.binomial(first_draw,0.3)
two = np.random.binomial(second_draw,0.35)
three = np.random.binomial(third_draw,0.28)
four = np.random.binomial(fourth_draw,0.24)
five = np.random.binomial(fifth_draw,0.27)
clicks1 = clicks1 + one
clicks2 = clicks2 + two
clicks3 = clicks3 + three
clicks4 = clicks4 + four
clicks5 = clicks5 + five
impr1 = impr1 + first_draw
impr2 = impr2 + second_draw
impr3 = impr3 + third_draw
impr4 = impr4 + fourth_draw
impr5 = impr5 + fifth_draw
impr1,impr2,impr3,impr4,impr5 = float(impr1), float(impr2),float(impr3),float(impr4),float(impr5)
# distribution = [(clicks1/impr1)**gamma, (clicks2/impr2)**gamma, (clicks1/impr3)**gamma,
# (clicks4/impr4)**gamma, (clicks5/impr5)**gamma]
i += 1

success = clicks1 + clicks2 + clicks3 + clicks4 + clicks5

Thomson sampling;

import random
import numpy as np
### say we have 5 different distributions.
### Distribution 1): binomial, mean 0.3
### 2): mean 0.35, 3): mean 0.28, 4) mean 0.24, 5) mean 0.27
### Say we have 300 days of testing.
gamma = 1
m = 300
#### Each day we have 1000 selections.
selects = 100
distribution = [0.2, 0.2, 0.2, 0.2, 0.2] ### At the beginning all are equally likely to be chosen.
best_ones = [0,0,0,0,0]
### We count successes and tries.
clicks1, clicks2,clicks3,clicks4,clicks5 = 0,0,0,0,0
impr1,impr2,impr3,impr4,impr5 = 0,0,0,0,0
successes = 0
i = 0
while i<30:
### We get beta distribution and random trials
sample1 = np.random.beta(clicks1 + 1,impr1 - clicks1 + 1, size = 1000)
sample2 = np.random.beta(clicks2 + 1,impr2 - clicks2 + 1, size = 1000)
sample3 = np.random.beta(clicks3 + 1,impr3 - clicks3 + 1, size = 1000)
sample4 = np.random.beta(clicks4 + 1,impr4 - clicks4 + 1, size = 1000)
sample5 = np.random.beta(clicks5 + 1,impr5 - clicks5 + 1, size = 1000)
### And using MC method we find the probability of certain distribution being the best.
indeksi = np.zeros(len(sample1))
for a in range(len(sample1)):
indeksi[a] = np.argmax([sample1[a],sample2[a],sample3[a],sample4[a],sample5[a]])
prob1 = float(len(np.where(indeksi==0)[0]))/float(len(sample1))
prob2 = float(len(np.where(indeksi==1)[0]))/float(len(sample1))
prob3 = float(len(np.where(indeksi==2)[0]))/float(len(sample1))
prob4 = float(len(np.where(indeksi==3)[0]))/float(len(sample1))
prob5 = float(len(np.where(indeksi==4)[0]))/float(len(sample1))
### Based on that we determine next draws
first_draw = int(round(prob1*selects))
second_draw = int(round(prob2*selects))
third_draw = int(round(prob3*selects))
fourth_draw = int(round(prob4*selects))
fifth_draw = int(round(prob5*selects))
### And finally we get results of the draws
one = np.random.binomial(first_draw,0.3)
two = np.random.binomial(second_draw,0.35)
three = np.random.binomial(third_draw,0.28)
four = np.random.binomial(fourth_draw,0.24)
five = np.random.binomial(fifth_draw,0.27)
### We update successful clicks and trials.
clicks1 = clicks1 + one
clicks2 = clicks2 + two
clicks3 = clicks3 + three
clicks4 = clicks4 + four
clicks5 = clicks5 + five
impr1 = impr1 + first_draw
impr2 = impr2 + second_draw
impr3 = impr3 + third_draw
impr4 = impr4 + fourth_draw
impr5 = impr5 + fifth_draw
### And just collect some data.
impr1,impr2,impr3,impr4,impr5 = float(impr1), float(impr2),float(impr3),float(impr4),float(impr5)
distribution = [(clicks1/impr1)**gamma, (clicks2/impr2)**gamma, (clicks1/impr3)**gamma,
(clicks4/impr4)**gamma, (clicks5/impr5)**gamma]
i += 1
### In the end we will get distribution as well as how many successful ones we had
### (opposed to loss)

success = clicks1 + clicks2 + clicks3 + clicks4 + clicks5

error: Content is protected !!