Project Description
After reading a paper about Thomson sampling, I implemented it. This code replicates multi armed bandit problem. We have 5 different distributions from which we choose. Due to the fact we have a simulation, we set up their distributions, but we pretend we do not know them. We go through 30 time periods and each time periods we can make 100 draws together. We decide from which distribution we will draw. Outcome of each draw is either 1 or 0 and we want to maximize the sum of all draws and determine distributions, if possible.
First, we only draw the same amount (20) from each distribution each time frame. In second step we implement Thomson sampling method where I create an automatic process of determining from where to draw so we optimize the good results. If we know that distribution is bad for example, we do not draw from there as much as from the others. The more certain we are about distributions, the better decisions we can make based on where we can draw from. The algorithm deployed is an algorithm of Thomson sampling.
This can be useful in automating different marketing campaigns or features on website where we decide which features to display to the user to have the best outcome (conversion). Even though we do not know which one to display at the beginning, we can have better guesses as time goes on and Thomson sampling for example will give much better results than just random sampling.
Random choosing
### We should be able to at least beat no-system way of selecting. |
### say we have 5 different distributions. |
### Distribution 1): binomial, mean 0.3 |
### 2): mean 0.35, 3): mean 0.28, 4) mean 0.24, 5) mean 0.27 |
### Say we have 300 days of testing. |
#### Each day we have 1000 selections. |
distribution = [0.2, 0.2, 0.2, 0.2, 0.2] ### At the beginning all are equally likely to be chosen. |
clicks1, clicks2,clicks3,clicks4,clicks5 = 0,0,0,0,0 |
impr1,impr2,impr3,impr4,impr5 = 0,0,0,0,0 |
first_draw = int(round(distribution[0]*selects)) |
second_draw = int(round(distribution[1]*selects)) |
third_draw = int(round(distribution[2]*selects)) |
fourth_draw = int(round(distribution[3]*selects)) |
fifth_draw = int(round(distribution[4]*selects)) |
one = np.random.binomial(first_draw,0.3) |
two = np.random.binomial(second_draw,0.35) |
three = np.random.binomial(third_draw,0.28) |
four = np.random.binomial(fourth_draw,0.24) |
five = np.random.binomial(fifth_draw,0.27) |
clicks3 = clicks3 + three |
impr1 = impr1 + first_draw |
impr2 = impr2 + second_draw |
impr3 = impr3 + third_draw |
impr4 = impr4 + fourth_draw |
impr5 = impr5 + fifth_draw |
impr1,impr2,impr3,impr4,impr5 = float(impr1), float(impr2),float(impr3),float(impr4),float(impr5) |
# distribution = [(clicks1/impr1)**gamma, (clicks2/impr2)**gamma, (clicks1/impr3)**gamma, |
# (clicks4/impr4)**gamma, (clicks5/impr5)**gamma] |
success = clicks1 + clicks2 + clicks3 + clicks4 + clicks5
Thomson sampling;
### say we have 5 different distributions. |
### Distribution 1): binomial, mean 0.3 |
### 2): mean 0.35, 3): mean 0.28, 4) mean 0.24, 5) mean 0.27 |
### Say we have 300 days of testing. |
#### Each day we have 1000 selections. |
distribution = [0.2, 0.2, 0.2, 0.2, 0.2] ### At the beginning all are equally likely to be chosen. |
### We count successes and tries. |
clicks1, clicks2,clicks3,clicks4,clicks5 = 0,0,0,0,0 |
impr1,impr2,impr3,impr4,impr5 = 0,0,0,0,0 |
### We get beta distribution and random trials |
sample1 = np.random.beta(clicks1 + 1,impr1 - clicks1 + 1, size = 1000) |
sample2 = np.random.beta(clicks2 + 1,impr2 - clicks2 + 1, size = 1000) |
sample3 = np.random.beta(clicks3 + 1,impr3 - clicks3 + 1, size = 1000) |
sample4 = np.random.beta(clicks4 + 1,impr4 - clicks4 + 1, size = 1000) |
sample5 = np.random.beta(clicks5 + 1,impr5 - clicks5 + 1, size = 1000) |
### And using MC method we find the probability of certain distribution being the best. |
indeksi = np.zeros(len(sample1)) |
for a in range(len(sample1)): |
indeksi[a] = np.argmax([sample1[a],sample2[a],sample3[a],sample4[a],sample5[a]]) |
prob1 = float(len(np.where(indeksi==0)[0]))/float(len(sample1)) |
prob2 = float(len(np.where(indeksi==1)[0]))/float(len(sample1)) |
prob3 = float(len(np.where(indeksi==2)[0]))/float(len(sample1)) |
prob4 = float(len(np.where(indeksi==3)[0]))/float(len(sample1)) |
prob5 = float(len(np.where(indeksi==4)[0]))/float(len(sample1)) |
### Based on that we determine next draws |
first_draw = int(round(prob1*selects)) |
second_draw = int(round(prob2*selects)) |
third_draw = int(round(prob3*selects)) |
fourth_draw = int(round(prob4*selects)) |
fifth_draw = int(round(prob5*selects)) |
### And finally we get results of the draws |
one = np.random.binomial(first_draw,0.3) |
two = np.random.binomial(second_draw,0.35) |
three = np.random.binomial(third_draw,0.28) |
four = np.random.binomial(fourth_draw,0.24) |
five = np.random.binomial(fifth_draw,0.27) |
### We update successful clicks and trials. |
clicks3 = clicks3 + three |
impr1 = impr1 + first_draw |
impr2 = impr2 + second_draw |
impr3 = impr3 + third_draw |
impr4 = impr4 + fourth_draw |
impr5 = impr5 + fifth_draw |
### And just collect some data. |
impr1,impr2,impr3,impr4,impr5 = float(impr1), float(impr2),float(impr3),float(impr4),float(impr5) |
distribution = [(clicks1/impr1)**gamma, (clicks2/impr2)**gamma, (clicks1/impr3)**gamma, |
(clicks4/impr4)**gamma, (clicks5/impr5)**gamma] |
### In the end we will get distribution as well as how many successful ones we had |
success = clicks1 + clicks2 + clicks3 + clicks4 + clicks5