BDA3 Chapter 1 Exercise 9
Here’s my solution to exercise 9, chapter 1, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.
Suppose there 3 doctors, who open their practice at 09:00 and stop accepting patients at 16:00. If customers arrive in exponentially distributed intervals with mean 10 minutes, and appointment duration is uniformly distributed between 5 and 10 minutes, we want to know:
- how many patients arrive per day?
- how many patients have to wait for their appointment?
- how long do patients have to wait?
- when does the last patient leave the practice?
We do this by simulation. The arrivals
function will simulate the arrival times of the patients, in minutes after 09:00. In principle, we should simulate draws from the exponential distribution until the sum of all draws is above 420, the number of minutes the practice accepts patients. However, I couldn’t find any efficient way to run this in R. Instead we’ll draw so many variables such that is is highly unlikely that we have too few, then just filter out what we don’t need.
To calculate a suitably large number, note that the number of patients in one day is \(\dpois(\frac{1}{10} \cdot (16 - 9) \cdot 60)\)-distributed. The 99.99999% percentile of this distribution is qpois(0.9999999, (16 - 9) * 6) =
80. We’ll err on the safe side and use \(n=100\).
arrivals <- function(λ, t, n=100) {
rexp(n, λ) %>%
tibble(
delay = .,
time = cumsum(delay)
) %>%
filter(time <= t) %>%
pull(time) %>%
return()
}
λ <- 1 / 10
t <- (16 - 9) * 60
arrivals(λ, t)
[1] 4.854265 4.963889 6.651014 9.518990 17.415709 20.110178
[7] 28.852188 34.862538 35.970215 44.205468 48.342152 52.934693
[13] 83.072579 86.746318 117.586517 122.811176 133.662687 142.016603
[19] 170.935913 190.999325 202.511439 204.915770 205.191951 208.422873
[25] 219.437526 225.162971 233.122550 235.351649 253.558658 254.097711
[31] 255.639118 256.270049 277.905899 291.055504 291.737173 294.688419
[37] 300.949679 302.681417 329.751112 335.998940 355.506712 361.162687
[43] 381.436543 388.767558 393.072689 393.088807 395.570811 401.669343
[49] 401.911643 408.650710 418.291196
Given the patients that arrive in a day, we now need a function to simulate the appointments. Let’s assume the patients get seen in the order they arrive. As we cycle through the patients, we’ll keep track of
n_waited
, the number of patients who have had to wait for their appointment so far;time_waiting
, the sum of all waiting times of the patients so far; anddoctors
, the next time at which each doctor is free to see another patient.
The doctors
variable starts at c(0, 0, 0)
because they are immediately availble to see patients. The doctor with the smallest availability time is the next doctor to see a patient. The start of the appointment is either the doctor’s availability time or the arrival time of the patient, whichever is greater. The end of the appointment is \(\duniform(5, 20)\)-minutes after the start of the appointment. The doctor’s availability time is then set to the end of the appointment. Once all patients have been given an appointment, the closing time is the maximum of the doctors’ next availability times or the closing time (16 - 9) * 60
, whichever is greater.
process <- function(arrivals, t=0) {
n_waited <- 0 # number of patients who have had to wait so far
time_waiting <- 0 # total waiting time so far
doctors <- c(0, 0, 0) # next time at which each doctor is free to see another patient
for(i in (1:length(arrivals))) {
wait <- pmax(min(doctors) - arrivals[i], 0) # waiting time of patient i
time_waiting <- time_waiting + wait
n_waited <- n_waited + (wait > 0)
appointment_start <- max(c(min(doctors), arrivals[i]))
appointment_end <- appointment_start + runif(1, 5, 20)
doctors[which.min(doctors)] <- appointment_end
}
list(
n_patients = length(arrivals),
n_waited = n_waited,
time_waiting = time_waiting,
time_waiting_per_patient = time_waiting / length(arrivals),
time_waiting_per_waiting_patient = time_waiting / n_waited,
closing_time = pmax(max(doctors), t)
) %>% return()
}
arrivals(λ, t) %>%
process(t)
$n_patients
[1] 39
$n_waited
[1] 0
$time_waiting
[1] 0
$time_waiting_per_patient
[1] 0
$time_waiting_per_waiting_patient
[1] NaN
$closing_time
[1] 426.9273
To simulate the above many times, we’ll use the replicate
function. For convenience, we’ll turn this into a tibble
.
simulate <- function(iters, λ, t, n=100) {
iters %>%
replicate(process(arrivals(λ, t, n), t)) %>%
t() %>%
as_tibble() %>%
mutate_all(unlist)
}
sims <- simulate(1000, λ, t)
n_patients | n_waited | time_waiting | time_waiting_per_patient | time_waiting_per_waiting_patient | closing_time |
---|---|---|---|---|---|
45 | 8 | 10.493347 | 0.2331855 | 1.311668 | 421.2221 |
44 | 6 | 13.226377 | 0.3005995 | 2.204396 | 435.4524 |
29 | 1 | 1.174172 | 0.0404887 | 1.174172 | 435.9763 |
31 | 1 | 9.413756 | 0.3036696 | 9.413756 | 440.1210 |
40 | 5 | 16.692607 | 0.4173152 | 3.338521 | 431.8950 |
37 | 2 | 11.329963 | 0.3062152 | 5.664981 | 420.0000 |
Finally, we can calculate the 50% intervals by applying the quantile
function to each summary.
sims_summary <- sims %>%
gather(variable, value) %>%
group_by(variable) %>%
summarise(
q25 = quantile(value, 0.25, na.rm=TRUE),
q50 = quantile(value, 0.5, na.rm=TRUE),
q75 = quantile(value, 0.75, na.rm=TRUE),
simulations = n()
)
variable | q25 | q50 | q75 | simulations |
---|---|---|---|---|
closing_time | 420.000000 | 424.9488049 | 430.7023807 | 1000 |
n_patients | 38.000000 | 42.0000000 | 46.2500000 | 1000 |
n_waited | 3.000000 | 5.0000000 | 9.0000000 | 1000 |
time_waiting | 8.266744 | 19.3983242 | 38.9367142 | 1000 |
time_waiting_per_patient | 0.212705 | 0.4685262 | 0.8622254 | 1000 |
time_waiting_per_waiting_patient | 2.660497 | 3.8266619 | 5.1309744 | 1000 |