Questioning lifecycle

Perform repeated sampling of samples of size n. Useful for creating sampling distributions.

rep_sample_n(tbl, size, replace = FALSE, reps = 1, prob = NULL)

Arguments

tbl

Data frame of population from which to sample.

size

Sample size of each sample.

replace

Should sampling be with replacement?

reps

Number of samples of size n = size to take.

prob

A vector of probability weights for obtaining the elements of the vector being sampled.

Value

A tibble of size rep times size rows corresponding to rep samples of size n = size from tbl.

Examples

suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(ggplot2)) # A virtual population of N = 10,010, of which 3091 are hurricanes population <- dplyr::storms %>% select(status) # Take samples of size n = 50 storms without replacement; do this 1000 times samples <- population %>% rep_sample_n(size = 50, reps = 1000) samples
#> # A tibble: 50,000 x 2 #> # Groups: replicate [1,000] #> replicate status #> <int> <chr> #> 1 1 tropical storm #> 2 1 hurricane #> 3 1 tropical depression #> 4 1 tropical depression #> 5 1 hurricane #> 6 1 tropical depression #> 7 1 tropical depression #> 8 1 tropical depression #> 9 1 hurricane #> 10 1 tropical storm #> # … with 49,990 more rows
# Compute p_hats for all 1000 samples = proportion hurricanes p_hats <- samples %>% group_by(replicate) %>% summarize(prop_hurricane = mean(status == "hurricane")) p_hats
#> # A tibble: 1,000 x 2 #> replicate prop_hurricane #> <int> <dbl> #> 1 1 0.38 #> 2 2 0.28 #> 3 3 0.32 #> 4 4 0.38 #> 5 5 0.3 #> 6 6 0.38 #> 7 7 0.28 #> 8 8 0.32 #> 9 9 0.32 #> 10 10 0.26 #> # … with 990 more rows
# Plot sampling distribution ggplot(p_hats, aes(x = prop_hurricane)) + geom_density() + labs(x = "p_hat", y = "Number of samples", title = "Sampling distribution of p_hat from 1000 samples of size 50")