Title: | Simulate Models Based on the Generalized Linear Model |
---|---|
Description: | Simulates regression models, including both simple regression and generalized linear mixed models with up to three level of nesting. Power simulations that are flexible allowing the specification of missing data, unbalanced designs, and different random error distributions are built into the package. |
Authors: | Brandon LeBeau [aut, cre] |
Maintainer: | Brandon LeBeau <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.9.20 |
Built: | 2025-03-04 04:18:29 UTC |
Source: | https://github.com/lebebr01/simglm |
Convenience function for computing density values for plotting.
compute_density_values(data, group_var, parameter, values)
compute_density_values(data, group_var, parameter, values)
data |
A dataframe that contains the parameter estimates. |
group_var |
A group variable that specifies the attributes to group by. By default, this would likely be the term attribute, but can contain more than one attribute. |
parameter |
The attribute that represents the parameter estimate. |
values |
A list of numeric vectors that specifies the values for which the density values are computed for. |
Compute Power, Type I Error, or Precision Statistics
compute_statistics( data, sim_args, power = TRUE, type_1_error = TRUE, precision = TRUE, alternative_power = FALSE, type_s_error = FALSE, type_m_error = FALSE )
compute_statistics( data, sim_args, power = TRUE, type_1_error = TRUE, precision = TRUE, alternative_power = FALSE, type_s_error = FALSE, type_m_error = FALSE )
data |
A list of model results generated by |
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
power |
TRUE/FALSE flag indicating whether power should be computed. Defaults to TRUE. |
type_1_error |
TRUE/FALSE flag indicating whether type I error rate should be computed. Defaults to TRUE. |
precision |
TRUE/FALSE flag indicating whether precision should be computed. Defaults to TRUE. |
alternative_power |
TRUE/FALSE flag indicating whether alternative power estimates should be computed. If TRUE, this must be accompanied by thresholds specified within the power simulation arguments. Defaults to FALSE. |
type_s_error |
TRUE/FALSE flag indicating whether Type S error should be computed. Defaults to FALSE. |
type_m_error |
TRUE/FALSE flag indicating whether Type M error should be computed. Defaults to FALSE. |
Correlate elements
correlate_variables(data, sim_args, ...)
correlate_variables(data, sim_args, ...)
data |
Data simulated from other functions to pass to this function. |
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
... |
Additional arguments, currently not used. |
Input the desired variance, number of distributions, and mean of the distributions, returns a value of the variance of each mixture distribution.
desireVar(desVar, num_dist, means, equalWeight = TRUE)
desireVar(desVar, num_dist, means, equalWeight = TRUE)
desVar |
Desired overall variance of mixture normal distribution. |
num_dist |
Number of normal distributions. |
means |
Vector of means for each normal distribution. Must equal num_dist. |
equalWeight |
Should equal weights be used, only TRUE is currently supported. |
This function can be used to generate the inputs for the rbimod
variances when a specific variance is desired. Especially useful when
attempting to simulate a mixture normal/bimodal distribution.
Extract Coefficients
extract_coefficients(model, extract_function = NULL)
extract_coefficients(model, extract_function = NULL)
model |
A returned model object from a fitted model. |
extract_function |
A function that extracts model results. The function must take the model object as the only argument. |
Tidy Missing Data Function
generate_missing(data, sim_args)
generate_missing(data, sim_args)
data |
Data simulated from other functions to pass to this function. |
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
Simulate response variable
generate_response(data, sim_args, keep_intermediate = TRUE, ...)
generate_response(data, sim_args, keep_intermediate = TRUE, ...)
data |
Data simulated from other functions to pass to this function. |
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
keep_intermediate |
TRUE/FALSE flag indicating whether intermediate steps should be kept. This would include fixed effects times regression weights, random effect summations, etc. Default is TRUE. |
... |
Other arguments to pass to error simulation functions. |
Function that inputs simulated data and returns data frame with new response variable that includes missing data. Missing data types incorporated include dropout missing data, missing at random, and random missing data.
missing_data( sim_data, resp_var = "sim_data", new_outcome = "sim_data2", clust_var = NULL, within_id = NULL, miss_prop = NULL, dropout_location = NULL, type = c("dropout", "random", "mar"), miss_cov, mar_prop ) dropout_missing( sim_data, resp_var = "sim_data", new_outcome = "sim_data2", clust_var = "clustID", within_id = "withinID", miss_prop = NULL, dropout_location = NULL ) random_missing( sim_data, resp_var = "sim_data", new_outcome = "sim_data2", miss_prop, clust_var = NULL, within_id = "withinID" ) mar_missing( sim_data, resp_var = "sim_data", new_outcome = "sim_data2", miss_cov, mar_prop )
missing_data( sim_data, resp_var = "sim_data", new_outcome = "sim_data2", clust_var = NULL, within_id = NULL, miss_prop = NULL, dropout_location = NULL, type = c("dropout", "random", "mar"), miss_cov, mar_prop ) dropout_missing( sim_data, resp_var = "sim_data", new_outcome = "sim_data2", clust_var = "clustID", within_id = "withinID", miss_prop = NULL, dropout_location = NULL ) random_missing( sim_data, resp_var = "sim_data", new_outcome = "sim_data2", miss_prop, clust_var = NULL, within_id = "withinID" ) mar_missing( sim_data, resp_var = "sim_data", new_outcome = "sim_data2", miss_cov, mar_prop )
sim_data |
Simulated data frame |
resp_var |
Character string of response variable with complete data. |
new_outcome |
Character string of new outcome variable name that includes the missing data. |
clust_var |
Cluster variable used for the grouping, set to NULL by default which means no clustering. |
within_id |
ID variable within each cluster. |
miss_prop |
Proportion of missing data overall |
dropout_location |
A vector the same length as the number of clusters representing the number of data observations for each individual. |
type |
The type of missing data to generate, currently supports dropout, random, or missing at random (mar) missing data. |
miss_cov |
Covariate that the missing values are based on. |
mar_prop |
Proportion of missing data for each unique value specified in the miss_cov argument. |
Tidy Model Fitting Function
model_fit(data, sim_args, ...)
model_fit(data, sim_args, ...)
data |
A data object, most likely generated from within simglm |
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
... |
Currently not used. |
This function is used to parse user specified correlation attributes. The correlation attributes need to be in a dataframe to be processed internally. Within the dataframe, there are expected to be 3 columns, 1) names of variable/attributes, 2) the variable/attribute pair for 1, 3) the correlation.
parse_correlation(sim_args)
parse_correlation(sim_args)
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
A function that parses the formula simulation syntax in order to simulate data.
parse_formula(sim_args)
parse_formula(sim_args)
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
Parse Multiple Membership Random Effects
parse_multiplemember(sim_args, random_formula_parsed)
parse_multiplemember(sim_args, random_formula_parsed)
sim_args |
Simulation arguments |
random_formula_parsed |
This is the output from
|
Parse power specifications
parse_power(sim_args, samp_size)
parse_power(sim_args, samp_size)
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
samp_size |
The sample size pulled from the simulation arguments or the power model results when vary_arguments is used. |
Parses random effect specification
parse_randomeffect(formula)
parse_randomeffect(formula)
formula |
Random effect formula already parsed by |
Parse between varying arguments
parse_varyarguments(sim_args)
parse_varyarguments(sim_args)
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
Parse within varying arguments
parse_varyarguments_w(sim_args, name)
parse_varyarguments_w(sim_args, name)
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
name |
The name of the within simulation condition. This is primarily an internal function. |
Input simulation metrics returns mixture normal random variable.
rbimod(n, mean, var, num_dist)
rbimod(n, mean, var, num_dist)
n |
Number of random draws. Optionally can be a vector with number in each simulated normal distribution. |
mean |
Vector of mean values for each normal distribution. Must be the same length as num_dist. |
var |
Vector of variance values for each normal distribution. Must be the same length as num_dist. |
num_dist |
Number of normal distributions to use when simulating mixture normal distribution. |
Function to simulate mixture normal distributions. The function computes adds the specified number of normal distributions into a single vector.
Use of the function desireVar
can be used to generate a mixture
normal distribution with a specific global variance.
Replicate Simulation
replicate_simulation(sim_args, return_list = FALSE, future.seed = TRUE, ...)
replicate_simulation(sim_args, return_list = FALSE, future.seed = TRUE, ...)
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
return_list |
TRUE/FALSE indicating whether a full list output should be returned. If TRUE, the nested list is returned. If FALSE, replications are combined with a replication id appended. |
future.seed |
TRUE/FALSE or numeric. Default value is true, see
|
... |
Currently not used. |
Function runs Shiny Application Demo
run_shiny()
run_shiny()
This function does not take any arguments and will run the Shiny Application. If running from RStudio, will open the application in the viewer, otherwise will use the default internet browser.
Function that simulates continuous variables. Any distribution function in R is supported.
sim_continuous2( n, dist = "rnorm", var_level = 1, variance = NULL, ther_sim = FALSE, ther_val = NULL, ceiling = NULL, floor = NULL, ... )
sim_continuous2( n, dist = "rnorm", var_level = 1, variance = NULL, ther_sim = FALSE, ther_val = NULL, ceiling = NULL, floor = NULL, ... )
n |
A list of sample sizes. |
dist |
A distribution function. This argument takes a quoted R distribution function (e.g. 'rnorm'). Default is 'rnorm'. |
var_level |
The level the variable should be simulated at. This can either be 1, 2, or 3 specifying a level 1, level 2, or level 3 variable respectively. |
variance |
The variance for random effect simulation. |
ther_sim |
A TRUE/FALSE flag indicating whether the error simulation function should be simulated, that is should the mean and standard deviation used for standardization be simulated. |
ther_val |
A vector of 2 that should include the theoretical mean and standard deviation of the generating function. |
ceiling |
A numeric value that specifies the ceiling (maximum) of an attribute being generated. Defaults to NULL meaning no ceiling effect. If a value is specified, any data larger than integer is rounded to that ceiling value. |
floor |
A numeric value that specifies the floor (minimum) of an attribute being generated. Defaults to NULL meaning no floor effect. If a value is specified, any data larger than integer is rounded to that floor value. |
... |
Additional parameters to pass to the dist_fun argument. |
Function that simulates factor or categorical variables. Is essentially a wrapper around the sample function from base R.
sim_factor2(n, levels, var_level = 1, replace = TRUE, force_equal = FALSE, ...)
sim_factor2(n, levels, var_level = 1, replace = TRUE, force_equal = FALSE, ...)
n |
A list of sample sizes. |
levels |
Scalar indicating the number of levels for categorical or factor variable. Can also specify levels as a character vector. |
var_level |
The level the variable should be simulated at. This can either be 1, 2, or 3 specifying a level 1, level 2, or level 3 variable respectively. |
replace |
TRUE/FALSE indicating whether levels should be sampled with replacement. Default is TRUE. |
force_equal |
TRUE/FALSE indicating if the sample size should be forced to be equal. Should not be used with the 'replace = FALSE' argument. |
... |
Additional parameters passed to the sample function. |
Function that simulates discrete variables. Is essentially a wrapper around the sample function from base R.
sim_ordinal2(n, levels, var_level = 1, replace = TRUE, ...)
sim_ordinal2(n, levels, var_level = 1, replace = TRUE, ...)
n |
A list of sample sizes. |
levels |
Scalar indicating the number of levels for discrete variable. Can also specify levels as a character vector. |
var_level |
The level the variable should be simulated at. This can either be 1, 2, or 3 specifying a level 1, level 2, or level 3 variable respectively. |
replace |
TRUE/FALSE indicating whether levels should be sampled with replacement. Default is TRUE. |
... |
Additional parameters passed to the sample function. |
This function simulates data for the time variable of longitudinal data.
sim_time(n, time_levels = NULL, ...)
sim_time(n, time_levels = NULL, ...)
n |
Sample size of the levels. |
time_levels |
The values the time variable should take. If NULL (default), the time values are discrete integers starting at 0 and going to n - 1. |
... |
Currently not used. |
This function is most useful to pass to replicate_simulation
.
The function attempts to determine automatically which aspects to add to
the simulation/power generation based on the elements found in the sim_args
argument.
simglm(sim_args)
simglm(sim_args)
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
Tidy error simulation
simulate_error(data, sim_args, ...)
simulate_error(data, sim_args, ...)
data |
Data simulated from other functions to pass to this function. |
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
... |
Other arguments to pass to error simulation functions. |
This function simulates the fixed portion of the model using a formula syntax.
simulate_fixed(data, sim_args, ...)
simulate_fixed(data, sim_args, ...)
data |
Data simulated from other functions to pass to this function. Can pass NULL if first in simulation string. |
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
... |
Other arguments to pass to error simulation functions. |
This function simulates heterogeneity of level one error variance.
simulate_heterogeneity(data, sim_args, ...)
simulate_heterogeneity(data, sim_args, ...)
data |
Data simulated from other functions to pass to this function. This function needs to be specified after 'simulate_fixed' and 'simulate_error'. |
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
... |
Other arguments to pass to error simulation functions. |
Function that generates knot locations. An example of usefulness of this function would be with generation of interrupted time series data. Another application may be with simulation of piecewise linear data structures.
simulate_knot(data, sim_args)
simulate_knot(data, sim_args)
data |
Mostly internal argument. |
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
This function simulates the random portion of the model using a formula syntax.
simulate_randomeffect(data, sim_args, ...)
simulate_randomeffect(data, sim_args, ...)
data |
Data simulated from other functions to pass to this function. Can pass NULL if first in simulation string. |
sim_args |
A named list with special model formula syntax. See details and examples for more information. The named list may contain the following:
|
... |
Other arguments to pass to error simulation functions. |
Transform response variable
transform_outcome(outcome, type, categories = NULL, ...)
transform_outcome(outcome, type, categories = NULL, ...)
outcome |
The outcome variable to transform. |
type |
Type of transformation to apply. |
categories |
A vector of named categories for multinomial sim |
... |
Additional arguments passed to distribution functions. |