Title: | Data Modification and Analysis for Communication Research |
---|---|
Description: | Provides convenience functions for common data modification and analysis tasks in communication research. This includes functions for univariate and bivariate data analysis, index generation and reliability computation, and intercoder reliability tests. All functions follow the style and syntax of the tidyverse, and are construed to perform their computations on multiple variables at once. Functions for univariate and bivariate data analysis comprise summary statistics for continuous and categorical variables, as well as several tests of bivariate association including effect sizes. Functions for data modification comprise index generation and automated reliability analysis of index variables. Functions for intercoder reliability comprise tests of several intercoder reliability estimates, including simple and mean pairwise percent agreement, Krippendorff's Alpha (Krippendorff 2004, ISBN: 9780761915454), and various Kappa coefficients (Brennan & Prediger 1981 <doi: 10.1177/001316448104100307>; Cohen 1960 <doi: 10.1177/001316446002000104>; Fleiss 1971 <doi: 10.1037/h0031619>). |
Authors: | Julian Unkel [aut, cre] |
Maintainer: | Julian Unkel <[email protected]> |
License: | GPL-3 |
Version: | 0.4.1 |
Built: | 2025-03-03 04:35:01 UTC |
Source: | https://github.com/joon-e/tidycomm |
Add a rowwise mean or sum index of specific variables to the dataset.
add_index(data, name, ..., type = "mean", na.rm = TRUE, cast.numeric = FALSE)
add_index(data, name, ..., type = "mean", na.rm = TRUE, cast.numeric = FALSE)
data |
|
name |
Name of the index column to compute. |
... |
Variables used for the index. |
type |
Type of index to compute. Either "mean" (default) or "sum". |
na.rm |
a logical value indicating whether |
cast.numeric |
a logical value indicating whether all variables selected
for index computation should be converted to numeric. Useful if computing
indices from factor variables. Defaults to |
a tdcmm model
get_reliability()
to compute reliability estimates of added index
variables.
WoJ %>% add_index(ethical_flexibility, ethics_1, ethics_2, ethics_3, ethics_4) WoJ %>% add_index(ethical_flexibility, ethics_1, ethics_2, ethics_3, ethics_4, type = "sum")
WoJ %>% add_index(ethical_flexibility, ethics_1, ethics_2, ethics_3, ethics_4) WoJ %>% add_index(ethical_flexibility, ethics_1, ethics_2, ethics_3, ethics_4, type = "sum")
This function recodes one or more numeric variables into categorical variables based on a specified lower end, upper end, and intermediate breaks. The intervals created include the right endpoint of the interval. For example, breaks = c(2, 3) with lower_end = 1 and upper_end = 5 creates intervals from 1 to <= 2, >2 to <= 3, and >3 to <= 5. If the lower or upper ends are not provided, the function defaults to the minimum and maximum values of the data and issues a warning. This default behavior is prone to errors, however, because a scale may not include its actual lower and upper ends which might in turn affect the recoding process. Hence, it is strongly suggested to manually set the lower and upper bounds of the original continuous scale.
categorize_scale( data, ..., breaks, labels, lower_end = NULL, upper_end = NULL, name = NULL, overwrite = FALSE )
categorize_scale( data, ..., breaks, labels, lower_end = NULL, upper_end = NULL, name = NULL, overwrite = FALSE )
data |
A tibble or a tdcmm model. |
... |
Variables to recode as factor variables in categories. If no variables are specified, all numeric columns will be recoded. |
breaks |
A vector of numeric values specifying the breaks for categorizing the data between the lower and upper ends. The breaks define the boundaries of the intervals. Setting this parameter is required. |
labels |
A vector of string labels for each interval. The number of labels must match the number of intervals defined by the breaks and lower/upper ends.Setting this parameter is required. |
lower_end |
Optional numeric value specifying the lower end of the scale. If not provided, defaults to the minimum value of the data. |
upper_end |
Optional numeric value specifying the upper end of the scale. If not provided, defaults to the maximum value of the data. |
name |
Optional string specifying the name of the new variable(s).
By default, the new variable names are the original variable names suffixed
with |
overwrite |
Logical indicating whether to overwrite the original
variable(s) with the new categorical variables. If |
A modified tibble or tdcmm model with the recoded variables.
Other scaling:
center_scale()
,
dummify_scale()
,
minmax_scale()
,
recode_cat_scale()
,
reverse_scale()
,
setna_scale()
,
z_scale()
WoJ %>% dplyr::select(trust_parliament, trust_politicians) %>% categorize_scale(trust_parliament, trust_politicians, lower_end = 1, upper_end = 5, breaks = c(2, 3), labels = c("Low", "Medium", "High"), overwrite = FALSE) WoJ %>% dplyr::select(autonomy_selection) %>% categorize_scale(autonomy_selection, breaks = c(2, 3, 4), lower_end = 1, upper_end = 5, labels = c("Low", "Medium", "High", "Very High"), name = "autonomy_in_categories")
WoJ %>% dplyr::select(trust_parliament, trust_politicians) %>% categorize_scale(trust_parliament, trust_politicians, lower_end = 1, upper_end = 5, breaks = c(2, 3), labels = c("Low", "Medium", "High"), overwrite = FALSE) WoJ %>% dplyr::select(autonomy_selection) %>% categorize_scale(autonomy_selection, breaks = c(2, 3, 4), lower_end = 1, upper_end = 5, labels = c("Low", "Medium", "High", "Very High"), name = "autonomy_in_categories")
This function centers the specified numeric columns or all numeric columns if none are specified. A centered scale has a mean of 0.0.
center_scale(data, ..., name = NULL, overwrite = FALSE)
center_scale(data, ..., name = NULL, overwrite = FALSE)
data |
|
... |
Numeric variables to be centered. If none are provided, all numeric columns will be centered. |
name |
Optional name for the new centered variable when a single
variable is provided. By default, the name will be the original variable
name suffixed with |
overwrite |
Logical. If |
A tdcmm model with the centered variable(s).
Other scaling:
categorize_scale()
,
dummify_scale()
,
minmax_scale()
,
recode_cat_scale()
,
reverse_scale()
,
setna_scale()
,
z_scale()
WoJ %>% dplyr::select(autonomy_emphasis) %>% center_scale(autonomy_emphasis) WoJ %>% center_scale(autonomy_emphasis, name = "my_centered_variable") WoJ %>% center_scale(overwrite = TRUE) WoJ %>% center_scale(autonomy_emphasis) %>% tab_frequencies(autonomy_emphasis, autonomy_emphasis_centered)
WoJ %>% dplyr::select(autonomy_emphasis) %>% center_scale(autonomy_emphasis) WoJ %>% center_scale(autonomy_emphasis, name = "my_centered_variable") WoJ %>% center_scale(overwrite = TRUE) WoJ %>% center_scale(autonomy_emphasis) %>% tab_frequencies(autonomy_emphasis, autonomy_emphasis_centered)
Computes correlation coefficients for all combinations of the specified variables. If no variables are specified, all numeric (integer or double) variables are used.
correlate(data, ..., method = "pearson", partial = NULL, with = NULL)
correlate(data, ..., method = "pearson", partial = NULL, with = NULL)
data |
|
... |
Variables to compute correlations for (column names). Leave empty to compute for all numeric variables in data. |
method |
a character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman" |
partial |
Specifies a variable to be used as a control in a partial correlation.
By default, this parameter is set to |
with |
Specifies a focus variable to correlate all other variables with.
By default, this parameter is set to |
a tdcmm model
WoJ %>% correlate(ethics_1, ethics_2, ethics_3) WoJ %>% correlate() WoJ %>% correlate(ethics_1, ethics_2, ethics_3, with = work_experience) WoJ %>% correlate(autonomy_selection, autonomy_emphasis, partial = work_experience) WoJ %>% correlate(with = work_experience)
WoJ %>% correlate(ethics_1, ethics_2, ethics_3) WoJ %>% correlate() WoJ %>% correlate(ethics_1, ethics_2, ethics_3, with = work_experience) WoJ %>% correlate(autonomy_selection, autonomy_emphasis, partial = work_experience) WoJ %>% correlate(with = work_experience)
Computes contingency table for one independent (column) variable and one or more dependent (row) variables.
crosstab( data, col_var, ..., add_total = FALSE, percentages = FALSE, chi_square = FALSE )
crosstab( data, col_var, ..., add_total = FALSE, percentages = FALSE, chi_square = FALSE )
data |
|
col_var |
Independent (column) variable. |
... |
Dependent (row) variables. |
add_total |
Logical indicating whether a 'Total' column should be
computed. Defaults to |
percentages |
Logical indicating whether to output column-wise
percentages instead of absolute values. Defaults to |
chi_square |
Logical indicating whether a Chi-square test should be computed.
Test results will be reported via message(). Defaults to |
a tdcmm model
Other categorical:
tab_frequencies()
WoJ %>% crosstab(reach, employment) WoJ %>% crosstab(reach, employment, add_total = TRUE, percentages = TRUE, chi_square = TRUE)
WoJ %>% crosstab(reach, employment) WoJ %>% crosstab(reach, employment, add_total = TRUE, percentages = TRUE, chi_square = TRUE)
Describe numeric variables by several measures of central tendency and variability. If no variables are specified, all numeric (integer or double) variables are described.
describe(data, ..., na.rm = TRUE)
describe(data, ..., na.rm = TRUE)
data |
|
... |
Variables to describe (column names). Leave empty to describe all numeric variables in data. |
na.rm |
a logical value indicating whether |
N: number of valid cases (i.e., all but missing)
Missing: number of NA cases
M: mean average
SD: standard deviation, sd
Min: minimum value, min
Q25: 25% quantile, quantile
Mdn: median average, same as 50% quantile
Q75: 75% quantile, quantile
Max: maximum value, max
Range: difference between Min and Max
CI_95_LL: where
denotes Student t's stats::quantile function with a probability of
and
degrees of freedom
CI_95_UL: where
denotes Student t's stats::quantile function with a probability of
and
degrees of freedom
Skewness: traditional Fisher-Pearson coefficient of skewness of valid cases as per
where
denotes
, following Doane & Seward (2011, p. 6, 1a). See DOI doi:10.1080/10691898.2011.11889611.
Kurtosis: empirical sample kurtosis (i.e., standardized fourth population moment about the mean) as per , following DeCarlo (1997, p. 292, b2). See DOI doi:10.1037/1082-989X.2.3.292.
a tdcmm model
Other descriptives:
describe_cat()
,
tab_percentiles()
WoJ %>% describe(autonomy_selection, autonomy_emphasis, work_experience) fbposts %>% describe(n_pictures)
WoJ %>% describe(autonomy_selection, autonomy_emphasis, work_experience) fbposts %>% describe(n_pictures)
Describe categorical variables by N, number of unique values, and mode. Note that in case of multiple modes, the first mode by order of values is chosen.
describe_cat(data, ...)
describe_cat(data, ...)
data |
|
... |
Variables to describe (column names). Leave empty to describe all categorical variables in data. |
If no variables are specified, all categorical (character or factor) variables are described.
N: number of valid cases (i.e., all but missing)
Missing: number of NA cases
Unique: number of unique categories in a given variable, without Missing
Mode: mode average (if multiple modes exist, first mode by order of values is returned)
Mode_N: number of cases reflecting the Mode
a tdcmm model
Other descriptives:
describe()
,
tab_percentiles()
WoJ %>% describe_cat(reach, employment, temp_contract) fbposts %>% describe_cat(type)
WoJ %>% describe_cat(reach, employment, temp_contract) fbposts %>% describe_cat(type)
Gray design
design_gray()
design_gray()
a list with main_color_1
, a vector of 12 main_colors
, a
corresponding main_contrast_1
(the color of text to write on top of the
main color) and a corresponding main_contrasts
, the main_size
(for
lines), a comparison_linetype
, comparison_color
, and comparison_size
for all lines that act as comparative lines, and a ggplot2 theme
Grey design
design_grey()
design_grey()
a list with main_color_1
, a vector of 12 main_colors
, a
corresponding main_contrast_1
(the color of text to write on top of the
main color) and a corresponding main_contrasts
, the main_size
(for
lines), a comparison_linetype
, comparison_color
, and comparison_size
for all lines that act as comparative lines, and a ggplot2 theme
Colorbrewer-inspired design with focus on LMU (lmu.de) green
design_lmu()
design_lmu()
a list with main_color_1
, a vector of 12 main_colors
, a
corresponding main_contrast_1
(the color of text to write on top of the
main color) and a corresponding main_contrasts
, the main_size
(for
lines), a comparison_linetype
, comparison_color
, and comparison_size
for all lines that act as comparative lines, and a ggplot2 theme
This function transforms specified categorical variables into dummy variables. Each level of the categorical variable is represented by a new dummy variable. Missing values are retained. These new dummy variables are appended to the original data frame. This function does not allow specifying new column names for the dummy variables. Instead, it follows a consistent naming pattern: the new dummy variables are named using the original variable name with the category value appended. For example, if a categorical variable named "autonomy" with levels "low", "medium", "high" is dummified, the new dummy variables will be named "autonomy_low", "autonomy_medium", "autonomy_high".
dummify_scale(data, ..., overwrite = FALSE)
dummify_scale(data, ..., overwrite = FALSE)
data |
|
... |
Categorical variables to be transformed into dummy variables. Category names will be automatically appended to the newly created dummy variables. |
overwrite |
Logical. If |
A tdcmm model with the dummy variables appended.
Other scaling:
categorize_scale()
,
center_scale()
,
minmax_scale()
,
recode_cat_scale()
,
reverse_scale()
,
setna_scale()
,
z_scale()
WoJ %>% dplyr::select(temp_contract) %>% dummify_scale(temp_contract) WoJ %>% categorize_scale(autonomy_emphasis, breaks = c(2, 3), labels = c('low', 'medium', 'high')) %>% dummify_scale(autonomy_emphasis_cat) %>% dplyr::select(starts_with('autonomy_emphasis'))
WoJ %>% dplyr::select(temp_contract) %>% dummify_scale(temp_contract) WoJ %>% categorize_scale(autonomy_emphasis, breaks = c(2, 3), labels = c('low', 'medium', 'high')) %>% dummify_scale(autonomy_emphasis_cat) %>% dplyr::select(starts_with('autonomy_emphasis'))
45 political facebook posts coded by 6 coders for an intercoder reliability test, focused on populist messages.
fbposts
fbposts
A data frame with 270 rows and 7 variables
Numeric id of the coded Facebook post
Numeric id of the coder
Type of Facebook post, one of "link", "photo", "status", or "video
Amount of pictures attached to the post, ranges from 0 to 6
Populism indicator: Does the Facebook post attack elites?, 0 = "no attacks on elites", 1 = "attacks political actors", 2 = "attacks public administration actors", 3 = "attacks economical actors", 4 = "attacks media actors/journalists", 9 = "attacks other elites"
Populism indicator: Does the Facebook refer to 'the people'?, 0 = "does not refer to 'the people'", 1 = "refers to 'the people'"
Populism indicator: Does the Facebook attack 'others'?, 0 = "no attacks on 'others'", 1 = "attacks other cultures", 2 = "attacks other political stances", 3 = "attacks other 'others'"
Get reliability estimates of index variables created with add_index
.
get_reliability( data, ..., type = "alpha", interval.type = NULL, bootstrap.samples = NULL, conf.level = NULL, progress = FALSE )
get_reliability( data, ..., type = "alpha", interval.type = NULL, bootstrap.samples = NULL, conf.level = NULL, progress = FALSE )
data |
|
... |
Index variables created with |
type |
Type of reliability estimate. See |
interval.type |
Type of reliability estimate confidence interval.
See |
bootstrap.samples |
Number of bootstrap samples for CI calculation.
See |
conf.level |
Confidence level for estimate CI.
See |
progress |
Show progress for reliability estimate computation. Useful if using computationally intense computations (e. g., many bootstrapping samples) and many index variables. |
a tdcmm model
add_index()
to create index variables
WoJ %>% add_index(ethical_flexibility, ethics_1, ethics_2, ethics_3, ethics_4) %>% get_reliability()
WoJ %>% add_index(ethical_flexibility, ethics_1, ethics_2, ethics_3, ethics_4) %>% get_reliability()
A dataset of a preregistered factorial survey experiment with a nationally representative sample of 964 German online users. Participants were presented with manipulated user comments that included statements associated with incivil discourse (such as profanity and attacks on arguments) and intolerant discourse (such as offensive stereotyping and violent threats). Participants rated the comments, e.g. offensiveness, harm to society, and their intention to delete the comment containing the statement.
incvlcomments
incvlcomments
A data frame of 3856 observations nested in 964 participants and 22 variables:
Numeric id of the participant
Age of the participant
Gender of the participant, either 'male' or 'not male'
Level of formal education of the participant, either 'high formal education' or 'low formal education'
Numeric id of the comment that the participant was exposed to
The subject of the comment that the participant was exposed to, either 'Gender', 'Abortion', 'Climate', or 'Migration'
Whether the comment contained profanities as an indicator of incivility
Whether the comment contained attacks towards arguments as an indicator of incivility
Whether the comment contained offensive stereotypes as an indicator of intolerant discourse
Whether the comment contained violent threats as an indicator of intolerant discourse
Rate statement whether the comment is being perceived as offensive & hostile (Scale from 1 to 7)
Rate statement whether the comment is being perceived as necessary & accurate (Scale from 1 to 7)
Rate statement whether the comment is being perceived as harmful to society (Scale from 1 to 7)
Whether the participant wants to delete the comment
How similar the participant feels to the person who created the post (Scale from 1 to 7)
How similar the participant feels to the group of people criticized in the post (Scale from 1 to 7)
Rate agreement with statements on gender policies (Scale from 1 to 7)
Rate agreement with statements on abortion (Scale from 1 to 7)
Rate agreement with statements on migration (Scale from 1 to 7)
Rate agreement with statements on climate change (Scale from 1 to 7)
Placement on a political spectrum from left to right (Scale from 1 to 9)
Rate agreement with statements about the freedom of speech and expression (Scale from 1 to 7)
The dataset was created from the OSF project: Differential perceptions of and reactions to incivil and intolerant user comments, corresponding to the paper: Kümpel, A. S., Unkel, J (2023). Differential perceptions of and reactions to incivil and intolerant user comments, Journal of Computer-Mediated Communication, Volume 28, Issue 4, https://doi.org/10.1093/jcmc/zmad018
Given a specified minimum and maximum, this function translates each value into a new value within this specified range. The transformation maintains the relative distances between values, resulting in changes to the mean and standard deviations. However, if both the original scale and the transformed scale are z-standardized, they will be equal again, indicating that the relative positions and distributions of the values remain consistent.
minmax_scale( data, ..., change_to_min = 0, change_to_max = 1, name = NULL, overwrite = FALSE )
minmax_scale( data, ..., change_to_min = 0, change_to_max = 1, name = NULL, overwrite = FALSE )
data |
|
... |
Numeric variables to be min-max scaled. If none are provided, all numeric columns will be scaled. |
change_to_min |
The desired minimum value after scaling. |
change_to_max |
The desired maximum value after scaling. |
name |
Optional name for the new scaled variable when a single variable is provided. By default, the name will be the original variable name suffixed with the range. For example, "variable" becomes "variable_3to5". Negative values are prefixed with "neg" to avoid invalid columns names (e.g., -3 to 3 becomes "variable_neg3to5"). |
overwrite |
Logical. If |
A tdcmm model with the min-max scaled variable(s).
Other scaling:
categorize_scale()
,
center_scale()
,
dummify_scale()
,
recode_cat_scale()
,
reverse_scale()
,
setna_scale()
,
z_scale()
WoJ %>% minmax_scale(autonomy_emphasis, change_to_min = 0, change_to_max = 1) WoJ %>% minmax_scale(autonomy_emphasis, name = "my_scaled_variable", change_to_min = 0, change_to_max = 1) WoJ %>% minmax_scale(autonomy_emphasis, change_to_min = 0, change_to_max = 1) %>% tab_frequencies(autonomy_emphasis, autonomy_emphasis_0to1)
WoJ %>% minmax_scale(autonomy_emphasis, change_to_min = 0, change_to_max = 1) WoJ %>% minmax_scale(autonomy_emphasis, name = "my_scaled_variable", change_to_min = 0, change_to_max = 1) WoJ %>% minmax_scale(autonomy_emphasis, change_to_min = 0, change_to_max = 1) %>% tab_frequencies(autonomy_emphasis, autonomy_emphasis_0to1)
This function transforms one or more categorical variables into new categories based on specified mapping. For unmatched cases not specified in the mapping, a default value can be assigned. Missing values are retained.
recode_cat_scale( data, ..., assign = NULL, other = NA, overwrite = FALSE, name = NULL )
recode_cat_scale( data, ..., assign = NULL, other = NA, overwrite = FALSE, name = NULL )
data |
A tibble or a tdcmm model. |
... |
Variables to recode. |
assign |
A named vector where names are the old values and values are the new values to be assigned. |
other |
The value for unmatched cases. By default, it is |
overwrite |
Logical. If |
name |
The name of the new variable(s). If not specified, this is the same
name as the provided variable(s) but suffixed with |
A tdcmm model or a tibble.
Other scaling:
categorize_scale()
,
center_scale()
,
dummify_scale()
,
minmax_scale()
,
reverse_scale()
,
setna_scale()
,
z_scale()
WoJ %>% recode_cat_scale(country, assign = c("Germany" = 1, "Switzerland" = 2), overwrite = TRUE) WoJ %>% recode_cat_scale(country, assign = c("Germany" = "german", "Switzerland" = "swiss"), other = "other", overwrite = TRUE) WoJ %>% recode_cat_scale(ethics_1, ethics_2, assign = c(`1` = 5, `2` = 4, `3` = 3, `4` = 2, `5` = 1), other = 6, overwrite = TRUE) WoJ %>% recode_cat_scale(ethics_1, ethics_2, assign = c(`1` = "very low", `2` = "low", `3` = "medium", `4` = "high", `5` = "very high"), overwrite = TRUE) WoJ %>% dplyr::select(temp_contract) %>% recode_cat_scale(temp_contract, assign = c(`Permanent` = "P", `Temporary` = "T"), other = "O")
WoJ %>% recode_cat_scale(country, assign = c("Germany" = 1, "Switzerland" = 2), overwrite = TRUE) WoJ %>% recode_cat_scale(country, assign = c("Germany" = "german", "Switzerland" = "swiss"), other = "other", overwrite = TRUE) WoJ %>% recode_cat_scale(ethics_1, ethics_2, assign = c(`1` = 5, `2` = 4, `3` = 3, `4` = 2, `5` = 1), other = 6, overwrite = TRUE) WoJ %>% recode_cat_scale(ethics_1, ethics_2, assign = c(`1` = "very low", `2` = "low", `3` = "medium", `4` = "high", `5` = "very high"), overwrite = TRUE) WoJ %>% dplyr::select(temp_contract) %>% recode_cat_scale(temp_contract, assign = c(`Permanent` = "P", `Temporary` = "T"), other = "O")
Computes linear regression for all independent variables on the specified dependent variable. Linear modeling of multiple independent variables uses stepwise regression modeling. If specified, preconditions for (multi-)collinearity and for homoscedasticity are checked.
regress( data, dependent_var, ..., check_independenterrors = FALSE, check_multicollinearity = FALSE, check_homoscedasticity = FALSE )
regress( data, dependent_var, ..., check_independenterrors = FALSE, check_multicollinearity = FALSE, check_homoscedasticity = FALSE )
data |
|
dependent_var |
The dependent variable on which the linear model is fitted. Specify as column name. |
... |
Independent variables to take into account as (one or many) predictors for the dependent variable. Specify as column names. At least one has to be specified. |
check_independenterrors |
if set, the independence of errors among any two cases is being checked using a Durbin-Watson test |
check_multicollinearity |
if set, multicollinearity among all specified independent variables is being checked using the variance inflation factor (VIF) and the tolerance (1/VIF); this check can only be performed if at least two independent variables are provided, and all provided variables need to be numeric |
check_homoscedasticity |
if set, homoscedasticity is being checked using a Breusch-Pagan test |
a tdcmm model
WoJ %>% regress(autonomy_selection, ethics_1) WoJ %>% regress(autonomy_selection, work_experience, trust_government)
WoJ %>% regress(autonomy_selection, ethics_1) WoJ %>% regress(autonomy_selection, work_experience, trust_government)
Reverses a continuous scale into a new variable. A 5-1 scale thus turns into a 1-5 scale. Missing values are retained. For a given continuous variable the lower and upper end of the scale should be provided. If they are not provided, the function assumes the scale's minimum and maximum value to represent these lower/upper ends (and issues a warning about this fact). This default behavior is prone to errors, however, because a scale may not include its actual lower and upper ends which might in turn affect correct reversing. Hence, it is strongly suggested to manually set the lower and upper bounds of the original continuous scale.
reverse_scale( data, ..., lower_end = NULL, upper_end = NULL, name = NULL, overwrite = FALSE )
reverse_scale( data, ..., lower_end = NULL, upper_end = NULL, name = NULL, overwrite = FALSE )
data |
|
... |
Numeric variables to be reverse scaled. If none are provided, all numeric columns will be scaled. |
lower_end |
Lower end of provided continuous scale (default is to use minimum value of current values, which might not be the actual lower end of the scale). |
upper_end |
Upper end of provided continuous scale (default is to use maximum value of current values, which might not be the actual upper end of the scale). |
name |
Optional name for the new reversed variable when a single
variable is provided. By default, the name will be the original variable
name suffixed with |
overwrite |
Logical. If |
A tdcmm model with the reversed variable(s).
Other scaling:
categorize_scale()
,
center_scale()
,
dummify_scale()
,
minmax_scale()
,
recode_cat_scale()
,
setna_scale()
,
z_scale()
WoJ %>% reverse_scale(autonomy_emphasis, lower_end = 0, upper_end = 1) WoJ %>% reverse_scale(autonomy_emphasis, name = "my_reversed_variable", lower_end = 0, upper_end = 1) WoJ %>% reverse_scale(overwrite = TRUE) WoJ %>% reverse_scale(autonomy_emphasis, lower_end = 0, upper_end = 1) %>% tab_frequencies(autonomy_emphasis, autonomy_emphasis_rev)
WoJ %>% reverse_scale(autonomy_emphasis, lower_end = 0, upper_end = 1) WoJ %>% reverse_scale(autonomy_emphasis, name = "my_reversed_variable", lower_end = 0, upper_end = 1) WoJ %>% reverse_scale(overwrite = TRUE) WoJ %>% reverse_scale(autonomy_emphasis, lower_end = 0, upper_end = 1) %>% tab_frequencies(autonomy_emphasis, autonomy_emphasis_rev)
This function allows users to set specific values to NA
in chosen variables
within a data frame. It can handle numeric, character, and factor variables.
setna_scale(data, ..., value, name = NULL, overwrite = FALSE)
setna_scale(data, ..., value, name = NULL, overwrite = FALSE)
data |
|
... |
One or more variables where specified values will be set to NA. If no variables are provided, the function is applied to the entire data frame. |
value |
A value (or vector of values) that needs to be set to NA. |
name |
The name of the new variable(s). By default, this is the same name
as the provided variable(s) but suffixed with |
overwrite |
Logical. If |
A tdcmm model or a tibble.
Other scaling:
categorize_scale()
,
center_scale()
,
dummify_scale()
,
minmax_scale()
,
recode_cat_scale()
,
reverse_scale()
,
z_scale()
WoJ %>% dplyr::select(autonomy_emphasis) %>% setna_scale(autonomy_emphasis, value = 5) WoJ %>% dplyr::select(autonomy_emphasis) %>% setna_scale(autonomy_emphasis, value = 5, name = "new_na_autonomy") WoJ %>% setna_scale(value = c(2, 3, 4), overwrite = TRUE) WoJ %>% dplyr::select(country) %>% setna_scale(country, value = "Germany") WoJ %>% dplyr::select(country) %>% setna_scale(country, value = c("Germany", "Switzerland"))
WoJ %>% dplyr::select(autonomy_emphasis) %>% setna_scale(autonomy_emphasis, value = 5) WoJ %>% dplyr::select(autonomy_emphasis) %>% setna_scale(autonomy_emphasis, value = 5, name = "new_na_autonomy") WoJ %>% setna_scale(value = c(2, 3, 4), overwrite = TRUE) WoJ %>% dplyr::select(country) %>% setna_scale(country, value = "Germany") WoJ %>% dplyr::select(country) %>% setna_scale(country, value = c("Germany", "Switzerland"))
A dataset of 630 German participants in an online experiment. The experiment investigated the effects of user comments on social network sites (SNS) on individuals' perceptions of journalistic quality. The researchers varied the subject of the article (factor 1: 'Copyright directive' or 'Social housing'), the order of comment presentation (factor 2: before or after the article) and the valence of the comments (factor 3: positive or negative).
snscomments
snscomments
A data frame of 630 observations and 15 variables:
Age of the participant
Gender of the participant, either 'female' or 'not female'
Level of formal education of the participant, either 'low formal education' or 'high formal education'
Index measuring the psychological trait of a person to enjoy thinking, calculated from several survey items
Index measuring a person's prior knowledge of the presented subject of the article, calculated from several survey items
Numeric id of the group that the participant was in during the experiment
Subject of the article that the participant was given to read, either 'Copyright directive' or 'Social housing'
Order of the comments that the participant was exposed to, either 'Comments after', 'Comments before', or 'Control group'
Valence of the comments that the participant was exposed to, either 'Negative', 'Positive', or 'Control group'
Indicates whether the participant was in the 'Control group' or 'Experimental group'
Index measuring participant's evaluation of the medium's quality, calculated from several survey items
Index measuring participant's evaluation of the article's quality, calculated from several survey items
Participant's perception of the quality of the comments
Participant's perception of the valence of the comments
Participant's measure of how much attention they put in reading the article
This dataset was created from the OSF project: https://osf.io/r867v/, corresponding to the paper: Kümpel, A. S., & Unkel, J. (2020). Negativity wins at last: How presentation order and valence of user comments affect perceptions of journalistic quality. Journal of Media Psychology: Theories, Methods, and Applications, 32(2), 89–99. doi:10.1027/1864-1105/a000261
Computes t-tests for one group variable and specified test variables. If no variables are specified, all numeric (integer or double) variables are used. A Levene's test will automatically determine whether the pooled variance is used to estimate the variance. Otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.
t_test( data, group_var, ..., var.equal = TRUE, paired = FALSE, pooled_sd = TRUE, levels = NULL, case_var = NULL, mu = NULL )
t_test( data, group_var, ..., var.equal = TRUE, paired = FALSE, pooled_sd = TRUE, levels = NULL, case_var = NULL, mu = NULL )
data |
|
group_var |
group variable (column name) to specify where to split two samples (two-sample t-test) or which variable to compare a one-sample t-test on |
... |
test variables (column names). Leave empty to compute t-tests for all numeric variables in data. Also leave empty for one-sample t-tests. |
var.equal |
this parameter is deprecated (previously: a logical variable indicating whether to treat the two
variances as being equal. If |
paired |
a logical indicating whether you want a paired t-test. Defaults
to |
pooled_sd |
a logical indicating whether to use the pooled standard
deviation in the calculation of Cohen's d. Defaults to |
levels |
optional: a vector of length two specifying the two levels of the group variable. |
case_var |
optional: case-identifying variable (column name). If you
set |
mu |
optional: a number indicating the true value of the mean in the
general population ( |
a tdcmm model
WoJ %>% t_test(temp_contract, autonomy_selection, autonomy_emphasis) WoJ %>% t_test(temp_contract) WoJ %>% t_test(employment, autonomy_selection, autonomy_emphasis, levels = c("Full-time", "Freelancer")) WoJ %>% t_test(autonomy_selection, mu = 3.62)
WoJ %>% t_test(temp_contract, autonomy_selection, autonomy_emphasis) WoJ %>% t_test(temp_contract) WoJ %>% t_test(employment, autonomy_selection, autonomy_emphasis, levels = c("Full-time", "Freelancer")) WoJ %>% t_test(autonomy_selection, mu = 3.62)
Tabulates frequencies for one or more categorical variable, including relative, and cumulative frequencies.
tab_frequencies(data, ...)
tab_frequencies(data, ...)
data |
|
... |
Variables to tabulate |
a tdcmm model
Other categorical:
crosstab()
WoJ %>% tab_frequencies(employment) WoJ %>% tab_frequencies(employment, country)
WoJ %>% tab_frequencies(employment) WoJ %>% tab_frequencies(employment, country)
This function tabulates specified percentiles for given numeric variables. If no variables are provided, the function will attempt to describe all numeric (either integer or double) variables found within the input. The percentiles are calculated based on the levels parameter, which defaults to every 10% from 10% to 90%. NA values are always removed because the concept of a percentile is based on ranking. As NA is not a value, it cannot be ordered in relation to actual numbers.
tab_percentiles( data, ..., levels = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1) )
tab_percentiles( data, ..., levels = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1) )
data |
a tibble or a tdcmm model that contains the numeric data to be tabulated. |
... |
Variables within the data for which to tabulate the percentiles. If no variables are provided, all numeric variables are used. |
levels |
a numeric vector specifying the percentiles to compute. Defaults to c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0). |
a tdcmm model
Other descriptives:
describe()
,
describe_cat()
WoJ %>% tab_percentiles(work_experience) WoJ %>% tab_percentiles(work_experience, autonomy_emphasis)
WoJ %>% tab_percentiles(work_experience) WoJ %>% tab_percentiles(work_experience, autonomy_emphasis)
Performs an intercoder reliability test by computing various intercoder reliability estimates for the included variables
test_icr( data, unit_var, coder_var, ..., levels = NULL, na.omit = FALSE, agreement = TRUE, holsti = TRUE, kripp_alpha = TRUE, cohens_kappa = FALSE, fleiss_kappa = FALSE, brennan_prediger = FALSE, lotus = FALSE, s_lotus = FALSE )
test_icr( data, unit_var, coder_var, ..., levels = NULL, na.omit = FALSE, agreement = TRUE, holsti = TRUE, kripp_alpha = TRUE, cohens_kappa = FALSE, fleiss_kappa = FALSE, brennan_prediger = FALSE, lotus = FALSE, s_lotus = FALSE )
data |
|
unit_var |
Variable with unit identifiers |
coder_var |
Variable with coder identifiers |
... |
Variables to compute intercoder reliability estimates for. Leave
empty to compute for all variables (excluding |
levels |
Optional named vector with levels of test variables |
na.omit |
Logical indicating whether |
agreement |
Logical indicating whether simple percent agreement should
be computed. Defaults to |
holsti |
Logical indicating whether Holsti's reliability estimate
(mean pairwise agreement) should be computed. Defaults to |
kripp_alpha |
Logical indicating whether Krippendorff's Alpha should
be computed. Defaults to |
cohens_kappa |
Logical indicating whether Cohen's Kappa should
be computed. Defaults to |
fleiss_kappa |
Logical indicating whether Fleiss' Kappa should
be computed. Defaults to |
brennan_prediger |
Logical indicating whether Brennan & Prediger's Kappa
should be computed (extension to 3+ coders as proposed by von Eye (2006)).
Defaults to |
lotus |
Logical indicating whether Fretwurst's Lotus should be
computed. Defaults to |
s_lotus |
Logical indicating whether Fretwurst's standardized Lotus
(S-Lotus) should be computed. Defaults to |
a tdcmm model
Brennan, R. L., & Prediger, D. J. (1981). Coefficient Kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement, 41(3), 687-699. https://doi.org/10.1177/001316448104100307
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46. https://doi.org/10.1177/001316446002000104
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378-382. https://doi.org/10.1037/h0031619
Fretwurst, B. (2015). Reliabilität und Validität von Inhaltsanalysen. Mit Erläuterungen zur Berechnung des Reliabilitätskoeffizienten „Lotus“ mit SPSS. In W. Wirth, K. Sommer, M. Wettstein, & J. Matthes (Ed.), Qualitätskriterien in der Inhaltsanalyse (S. 176–203). Herbert von Halem.
Krippendorff, K. (2011). Computing Krippendorff's Alpha-Reliability. Retrieved from http://repository.upenn.edu/asc_papers/43
von Eye, A. (2006). An Alternative to Cohen's Kappa. European Psychologist, 11(1), 12-24. https://doi.org/10.1027/1016-9040.11.1.12
fbposts %>% test_icr(post_id, coder_id, pop_elite, pop_othering) fbposts %>% test_icr(post_id, coder_id, levels = c(n_pictures = "ordinal"), fleiss_kappa = TRUE)
fbposts %>% test_icr(post_id, coder_id, pop_elite, pop_othering) fbposts %>% test_icr(post_id, coder_id, levels = c(n_pictures = "ordinal"), fleiss_kappa = TRUE)
Turns the tibble exported from correlate
into a correlation
matrix.
to_correlation_matrix(data, verbose = FALSE)
to_correlation_matrix(data, verbose = FALSE)
data |
|
verbose |
A logical, defaulted to |
a tdcmm model
WoJ %>% correlate() %>% to_correlation_matrix()
WoJ %>% correlate() %>% to_correlation_matrix()
Computes one-way ANOVAs for one group variable and specified test variables. If no variables are specified, all numeric (integer or double) variables are used. A Levene's test will automatically determine whether a classic ANOVA is used. Otherwise Welch's ANOVA with a (Satterthwaite's) approximation to the degrees of freedom is used.
unianova(data, group_var, ..., descriptives = FALSE, post_hoc = FALSE)
unianova(data, group_var, ..., descriptives = FALSE, post_hoc = FALSE)
data |
|
group_var |
group variable (column name) |
... |
test variables (column names). Leave empty to compute ANOVAs for all numeric variables in data. |
descriptives |
a logical indicating whether descriptive statistics (mean
& standard deviation) for all group levels should be added to the returned
tibble. Defaults to |
post_hoc |
a logical value indicating whether post-hoc tests should be performed. Tukey's HSD is employed when the assumption of equal variances is met, whereas the Games-Howell test is automatically applied when this assumption is violated. The results of the post-hoc test will be added to a list column in the resulting tibble. |
a tdcmm model
WoJ %>% unianova(employment, autonomy_selection, autonomy_emphasis) WoJ %>% unianova(employment, descriptives = TRUE, post_hoc = TRUE) ## Not run: WoJ %>% unianova(employment) ## End(Not run)
WoJ %>% unianova(employment, autonomy_selection, autonomy_emphasis) WoJ %>% unianova(employment, descriptives = TRUE, post_hoc = TRUE) ## Not run: WoJ %>% unianova(employment) ## End(Not run)
Returns ggplot2 visualization appropriate to respective tdcmm
model
(see list below). Returns NULL
(and a warning) if no visualization has
been implemented for the particular model.
## S3 method for class 'tdcmm_ctgrcl' visualize(x, ..., .design = design_lmu()) ## S3 method for class 'tdcmm_crrltn' visualize(x, which = "jitter", ..., .design = design_lmu()) ## S3 method for class 'tdcmm_dscrb' visualize(x, ..., .design = design_lmu()) ## S3 method for class 'tdcmm_rgrssn' visualize(x, which = "jitter", ..., .design = design_lmu()) ## S3 method for class 'tdcmm_prcntl' visualize(x, ..., .design = design_lmu()) visualize(x, ..., .design = design_lmu()) ## S3 method for class 'tdcmm_ttst' visualize(x, ..., .design = design_lmu()) ## S3 method for class 'tdcmm_nnv' visualize(x, ..., .design = design_lmu())
## S3 method for class 'tdcmm_ctgrcl' visualize(x, ..., .design = design_lmu()) ## S3 method for class 'tdcmm_crrltn' visualize(x, which = "jitter", ..., .design = design_lmu()) ## S3 method for class 'tdcmm_dscrb' visualize(x, ..., .design = design_lmu()) ## S3 method for class 'tdcmm_rgrssn' visualize(x, which = "jitter", ..., .design = design_lmu()) ## S3 method for class 'tdcmm_prcntl' visualize(x, ..., .design = design_lmu()) visualize(x, ..., .design = design_lmu()) ## S3 method for class 'tdcmm_ttst' visualize(x, ..., .design = design_lmu()) ## S3 method for class 'tdcmm_nnv' visualize(x, ..., .design = design_lmu())
x |
|
... |
other arguments |
.design |
a list to style the visualization; by default and good practice
use one of the ready-made design functions' returns (e.g., |
which |
string to specify type of regression visualization. One of "jitter" (default), "alpha", "correlogram", "residualsfitted" (or "resfit"), "pp", "qq", "scalelocation" (or "scaloc"), "residualsleverage" (or "reslev"). See below for details. |
describe()
: horizontal box plot depicting a box from Q25 to Q75, a thick
line for Mdn, and two whiskers to Min/Max respectively;
no additional arguments
describe_cat()
: horizontal bar plot depicting number of occurrences;
no additional arguments
tab_frequencies()
: either a histogram (if 1 variable is given) or
multiple histograms wrapped, 5+ variables issue a warning about readability;
no additional arguments
tab_percentiles()
: quantile plot
crosstab()
: horizontal stacked bar plot, either absolute or relative
(depending on the percentages
argument in crosstab()
)
t_test()
: plot with points and appended 95% confidence intervals;
no additional arguments
unianova()
: plot with points and appended 95% confidence intervals;
no additional arguments
correlate()
: plot as scatter; for more than 2 variables, a correlogram
is plotted (just like for to_correlation_matrix()
); use the which
parameter to select how points are visualized:
"jitter" adds a bit of random noise to each point to better reflect categorical values
"alpha" depicts points slightly transparent so that multiple points in the same position are more easily visible
correlate()
: for partial correlation, a scatter plot with some jitter is
plotted using the residuals between the control variable and (a) the
dependent as well as (b) the independent variable; no additional arguments
to_correlation_matrix()
: plot as correlogram building on
GGally::ggpairs()
with jittered scatter plots in lower half, histograms as
diagonals, and correlation coefficients with 95% confidence intervals in
upper half
regress()
: plot regression results as scatter (without jitter) and an
additional depicted model line with including its 95% confidence intervals;
alternatively, visual check inspection helpers can be plotted through the
which
parameter which can be set to yield one of the following:
"jitter" (default): plots a scatter plot with jitter per independent variable and adds a linear regression line with 95% confidence intervals to it; keep in mind that if you have, say, three independent variables, this visualization shows you three plots with one linear regression for each, so that the three models (i.e., the three colored lines) reflect only the particular combination of one independent and the dependent variable
"alpha" (default): almost like jitter
but instead of jitter it plots
scatter plots with some transparency so that multiple data points in the
same position appear as darker
"correlogram": like to_correlation_matrix()
, a correlogram between
independent variables are produced to help determine independent errors
and multicollinearity
"residualsfitted" or "resfit": a residuals-versus-fitted plot is useful to determine distributions; for a normal distribution the colored line should ideally fit on the dashed line
"pp": a (normal) probability-probability plot helps checking for multicollinearity whereby the data (here mostly the center data from within the IQR) should ideally align with the dashed line
"qq": a (normal) quantile-quantile plot helps checking for multicollinearity but focuses more on outliers; the data should align with the dashed line
"scalelocation" or "scaloc": a scale-location (sometimes also called a spread-location) plot checks whether residuals are spread equally to help check for homoscedasticity; ideally, the colored line is horizontal and the data spreads more or less randomly
"residualsleverage" or "reslev": a residuals-versus-leverage plot allows to check for influential outliers affecting the final model more than the rest of the data; ideally, no data is far off compared to the bulk of the the data and thus shows high Cook's distance to the rest; the colored line helps to identify the bulk of the data and the five most-distant outliers are labelled with their case number (i.e., the row number in the dataset); note that 5 is arbitrary here, meaning that they might not be too far off or there might be more than 5 noteworthy outliers in this model; interpret with care
Note that the returned ggplot2 object can be modified easily by appending or overwriting individual geom's or scale's. See the examples below and the documentation of ggplot2.
A ggplot2 object
## Not run: WoJ %>% describe() %>% visualize() fbposts %>% describe_cat() %>% visualize() WoJ %>% tab_frequencies(trust_parliament) %>% visualize() fbposts %>% tab_frequencies(pop_elite, pop_people, pop_othering) %>% visualize() WoJ %>% crosstab(reach, employment) %>% visualize() fbposts %>% crosstab(coder_id, type, percentages = TRUE) %>% visualize() WoJ %>% t_test(temp_contract, autonomy_selection, autonomy_emphasis) %>% visualize() WoJ %>% unianova(country, autonomy_selection, autonomy_emphasis) %>% visualize() fbposts %>% correlate(pop_elite, pop_people) %>% visualize() fbposts %>% correlate(pop_elite, pop_people, with = pop_othering) %>% visualize() fbposts %>% correlate(pop_elite, pop_people) %>% visualize("alpha") WoJ %>% correlate(autonomy_selection, ethics_1, partial = work_experience) %>% visualize() WoJ %>% correlate(ethics_1, ethics_2, ethics_3, ethics_4) %>% to_correlation_matrix() %>% visualize() r <- WoJ %>% regress(autonomy_selection, temp_contract, work_experience, ethics_2) r %>% visualize() # same as r %>% visualize("jitter") r %>% visualize("alpha") r %>% visualize("correlogram") r %>% visualize("resfit") r %>% visualize("pp") r %>% visualize("qq") r %>% visualize("scaloc") r %>% visualize("reslev") # To overwrite a certain scale or geom, just append as you would with ggplot2 fbposts %>% describe_cat() %>% visualize() + ggplot2::scale_fill_grey() ## End(Not run)
## Not run: WoJ %>% describe() %>% visualize() fbposts %>% describe_cat() %>% visualize() WoJ %>% tab_frequencies(trust_parliament) %>% visualize() fbposts %>% tab_frequencies(pop_elite, pop_people, pop_othering) %>% visualize() WoJ %>% crosstab(reach, employment) %>% visualize() fbposts %>% crosstab(coder_id, type, percentages = TRUE) %>% visualize() WoJ %>% t_test(temp_contract, autonomy_selection, autonomy_emphasis) %>% visualize() WoJ %>% unianova(country, autonomy_selection, autonomy_emphasis) %>% visualize() fbposts %>% correlate(pop_elite, pop_people) %>% visualize() fbposts %>% correlate(pop_elite, pop_people, with = pop_othering) %>% visualize() fbposts %>% correlate(pop_elite, pop_people) %>% visualize("alpha") WoJ %>% correlate(autonomy_selection, ethics_1, partial = work_experience) %>% visualize() WoJ %>% correlate(ethics_1, ethics_2, ethics_3, ethics_4) %>% to_correlation_matrix() %>% visualize() r <- WoJ %>% regress(autonomy_selection, temp_contract, work_experience, ethics_2) r %>% visualize() # same as r %>% visualize("jitter") r %>% visualize("alpha") r %>% visualize("correlogram") r %>% visualize("resfit") r %>% visualize("pp") r %>% visualize("qq") r %>% visualize("scaloc") r %>% visualize("reslev") # To overwrite a certain scale or geom, just append as you would with ggplot2 fbposts %>% describe_cat() %>% visualize() + ggplot2::scale_fill_grey() ## End(Not run)
A subset of data from the Worlds of Journalism 2012-16 study containing survey data of 1,200 journalists from five European countries.
WoJ
WoJ
A data frame with 1200 rows and 15 variables:
Country of residence
Reach of medium
Current employment situation
Type of contract (if current employment situation is either full-time or part-time
Autonomy in news story selection, scale from 1 (no freedom at all) to 5 (complete freedom)
Autonomy in news story emphasis, scale from 1 (no freedom at all) to 5 (complete freedom)
Agreement with statement "Journalists should always adhere to codes of professional ethics, regardless of situation and context", scale from 1 (strongly disagree) to 5 (strongly agree) (reverse-coded!)
Agreement with statement "What is ethical in journalism depends on the specific situation.", scale from 1 (strongly disagree) to 5 (strongly agree)
Agreement with statement "What is ethical in journalism is a matter of personal judgment.", scale from 1 (strongly disagree) to 5 (strongly agree)
Agreement with statement "It is acceptable to set aside moral standards if extraordinary circumstances require it.", scale from 1 (strongly disagree) to 5 (strongly agree)
Work experience as a journalist in years
Trust placed in parliament, scale from 1 (no trust at all) to 5 (complete trust)
Trust placed in government, scale from 1 (no trust at all) to 5 (complete trust)
Trust placed in parties, scale from 1 (no trust at all) to 5 (complete trust)
Trust placed in politicians in general, scale from 1 (no trust at all) to 5 (complete trust)
‘https://worldsofjournalism.org/data/data-and-key-tables-2012-2016’
This function z-standardizes the specified numeric columns or all numeric columns if none are specified. A z-standardized scale centers at a mean of 0.0 and has a standard deviation of 1.0, making it comparable to other z-standardized distributions.
z_scale(data, ..., name = NULL, overwrite = FALSE)
z_scale(data, ..., name = NULL, overwrite = FALSE)
data |
|
... |
Numeric variables to be z-standardized. If none are provided, all numeric columns will be z-standardized. |
name |
Optional name for the new z-standardized variable when a single variable is provided. By default, the name will be the original variable name suffixed with |
overwrite |
Logical. If |
A tdcmm model with the z-standardized variable(s).
Other scaling:
categorize_scale()
,
center_scale()
,
dummify_scale()
,
minmax_scale()
,
recode_cat_scale()
,
reverse_scale()
,
setna_scale()
WoJ %>% z_scale(autonomy_emphasis) WoJ %>% z_scale(autonomy_emphasis, name = "my_zstdized_variable") WoJ %>% z_scale(autonomy_emphasis) %>% tab_frequencies(autonomy_emphasis, autonomy_emphasis_z)
WoJ %>% z_scale(autonomy_emphasis) WoJ %>% z_scale(autonomy_emphasis, name = "my_zstdized_variable") WoJ %>% z_scale(autonomy_emphasis) %>% tab_frequencies(autonomy_emphasis, autonomy_emphasis_z)