Package 'resquin'

Title: Response Quality Indicators for Survey Research
Description: Calculate common survey data quality indicators for multi-item scales and matrix questions. Currently supports the calculation of response style indicators and response distribution indicators. For an overview on response quality indicators see Bhaktha N, Henning S, Clemens L (2024). 'Characterizing response quality in surveys with multi-item scales: A unified framework' <https://osf.io/9gs67/>.
Authors: Matthias Roth [aut, cre, cph] , Nivedita Bhaktha [aut, ctb], Matthias Bluemke [aut, ctb], Thomas Knopf [aut, ctb], Fabienne Krämer [aut, ctb], Clemens Lechner [aut, ctb], Çağla Yildiz [aut, ctb]
Maintainer: Matthias Roth <[email protected]>
License: GPL (>= 3)
Version: 0.0.2.9000
Built: 2025-02-20 10:22:03 UTC
Source: https://github.com/matroth/resquin

Help Index


Compute response distribution indicators

Description

Compute response distribution indicators for responses to multi-item scales or matrix questions.

Usage

resp_distributions(x, min_valid_responses = 1)

Arguments

x

A data frame containing survey responses in wide format. For more information see section "Data requirements" below.

min_valid_responses

numeric between 0 and 1. Defines the share of valid responses a respondent must have to calculate response quality indicators. Default is 1.

Details

The following response distribution indicators are calculated per respondent:

  • n_na: number of intra-individual missing answers

  • prop_na: proportion of intra-individual missing responses

  • ii_mean: intra-individual mean

  • ii_median: intra-individual median

  • ii_sd: intra-individual standard deviation

  • mahal: mahalanobis distance per respondent.

Intra-individual response variability (ii_sd) has been proposed to measure insufficient effort responding (Dunn et al., 2018) and to distinguish between random and conscientious responding (Marjanovic et al, 2015).

Intra-individual location indicators can be used to asses the average location of responses on a set of questions (ii_mean, ii_median).

Mahalanobis distance is a outlier detection indicator. It represents the distance of a participants responses from the center of a multivariate normal distribution defined by the data of all respondents.

Value

Returns a data frame with response quality indicators per respondent. Dimensions:

  • Rows: Equal to number of rows in x.

  • Columns: Six, one for each response distribution indicator.

Data requirements

resp_distributions() assumes that data comes from multi-item scales or matrix questions, which have the same number and labeling of response options for many questions. The input data frame must be structured in the following way:

  • The data frame is in wide format, meaning each row represents one respondent, each column represents one variable.

  • All responses have integer values.

  • Missing values are set to NA.

Reverse coding of variables

The interpretation of the indicators depends on the whether response data of negatively worded questions was reversed or not:

  • Do not reverse data of negatively worded questions if you want to assess average response patterns (Dunn et al., 2018).

  • Reverse data of negatively worded questions if you want to assess whether responses are distributed randomly or not with respect to an assumed latent variable (Marjanovic et al., 2015).

Mahalanobis distance could not be calculated

Under certain circumstances, the mahalanobis distance can not be calculated. This may be if there is high collinearity (correlation between variables) or if there are to many missing values. Although this can happen in survey research data, this message can also indicate that something in the data is "off" due to one of the reasons stated above. A manual inspection for low-quality responses can be a next step.

Author(s)

Matthias Roth, Matthias Bluemke & Clemens Lechner

References

Dunn, Alexandra M., Eric D. Heggestad, Linda R. Shanock, and Nels Theilgard. 2018. “Intra-Individual Response Variability as an Indicator of Insufficient Effort Responding: Comparison to Other Indicators and Relationships with Individual Differences.” Journal of Business and Psychology 33(1):105–21. doi: 10.1007/s10869-016-9479-0.

Marjanovic, Zdravko, Ronald Holden, Ward Struthers, Robert Cribbie, and Esther Greenglass. 2015. “The Inter-Item Standard Deviation (ISD): An Index That Discriminates between Conscientious and Random Responders.” Personality and Individual Differences 84:79–83. doi: 10.1016/j.paid.2014.08.021.

See Also

resp_styles() for calculating response style indicators. resp_nondifferentiation() for calculating response nondifferentiation indicators.

Examples

# A small test data set with ten respondents
# and responses to three survey questions
# with response scales from 1 to 5.
testdata <- data.frame(
  var_a = c(1,4,3,5,3,2,3,1,3,NA),
  var_b = c(2,5,2,3,4,1,NA,2,NA,NA),
  var_c = c(1,2,3,NA,3,4,4,5,NA,NA))

# Calculate response distribution indicators
resp_distributions(x = testdata) |>
    round(2)

# Include respondents with NA values by decreasing the
# necessary number of valid responses per respondent.

resp_distributions(
      x = testdata,
      min_valid_responses = 0.2) |>
   round(2)

Compute response nondifferentiation indicators

Description

Compute response nondifferentiation indicators for responses to multi-item scales or matrix questions.

Usage

resp_nondifferentiation(x, min_valid_responses = 1)

Arguments

x

A data frame containing survey responses in wide format. For more information see section "Data requirements" below.

min_valid_responses

numeric between 0 and 1. Defines the share of valid responses a respondent must have to calculate response quality indicators. Default is 1.

Details

Response nondifferentiation is the result of response behavior in which respondents deviate from an ideal response process. Optimal response behavior is termed optimizing, while deviations from optimal response behavior are termed satisficing (Krosnik, 1991). Optimizing describes a behavior in which respondents go through all steps of comprehension, retrieval, judgment, and response selection. When satisficing, respondents skip all or parts of the optimal response process. Satisficing can lead to non-response, "don't know" responses, random responding or nondifferentiation. The later is targeted by the function resp_nondifferentiation().

Nondifferentiation is characterized by respondents choosing similar or even the same response options regardless of the content of the question. Multiple indicators for response nondifferentiation have been developed. For resp_nondifferentiation(), the following response nondifferentiation indicators described by Kim et al. (2017) are calculated per respondent:

  • Simple Nondifferentiation: Respondents are assigned 1 or 0 depending on whether all responses have the same value (1) or not (0).

  • Mean Root of Pairs Method: Mean of the root of the absolute differences between all pairs in a multi-item scale or matrix questions. It ranges from 0 (least straightlining) to 1 (most straightlining). The indicator is rescaled to be inbetween the minimum and maximum of all values. This means that including/excluding responses or respondents into the calculation changes the indicators values.

  • Maximum Identical Rating Method: Proportion of the most commonly selected response option among all responses in a multi-item scale or matrix questions. It ranges from 0 (least straightlining) to 1 (most straightlining).

  • Scale Point Variation Method: The probability of differentiation is defined as 1Σpi21-\Sigma{p_i^2}, where pip_i is the proportion of the values rated at each scale point on a rating scale and ii indicates the number of scale points. The measure becomes larger if respondents use more scales points in a multi-item scale or matrix questions.

It should be noted that Kim et al. (2017) average the response nondifferentiation indicators to obtain an aggregate measure for response nondifferentiation. To do so, the summary() function can be called on the results of resp_nondifferentiation().

Value

Returns a data frame with response nondifferentiation indicators per respondent. Dimensions:

  • Rows: Equal to number of rows in x.

  • Columns: Four, one corresponding to each response nondifferentiation indicator.

Data requirements

resp_nondifferentiationf() assumes that the input data frame is structured in the following way:

  • The data frame is in wide format, meaning each row represents one respondent, each column represents one variable.

  • The variables are in same the order as the questions respondents saw while taking the survey.

  • Reverse keyed variables are in their original form. No items were recoded.

  • All responses have integer values.

  • Questions have the same number of response options.

  • Missing values are set to NA.

Author(s)

Matthias Roth

References

Kim, Yujin, Jennifer Dykema, John Stevenson, Penny Black, and D. Paul Moberg. 2019. “Straightlining: Overview of Measurement, Comparison of Indicators, and Effects in Mail–Web Mixed-Mode Surveys.” Social Science Computer Review 37(2):214–33. doi: 10.1177/0894439317752406.

Krosnick, Jon A. 1991. “Response Strategies for Coping with the Cognitive Demands of Attitude Measures in Surveys.” Applied Cognitive Psychology 5(3):213–36. doi: 10.1002/acp.2350050305.

See Also

resp_styles() for calculating response style indicators. resp_distributions() for calculating response distribution indicators.

Examples

# A small test data set with ten respondents
# and responses to three survey questions
# with response scales from 1 to 5.
testdata <- data.frame(
  var_a = c(1,4,3,5,3,2,3,1,3,NA),
  var_b = c(2,5,2,3,4,1,NA,2,NA,NA),
  var_c = c(1,2,3,NA,3,4,4,5,NA,NA))

# Calculate response nondifferentiation indicators
resp_nondifferentiation(x = testdata) |>
    round(2)

# Include respondents with NA values by decreasing the
# necessary number of valid responses per respondent.

resp_nondifferentiation(
      x = testdata,
      min_valid_responses = 0.2) |>
   round(2)

resp_nondifferentiation(
     x = testdata,
     min_valid_responses = 0.2) |>
  summary() # To obtain aggregate measures of response nondifferentiation

Compute response pattern indicators

Description

Compute response pattern indicators for responses to multi-item scales or matrix questions.

Usage

resp_patterns(
  x,
  min_valid_responses = 1,
  defined_patterns,
  arbitrary_patterns,
  min_repetitions = 2
)

Arguments

x

A data frame containing survey responses in wide format. For more information see section "Data requirements" below.

min_valid_responses

numeric between 0 and 1. Defines the share of valid responses a respondent must have to calculate response pattern indicators. Default is 1.

defined_patterns

A character vector with patterns to search for. Will not be computed if not specified or if an empty vector is supplied.

arbitrary_patterns

A vector of integer values or a list containing vectors of integer values. The values determine the pattern that should be searched for. Will not be computed if not specified or if 0 is supplied.

Details

The following response distribution indicators are calculated per respondent:

  • n_transitions: Number of times two consecutive response options differ.

  • mean_string_length: Mean length of strings of identical answers.

  • longest_string_length: Longest length of string of identical answers.

  • (optional) defined_pattern: A list column that contains one named vector per respondent. The names of the vector are repeating patterns found in the responses of a respondent. The values of the vector are how often the pattern specified in the argument "defined_patterns" occurs. See section "Defined patterns" for more information.

  • (optional) arbitrary_patterns: A list column that contains one named vector per respondent. The names of the vector are repeating patterns found in the responses of a respondent. The values of the vector are how often the pattern occurred. See "Arbitrary patterns" for more information.

Value

Returns a data frame with response quality indicators per respondent. Dimensions:

  • Rows: Equal to number of rows in x.

  • Columns:

Defined and arbitrary pattern indicators:

Responses of an individual respondent can follow patterns, such as zig-zagging across the response scale over multiple items. There might be a-priori knowledge which response patterns could occur and might be indicative of low quality responding. For this case the defined_patterns argument can be used to specify one or more patterns whose presence will be checked for each respondent. If no a-priori knowledge exists, it is possible to check for all patterns of a specified length.

Defined patterns:

A pattern is defined by providing one ore more patterns in a character vector. A few examples: resp_patterns(x,defined_patterns =" checks how often the response pattern "123" occurs in the responses of a single respondent. c("123","321") checks how often the two patterns "123" and "321" occur individually the responses of a single respondent. There can be an arbitrary number of patterns

Arbitrary patterns

Checks for arbitrary patterns are defined by providing one ore more integer values in a numeric vector. The integers must be larger or equal to two. A few examples: resp_patterns(x,arbitrary_patterns = 2) will check for sequences of responses of length two which repeat at least two times. resp_patterns(x,arbitrary_patterns = c(2,3,4,5)) will check for sequences of responses of length two, three, four and five that repeat at least two times.

Data requirements:

resp_patterns() assumes that the input data frame is structured in the following way:

  • The data frame is in wide format, meaning each row represents one respondent, each column represents one variable.

  • The variables are in same the order as the questions respondents saw while taking the survey.

  • Reverse keyed variables are in their original form. No items were recoded.

  • All responses have integer values.

  • Questions have the same number of response options.

  • Missing values are set to NA.

Author(s)

Matthias Roth, Thomas Knopf

References

Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4–19. https://doi.org/10.1016/j.jesp.2015.07.006

See Also

resp_styles() for calculating response style indicators. resp_distributions() for calculating response distribution indicators. resp_nondifferentiation() for calculating response nondifferentiation indicators.

Examples

# A small test data set with ten respondents
# and responses to three survey questions
# with response scales from 1 to 5.
testdata <- data.frame(
  var_a = c(1,4,3,5,3,2,3,1,3,NA),
  var_b = c(2,5,2,3,4,1,NA,2,NA,NA),
  var_c = c(1,2,3,NA,3,4,4,5,NA,NA))

# Calculate response pattern indicators
resp_patterns(x = testdata) |>
    round(2)

# Include respondents with NA values by decreasing the
# necessary number of valid responses per respondent.

resp_patterns(
      x = testdata,
      min_valid_responses = 0.2) |>
   round(2)

Compute response style indicators

Description

Calculates response style indicators for matrix questions or multi-item scales.

Usage

resp_styles(x, scale_min, scale_max, min_valid_responses = 1, normalize = TRUE)

Arguments

x

A data frame containing survey responses in wide format. For more information see section "Data requirements" below.

scale_min

numeric. Minimum of scale provided.

scale_max

numeric. Maximum of scale provided.

min_valid_responses

numeric between 0 and 1. Defines the share of valid responses a respondent must have to calculate response style indicators.

normalize

logical. If TRUE, counts of response style indicators will be divided by the number of non-missing responses per respondent. Default is TRUE.

Details

Response styles capture systematic shifts in respondents response behavior. resp_styles() is aimed at multi-item scales or matrix questions which use the same number of response options for many questions.

The following response style indicators are calculated per respondent: Middle response style (MRS), acquiescence response style (ARS), disacquiescence response style (DRS), extreme response style (ERS) and non-extreme response style (NERS).

The response style indicators are calculated in the following way

  • MRS: Sum of mid point responses.

  • ARS: Sum of responses larger than midpoint.

  • DRS: Sum of responses lower than midpoint.

  • ERS: Sum of lowest or highest category responses.

  • NERS: Sum of responses between lowest and highest response category.

Note that ARS and DRS assume that the polarity of the scale is positive. This means that higher numerical values indicate agreement and lower numerical values indicate disagreement. MRS can only be calculated if the scale has a numeric midpoint.

Also note that the response style literature is fragmented (Bhaktha et al., 2024). Response styles calculated with resp_styles() are based on van Vaerenbergh & Thomas (2024). However, we used the name non-extreme response style (NERS) instead of mild response style, to emphasize that NERS it the inverse of ERS. Both appear in the literature (for a NERS example see Wetzel et al. (2013)). Consult literature in your field of research to find appropriate names for the response style indicators calculated here.

Value

Returns a data frame with response style indicators per respondent.

  • Rows: Equal to number of rows in x.

  • Columns: Five, one for each response style indicator.

Data requirements

resp_styles() assumes that the input data frame is structured in the following way:

  • The data frame is in wide format, meaning each row represents one respondent, each column represents one variable.

  • The variables are in same the order as the questions respondents saw while taking the survey.

  • Reverse keyed variables are in their original form. No items were recoded.

  • All responses have integer values.

  • Questions have the same number of response options.

  • Missing values are set to NA.

Author(s)

Matthias Roth, Matthias Bluemke & Clemens Lechner

References

Bhaktha, Nivedita, Henning Silber, and Clemens Lechner. 2024. „Characterizing response quality in surveys with multi-item scales: A unified framework“. OSF-preprtint: https://osf.io/9gs67/

van Vaerenbergh, Y., and T. D. Thomas. 2013. „Response Styles in Survey Research: A Literature Review of Antecedents, Consequences, and Remedies“. International Journal of Public Opinion Research 25(2):195–217. doi: 10.1093/ijpor/eds021.

Wetzel, Eunike, Claus H. Carstensen, und Jan R. Böhnke. 2013. „Consistency of Extreme Response Style and Non-Extreme Response Style across Traits“. Journal of Research in Personality 47(2):178–89. doi: 10.1016/j.jrp.2012.10.010.

See Also

resp_distributions() for calculating response distribution indicators. resp_nondifferentiation() for calculating response nondifferentiation indicators.

Examples

# A test data set with ten respondents
# and responses to three survey questions
# with response scales from 1 to 5.
testdata <- data.frame(
  var_a = c(1,4,3,5,3,2,3,1,3,NA),
  var_b = c(2,5,2,3,4,1,NA,2,NA,NA),
  var_c = c(1,2,3,NA,3,4,4,5,NA,NA))

# Calculate response distribution indicators
resp_styles(testdata,
            scale_min = 1,
            scale_max = 5) |>
   round(2) # round to second decimal

# Include respondents with NA values by decreasing the
# necessary number of valid responses per respondent.
resp_styles(testdata,
            scale_min = 1,
            scale_max = 5,
            min_valid_responses = 0.2) |>
   round(2) # round to second decimal

# Get counts of responses attributable to response styles.
resp_styles(testdata,
            scale_min = 1,
            scale_max = 5,
            normalize = FALSE)