The function bvq_responses()
retrieves participants’
responses to the Barcelona Vocabulary Questionnaire (BVQ) using the formr API, and returns them
along participant- and item-level information. This function returns a
tidy data frame in which each row is one participant’s response to an
individual item.
responses <- bvq_responses()
# select relevant variables
responses %>%
select(id, time, item, response, randomisation) %>%
drop_na(response) # drop unanswered items
#> # A tibble: 34,465 × 16
#> child_id response_id time version version_list date_birth date_started
#> <chr> <chr> <dbl> <chr> <chr> <date> <date>
#> 1 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 2 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 3 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 4 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 5 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 6 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 7 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 8 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 9 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 10 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> # ℹ 34,455 more rows
#> # ℹ 9 more variables: date_finished <date>, item <chr>, response <int>,
#> # sex <chr>, doe_catalan <dbl>, doe_spanish <dbl>, doe_others <dbl>,
#> # edu_parent1 <chr>, edu_parent2 <chr>
This dataset is the base for many analyses of interest, like
participants’ vocabulary size, word prevalence, or modelling item-level
probability of acquisition. This package offers several functions to do
this (e.g., bvq_vocabulary()
, bvq_norms()
),
but one could already do this from this dataset.
Additional participant-level properties like language profile
variables can be extracted using the bvq_logs()
function.
logs <- bvq_logs(responses = responses)
logs %>%
select(
child_id, time, age, lp,
starts_with("doe_")
)
Item-level properties can be consulted in the pool
dataset (See ?pool):
data("pool")
pool
#> # A tibble: 1,590 × 14
#> item language te label xsampa n_lemmas is_multiword subtlex_lemma
#> <chr> <chr> <int> <chr> <chr> <int> <lgl> <chr>
#> 1 cat_pessigol… Catalan 1 (fer… "p@.s… 1 FALSE pessigolles
#> 2 cat_abracar Catalan 2 abra… "@.B4… 1 FALSE abraçar
#> 3 cat_obrir Catalan 3 obrir "u\"B… 1 FALSE obrir
#> 4 cat_acabar Catalan 4 acab… "@.k@… 1 FALSE acabar
#> 5 cat_llancar Catalan 5 llan… "L@n\… 1 FALSE llançar
#> 6 cat_apagar Catalan 6 apag… "@.p@… 1 FALSE apagar
#> 7 cat_aprendre Catalan 7 apre… "@\"p… 1 FALSE aprendre
#> 8 cat_esgarrap… Catalan 8 esga… "@z.g… 1 FALSE esgarrapar
#> 9 cat_ajudar Catalan 9 ajud… "@.Zu… 1 FALSE ajudar
#> 10 cat_ballar Catalan 10 ball… "b@\"… 1 FALSE ballar
#> # ℹ 1,580 more rows
#> # ℹ 6 more variables: wordbank_lemma <chr>, childes_lemma <chr>,
#> # semantic_category <chr>, class <chr>, version <list>, include <lgl>
Longitudinal responses
Several participants have filled the questionnaire more than once.
All questionnaire responses included in any dataset returned by any
function in BVQ have an associated time
value. This
variable indexes how many times that specific participant has filled the
questionnaire (any version), including their last response. This allows
to track each participant’s responses across time and perform
longitudinal analyses.
By default, bvq_responses()
retrieves all responses.
Using get_longitudinal()
, we can filter what cases we want
to keep. The argument longitudinal
takes one of the
following character strings:
-
"all"
: all responses are returned -
"no"
: participants with more than one response to the questionnaire (any version) are excluded from the output -
"first"
: only the first response from each participant is returned (including responses of participant that only responded once) -
"last"
: only the most recent response from each participant is returned (including responses of participant that only responded once) -
"only"
: only responses from participants that filled the questionnaire more than once are returned.
Setting longitudinal = "only"
is especially useful to
perform repeated measures analyses. For example:
#> # A tibble: 16,077 × 8
#> child_id time item response version_list date_birth date_started
#> <chr> <dbl> <chr> <int> <chr> <date> <date>
#> 1 58298 2 cat_astronauta 3 C 2020-04-27 2022-10-08
#> 2 58298 2 cat_abella 3 C 2020-04-27 2022-10-08
#> 3 58298 2 cat_ales 3 C 2020-04-27 2022-10-08
#> 4 58298 2 cat_capo 1 C 2020-04-27 2022-10-08
#> 5 58298 2 cat_com 3 C 2020-04-27 2022-10-08
#> 6 58298 2 cat_ara 3 C 2020-04-27 2022-10-08
#> 7 58298 2 cat_anar2 3 C 2020-04-27 2022-10-08
#> 8 58298 2 cat_blanc 3 C 2020-04-27 2022-10-08
#> 9 58298 2 cat_i 3 C 2020-04-27 2022-10-08
#> 10 58298 2 cat_aigua1 3 C 2020-04-27 2022-10-08
#> # ℹ 16,067 more rows
#> # ℹ 1 more variable: date_finished <date>
Please note
The values of time
in the outcome of
bvq_participants()
and the outcome of the rest of the
functions may not be identical. This is because in
bvq_participants()
this value increases in one unit every
time a given participant is sent the questionnaire, even if they do not
end up filling it. In contrast, the value of time
in the
rest of the functions (e.g., bvq_responses()
,
bvq_logs()
) only increases when the questionnaire is
filled. Since the outcome of bvq_participants()
is mainly
intended for internal use, you don’t have to worry about this as long as
you don’t try to cross the outcomes of bvq_participants()
and the rest of the functions.