Introduction to bvq • bvq

The function bvq_responses() retrieves participants’ responses to the Barcelona Vocabulary Questionnaire (BVQ) using the formr API, and returns them along participant- and item-level information. This function returns a tidy data frame in which each row is one participant’s response to an individual item.

library(bvq)
library(dplyr)

participants <- bvq_participants()
responses <- bvq_responses(participants)

# select relevant variables
responses %>%
    select(id, time, item, response, randomisation) %>%
    filter(!is.na(response)) # drop unanswered items

## # A tibble: 34,465 × 16
##    child_id response_id  time version      version_list date_birth date_started
##    <chr>    <chr>       <dbl> <chr>        <chr>        <date>     <date>      
##  1 58298    BL1879          2 bvq-lockdown C            2020-04-27 2022-10-08  
##  2 58298    BL1879          2 bvq-lockdown C            2020-04-27 2022-10-08  
##  3 58298    BL1879          2 bvq-lockdown C            2020-04-27 2022-10-08  
##  4 58298    BL1879          2 bvq-lockdown C            2020-04-27 2022-10-08  
##  5 58298    BL1879          2 bvq-lockdown C            2020-04-27 2022-10-08  
##  6 58298    BL1879          2 bvq-lockdown C            2020-04-27 2022-10-08  
##  7 58298    BL1879          2 bvq-lockdown C            2020-04-27 2022-10-08  
##  8 58298    BL1879          2 bvq-lockdown C            2020-04-27 2022-10-08  
##  9 58298    BL1879          2 bvq-lockdown C            2020-04-27 2022-10-08  
## 10 58298    BL1879          2 bvq-lockdown C            2020-04-27 2022-10-08  
## # ℹ 34,455 more rows
## # ℹ 9 more variables: date_finished <date>, item <chr>, response <int>,
## #   sex <chr>, doe_catalan <dbl>, doe_spanish <dbl>, doe_others <dbl>,
## #   edu_parent1 <chr>, edu_parent2 <chr>

Consulting participant-level information

Participant-level properties like language profile variables can be extracted using the bvq_logs() function.

bvq_logs(participants, responses) %>% 
    select(child_id, time, age, lp, starts_with("doe_"))

## # A tibble: 38 × 7
##    child_id  time   age lp          doe_spanish doe_catalan doe_others
##    <chr>    <dbl> <dbl> <chr>             <dbl>       <dbl>      <dbl>
##  1 58298        2  29.4 Monolingual        0           1           0  
##  2 58361        1  24.4 Bilingual          0.75        0.25        0  
##  3 58298        1  24.9 Monolingual        0.1         0.9         0  
##  4 58068        2  26.2 Bilingual          0.25        0.75        0  
##  5 57177        4  30.6 Bilingual          0.2         0.7         0.1
##  6 56911        4  32.2 Monolingual        0.15        0.85        0  
##  7 57436        4  30.8 Bilingual          0.35        0.65        0  
##  8 57534        1  21.5 Monolingual        0.05        0.95        0  
##  9 57436        3  25.5 Bilingual          0.4         0.6         0  
## 10 57336        1  21.2 Bilingual          0.4         0.6         0  
## # ℹ 28 more rows

Item properties

Item-level properties can be consulted in the pool data frame (see ?pool):

data("pool")
pool

## # A tibble: 1,590 × 14
##    item          language    te label xsampa n_lemmas is_multiword subtlex_lemma
##    <chr>         <chr>    <int> <chr> <chr>     <int> <lgl>        <chr>        
##  1 cat_pessigol… Catalan      1 (fer… "p@.s…        1 FALSE        pessigolles  
##  2 cat_abracar   Catalan      2 abra… "@.B4…        1 FALSE        abraçar      
##  3 cat_obrir     Catalan      3 obrir "u\"B…        1 FALSE        obrir        
##  4 cat_acabar    Catalan      4 acab… "@.k@…        1 FALSE        acabar       
##  5 cat_llancar   Catalan      5 llan… "L@n\…        1 FALSE        llançar      
##  6 cat_apagar    Catalan      6 apag… "@.p@…        1 FALSE        apagar       
##  7 cat_aprendre  Catalan      7 apre… "@\"p…        1 FALSE        aprendre     
##  8 cat_esgarrap… Catalan      8 esga… "@z.g…        1 FALSE        esgarrapar   
##  9 cat_ajudar    Catalan      9 ajud… "@.Zu…        1 FALSE        ajudar       
## 10 cat_ballar    Catalan     10 ball… "b@\"…        1 FALSE        ballar       
## # ℹ 1,580 more rows
## # ℹ 6 more variables: wordbank_lemma <chr>, childes_lemma <chr>,
## #   semantic_category <chr>, class <chr>, version <list>, include <lgl>

Computing vocabulary sizes

The bvq_vocabulary() function allows to extract vocabulary sizes for individual responses to any of the questionnaires. It takes the output of the bvq_responses() function as an argument, and returns several measures of vocabulary size base on such data frame.

To compute vocabulary size, we first need to run bvq_responses() (although if this argument is not provided, bvq_responses() is run under the hood):

bvq_vocabulary(participants, 
               responses,
               lp, # to keep participants' language profile
               .scale = "prop" # to return estimates as proportions
)

## # A tibble: 76 × 10
##    child_id response_id type       lp    total_prop l1_prop l2_prop concept_prop
##    <chr>    <chr>       <chr>      <chr>      <dbl>   <dbl>   <dbl>        <dbl>
##  1 58298    BL1879      understan… Mono…    0.518    0.924   0.141        0.881 
##  2 58298    BL1879      produces   Mono…    0.430    0.784   0.103        0.743 
##  3 58361    BL1863      understan… Bili…    0.703    0.805   0.601        0.826 
##  4 58361    BL1863      produces   Bili…    0.476    0.573   0.379        0.617 
##  5 58298    BL1848      understan… Mono…    0.370    0.702   0.0623       0.662 
##  6 58298    BL1848      produces   Mono…    0.00985  0.0205  0            0.0189
##  7 58068    BL1833      understan… Bili…    0.737    0.850   0.633        0.846 
##  8 58068    BL1833      produces   Bili…    0.328    0.528   0.146        0.526 
##  9 57177    BL1748      understan… Bili…    0.837    0.814   0.857        0.887 
## 10 57177    BL1748      produces   Bili…    0.568    0.678   0.466        0.757 
## # ℹ 66 more rows
## # ℹ 2 more variables: te_prop <dbl>, contents <list>

Computing word acquisition norms

The function bvq::bvq_norms() computes the proportion of children in the sample that understand or produce each item, sometimes called word prevalence. This function returns the estimated probability of an average participant understanding or producing each word. The bvq_norms() function allows to condition this probability on the age, language profile or language dominance of participants, among other variables. Proportions are adjusted for zero- and one-inflation following Gelman, Hill, and Vehtari (2020).

# items we want to compute norms for
bvq_norms(participants,
          responses,
          item = c("cat_gos", "cat_gat"),
          age = c(12, 35)
)

## # A tibble: 96 × 9
##       te item    label   age type     item_dominance  .sum    .n .prop
##    <int> <chr>   <chr> <dbl> <chr>    <chr>          <int> <int> <dbl>
##  1   173 cat_gat gat      12 produces L1                 0     1 0.4  
##  2   173 cat_gat gat      12 produces L2                 0     3 0.286
##  3   173 cat_gat gat      13 produces L1                 0     1 0.4  
##  4   173 cat_gat gat      13 produces L2                 0     1 0.4  
##  5   173 cat_gat gat      14 produces L2                 0     1 0.4  
##  6   173 cat_gat gat      14 produces L1                 0     1 0.4  
##  7   173 cat_gat gat      15 produces L1                 0     1 0.4  
##  8   173 cat_gat gat      15 produces L2                 0     1 0.4  
##  9   173 cat_gat gat      17 produces L1                 0     1 0.4  
## 10   173 cat_gat gat      19 produces L2                 0     1 0.4  
## # ℹ 86 more rows