library(multilex)
my_email <- "gonzalo.garciadecastro@upf.edu"
ml_connect(google_email = my_email)
The ml_vocabulary
function allows to extract vocabulary sizes for individual responses to any of the questionnaires:
ml_connect()
#> [1] TRUE
p <- ml_participants()
r <- ml_responses()
ml_vocabulary(participants = p, responses = r)
#> # A tibble: 1,166 x 9
#> id time age type vocab_count_total vocab_count_dom~ vocab_count_dom~
#> <chr> <dbl> <dbl> <chr> <int> <int> <int>
#> 1 bilex~ 1 20.9 produ~ 240 112 128
#> 2 bilex~ 1 20.9 under~ 405 206 199
#> 3 bilex~ 1 17.1 produ~ 3 3 0
#> 4 bilex~ 1 17.1 under~ 114 114 0
#> 5 bilex~ 1 16.4 produ~ 11 6 5
#> 6 bilex~ 1 16.4 under~ 217 145 72
#> 7 bilex~ 1 16.4 produ~ 4 4 0
#> 8 bilex~ 1 16.4 under~ 125 124 1
#> 9 bilex~ 1 16.4 produ~ 5 2 3
#> 10 bilex~ 1 16.4 under~ 95 78 17
#> # ... with 1,156 more rows, and 2 more variables: vocab_count_conceptual <int>,
#> # vocab_count_te <int>
Vocabulary sizes are, by default, computed in two different scales:
By default, four modalities of vocabulary size are computed:
Vocabulary sizes are also computed in two types:
This is what the default output looks like:
library(multilex)
ml_connect()
#> [1] TRUE
p <- ml_participants()
r <- ml_responses(update = FALSE)
ml_vocabulary(participants = p, responses = r)
#> # A tibble: 1,166 x 9
#> id time age type vocab_count_total vocab_count_dom~ vocab_count_dom~
#> <chr> <dbl> <dbl> <chr> <int> <int> <int>
#> 1 bilex~ 1 20.9 produ~ 240 112 128
#> 2 bilex~ 1 20.9 under~ 405 206 199
#> 3 bilex~ 1 17.1 produ~ 3 3 0
#> 4 bilex~ 1 17.1 under~ 114 114 0
#> 5 bilex~ 1 16.4 produ~ 11 6 5
#> 6 bilex~ 1 16.4 under~ 217 145 72
#> 7 bilex~ 1 16.4 produ~ 4 4 0
#> 8 bilex~ 1 16.4 under~ 125 124 1
#> 9 bilex~ 1 16.4 produ~ 5 2 3
#> 10 bilex~ 1 16.4 under~ 95 78 17
#> # ... with 1,156 more rows, and 2 more variables: vocab_count_conceptual <int>,
#> # vocab_count_te <int>
This data frame includes two rows per response: one for comprehensive vocabulary and one for productive vocabulary, and includes the following columns:
id
: participant ID. This ID is unique for every participant and is the same across all responses to the questionnaire from the same participant.time
: how many times has this participant completed any of the questionnaires, including this one?age
: age in months at time of completiontype
: vocabulary size type (understands
for comprehension, produces
for production)vocab_count_total
: total number of item the child was reported to know, summing both languages togethervocab_count_dominance_l1
: number of items the child was reported to know in their dominant language (e.g., Catalan words for a child whose language of most exposure is Catalan)vocab_count_dominance_l2
: number of items the child was reported to know in their non-dominant language (e.g., Spanish words for a child whose language of most exposure is Catalan)vocab_count_conceptual
: number of concepts the child know at least one item for (regadless of the language the item belongs to).vocab_count_te
: number of translation equivalents the child knows (how many concepts the child know one item in each language for).This is what the output looks like when scale = "prop"
:
ml_vocabulary(p, r, scale = "prop")
#> # A tibble: 1,166 x 9
#> id time age type vocab_prop_total vocab_prop_domin~ vocab_prop_domi~
#> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 bilex~ 1 20.9 produ~ 0.338 0.302 0.378
#> 2 bilex~ 1 20.9 under~ 0.570 0.555 0.587
#> 3 bilex~ 1 17.1 produ~ 0.00422 0.00877 0
#> 4 bilex~ 1 17.1 under~ 0.160 0.333 0
#> 5 bilex~ 1 16.4 produ~ 0.0153 0.0173 0.0135
#> 6 bilex~ 1 16.4 under~ 0.303 0.419 0.194
#> 7 bilex~ 1 16.4 produ~ 0.00574 0.0115 0
#> 8 bilex~ 1 16.4 under~ 0.179 0.356 0.00287
#> 9 bilex~ 1 16.4 produ~ 0.00703 0.00585 0.00813
#> 10 bilex~ 1 16.4 under~ 0.134 0.228 0.0461
#> # ... with 1,156 more rows, and 2 more variables: vocab_prop_conceptual <dbl>,
#> # vocab_prop_te <dbl>
This data frame follows a similar structure to the one returned by ml_vocabulary
when run with default arguments, but vocabulary sizes are now expressed as proportions:
id
: participant ID. This ID is unique for every participant and is the same across all responses to the questionnaire from the same participant.time
: how many times has this participant completed any of the questionnaires, including this one?age
: age in months at time of completiontype
: vocabulary size type (understands
for comprehension, produces
for production)vocab_prop_total
: proportion number of item the child was reported to know, summing both languages togethervocab_prop_dominance_l1
: proportion of items the child was reported to know in their dominant language (e.g., Catalan words for a child whose language of most exposure is Catalan)vocab_prop_dominance_l2
: proportion of items the child was reported to know in their non-dominant language (e.g., Spanish words for a child whose language of most exposure is Catalan)vocab_prop_conceptual
: proportion of concepts the child know at least one item for (regadless of the language the item belongs to).vocab_prop_te
: proportion of translation equivalents the child knows (how many concepts the child know one item in each language for).We can also ask for vocabulary sizes expressed in both scales (counts and proportions):
ml_vocabulary(p, r, scale = c("count", "prop"))
#> # A tibble: 1,166 x 14
#> id time age type vocab_count_total vocab_count_dom~ vocab_count_dom~
#> <chr> <dbl> <dbl> <chr> <int> <int> <int>
#> 1 bilex~ 1 20.9 produ~ 240 112 128
#> 2 bilex~ 1 20.9 under~ 405 206 199
#> 3 bilex~ 1 17.1 produ~ 3 3 0
#> 4 bilex~ 1 17.1 under~ 114 114 0
#> 5 bilex~ 1 16.4 produ~ 11 6 5
#> 6 bilex~ 1 16.4 under~ 217 145 72
#> 7 bilex~ 1 16.4 produ~ 4 4 0
#> 8 bilex~ 1 16.4 under~ 125 124 1
#> 9 bilex~ 1 16.4 produ~ 5 2 3
#> 10 bilex~ 1 16.4 under~ 95 78 17
#> # ... with 1,156 more rows, and 7 more variables: vocab_count_conceptual <int>,
#> # vocab_count_te <int>, vocab_prop_total <dbl>,
#> # vocab_prop_dominance_l1 <dbl>, vocab_prop_dominance_l2 <dbl>,
#> # vocab_prop_conceptual <dbl>, vocab_prop_te <dbl>
by
argumentWe can also compute vocabulary sizes conditional to some variables at the item or participant level, such as semantic/functional category (category
), cognate status (cognate
) or language profile (lp
), using the argument by
. Just take a look the variables included i nthe data frames returned by ml_participants()
or in the pool
of items. You can use this argument as:
ml_vocabulary(p, r, by = "dominance")
#> # A tibble: 1,166 x 10
#> id time age type dominance vocab_count_tot~ vocab_count_domina~
#> <chr> <dbl> <dbl> <chr> <chr> <int> <int>
#> 1 bilexico~ 1 20.9 produces Spanish 240 112
#> 2 bilexico~ 1 20.9 underst~ Spanish 405 206
#> 3 bilexico~ 1 17.1 produces Catalan 3 3
#> 4 bilexico~ 1 17.1 underst~ Catalan 114 114
#> 5 bilexico~ 1 16.4 produces Catalan 11 6
#> 6 bilexico~ 1 16.4 underst~ Catalan 217 145
#> 7 bilexico~ 1 16.4 produces Catalan 4 4
#> 8 bilexico~ 1 16.4 underst~ Catalan 125 124
#> 9 bilexico~ 1 16.4 produces Catalan 5 2
#> 10 bilexico~ 1 16.4 underst~ Catalan 95 78
#> # ... with 1,156 more rows, and 3 more variables:
#> # vocab_count_dominance_l2 <int>, vocab_count_conceptual <int>,
#> # vocab_count_te <int>
This data frame follows a similar structure as the ones above, but preserves a column for the variable category
, which indexes that functiona/semantic category the items belongs to. The value of this argument is passed to dplyr’s group_by
under the hood. As with group_by
, you can compute vocabulary sizes for combinations of variables:
ml_vocabulary(p, r, by = c("dominance", "lp"))
#> # A tibble: 1,166 x 11
#> id time age type dominance lp vocab_count_tot~ vocab_count_domi~
#> <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <int>
#> 1 bilexi~ 1 20.9 produ~ Spanish Bili~ 240 112
#> 2 bilexi~ 1 20.9 under~ Spanish Bili~ 405 206
#> 3 bilexi~ 1 17.1 produ~ Catalan Mono~ 3 3
#> 4 bilexi~ 1 17.1 under~ Catalan Mono~ 114 114
#> 5 bilexi~ 1 16.4 produ~ Catalan Mono~ 11 6
#> 6 bilexi~ 1 16.4 under~ Catalan Mono~ 217 145
#> 7 bilexi~ 1 16.4 produ~ Catalan Mono~ 4 4
#> 8 bilexi~ 1 16.4 under~ Catalan Mono~ 125 124
#> 9 bilexi~ 1 16.4 produ~ Catalan Bili~ 5 2
#> 10 bilexi~ 1 16.4 under~ Catalan Bili~ 95 78
#> # ... with 1,156 more rows, and 3 more variables:
#> # vocab_count_dominance_l2 <int>, vocab_count_conceptual <int>,
#> # vocab_count_te <int>