Skip to contents

This function generates a data frame with the vocabulary of each participant (keeping longitudinal data from the same participant in different rows). Comprehensive and productive vocabulary sizes are computed as raw counts (*_count) and as proportions *_prop.

Usage

bvq_vocabulary(
  participants = bvq_participants(),
  responses = bvq_responses(participants),
  ...,
  .scale = "prop"
)

Arguments

participants

Participants data frame, as generated by bvq_participants().

responses

Responses data frame, as generated by bvq_responses().

...

<dynamic-dots> Unquoted name of the variable(s) to group data into. Vocabulary metrics will be calculated by aggregating responses within the groups that result from the combination of crossing of the variables provided in .... These variables can refer to item properties (see pool, e.g., semantic_category) or to participant properties (see bvq_logs(), e.g., lp).

.scale

A character vector that takes the value "count" and/or "prop". If "prop" (default), vocabulary metrics are calculated as proportions. If "count", vocabulary metrics are reported as counts (number of words).

Value

A dataset (actually, a tibble::tibble with each participant's comprehensive and/or vocabulary size in each language. This data frame contains the following variables:

  • child_id: a character string with five digits indicating a participant's identifier in the database from the Laboratori de Recerca en Infància at Universitat Pompeu Fabra. This value is always the same for each participant, so that different responses from the same participant share the same child_id.

  • response_id: a character string identifying a single response to the questionnaire. This value is always unique for each response to the questionnaire, even for responses from the same participant.

  • age: a numeric value indicating the number of months elapsed since participants' birth date until they filled in the last item of their questionnaire response.

  • type: a character string indicating the vocabulary type computed: "understands" if option "Understands" was selected, and "produces" if option "Understands & Says" was selected.

  • total_count: integer indicating the number of items selected as "Understands" or "Understands and Says" in both languages.

  • l1_count: positive integer indicating the number of items selected as "Understands" or "Understands and Says" in the dominant language (L1).

  • l2_count: positive integer indicating the number of items selected as "Understands" or "Understands and Says" in the non-dominant language (L2).

  • concept_count: positive integer indicating the number of translation equivalents (a.k.a. cross-language synonyms or doublets) in which "at list one of the items was selected as "Understands" or "Understands and Says". This is a measure of the number of lexicalised concepts.

  • te_count: positive integer indicating the number of translation equivalents (out of the total number of items the participant answered to) in which at both items was selected as "Understands" or "Understands and Says". This is a measure of the number of lexicalised concepts.

  • total_prop: numeric value ranging from 0 to 1 (both included) indicating the proportion of items selected as "Understands" or "Understands and Says" in both languages.

  • l1_prop: numeric value ranging from 0 to 1 (both included) indicating the proportion of of items selected as "Understands" or "Understands and Says" in the dominant language (L1).

  • l2_prop: numeric value ranging from 0 to 1 (both included) indicating the proportion of of items selected as "Understands" or "Understands and Says" in the non-dominant language (L2).

  • concept_prop: numeric value ranging from 0 to 1 (both included) indicating the proportion of of translation equivalents (a.k.a. cross-language synonyms or doublets) in which at least one of the items was selected as "Understands" or "Understands and Says". This is a measure of the number of lexicalised concepts.

  • te_prop: numeric value ranging from 0 to 1 (both included) indicating the proportion of of translation equivalents (aka. cross-language synonyms or doublets) in which both items were selected as "Understands" or "Understands and Says". This is a measure of the number of lexicalised concepts. The specific subset of columns returned by bvq_vocabulary() depends on the elements of ... and .scale.

  • contents: list containing the items marked as acquired.

Author

Gonzalo Garcia-Castro