This data frame contains information about the word-forms included in the questionnaire, together with some identifiers used to relate participants’ responses to the information of the word-forms they responded to.

Code
items |>
  head(10) |>
  knitr::kable(digits = 2)
te meaning language item ipa xsampa lv n_phon n_syll syll freq freq_syll list
115 witch Catalan bruixa ˈbɾu.ʃə “b4u.S@ 0.60 5 2 b4u, S@ 6.23 13.00 A, B, C, D
115 witch Spanish bruja ˈbɾu.xa “b4u.xa 0.60 5 2 b4u, xa 6.23 13.40 A, B, C, D
148 bee Catalan abella əˈβɛ.ʎə @“BE.L@ 0.20 5 3 @ , BE, L@ 6.09 20.48 A, B, C, D
148 bee Spanish abeja aˈβe.xa a”Be.xa 0.20 5 3 a , Be, xa 6.09 21.36 A, B, C, D
149 animal Catalan animals ˈæn.ɪ.məlz “{n.I.m@lz 0.25 8 3 {n , I , m@lz 5.90 17.71 D
149 animal Spanish animales a.niˈma.les a.ni”ma.les 0.25 8 4 a , ni , ma , les 5.90 27.70 D
150 spider Catalan aranya əˈɾa.ɲə @“4a.J@ 0.60 5 3 @ , 4a, J@ 6.04 19.55 A
150 spider Spanish arana aˈɾa.ɲa a”4a.Ja 0.60 5 3 a , 4a, Ja 6.04 21.31 A
153 owl Catalan mussol muˈsɔɫ mu”sO5 0.20 5 2 mu , sO5 5.95 12.57 A, B, C, D
153 owl Spanish buho ˈbu.o “bu.o 0.20 3 2 bu, o 5.95 13.60 A, B, C, D
  • te: integer that uniquely labels a translation equivalent, and is only repeated across the word-forms from Catalan and Spanish that are part of the same translation equivalent
  • meaning: character string that uniquely labels the concept associated to the word-form, and is only repeated across the word-forms from Catalan and Spanish that are part of the same translation equivalent
  • language: character string indicating the language (Catalan or Spanish) to which the word-form belongs to
  • item: character string that uniquely identifies the word-form in the questionnaire, and links it to the formr item
  • ipa: phonological transcription of the word-form in IPA format, generated from the X-SAMPA transcription of the word-form using the ipa::xsampa() function.
  • xsampa: phonological transcription of the word-form in X-SAMPA format
  • lv: numeric value indicating the normalised inverse of the Levenshtein distance between the X-SAMPA phonological transcriptions of the word-form and of its translation equivalent, calculated using the stringdist::stringsim() (see the Methods section in the main manuscript for more details)
  • n_phon: integer indicating the number of phonemes included in the X-SAMPA phonological transcription of the word-form
  • n_syll: integer indicating the number of syllables included in the X-SAMPA phonological transcription of the word-form
  • syll: list of character strings in which each element is a syllable included in the word-form
  • freq: numeric values indicating the lexical frequency in Zipf scores, from the English CHILDES corpora
  • freq_syll: numeric value indicating the um of the frequency of the syllables in the word-form, expressed as counts per million tokens
  • list: characters string indicating the questionnaire sub-list(s) in which the word-form appears

This data frame contains demographic and linguistic information about participants participants, together with some identifiers used to relate participants’ responses to their corresponding information.

Code
participants |>
  head(10) |>
  knitr::kable(digits = 2)
child_id response_id time time_stamp list age sex lp doe_catalan doe_spanish edu_parent
54531 872 1 2020-04-19 bvq-short 31.64 male Monolingual 0.9 0.1 University
54794 828 1 2020-04-15 bvq-short 29.31 female Monolingual 1.0 0.0 University
54828 826 1 2020-06-11 bvq-short 30.78 male Monolingual 1.0 0.0 University
54881 1197 1 2020-06-03 bvq-short 29.21 female Bilingual 0.5 0.5 Complementary
54939 952 1 2020-05-08 bvq-short 28.91 female Bilingual 0.2 0.7 University
54974 1051 1 2020-05-23 bvq-short 29.24 male Bilingual 0.4 0.6 Vocational
54978 940 1 2020-05-18 bvq-short 28.98 female Monolingual 0.1 0.9 University
55011 942 1 2020-05-18 bvq-short 28.45 male Monolingual 0.2 0.8 Vocational
55027 811 1 2020-04-13 bvq-short 27.14 male Bilingual 0.6 0.3 University
55056 950 1 2020-05-08 bvq-short 27.99 male Monolingual 0.1 0.8 University
  • child_id: integer that uniquely labels participant, and is only repeated across responses to the questionnaire from the same participant
  • response_id: integer that uniquely labels questionnaire administrations, and is never repeated across questionnaire administrations or participants
  • time: integer indicating the cumulative number of times the participant has provided a valid response to the questionnaire
  • time_stamp: date at which the response to the questionnaire was recorded (last item responded)
  • list: character string indicating the questionnaire sub-list to which the participant responded, which is virtually always the same for the same participant
  • age: numeric value indicating the age of the participant when their response to the questionnaire was recorded, calculated as the difference in months between such date and the birth date of the participant
  • lp: character string indicating the language profile of the participant (Monolingual or Bilingual), calculated from doe_catalan and doe_spanish ("Monolingual" if >=80% DoE to Catalan or Spanish, "Bilingual" otherwise)
  • doe_catalan: numeric value indicating participant’s degree of exposure (DoE) to Catalan, as reported by their caregivers
  • doe_spanish: numeric value indicating participant’s degree of exposure (DoE) to Spanish, as reported by their caregivers
  • edu_parent: factor indicating the caregivers maximum educational attainment

This data frame is the one used to fit the main model, and the model included in Appendix A. It contains participants’ responses to each item included in the sub-list of the questionnaire they responded to, together with participant- and word-level predictors of interest.

Code
responses |>
  head(10) |>
  knitr::kable(digits = 2)
child_id response_id time age age_std te language meaning item response lv lv_std freq freq_std n_phon n_phon_std doe doe_std exposure exposure_std
54531 872 1 31.64 1.96 115 Catalan witch bruixa Understands and Says 0.60 0.98 6.23 1.10 5 -0.21 0.9 1.36 5.61 1.46
54531 872 1 31.64 1.96 115 Spanish witch bruja Understands 0.60 0.98 6.23 1.10 5 -0.21 0.1 -1.31 0.62 -1.29
54531 872 1 31.64 1.96 148 Catalan bee abella Understands and Says 0.20 -0.58 6.09 0.31 5 -0.21 0.9 1.36 5.48 1.39
54531 872 1 31.64 1.96 148 Spanish bee abeja No 0.20 -0.58 6.09 0.31 5 -0.21 0.1 -1.31 0.61 -1.30
54531 872 1 31.64 1.96 153 Catalan owl mussol Understands and Says 0.20 -0.58 5.95 -0.45 5 -0.21 0.9 1.36 5.35 1.32
54531 872 1 31.64 1.96 153 Spanish owl buho No 0.20 -0.58 5.95 -0.45 3 -1.49 0.1 -1.31 0.59 -1.31
54531 872 1 31.64 1.96 159 Catalan snail cargol Understands and Says 0.29 -0.25 5.84 -1.03 6 0.43 0.9 1.36 5.25 1.26
54531 872 1 31.64 1.96 159 Spanish snail caracol Understands 0.29 -0.25 5.84 -1.03 7 1.07 0.1 -1.31 0.58 -1.32
54531 872 1 31.64 1.96 160 Catalan zebra zebra Understands 0.60 0.98 6.13 0.54 5 -0.21 0.9 1.36 5.52 1.41
54531 872 1 31.64 1.96 160 Spanish zebra cebra No 0.60 0.98 6.13 0.54 5 -0.21 0.1 -1.31 0.61 -1.30
  • child_id: integer that uniquely labels participant, and is only repeated across responses to the questionnaire from the same participant
  • response_id: integer that uniquely labels questionnaire administrations, and is never repeated across questionnaire administrations or participants
  • age: numeric value indicating the age of the participant when their response to the questionnaire was recorded, calculated as the difference in months between such date and the birth date of the participant
  • age_std: numeric value indicating the participant’s standardised age
  • te: integer that uniquely labels a translation equivalent, and is only repeated across the word-forms from Catalan and Spanish that are part of the same translation equivalent
  • language: character string indicating the language (Catalan or Spanish) to which the word-form belongs to
  • meaning: character string that uniquely labels the concept associated to the word-form, and is only repeated across the word-forms from Catalan and Spanish that are part of the same translation equivalent
  • item: character string that uniquely identifies the word-form in the questionnaire, and links it to the formr item
  • response: ordered factor indicating participant’s response to the item, which can take "No", "Understands", or "Understands and Says" as values
  • lv: numeric value indicating the normalised inverse of the Levenshtein distance between the X-SAMPA phonological transcriptions of the word-form and of its translation equivalent, calculated using the stringdist::stringsim() (see the Methods section in the main manuscript for more details)
  • lv_std: numeric value indicating the word-form’s standardised lv
  • freq: numeric values indicating the lexical frequency in Zipf scores, from the English CHILDES corpora
  • freq_std: numeric value indicating the word-form’s standardised freq
  • n_phon: integer indicating the number of phonemes included in the X-SAMPA phonological transcription of the word-form
  • n_phon_std: numeric value indicating the word-form’s standardised n_phon
  • doe: numeric value indicating participant’s degree of exposure (DoE) to the language the item belongs to
  • doe_std: numeric value indicating the participant’s standardised doe to the language the item belongs to
  • exposure: numeric value indicating the participants’ exposure score to the word-form, calculated as the product of freq and doe
  • doe_std: numeric value indicating the participant’s standardised exposure score for the item and its freq (see the Methods section in the main Manuscript for more details)