This data frame contains information about the word-forms included in the questionnaire, together with some identifiers used to relate participants’ responses to the information of the word-forms they responded to.
Code
items |>head(10) |> knitr::kable(digits =2)
te
meaning
language
item
ipa
xsampa
lv
n_phon
n_syll
syll
freq
freq_syll
list
115
witch
Catalan
bruixa
ˈbɾu.ʃə
“b4u.S@
0.60
5
2
b4u, S@
6.23
13.00
A, B, C, D
115
witch
Spanish
bruja
ˈbɾu.xa
“b4u.xa
0.60
5
2
b4u, xa
6.23
13.40
A, B, C, D
148
bee
Catalan
abella
əˈβɛ.ʎə
@“BE.L@
0.20
5
3
@ , BE, L@
6.09
20.48
A, B, C, D
148
bee
Spanish
abeja
aˈβe.xa
a”Be.xa
0.20
5
3
a , Be, xa
6.09
21.36
A, B, C, D
149
animal
Catalan
animals
ˈæn.ɪ.məlz
“{n.I.m@lz
0.25
8
3
{n , I , m@lz
5.90
17.71
D
149
animal
Spanish
animales
a.niˈma.les
a.ni”ma.les
0.25
8
4
a , ni , ma , les
5.90
27.70
D
150
spider
Catalan
aranya
əˈɾa.ɲə
@“4a.J@
0.60
5
3
@ , 4a, J@
6.04
19.55
A
150
spider
Spanish
arana
aˈɾa.ɲa
a”4a.Ja
0.60
5
3
a , 4a, Ja
6.04
21.31
A
153
owl
Catalan
mussol
muˈsɔɫ
mu”sO5
0.20
5
2
mu , sO5
5.95
12.57
A, B, C, D
153
owl
Spanish
buho
ˈbu.o
“bu.o
0.20
3
2
bu, o
5.95
13.60
A, B, C, D
te: integer that uniquely labels a translation equivalent, and is only repeated across the word-forms from Catalan and Spanish that are part of the same translation equivalent
meaning: character string that uniquely labels the concept associated to the word-form, and is only repeated across the word-forms from Catalan and Spanish that are part of the same translation equivalent
language: character string indicating the language (Catalan or Spanish) to which the word-form belongs to
item: character string that uniquely identifies the word-form in the questionnaire, and links it to the formr item
ipa: phonological transcription of the word-form in IPA format, generated from the X-SAMPA transcription of the word-form using the ipa::xsampa() function.
xsampa: phonological transcription of the word-form in X-SAMPA format
lv: numeric value indicating the normalised inverse of the Levenshtein distance between the X-SAMPA phonological transcriptions of the word-form and of its translation equivalent, calculated using the stringdist::stringsim() (see the Methods section in the main manuscript for more details)
n_phon: integer indicating the number of phonemes included in the X-SAMPA phonological transcription of the word-form
n_syll: integer indicating the number of syllables included in the X-SAMPA phonological transcription of the word-form
syll: list of character strings in which each element is a syllable included in the word-form
freq: numeric values indicating the lexical frequency in Zipf scores, from the English CHILDES corpora
freq_syll: numeric value indicating the um of the frequency of the syllables in the word-form, expressed as counts per million tokens
list: characters string indicating the questionnaire sub-list(s) in which the word-form appears
This data frame contains demographic and linguistic information about participants participants, together with some identifiers used to relate participants’ responses to their corresponding information.
child_id: integer that uniquely labels participant, and is only repeated across responses to the questionnaire from the same participant
response_id: integer that uniquely labels questionnaire administrations, and is never repeated across questionnaire administrations or participants
time: integer indicating the cumulative number of times the participant has provided a valid response to the questionnaire
time_stamp: date at which the response to the questionnaire was recorded (last item responded)
list: character string indicating the questionnaire sub-list to which the participant responded, which is virtually always the same for the same participant
age: numeric value indicating the age of the participant when their response to the questionnaire was recorded, calculated as the difference in months between such date and the birth date of the participant
lp: character string indicating the language profile of the participant (Monolingual or Bilingual), calculated from doe_catalan and doe_spanish ("Monolingual" if >=80% DoE to Catalan or Spanish, "Bilingual" otherwise)
doe_catalan: numeric value indicating participant’s degree of exposure (DoE) to Catalan, as reported by their caregivers
doe_spanish: numeric value indicating participant’s degree of exposure (DoE) to Spanish, as reported by their caregivers
edu_parent: factor indicating the caregivers maximum educational attainment
This data frame is the one used to fit the main model, and the model included in Appendix A. It contains participants’ responses to each item included in the sub-list of the questionnaire they responded to, together with participant- and word-level predictors of interest.
Code
responses |>head(10) |> knitr::kable(digits =2)
child_id
response_id
time
age
age_std
te
language
meaning
item
response
lv
lv_std
freq
freq_std
n_phon
n_phon_std
doe
doe_std
exposure
exposure_std
54531
872
1
31.64
1.96
115
Catalan
witch
bruixa
Understands and Says
0.60
0.98
6.23
1.10
5
-0.21
0.9
1.36
5.61
1.46
54531
872
1
31.64
1.96
115
Spanish
witch
bruja
Understands
0.60
0.98
6.23
1.10
5
-0.21
0.1
-1.31
0.62
-1.29
54531
872
1
31.64
1.96
148
Catalan
bee
abella
Understands and Says
0.20
-0.58
6.09
0.31
5
-0.21
0.9
1.36
5.48
1.39
54531
872
1
31.64
1.96
148
Spanish
bee
abeja
No
0.20
-0.58
6.09
0.31
5
-0.21
0.1
-1.31
0.61
-1.30
54531
872
1
31.64
1.96
153
Catalan
owl
mussol
Understands and Says
0.20
-0.58
5.95
-0.45
5
-0.21
0.9
1.36
5.35
1.32
54531
872
1
31.64
1.96
153
Spanish
owl
buho
No
0.20
-0.58
5.95
-0.45
3
-1.49
0.1
-1.31
0.59
-1.31
54531
872
1
31.64
1.96
159
Catalan
snail
cargol
Understands and Says
0.29
-0.25
5.84
-1.03
6
0.43
0.9
1.36
5.25
1.26
54531
872
1
31.64
1.96
159
Spanish
snail
caracol
Understands
0.29
-0.25
5.84
-1.03
7
1.07
0.1
-1.31
0.58
-1.32
54531
872
1
31.64
1.96
160
Catalan
zebra
zebra
Understands
0.60
0.98
6.13
0.54
5
-0.21
0.9
1.36
5.52
1.41
54531
872
1
31.64
1.96
160
Spanish
zebra
cebra
No
0.60
0.98
6.13
0.54
5
-0.21
0.1
-1.31
0.61
-1.30
child_id: integer that uniquely labels participant, and is only repeated across responses to the questionnaire from the same participant
response_id: integer that uniquely labels questionnaire administrations, and is never repeated across questionnaire administrations or participants
age: numeric value indicating the age of the participant when their response to the questionnaire was recorded, calculated as the difference in months between such date and the birth date of the participant
age_std: numeric value indicating the participant’s standardised age
te: integer that uniquely labels a translation equivalent, and is only repeated across the word-forms from Catalan and Spanish that are part of the same translation equivalent
language: character string indicating the language (Catalan or Spanish) to which the word-form belongs to
meaning: character string that uniquely labels the concept associated to the word-form, and is only repeated across the word-forms from Catalan and Spanish that are part of the same translation equivalent
item: character string that uniquely identifies the word-form in the questionnaire, and links it to the formr item
response: ordered factor indicating participant’s response to the item, which can take "No", "Understands", or "Understands and Says" as values
lv: numeric value indicating the normalised inverse of the Levenshtein distance between the X-SAMPA phonological transcriptions of the word-form and of its translation equivalent, calculated using the stringdist::stringsim() (see the Methods section in the main manuscript for more details)
lv_std: numeric value indicating the word-form’s standardised lv
freq: numeric values indicating the lexical frequency in Zipf scores, from the English CHILDES corpora
freq_std: numeric value indicating the word-form’s standardised freq
n_phon: integer indicating the number of phonemes included in the X-SAMPA phonological transcription of the word-form
n_phon_std: numeric value indicating the word-form’s standardised n_phon
doe: numeric value indicating participant’s degree of exposure (DoE) to the language the item belongs to
doe_std: numeric value indicating the participant’s standardised doe to the language the item belongs to
exposure: numeric value indicating the participants’ exposure score to the word-form, calculated as the product of freq and doe
doe_std: numeric value indicating the participant’s standardised exposure score for the item and its freq (see the Methods section in the main Manuscript for more details)
Source Code
---title: Data dictionary---```{r setup}#| label: setup#| message: false#| warning: false#| echo: false# load objects -----------------------------------------------------------------targets::tar_config_set( store = here::here("_targets"), script = here::here("_targets.R"))yaml_metadata <- rmarkdown::yaml_front_matter( here::here("manuscript", "manuscript.qmd"))title <- yaml_metadata$titleabstract <- yaml_metadata$abstractthanks <- yaml_metadata$thankstargets::tar_load(items)targets::tar_load(participants)targets::tar_load(responses)library(dplyr)library(tidyr)library(knitr)```::: {.panel-tabset}## ItemsThis data frame contains information about the word-forms included in the questionnaire, together with some identifiers used to relate participants' responses to the information of the word-forms they responded to.```{r items}items |> head(10) |> knitr::kable(digits = 2)```* `te`: integer that uniquely labels a translation equivalent, and is only repeated across the word-forms from Catalan and Spanish that are part of the same translation equivalent* `meaning`: character string that uniquely labels the concept associated to the word-form, and is only repeated across the word-forms from Catalan and Spanish that are part of the same translation equivalent* `language`: character string indicating the language (Catalan or Spanish) to which the word-form belongs to * `item`: character string that uniquely identifies the word-form in the questionnaire, and links it to the formr item* `ipa`: phonological transcription of the word-form in IPA format, generated from the X-SAMPA transcription of the word-form using the `ipa::xsampa()` function.* `xsampa`: phonological transcription of the word-form in X-SAMPA format* `lv`: numeric value indicating the normalised inverse of the Levenshtein distance between the X-SAMPA phonological transcriptions of the word-form and of its translation equivalent, calculated using the `stringdist::stringsim()` (see the Methods section in the main manuscript for more details)* `n_phon`: integer indicating the number of phonemes included in the X-SAMPA phonological transcription of the word-form* `n_syll`: integer indicating the number of syllables included in the X-SAMPA phonological transcription of the word-form* `syll`: list of character strings in which each element is a syllable included in the word-form* `freq`: numeric values indicating the lexical frequency in Zipf scores, from the English CHILDES corpora* `freq_syll`: numeric value indicating the um of the frequency of the syllables in the word-form, expressed as counts per million tokens* `list`: characters string indicating the questionnaire sub-list(s) in which the word-form appears## ParticipantsThis data frame contains demographic and linguistic information about participants participants, together with some identifiers used to relate participants' responses to their corresponding information.```{r participants}participants |> head(10) |> knitr::kable(digits = 2)```* `child_id`: integer that uniquely labels participant, and is only repeated across responses to the questionnaire from the same participant* `response_id`: integer that uniquely labels questionnaire administrations, and is never repeated across questionnaire administrations or participants* `time`: integer indicating the cumulative number of times the participant has provided a valid response to the questionnaire* `time_stamp`: date at which the response to the questionnaire was recorded (last item responded)* `list`: character string indicating the questionnaire sub-list to which the participant responded, which is virtually always the same for the same participant* `age`: numeric value indicating the age of the participant when their response to the questionnaire was recorded, calculated as the difference in months between such date and the birth date of the participant* `lp`: character string indicating the language profile of the participant (Monolingual or Bilingual), calculated from `doe_catalan` and `doe_spanish` (`"Monolingual"` if >=80% DoE to Catalan or Spanish, `"Bilingual"` otherwise)* `doe_catalan`: numeric value indicating participant's degree of exposure (DoE) to Catalan, as reported by their caregivers* `doe_spanish`: numeric value indicating participant's degree of exposure (DoE) to Spanish, as reported by their caregivers* `edu_parent`: factor indicating the caregivers maximum educational attainment## ResponsesThis data frame is the one used to fit the main model, and the model included in Appendix A. It contains participants' responses to each item included in the sub-list of the questionnaire they responded to, together with participant- and word-level predictors of interest. ```{r responses}responses |> head(10) |> knitr::kable(digits = 2)```* `child_id`: integer that uniquely labels participant, and is only repeated across responses to the questionnaire from the same participant* `response_id`: integer that uniquely labels questionnaire administrations, and is never repeated across questionnaire administrations or participants* `age`: numeric value indicating the age of the participant when their response to the questionnaire was recorded, calculated as the difference in months between such date and the birth date of the participant* `age_std`: numeric value indicating the participant's standardised `age`* `te`: integer that uniquely labels a translation equivalent, and is only repeated across the word-forms from Catalan and Spanish that are part of the same translation equivalent* `language`: character string indicating the language (Catalan or Spanish) to which the word-form belongs to * `meaning`: character string that uniquely labels the concept associated to the word-form, and is only repeated across the word-forms from Catalan and Spanish that are part of the same translation equivalent* `item`: character string that uniquely identifies the word-form in the questionnaire, and links it to the formr item* `response`: ordered factor indicating participant's response to the item, which can take `"No"`, `"Understands"`, or `"Understands and Says"` as values* `lv`: numeric value indicating the normalised inverse of the Levenshtein distance between the X-SAMPA phonological transcriptions of the word-form and of its translation equivalent, calculated using the `stringdist::stringsim()` (see the Methods section in the main manuscript for more details)* `lv_std`: numeric value indicating the word-form's standardised `lv`* `freq`: numeric values indicating the lexical frequency in Zipf scores, from the English CHILDES corpora* `freq_std`: numeric value indicating the word-form's standardised `freq`* `n_phon`: integer indicating the number of phonemes included in the X-SAMPA phonological transcription of the word-form* `n_phon_std`: numeric value indicating the word-form's standardised `n_phon`* `doe`: numeric value indicating participant's degree of exposure (DoE) to the `language` the `item` belongs to* `doe_std`: numeric value indicating the participant's standardised `doe` to the `language` the `item` belongs to* `exposure`: numeric value indicating the participants' exposure score to the word-form, calculated as the product of `freq` and `doe`* `doe_std`: numeric value indicating the participant's standardised `exposure` score for the `item` and its `freq` (see the Methods section in the main Manuscript for more details):::