An interactive account of bilingual lexical acquisition
Word learning is challenging: variability, ambiguity
Earliest evidence of word acquisition: 6 months of age
Bilingual word acquisition is more complex: more than one word-form per referent (gos → DOG ← perro)
Do bilinguals fall behind? Mixed evidence across language pairs: English-French, Catalan-Spanish, etc.
Bilingual toddlers learning two languages that share more cognates show larger vocabulary sizes
Cognate | Non-cognate |
---|---|
[cat] /ˈgat-ˈga.to/ | [dog] /ˈgos-ˈpe.ro/ |
Cognates are acquired earlier than non-cognates. Why?
Word acquisition as a continuous process of lexical consolidation: accumulation of word learning instances
For participant i and word j:
\begin{aligned} \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \cdot \text{Exposure}_i\\ \text{Frequency}_j &\sim \text{Poisson}(\lambda) \\\\ \textbf{For simulations:}~ \lambda &= 50 \end{aligned}
Simulation 1: no parallel activation
Catalan | Spanish | |
---|---|---|
Language exposure | 60% | 40% |
Catalan | Spanish | |
---|---|---|
Language exposure | 60% | 40% |
Simulation 2: parallel activation
Cognate
Non-cognate
Catalan | Spanish | |
---|---|---|
Language exposure | 60% | 40% |
Catalan | Spanish | |
---|---|---|
Language exposure | 60% | 40% |
Data collection
Barcelona Vocabulary Questionnaire (BVQ)
Understands | Understands & Says | |
---|---|---|
chair | [ x ] | [ ] |
table | [ ] | [ ] |
… | [ ] | [ x ] |
Fenson et al. (1994)
138,078 item responses from 366 Catalan-Spanish bilinguals
1 time | 2 times | 3 times | 4 times |
---|---|---|---|
312 | 42 | 8 | 4 |
Modelling and statistical inference
Multilevel, ordinal regression model:
Bayesian (brms
/Stan): probability of parameter values
P(\text{model} | \text{data}) \propto P(\text{data} | \text{model}) \times P(\text{model})
Two-way and three-way interactions between age, exposure, and cognateness
Predictor | Estimate | 95% HDI | p(H0) |
---|---|---|---|
Intercepts | |||
Comprehension and Production | 0.438 | [-0.5, 0.5] | 0.088 |
Comprehension | 0.936 | [2.44, 0.95] | 0.000 |
Slopes | |||
Age (+1 SD, 4.87, months) | 0.405 | [1.43, 0.45] | 0.000 |
Exposure (+1 SD, 1.81) | 0.233 | [0.8, 0.27] | 0.000 |
Cognateness (+1 SD, 0.26) | 0.058 | [0.06, 0.1] | 0.037 |
Length (+1 SD, 1.56 phonemes) | -0.062 | [-0.35, -0.04] | 0.000 |
Age × Exposure | 0.071 | [0.16, 0.1] | 0.000 |
Age × Cognateness | 0.014 | [0, 0.03] | 0.985 |
Exposure × Cognateness | -0.057 | [-0.28, -0.05] | 0.000 |
Age × Exposure × Cognateness | -0.018 | [-0.11, -0.01] | 0.975 |
Thanks!
Levenshtein distance: number of edits for two character strings to become identical
Orthography | Phonology | String | |
---|---|---|---|
Catalan | porta | /ˈpɔɾ.tə/ | pɔɾtə |
Spanish | puerta | /ˈpweɾ.ta/ | pweɾta |
1-\frac{lev(A, B)}{Max(length(A), length(B))}
Catalan | Spanish | Levenshtein |
---|---|---|
porta (/ˈpɔɾ.tə/) | puerta (/ˈpweɾ.ta/) | 0.50 (3) |
taula (/ˈtaw.lə/) | mesa* (/ˈmesa/) | 0.00 (5) |
cotxe (/ˈkɔ.t͡ʃə/) | coche (/ˈkot͡ʃe/) | 0.40 (3) |
… | … | … |
For participant i and word j:
\begin{aligned} \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \\ \text{Frequency}_j &\sim \text{Poisson}(\lambda) \\\\ \textbf{For simulations:}~ \lambda &= 50 \end{aligned}
For participant i and word j:
\begin{aligned} \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \\ \text{Frequency}_j &\sim \text{Poisson}(\lambda) \\ \\ \textbf{For simulations:}\\ \lambda &= 50 \\ \text{Threshold} &= 300 \end{aligned}
For participant i and word j:
\begin{aligned} \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \\ \text{Frequency}_j &\sim \text{Poisson}(\lambda) \\ \\ \textbf{For simulations:}\\ \lambda &= 50 \\ \text{Threshold} &= 300 \\ \\ \text{Age of Acquisition}_{ij} &= \text{Age}_{i~[\text{Threshold]}} \end{aligned}
For participant i and word j:
\begin{aligned} \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \\ \text{Frequency}_j &\sim \text{Poisson}(\lambda) \\ \\ \textbf{For simulations:}\\ \lambda &= 50 \\ \text{Threshold} &= 300 \\ \\ \text{Age of Acquisition}_{ij} &= \text{Age}_{i~[\text{Threshold]}} \end{aligned}
Including learning instances from parallel activation:
Hypothesis: word-representations receive learning instances from their translations
Proportional to the amount of form-similarity (cognateness)
\begin{aligned} \textbf{Monolinguals:} \\ \text{Learning instances}_{ij} &= Age_i \cdot Frequency_j \end{aligned}
\begin{aligned} \textbf{Bilinguals:} \\ \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \cdot \text{Exposure}_i+ \\ &(\text{Cognateness}_j \cdot \text{Learning instances}_{ij'}) \end{aligned}
International Symposium of Psycholinguistics | Vitoria, 31st May, 2023