Bilingual Insights into the Initial Lexicon

The Role of Cognates in Word Acquisition

Gonzalo Garcia-Castro

PhD Defence / Departament de Medicina i Ciències de la Vida

2024-11-03


The initial lexicon

Average 20-year-old knows ~42,000 lemmas: mental lexicon

Lexical representations
Phonological, conceptual, grammatical information of known words

First lexical representations at 6-9 months

Normative trajectories of lexical development

Vocabulary size norms for 51,800 monolingual children learning 35 distinct languages (Frank et al. 2017)

Bilinguals face additional challenges, but do not lag behind



Increased complexity in linguistic context

Reduced linguistic input in each language

Increased referential ambiguity

Two overlapping codes

Split into two languages

> 2 labels per referent

Bilinguals face additional challenges, but do not lag behind

Hoff et al. (2012): bilinguals acquire words at similar rates as monolinguals

Lexical similarity modulates vocabulary growth in bilinguals

Floccia et al. (2018): CDI responses of 372 bilinguals learning English + additional language

Lexical similarity: Average phonological similarity (Levenshtein) between pairs of translations


Higher lexical similarity, larger vocabulary size

Stronger effect in the additional language (e.g., Dutch, Mandarin)

Lexical similarity modulates vocabulary acquisition in bilinguals

A cognate facilitation in lexical acquisition?

Cognates: Phonologically-similar translation equivalents

Cognate Non-cognate
[cat] /ˈgat-ˈga.to/ [dog] /ˈgos-ˈpe.ro/


What mechanisms support a cognate facilitation during word acquisition?

Lexical access is language non-selective in bilinguals

The present dissertation

Study 1

  1. Provide a mechanistic account for the cognateness facilitation
  2. Test predictions of the model

Under review in Child Development (R2),

Study 2

  1. Test core assumption of the model: Language non-selectivity in the initial lexicon

In preparation

Study 1

Cognate beginnings to lexical acquisition: The AMBLA model

Accumulator Model of Bilingual Lexical Acquisition (AMBLA)

  1. Accumulation of information about form-meaning mappings:

Learning instances: Exposure to a word-form that results in the accumulation of information about its meaning

  1. Age of acquisition: The infant accumulates a threshold amount of learning instances for a word-form

\[ \begin{aligned} \definecolor{myred}{RGB}{ 168, 0, 53 } \definecolor{myblue}{RGB}{ 0, 64, 168 } \definecolor{mygreen}{RGB}{0, 168, 87} \definecolor{grey}{RGB}{128, 128, 128} \textbf{For participant } &i \textbf{ and word-form } j \text{ (translation of } j'): \\ {\color{mygreen}\text{Age of Acquisition}_{ij}} &= \{\text{Age}_i \mid {\color{myred}\text{Learning instances}_{ij}} = {\color{myblue}\text{Threshold}} \}\\ {\color{myred}\text{Learning instances}_{ij}} &= \text{Age}_i \cdot \text{Freq}_j \\ \end{aligned} \]

AMBLA: Simulating monolingual word acquisition

Catalan monolingual child

  • /’gos/ (Catalan), 100%

Parameters:

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]

AMBLA: Simulating monolingual word acquisition

Catalan monolingual child

  • /’gos/ (Catalan), 100%

Parameters:

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]

AMBLA: Simulating bilingual word acquisition

  1. Linguistic input divided into two languages: Catalan 60%, Spanish 40%

Exposure: Proportion of time exposed to the language of \(j\) word

Accumulation of learning instances, a function of Exposure and Frequency.

\[ \begin{aligned} \textbf{For participant } &i \textbf{ and word-form } j \text{ (translation of } j'): \\ \text{Age of Acquisition}_{ij} &= \{\text{Age}_i \mid \text{Learning instances}_{ij} = \text{Threshold} \}\\ \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Freq}_j \cdot {\color{myred}\text{Exposure}_{ij}}\\ \end{aligned} \]

AMBLA: Simulating bilingual word acquisition

Catalan monolingual child

  • /’gos/ (Catalan), 100%

Catalan/Spanish bilingual child

  • /’gos/ (Catalan), 60%

  • /’pe.ro/ (Spanish), 40%

Parameters:

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]

AMBLA: Simulating bilingual word acquisition

Catalan monolingual child

  • /’gos/ (Catalan), 100%

Catalan/Spanish bilingual child

  • /’gos/ (Catalan), 60%

  • /’pe.ro/ (Spanish), 40%

Parameters:

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]

AMBLA: Simulating a cognate facilitation

  1. Words may accumulate additional learning instances from the co-activation of their (phonologically similar) translation equivalent

Degree proportional to their phonological similarity (Cognateness)

\[ \begin{aligned} \textbf{For participant } &i \textbf{ and word-form } j \text{ (translation of } j'): \\ \text{Age of Acquisition}_{ij} &= \{\text{Age}_i \mid \text{Learning instances}_{ij} = \text{Threshold} \}\\ \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Freq}_j \cdot \text{Exposure}_{ij} + \\ &({\color{myred}\text{Learning instances}_{ij'} \cdot {\text{Cognateness}}_{j}})\\ \textbf{where:} \\ {\color{myred}\text{Cognateness}_{j,j'}}&{\color{myred} = \text{Levenshtein}(j, j')} \end{aligned} \]

AMBLA: Simulating a cognate facilitation

Catalan monolingual child

  • /’gat/ (Catalan), 100%

Catalan/Spanish bilingual child

  • /’gat/ (Catalan), 60%

  • /’ga.to/ (Spanish), 40%

Parameters:

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \\ \text{Cognateness}_{j,j'} = 0.75 \end{aligned} \]

AMBLA: Simulating a cognate facilitation

Catalan monolingual child:

  • /’gat/ (Catalan), 100%

Catalan/Spanish bilingual child:

  • /’gat/ (Catalan), 60%

  • /’ga.to/ (Spanish), 40%

Parameters:

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \\ \text{Cognateness}_{j,j'} = 0.75 \end{aligned} \]

Predictions

  1. Cognates acquired earlier than non-cognates
  2. Cognateness facilitation stronger in the lower-exposure language

Predictions

  1. Cognates acquired earlier than non-cognates
  2. Cognateness facilitation stronger in the lower-exposure language

Barcelona Vocabulary Questionnaire (BVQ)


  • Online, open source
  • \(\approx\) 1,600 words (800 Cat., 800 Spa.)
  • 4 sublists, random allocation

Results: Comprehension

366 participants (12-32 mo), 436 administrations \(\times\) 604 noun words

Ordinal, multilevel (Bayesian) regression model

\(p(\text{Comprehension}, \text{Production}) \sim \text{Exposure}_{ij} \cdot \text{Cognateness}_j\)

Results: Production

366 participants (12-32 mo), 436 administrations \(\times\) 604 noun words

Ordinal, multilevel (Bayesian) regression model

\(p(\text{Comprehension}, \text{Production}) \sim \text{Exposure}_{ij} \cdot \text{Cognateness}_j\)

Discussion

Earlier acquisition for cognates vs. non-cognates

Cognate facilitation moderated by exposure

Only words from the lower exposure benefit from cognateness

Cognateness as a candidate mechanism underlying Floccia et al.’s results

Cross-language facilitation via co-activation of phonologically similar translation equivalents

Is language-non selectivity already present in the initial lexicon?

Study 2

Developmental trajectories of bilingual spoken word recognition


Language non-selectivity in the initial lexicon



Some evidence in infants and children (e.g., Von Holzen and Mani 2012; Singh 2014)

Methodological pitfalls: “Bilingual” task

One language is task relevant, the other is covertly activated

Implicit naming task

Mani and Plunkett (2010, 2011)

Implicit naming task

Mani and Plunkett (2010, 2011)

  • Chance-level target looking in related trials
  • Prime-Target phonological interference
  • Implicit naming of prime pictures

Implicit naming task

Study 2: Design

Study 2: Design

Study 2: Design

Study 2: Design

Cross-language priming effects are short-lived

Change in design:

Auditory label before target-distractor images

Increased temporal proximity of prime and target

Study 2: Design

Study 2: Design

Data collection timeline

Predictions

Exp. 1: Monolinguals

Replicate within-language phonological interference from Mani and Plunkett (proof of concept)

Exp. 2: Monolinguals and bilinguals

If language non-selectivity, stronger interference in cognate vs. non-cognate trials

Data collection timeline

Experiment 1: Results, Bayesian GAMMs

English monolinguals

79 participants, 89 sessions

No evidence of phonological priming

Related trials \(\approx\) Unrelated trials

Experiment 2: Results, Bayesian GAMMs

Catalan/Spanish monolinguals

77 participants, 107 sessions

No evidence of phonological priming

Related trials \(\approx\) Unrelated trials Cognate trials \(\approx\) Non-cognate trials

Experiment 2: Results, Bayesian GAMMs

Catalan/Spanish bilinguals

78 participants, 133 sessions

No evidence of phonological priming

Related trials \(\approx\) Unrelated trials Cognate trials \(\approx\) Non-cognate trials

Discussion


Successful spoken word recognition across ages and language profiles

No evidence of priming effects, within or across languages

Unsuccessful retrieval of prime phonological forms?

Inconclusive results, revise design

General discussion

Summary

Cognateness facilitates word acquisition in the lower-exposure language

Candidate mechanism behind bilingual vocabulary growth

AMBLA: Cross-language accumulation of learning instances

Language non-selectivity in the initial lexicon: Pending testing

Discussion

Discussion

Future steps

  • The impact of cognateness in spoken word recognition: Re-analysing data from Study 2
  • More language pairs (lower overall similarity)
  • Train AMBLA model on vocabulary data

Whats not in this dissertation

Backward Semantic Inhibition

The emergence of inhibitory links in the initial lexicon

Vocabulary growth through the lens of bilingualism

Data collection ongoing

Whats not in this dissertation

Translation Elicitation

Levenshtein distance as a valid measure of word-level effects of phonological similarity

Monolingual participants listening to a non-native language

jtracer package

Methodological contributions

  • Sample size (N > 400)
  • Bayesian modelling: Quantifying uncertainty, estabilising statistical inference
  • Barcelona Vocabulary Questionnaire (BVQ)

bvq package +

Thank you!

Thank you!

Thank you!

Thank you!

Thank you!

Appendix

Introduction: Bilingualism

Classification of participants into monolinguals an bilinguals

Introduction: Cognate contents in the aggregated vocabulary

Cognate contents in the aggregated vocabulary

Study 1: Posterior regression coefficients

Aggregated vocabularies might conceal facilitation effects

Study 1: MCMC convergence (\(\hat{R}\))

MCMC convergence for the model in Study 1

Study 2: Predictions

  • Successful spoken word recognition across groups
  • If language non-selectivity, stronger interference in cognate vs. non-cognate trials

Study 2: Vocabulary size

Study 2 participant receptive vocabulary sizes across ages and language profiles

Study 2: Model convergence (Exp. 1)

MCMC convergence for model in Study 1 (Exp. 1)

Study 2: Model convergence (Exp. 2)

MCMC convergence for model in Study 2 (Exp. 1)

References

Bergelson, Elika, and Daniel Swingley. 2012. “At 69 Months, Human Infants Know the Meanings of Many Common Nouns.” Proceedings of the National Academy of Sciences 109 (9): 3253–58. https://doi.org/10.1073/pnas.1113380109.
Bosch, Laura, and Marta Ramon-Casas. 2014. “First Translation Equivalents in Bilingual Toddlers’ Expressive Vocabulary: Does Form Similarity Matter?” International Journal of Behavioral Development 38 (4): 317–22. https://doi.org/10.1177/0165025414532559.
Fenson, Larry, Philip S Dale, J Steven Reznick, Elizabeth Bates, Donna J Thal, Stephen J Pethick, Michael Tomasello, Carolyn B Mervis, and Joan Stiles. 1994. “Variability in Early Communicative Development.” Monographs of the Society for Research in Child Development 59 (5): 1–185. https://doi.org/10.2307/1166093.
Frank, Michael C., Mika Braginsky, Daniel Yurovsky, and Virginia A. Marchman. 2017. “Wordbank: An Open Repository for Developmental Vocabulary Data.” Journal of Child Language 44 (3): 677–94. https://doi.org/10.1017/s0305000916000209.
Kachergis, George, Virginia A. Marchman, and Michael C. Frank. 2022. “Toward a Standard Model’ of Early Language Learning.” Current Directions in Psychological Science 31 (1): 20–27. https://doi.org/10.1177/09637214211057836.
Mitchell, Lori, Rachel Ka-Ying Tsui, and Krista Byers-Heinlein. 2023. “Cognates Are Advantaged over Non-Cognates in Early Bilingual Expressive Vocabulary Development.” Journal of Child Language, 1–20.
Singh, Leher. 2014. “One World, Two Languages: Cross-Language Semantic Priming in Bilingual Toddlers.” Child Development 85 (2): 755–66. https://doi.org/10.1111/cdev.12133.
Siow, Serene, Nicola A Gillen, Irina Lepadatu, Daniela S Avila-Varela, Gonzalo Garcia-Castro, Nuria Sebastian-Galles, and Kim Plunkett. 2022. “The Effect of Cognates on Bilingual Infant Vocabulary Trajectories: A Study Using Bilingual CDIs of English and One Additional Language.” In Proceedings of the 46th Annual Boston University Conference on Language Development. Boston, MA.
Tan, Alvin WM, Virginia A Marchman, and Michael C Frank. 2024. “The Role of Translation Equivalents in Bilingual Word Learning.” Developmental Science, e13476.
Tincoff, Ruth, and Peter W Jusczyk. 1999. “Some Beginnings of Word Comprehension in 6-Month-Olds.” Psychological Science 10 (2): 172–75. https://doi.org/10.1111/1467-9280.00127.
Von Holzen, Katie, and Nivedita Mani. 2012. “Language Nonselective Lexical Access in Bilingual Toddlers.” Journal of Experimental Child Psychology 113 (4): 569–86. https://doi.org/10.1016/j.jecp.2012.08.001.