Cognate beginnings to bilingual lexical acquisition

Link Contents
Website Instructions for reproducibility, data dictionaries, lab notes
PsyArxiv Preprint and figures
GitHub Code, preprint and figures
OSF Code, preprint, and results (model outputs)
Docker Docker image with reproducible RStudio session

Repository structure and files đź“‚

This repository is organised as follows:

  • data: processed data in CSV format
    • items.csv: information about words included in the analyses
    • participants.csv: information about participants
    • responses.csv: participant responses to the items. The model was fit on this dataset.
  • data-raw: raw data from the Barcelona Vocabulary Questionnaire, BVQ. This is a RDS file containing a list of data frames with all the information necessary to generate the datasets in the data/ directory.
  • docs: source code to generate the documentation site of the project (cognate-beginnings).
  • manuscript: Quarto document with the source code of the manuscript and appendix
  • R: R functions used in the targets to process and analyse the data.
    • items.R: to generate items.csv
    • models.R: to fit the Bayesian model and extract posterior draws
    • participants.R: to generate participants.csv
    • predictions.R: to generate posterior predictions from the model
    • utils.R: helper functions and wrappers used across the project
  • renv: internal settings to ensure reproducibility of the computing environment.
  • results: model outputs. You will need to run the code to generate the files that will be contained in this directoty.
    • fits: RDS files with the brmsfit of the Bayesian models
    • posterior: CSV files with the posterior draws of the population-level and group-level coefficients
    • predictions: CSV files with the posterior predictions
  • src: R functions to make programming tasks easier, not needed to reproduce the project.
  • Stan: Stan code of the models, as generated by brms::stancode().
  • tests: testthat scripts used to unit test the functions used across the project.

Abstract

Bilingual infants’ developmental trajectories of lexical acquisition are equivalent to their monolingual peers’. This is remarkable, given the complexity of their linguistic input. Recent studies suggest that bilingual vocabulary growth is boosted by the number of cognates (form-similar translation equivalents) shared by the pair of languages being learned, and that this cognateness facilitation effect is driven by a stronger parallel activation of cognates during linguistic exposure, compared to non-cognates. The mechanisms behind this facilitation are still unclear. In this study, we propose an account of bilingual lexical acquisition in which parallel activation increases the rate at which children accumulate learning instances for words in both languages, even in fully monolingual situations. We predicted a stronger cognate facilitation for words to which children were exposed less frequently (low-exposure words), as they are co-activated by their translation more often than high-exposure words. We developed an extensive online vocabulary checklist, the Barcelona Vocabulary questionnaire (BVQ), to collect vocabulary data from 366 Catalan-Spanish bilingual toddlers aged 12 to 32 months. We used Bayesian explanatory item response theory to model the acquisition trajectories of 604 Catalan and Spanish words. We found an interaction between exposure and cognateness, suggesting that cognateness facilitates the aquisition of low-exposure words, but not of mean exposure or high-exposure words. Overall, our findings suggest that cognateness plays a key role in bilingual lexical acquisition, and provide evidence for a frequency-mediated facilitation effect driven by parallel activation.

Acknowledgements

The authors declare no conflicts of interest with regard to the funding source of this study. This study was supported by the Spanish Ministry for Science and Innovation and State Research Agency (Project PID2021- 123416NB-I00 financed by MCIN/ AEI/ 10.13039/501100011033 / FEDER, UE) and the Economic and Social Research Council (ESRC) (ES/S010947/1, UK). GGC was supported by a FPI research contract (PRE2019-088165). DAV was supported by the European Union’s Horizon 2023 research and innovation program under Marie Skłodowska–Curie Grant (765556) and a postdoctoral fellowship from the Foundation for Science and Technology of Portugal (UIDB/00214/2020). IC was supported by the Investigo program funded by the European Union’s NextGenerationEU (NGEU) recovery plan. NSG was supported by an ICREA Academia award from the Catalan Institution for Research and Advanced Studies (ICREA). We are grateful to Chiara Santolin, Ege E. Özer, and the rest of the Speech Acquisition and Perception research group, and to Alicia Franco-Martínez and Cristina Rodríguez-Prada, for their helpful feedback. We thank Xavier Mayoral, Silvia Blanch, and Cristina Cuadrado for their technical support, and Cristina Dominguez and Katia Pistrin for their efforts in recruiting infants. We also thank all families and infants who participated in the experiments.