A dataset containing candidate words to be included in the questionnaires with some lexical properties. Transcriptions were (a) generated manually, (b) retrieved from Wiktionary (Catalan words), or (c) generated using TraFo. All transcriptions have been manually double-checked and fixed if necessary.
pool
A data frame with 1601 rows and 20 variables:
item label, as indicated in the formr survey spreadsheets, items are unique within and across questionnaires
index associated to translation equivalents across languages
language the item belongs to
semantic/functional category the items belongs to
Funcional category (verb, nouns, adjective, etc.)
item label, as presented to participants in the front-end of the questionnaire, some labels are not unique within or across quesitonnaires
phonological transcription in IPA format
phonological transcription in IPA format, without special characters (ready to compute distance metrics)
word label, as included in the corresponding version of SUBTLEX
lexical frequency (in counts per million score) retrieved from the corresponding version of SUBTLEX
lexical frequency (in Zipf score) retrieved from the corresponding version of SUBTLEX
cognate status, manually coded
should this item be included in analyses?
what short version of the questionnaire does this item appear on?
additional comments to the item