Pool of words — pool • multilex

A dataset containing candidate words to be included in the questionnaires with some lexical properties. Transcriptions were (a) generated manually, (b) retrieved from Wiktionary (Catalan words), or (c) generated using TraFo. All transcriptions have been manually double-checked and fixed if necessary.

pool

Format

A data frame with 1601 rows and 20 variables:

item: item label, as indicated in the formr survey spreadsheets, items are unique within and across questionnaires
te: index associated to translation equivalents across languages
language: language the item belongs to
category: semantic/functional category the items belongs to
class: Funcional category (verb, nouns, adjective, etc.)
label: item label, as presented to participants in the front-end of the questionnaire, some labels are not unique within or across quesitonnaires
ipa: phonological transcription in IPA format
ipa_flat: phonological transcription in IPA format, without special characters (ready to compute distance metrics)
label_subtlex: word label, as included in the corresponding version of SUBTLEX
frequency_million: lexical frequency (in counts per million score) retrieved from the corresponding version of SUBTLEX
frequency_zipf: lexical frequency (in Zipf score) retrieved from the corresponding version of SUBTLEX
cognate: cognate status, manually coded
include: should this item be included in analyses?
version: what short version of the questionnaire does this item appear on?
comments: additional comments to the item