A dataset containing candidate words to be included in the questionnaires with some lexical properties. Transcriptions were (a) generated manually, (b) retrieved from Wiktionary (Catalan words), or (c) generated using TraFo. All transcriptions have been manually double-checked and fixed if necessary.

pool

Format

A data frame with 1601 rows and 20 variables:

item

item label, as indicated in the formr survey spreadsheets, items are unique within and across questionnaires

te

index associated to translation equivalents across languages

language

language the item belongs to

category

semantic/functional category the items belongs to

class

Funcional category (verb, nouns, adjective, etc.)

label

item label, as presented to participants in the front-end of the questionnaire, some labels are not unique within or across quesitonnaires

ipa

phonological transcription in IPA format

ipa_flat

phonological transcription in IPA format, without special characters (ready to compute distance metrics)

label_subtlex

word label, as included in the corresponding version of SUBTLEX

frequency_million

lexical frequency (in counts per million score) retrieved from the corresponding version of SUBTLEX

frequency_zipf

lexical frequency (in Zipf score) retrieved from the corresponding version of SUBTLEX

cognate

cognate status, manually coded

include

should this item be included in analyses?

version

what short version of the questionnaire does this item appear on?

comments

additional comments to the item