Probabilistic Phonology: Formal Analysis & Empirical Assessment of Two Theories
Phonology is the sub-discipline of linguistics that studies the sound systems of languages. Over the past two decades, phonology has taken a probabilistic turn. Categorical data collected through field work and introspection are now routinely complemented with probabilistic data from corpora and experiments. What is the correct probabilistic model of natural language phonology? We distinguish two types of models: INTRINSICALLY PROBABILISTIC MODELS where phonological grammars directly assign probabilities to linguistic forms and GRAMMAR SAMPLING MODELS that derive probabilistic behavior indirectly from the assumption that speakers maintain probability distributions over categorical grammars. In earlier work, we have defended grammar sampling models based on typological and learnability evidence. Our conclusion is challenged by a new generalization in favor of intrinsically probabilistic models based on their apparently better fit to usage data (Hayes 2022). We have three main goals: (i) Theoretical goal: Examine the new generalization from the grammar sampling perspective; (ii) Empirical goal: Examine evidence from several languages to judge how well the new generalization holds up in new data sets; (iii) Formal goal: Study the general properties of probabilistic frameworks in order to understand when the new generalization holds and how it is related to grammar sampling models.