Leibniz-Zentrum Allgemeine Sprachwissenschaft Leibniz-Gemeinschaft

Winter mini-course: Learning, compression, and linguistic representations in humans and machines

Organizer(s) Roni Katzir
Affiliaton(s) Tel Aviv University
Start of event 03.01.2024, 11.00 o'clock
End of event 05.01.2024, 16.00 o'clock
Venue ZAS, Pariser Str. 1, 10719 Berlin; Room: 0.32 (Ground floor)

1. Simplicity

Date and time: Wednesday, January 3, 11-13, ZAS room 0.32, Pariser Str. 1

When ChatGPT sees a sequence of strings that all look like abbbcddd and aabccd, it sometimes guesses that the next string will end in an e. Few humans share this guess. Why is that? Relatedly, though it might not seem so: why do we find it surprising when lightning strikes the same place twice? Or when a fair coin keeps coming up heads? And should linguists care?

They should, and in this first meeting of the mini-course I will introduce Minimum Description Length (MDL), a simplicity principle that helps us think about strange and not-so-strange coincidences and about what makes some generalizations better than others. As we will see, MDL provides a possible answer to how humans learn abstract grammars from unanalyzed surface data. It also provides a way for machines to do the same.

MDL offers a view of learning as a form of compression. It considers both the size of the grammar and that of the description of the data given the grammar and attempts to minimize their sum. By doing so, it guides the learner to hypotheses that balance between generality and the need to fit the data. MDL appears to match subjects’ generalization patterns in a variety of tasks and is arguably less stipulative than alternative approaches that have been proposed in the literature. Moreover, in several different domains it has yielded the first implemented learners that assume both realistic linguistic theories and realistic input data. We will walk through a detailed example.

Reading:

2. MDL for theoretical linguistics

Date and time: Wednesday, January 3, 14-16, ZAS room 0.32, Pariser Str. 1

In the second meeting of the mini-course we will see how linguists can use MDL to compare between competing architectures in cases where it is difficult to make progress based on adult grammaticality judgments alone. I will illustrate this with a phonological case study concerning constraints on underlying representations (also known as morpheme-structure constraints), which were central to early generative phonology but rejected in Optimality Theory. Evidence bearing directly on the question of whether the grammar uses constraints on URs has been scarce. I will show, however, that if children are MDL learners, then they will succeed in learning patterns such as English aspiration if they can use constraints on URs but run into difficulties otherwise. While the case study that I will discuss is phonological, the methodology is very general, and I will outline a second case study, in semantics, where MDL allows us to extract divergent empirical predictions from two competing hypotheses from the literature concerning the representation of semantic denotations.

Reading:

3. MDL Artificial Neural Networks: Can neural networks be more like automated scientists and less like stochastic parrots?

Date and time: Friday, January 5, 14-16, ZAS room 0.32, Pariser Str. 1

Since the mid-1980s, artificial neural networks (ANNs) have been trained almost exclusively using a particular learning method that has proven to be very useful for improving how the ANN fits its training data and, in turn, has been instrumental in the impressive engineering successes of ANNs on linguistic tasks over the past decades. ANNs trained using this standard method are typically extremely large, they require huge training corpora, and their inner workings are opaque. They also generalize poorly: they fit the training data too well and fail to extract even elementary regularities. In the third and final meeting of the mini-course we will look at what happens when we replace the standard training approach for neural networks with MDL. 

What happens is that we obtain small, transparent networks that learn complex recursive patterns perfectly and from very little data. These MDL networks help illustrate just how far standard ANNs (even the most successful of them) are from what we would expect from an intelligent system that attempts to extract regularities from the input data: given hundreds of billions of parameters and huge training corpora the performance of standard ANNs is sufficiently good to fool us on many common examples, but even then what the networks offer is just a superficial approximation of the regularities that reveals a complete lack of understanding of what what these regularities actually are. The MDL networks show us that it is possible for ANNs to learn intelligently and acquire systematic regularities perfectly from small training corpora but that this requires a very different learning approach from what current networks are based on.

Reading: