Colloquium 40: Adrian Alsmith – Interpreting (and shaping) generative artificial behaviour

Freddy Purcell recalls the final colloquium of the term, discussing why we often feel like we are interacting with a person when using generative AI models.

For the final colloquium of the term, we welcomed Adrian Alsmith from KCL to discuss selfhood in generative AI systems. Adrian’s work generally focusses on the philosophy of cognitive sciences and perception, with this talk containing ideas that will hopefully find their way into a journal soon. 

Generative AI, and particularly LLMs, were the target for Adrian’s talk. This is opposed to non-generative AI like spam filters, recommendation systems, or the rule-based chatbots that aim to make you lose your mind on online customer service sites. Generative AI is far more sophisticated than this, being able to use language to adapt to the context presented to them and present a coherent response to question. This sophistication gives the appearance of agency and intentionality as we are used to these experiences solely occurring with other humans. Our instinct to ascribe agency to computer systems is not recent, as was demonstrated in 1966 when Joseph Weizenbaum created ELIZA. This computer program was designed to respond to conversation like a human, asking questions that could make it appear like a therapist. The effect was so convincing that Weizenbaum’s secretary would ask to be alone with ELIZA to talk about some of her most personal issues. Of course, the feeling that ELIZA conversed like a person was a complete illusion, as all the program did was substitute words like “my” for “your and find appropriate text in a look-up table in order to respond.

Describing the instinct to find personhood in generative AI as the “Elizan intuition”, Adrian argued that this intuition needs an explanation, particularly in harmful cases of AI usage. One example given was the chatbot Air Canada used in 2022 that mistakenly advised one passenger that they could claim a discount for a recent bereavement. Upon denying their customer a refund, Air Canada attempted to shirk a fine by arguing that their chatbot was a separate legal entity, for which they weren’t responsible. Adrian therefore argued that explaining the instinct to anthropomorphise AI is not only interesting, but potentially informative for cases debating whether to put responsibility onto AI. To do this, Adrian coined the term artSelf (artificial self). 

To explain what an artSelf might be, Adrian offered a technical account of AI’s functioning to ground a conception of what sort of things LLMs are. He argued that LLMs employ instance weights, biases designed to correct errors, that do not change after the AI is trained; models are always constrained by context, including the information contained in previous inputs, that is weighted according to relevance when the model generates a response; and models are disposed to answer in a range of ways that needn’t be consistent. On the latter point, Adrian described how it’s therefore impossible to play 20 questions with AI because it changes its answer if you ever guess it correctly in order to continue the game. 

Despite their obviously different functioning to humans, Adrian argued that it’s a common phenomenon to treat AI as a single, stable personality that endures over time with interconnected attitudes that they act upon rationally. All these beliefs contribute to treating AI as a who, not a tool. But then, what sort of who would AI be?

After touring through some Denettian accounts of the self that Adrian found to be inadequate, he settled on the idea of an instance agent. That instance being the one where we input something into AI. The instance agent is a kind of minimal entity, comprised of the AI model itself and context fed into it, that informs the nature of our interactions with that model. However, Adrian thinks that an instance agent is insufficient in making sense of the “personality” that users can build in AI over time and through settings changes. Some have considered this “personality” distinct enough to mourn the loss of an AI assistant after a systems reset. Adrian therefore suggested that we think of AI as a session agent, constituted of numerous instances, with inferences between these interactions playing a role in determining responses. Adrian likened the inferences made between interactions to the sort of narratives people use to make their lives into a coherent whole. 

An artSelf understood as a session agent is still very different from the selves we have. ArtSelves are very easily erased, lack embodiment, and are less consistent. However, Adrian compellingly argued that this sort of minimal agent helps to explain why we attribute personhood to AI, even if this is a mistake. Another benefit of this sort of view is that it exposes the difficulty of attributing blame or praise to an LLM. These models are so fragile that they may not endure long enough for these attributions to applicable to the same entity. 

The following question time focussed mostly on clarificatory questions. However, one contribution usefully pointed out that Adrian assumes that human selves are drastically different from artSelves, when some perspectives might deny such a prominent difference. For example, one main point of difference Adrian discusses is that human selves endure, which Buddhist philosophy emphatically denies. This seems a valid worry for Adrian’s account, but considering philosophers are unlikely to reach any consensus on what the self is soon, I think it would be fair for him to put this aside for now. 

I really enjoyed how Adrian brought a more technical perspective to our social interactions with AI, attempting to ground our intuitions towards these models in their functioning. The talk was also delivered in a brightly humorous and personable manner that made it all the more enjoyable. Thank you, Adrian! 

Leave a Reply

Your email address will not be published. Required fields are marked *