online read us now
Paper details
Number 2 - June 2024
Volume 34 - 2024
Learning abstract visual reasoning via task decomposition: A case study in Raven progressive matrices
Jakub Kwiatkowski, Krzysztof Krawiec
Abstract
Learning to perform abstract reasoning often requires decomposing the task in question into intermediate subgoals that
are not specified upfront, but need to be autonomously devised by the learner. In Raven progressive matrices (RPMs), the
task is to choose one of the available answers given a context, where both the context and answers are composite images
featuring multiple objects in various spatial arrangements. As this high-level goal is the only guidance available, learning to
solve RPMs is challenging. In this study, we propose a deep learning architecture based on the transformer blueprint which,
rather than directly making the above choice, addresses the subgoal of predicting the visual properties of individual objects
and their arrangements. The multidimensional predictions obtained in this way are then directly juxtaposed to choose the
answer. We consider a few ways in which the model parses the visual input into tokens and several regimes of masking
parts of the input in self-supervised training. In experimental assessment, the models not only outperform state-of-the-art
methods but also provide interesting insights and partial explanations about the inference. The design of the method also
makes it immune to biases that are known to be present in some RPM benchmarks.
Keywords
abstract visual reasoning, Raven progressive matrices, machine learning, problem decomposition