What makes an image beautiful? A new dataset enables the study of visual aesthetics with reduced interpretative interference
- Researchers from CVC and UAB released the Minimum Semantic Content (MSC), an open resource to analyse how low-level visual features influence aesthetic perception.
- Published in Scientific Data, the work introduces a more controlled dataset to examine the relationship between visual features and aesthetic preference with fewer interpretative interferences.
Why do some images appear beautiful while others do not? Research in empirical aesthetics has long sought to answer this question by analysing large image datasets and comparing their visual properties with people’s ratings. However, most existing datasets combine two levels of information that are difficult to disentangle: low-level visual features and high-level semantic content.
Low-level features include purely perceptual properties such as contrast, luminance, saturation, texture, symmetry, or spatial structure. In contrast, semantic content refers to the interpretation of the scene, including the presence of people, objects, actions, or cultural meaning. This combination gives rise to the well-known semantic gap, the discrepancy between the visual information contained in an image and the meaning attributed to it by human observers. This effect can introduce biases in predicting aesthetic preference, as ratings often depend more on meaning than on visual properties themselves.
The new Minimum Semantic Content (MSC) dataset has been developed to address this issue and enable a more controlled study of visual aesthetics. Created by researchers from the Computer Vision Center (CVC), the Autonomous University of Barcelona (UAB), and the University of St Andrews, MSC comprises 10,426 images of natural scenes with reduced and homogenised semantic content. Each image received 100 individual aesthetic ratings from naïve observers, drawn from a pool of approximately 10,000 participants through crowdsourcing. The work has been published as a Data Descriptor in Scientific Data.
The goal of MSC is to more precisely disentangle the contribution of low-level visual features—directly measurable image properties—from the influence of high-level meaning, which tends to introduce subjective and cultural variability. To achieve this, the dataset deliberately excludes images containing people, animals, man-made objects, text, or strongly symbolic elements, focusing instead on natural scenes such as vegetation, rocks, skies, or water surfaces. According to the research team, this design enables a closer approximation to the perceptual basis of visual aesthetics by reducing interpretative influences. “Most datasets used in computational aesthetics are not neutral: they contain semantic cues and cultural meanings that can strongly influence how people judge beauty. With MSC, we aimed to create a resource that better isolates the perceptual component of aesthetic experience,” explains Dr Alejandro Párraga, researcher at CVC and UAB and author of the study.

Figure 1. Examples of images from the Minimum Semantic Content (MSC) dataset.
The dataset also includes “beautified” and “uglified” versions of a subset of images, generated using a custom image manipulation tool called Uglifier, which systematically modifies visual properties without introducing new semantic content. This approach expands the range of aesthetic preferences and improves coverage across the full spectrum of ratings. “This does not completely eliminate semantics, but it reduces and homogenises it enough to allow more precise questions about the relationship between image statistics and aesthetic preference,” adds Dr Párraga.
Validation results show that this redistribution has a substantial impact on computational analyses, explains Olivier Penacchio, also a researcher at CVC and UAB and author of the study: compared to traditional datasets, MSC alters results in 84% of the image metrics analysed, and in approximately 20% of cases, the relationship between visual features and aesthetic preference is reversed. These findings suggest that biases may have influenced some previous conclusions in computational aesthetics in dataset composition and distribution.

Figure 2. Before/after “uglifier” example
Overall, MSC provides a new foundation for studying aesthetic perception with stricter control over semantic content. By reducing and homogenising interpretative elements, the dataset enables a more precise analysis of how visual properties contribute to aesthetic preference, offering a valuable resource for investigating the perceptual mechanisms underlying visual beauty.
The dataset is openly available through the Open Science Framework and includes images, aesthetic ratings, and validation software. The images are either in the public domain or distributed under licenses that allow reuse and redistribution, facilitating adoption by the research community.
Reference article: Penacchio, O., Javed, A., Raducanu, B. et al. The Minimum Semantic Content (MSC) Dataset: A Large, Balanced Resource for Computational Aesthetics Research. Sci Data 13, 470 (2026). https://doi.org/10.1038/s41597-026-06816-0