Thesis proposals

Here's a list of thesis proposed by the members of the lab. These are basically ideas from which develop a more structured work once the topic of interest is selected. Basic knowledge of deep learning and computer vision is required to engage with these topics effectively.

Multimodal 360° Urban Scene Understanding

Keywords: 360° vision, vision-language models, panoramic scene interpretation

Abstract: This thesis adapts multimodal models and datasets to interpret 360° panoramic images paired with captions, object descriptions, and spatial questions. We collect annotated scnarios using the Insta360 Pro camera.

Supervisors: Irene Amerini (amerini@diag.uniroma1.it), Claudio Schiavella (schiavella@diag.uniroma1.it), Simone Teglia (teglia@diag.uniroma1.it)

References:

Li et al., BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Language Models, 2023.
Tran et al., 360° Depth Estimation with Pseudo Ground Truth, ICCV 2021.
Tsai et al., Multimodal Transformers for Grounding and QA, CVPR 2021.

Understanding Representations in Optimized Vision Transformers

Keywords: information theory, entropy estimation, efficient architectures

Abstract: This thesis investigates how optimizing different parts of transformer-based models affects the quality of learned representations. Focusing on encoder or decoder embeddings in monocular depth estimation, the study compares encoder, decoder, and full-model optimization. Insights can be extended to different computer vision tasks.

Supervisors: Irene Amerini (amerini@diag.uniroma1.it), Paolo Russo (russo@diag.uniroma1.it), Claudio Schiavella (schiavella@diag.uniroma1.it), Lorenzo Cirillo (cirilo@diag.uniroma1.it)

References:

Kingma et al., Auto-Encoding Variational Bayes, 2013.
Tishby et al., Deep learning and the information bottleneck principle. 2015.

Architectural vs General Optimizations in Vision Transformers

Keywords: attention optimization, efficient architectures, model compression

Abstract: This thesis compares two key approaches to optimizing Vision Transformers: architectural changes (e.g., attention modifications) and model compression techniques (e.g., pruning, quantization, distillation). It evaluates their interaction, robustness to compression, and the efficiency-accuracy trade-off across different networks and tasks.

Supervisors: Irene Amerini (amerini@diag.uniroma1.it), Claudio Schiavella (schiavella@diag.uniroma1.it), Lorenzo Cirillo (cirilo@diag.uniroma1.it)

References:

Papa et al., A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking, 2023.
Schiavella et al.,Optimize vision transformer architecture via efficient attention modules: a study on the monocular depth estimation task. 2024.

Auxiliary Supervision for Conditional Diffusion in Vision Tasks

Keywords: diffusion models, conditional generation, auxiliary tasks

Abstract: This thesis explores how vision-based conditioning (e.g., depth, segmentation) can improve the quality and efficiency of diffusion models. The work investigates whether semantic supervision can enable smaller, faster conditional diffusion models. Experiments focus on different vision tasks.

Supervisors: Irene Amerini (amerini@diag.uniroma1.it), Claudio Schiavella (schiavella@diag.uniroma1.it)

References:

Ho et al., Denoising Diffusion Probabilistic Models, NeurIPS 2020.
Zhan et al., Conditional Image Synthesis with Diffusion Models: A Survey, 2024.

Lightweight Computer Vision for Silkworm Feeding State Inference

Keywords: unsupervised segmentation, green AI, low-resource vision

Abstract: This thesis proposes a lightweight computer vision pipeline to analyze silkworm farming and infer feeding states from RGB video. The method is optimized for edge devices like NVIDIA Jetson. Evaluated in real-world conditions, the approach supports agricultural automation and advances the visual understanding of dense biological systems.

In collaboration with: Tecnoseta SRL (https://www.tecnoseta.com/)

Supervisors: Irene Amerini (amerini@diag.uniroma1.it), Claudio Schiavella (schiavella@diag.uniroma1.it)

References:

Wang et al., Unsupervised Learning of Object Segmentation from Video, NeurIPS 2021.
Papa et al., A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking, 2023.

Simulation of a Mulberry Plantation or Silkworm Farming

Keywords: dataset development, 3D simulation, green AI

Abstract: This thesis proposes the development of a 3D dataset for agricultural applications, featuring both point clouds and 3D mesh models. The dataset will support use cases such as simulation, virtual and augmented reality, and machine learning in smart farming scenarios.

In collaboration with: Tecnoseta SRL (https://www.tecnoseta.com/)

Supervisors: Irene Amerini (amerini@diag.uniroma1.it), Claudia Melis Tonti (melistonti@diag.uniroma1.it), Claudio Schiavella (schiavella@diag.uniroma1.it)

References:

Chang et al., Shapenet: An information-rich 3d model repository, 2015.

Page updated

Report abuse