Publications

You can also find my articles on my Google Scholar profile.

Journal Articles

Learning to mask and permute visual tokens for vision transformer pre-training

Published in Computer Vision and Image Understanding, 2025

A novel approach for pre-training Vision Transformers using masking and permutation strategies.

Recommended citation: L. Baraldi, R. Amoroso, M. Cornia, A. Pilzer, R. Cucchiara (2025). "Learning to mask and permute visual tokens for vision transformer pre-training." Computer Vision and Image Understanding, 104294.
Download Paper

Conference Papers

What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models

Published in ICCV, 2025

A novel methodologies to evaluate edited images

Recommended citation: L.Baraldi, D. Bucciarelli, F. Betti, M. Cornia, L. Baraldi, et al. (2025). "What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models." arXiv preprint 2505.20405.
Download Paper

Adapt to Scarcity: Few-Shot Deepfake Detection via Low-Rank Adaptation

Published in International Conference on Pattern Recognition, 2025

A novel approach for few-shot deepfake detection using low-rank adaptation techniques.

Recommended citation: S. Cappelletti, L. Baraldi, F. Cocchi, M. Cornia, R. Cucchiara (2025). "Adapt to Scarcity: Few-Shot Deepfake Detection via Low-Rank Adaptation." International Conference on Pattern Recognition, 111-126.
Download Paper

Optimizing Resource Consumption in Diffusion Models through Hallucination Early Detection

Published in European Conference on Computer Vision Workshops (ECCVW), 2024

A method for early detection of hallucinations in diffusion models to optimize resource consumption.

Recommended citation: F. Betti, L. Baraldi, R. Cucchiara, N. Sebe (2024). "Optimizing Resource Consumption in Diffusion Models through Hallucination Early Detection." arXiv preprint arXiv:2409.10597.
Download Paper

The revolution of multimodal large language models: a survey

Published in Findings of the Association for Computational Linguistics (ACL F), 2024

A comprehensive survey on multimodal large language models.

Recommended citation: D. Caffagni, F. Cocchi, L. Barsellotti, N. Moratelli, S. Sarto, L. Baraldi, et al. (2024). "The revolution of multimodal large language models: a survey." arXiv preprint arXiv:2402.12451.
Download Paper

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities

Published in European Conference on Computer Vision (ECCV), 2024

This paper presents a novel approach to deepfake detection using contrastive learning.

Recommended citation:
Download Paper

Let’s ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation

Published in ACM International Conference on Multimedia, 2023

A novel approach for evaluating image generation by mimicking human cognitive behavior.

Recommended citation: F. Betti, J. Staiano, L. Baraldi, L. Baraldi, R. Cucchiara, N. Sebe (2023). Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation. Proceedings of the 31st ACM International Conference on Multimedia, 9306-9312.
Download Paper

Unveiling the impact of image transformations on deepfake detection: An experimental analysis

Published in International Conference on Image Analysis and Processing (ICIAP), 2023

An analysis of how image transformations affect deepfake detection performance.

Recommended citation: F. Cocchi, L. Baraldi, S. Poppi, M. Cornia, L. Baraldi, R. Cucchiara (2023). "Unveiling the impact of image transformations on deepfake detection: An experimental analysis." International Conference on Image Analysis and Processing, 345-356.
Download Paper

Lorenzo Baraldi

Publications

Journal Articles

Conference Papers