🧙‍♂️ MILD: Multi-Layer Diffusion Strategy for Multi-IP Aware Human Erasing

MILD Human Erasing Results

Interactive demonstration showing original images and MILD processing results. Each frame shows the input image followed by the human erasing result, showcasing our method's effectiveness across different scenarios.

Abstract

Recent years have witnessed the success of diffusion models in image customization tasks. However, existing mask-guided human erasing methods still struggle in complex scenarios such as human–human occlusion, human–object entanglement, and human–background interference, mainly due to the lack of large-scale multi-instance datasets and effective spatial decoupling to separate foreground from background. To bridge these gaps, we curate the MILD dataset capturing diverse poses, occlusions, and complex multi-instance interactions. We then define the Cross-Domain Attention Gap (CAG), an attention-gap metric to quantify semantic leakage. On top of these, we propose Multi-Layer Diffusion (MILD), which decomposes the generation process into independent denoising pathways, enabling separate reconstruction of each foreground instance and the background. To enhance human-centric understanding, we introduce Human Morphology Guidance, a plug-and-play module that incorporates pose, parsing, and spatial relationships into the diffusion process to improve structural awareness and restoration quality. Additionally, we present Spatially-Modulated Attention, an adaptive mechanism that leverages spatial mask priors to modulate attention across semantic regions, further widening the CAG to effectively minimize boundary artifacts and mitigate semantic leakage.Experiments show that MILD significantly outperforms existing methods.

Diffusion Models Human Erasing Multi-IP Computer Vision Image Inpainting Deep Learning
Before
Before
Click ERASE! to see result
After
Before
Before
Click ERASE! to see result
After
Before
Before
Click ERASE! to see result
After
Before
Before
Click ERASE! to see result
After

Flex Composer

Input Image
ip1 ip2 ip3 Background
Input Image
Chair Desk Cat Background
Click on layer thumbnails to add them to the canvas

Layer Management

No layers in canvas
Try More!