DICE: Tuning-Free Dynamic High-Fidelity Identity Customization and Enhancement using Multi-Modal Contrastive Fusion for Consumer Devices

Research output: Contribution to journalArticlepeer-review

Abstract

High Fidelity (HiFi) identity customization with text-to-image generation has gained a lot of interest from all four quadrants, such as industries, consumers, researchers, and digital content creators. Such generational models are capable of personalizing images with pretrained diffusion models without extensive fine-tuning. However, existing works often compromise HiFi or generative behavior of the original model due to computational constraints associated with training identity customization on consumer electronic devices. Furthermore, when using auxiliary images for fusion, existing models often compromise the identity customization. In this regard, we propose Dynamic high-fidelity Identity Customization and Enhancement (DICE) that integrates a vision transformer (ViT), specifically dealing with facial and non-facial images to extract semantic features, a dynamic and multi-model contrastive fusion strategy, denoising diffusion model, and a composite loss function. The DICE leverages evolved feature extraction, multi-scale feature fusion, adaptive contrastive paths, and adaptive composite loss to achieve high fidelity, editability, and minimal refinement to the base model even for the fusion of base image with the auxiliary one. Such tuning-free identity customization is appropriate for the consumers on their resource constrained electronic devices, as it requires no retraining, shifting the computational burden to a one-time, server-side training process. Experiments demonstrate that DICE outperforms existing state-of-the-art methods while offering a flexible solution for personalized image generation.

Original languageEnglish
JournalIEEE Transactions on Consumer Electronics
DOIs
Publication statusAccepted/In press - 2025
Externally publishedYes

Keywords

  • Diffusion Models
  • Identity Customization
  • Multi-Modal Contrastive Fusion
  • Personalized Image Generation
  • Vision Transformers

Fingerprint

Dive into the research topics of 'DICE: Tuning-Free Dynamic High-Fidelity Identity Customization and Enhancement using Multi-Modal Contrastive Fusion for Consumer Devices'. Together they form a unique fingerprint.

Cite this