Software & Apps

InstantX-Research / InstantStyle: InstantSystyle: Free lunch to Style Style Text-to-Image Generation πŸ”₯

InstantStyle is a general framework that uses two straightforward techniques with an effective steward and content disability from reference images.

Separate the content from the image. Benefit from the good again of the global clip parts, after the edge of the inside of the image, the style and content can be clearly specified. Although simple, this strategy is very effective to lighten the edge leak.

Inject the style block only. Empirically, each layer of a deep network takes different semantic information on the key observation of our work is that there are two specific attention layers of management. Specifically, we find blocks.0.Attentions.1 and in blocks.2.attentions.1 Taking the taking style (color, spatial use) and structure).

  • (2024/07/06) πŸ”₯ We are released CSGO Page for the internal surface. The code will be released soon.
  • (2024/07/01) πŸ”₯ We are living InstantStyle-Plus Report for storage in the room.
  • (2024/04/29) πŸ”₯ We support InstantStystyynyny of Diffusers, Use can be found HERE
  • (2024/04/24) πŸ”₯ Instantsystyle for the generation of fast, find demos in InstantStyle-SDXL lightning and InstantStyle-Hyper-SDXL.
  • (2024/04/24) πŸ”₯ We support HIDIOMANUSION For generating high images, find additional information HERE.
  • (2024/04/23) πŸ”₯ Instantsystyle are indigenously supported by various information, the more found HERE.
  • (2024/04/20) πŸ”₯ InstantStyle supported by Mikubill / SD-Webei-Controlnet.
  • . Check it out HERE.
  • (2024/04/10) πŸ”₯ We support a Online demo in modeloscope.
  • (2024/04/09) πŸ”₯ We support a Online demo to muggingface.
  • (2024/04/09) πŸ”₯ We support SDXL-Inpainting, additional information can be found HERE.
  • (2024/04/08) πŸ”₯ InstantStyle supported by Anyv2vv For the simplified video-to-video editing, demo can be found HERE.
  • (2024/04/07) πŸ”₯ We support image-based stylization, more information can be found HERE.
  • (2024/04/07) πŸ”₯ We support a version of the experiment for SD1.5, multiple information can be found HERE.
  • (2024/04/03) πŸ”₯ InstantStyyle is supported by Comfyui_ipadapter_plus developed by our co-author.
  • (2024/04/03) πŸ”₯ We raise the Real year.


The stylized synthesis based on image

Comparing past works

FOLLOWED Ip-adapter To download pre-trained checkpoints from HERE.

git clone https://github.com/InstantStyle/InstantStyle.git
cd InstantStyle

# download the models
git lfs install
git clone https://huggingface.co/h94/IP-Adapter
mv IP-Adapter/models models
mv IP-Adapter/sdxl_models sdxl_models

Our procedure is fully conformed Ip-adapter. For reduction in part, it only works for the worldwide part of the patch features. For SD1.5, you will find a demo of Infer_style_sd15.pyBut we know that SD1.5 has a weak understanding and understanding of style information, thus this demo is just experimenting. All block names can be found at attn_blocks.py and Atn_blocks_sd15.py For SDXL and SD1.5 indeed.

import torch
from diffusers import StableDiffusionXLPipeline
from PIL import Image

from ip_adapter import IPAdapterXL

base_model_path = "stabilityai/stable-diffusion-xl-base-1.0"
image_encoder_path = "sdxl_models/image_encoder"
ip_ckpt = "sdxl_models/ip-adapter_sdxl.bin"
device = "cuda"

# load SDXL pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
    base_model_path,
    torch_dtype=torch.float16,
    add_watermarker=False,
)

# reduce memory consumption
pipe.enable_vae_tiling()

# load ip-adapter
# target_blocks=("block") for original IP-Adapter
# target_blocks=("up_blocks.0.attentions.1") for style blocks only
# target_blocks = ("up_blocks.0.attentions.1", "down_blocks.2.attentions.1") # for style+layout blocks
ip_model = IPAdapterXL(pipe, image_encoder_path, ip_ckpt, device, target_blocks=("up_blocks.0.attentions.1"))

image = "./assets/0.jpg"
image = Image.open(image)
image.resize((512, 512))

# generate image variations with only image prompt
images = ip_model.generate(pil_image=image,
                            prompt="a cat, masterpiece, best quality, high quality",
                            negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
                            scale=1.0,
                            guidance_scale=5,
                            num_samples=1,
                            num_inference_steps=30, 
                            seed=42,
                            #neg_content_prompt="a rabbit",
                            #neg_content_scale=0.5,
                          )

images(0).save("result.png")

InstantStyle has already been involved in Full (Please make sure that you install diffusers> = 0.28.0.dev0), making use more simple. You can control each other’s behavior in each ip-adapter in Set_IP_APAPTER_SPEEDE dictionary for the heart test:

from diffusers import StableDiffusionXLPipeline
from PIL import Image
import torch

# load SDXL pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    add_watermarker=False,
)

# load ip-adapter
pipe.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")
pipe.enable_vae_tiling()

# configure ip-adapter scales.
scale = {
    "down": {"block_2": (0.0, 1.0)},
    "up": {"block_0": (0.0, 1.0, 0.0)},
}
pipeline.set_ip_adapter_scale(scale)

In this example. We have set up scale=1.0 For ip-adapter on the second shift down down, block 2, and the second to up-app-ant with a zero scale in all layers.

With the help of set_ip_adapter_scale()We can now configure IP-adapters without having to reload them every time we want to try ip-adapter criteria.

# for original IP-Adapter
scale = 1.0
pipeline.set_ip_adapter_scale(scale)

# for style blocks only
scale = {
    "up": {"block_0": (0.0, 1.0, 0.0)},
}
pipeline.set_ip_adapter_scale(scale)

Large IP-adapter images with masks

You can also load as many IP-adapters, with multiple IP-adapter images with masked layout controls such as Ip-adapter do.

from diffusers import StableDiffusionXLPipeline
from diffusers.image_processor import IPAdapterMaskProcessor
from transformers import CLIPVisionModelWithProjection
from PIL import Image
import torch

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter", subfolder="models/image_encoder", torch_dtype=torch.float16
).to("cuda")

pipe = StableDiffusionXLPipeline.from_pretrained(
    "RunDiffusion/Juggernaut-XL-v9", torch_dtype=torch.float16, image_encoder=image_encoder, variant="fp16"
).to("cuda")

pipe.load_ip_adapter(
    ("ostris/ip-composition-adapter", "h94/IP-Adapter"),
    subfolder=("", "sdxl_models"),
    weight_name=(
        "ip_plus_composition_sdxl.safetensors",
        "ip-adapter_sdxl_vit-h.safetensors",
    ),
    image_encoder_folder=None,
)

scale_1 = {
    "down": ((0.0, 0.0, 1.0)),
    "mid": ((0.0, 0.0, 1.0)),
    "up": {"block_0": ((0.0, 0.0, 1.0), (1.0, 1.0, 1.0), (0.0, 0.0, 1.0)), "block_1": ((0.0, 0.0, 1.0))},
}
# activate the first IP-Adapter in everywhere in the model,
# configure the second one for precise style control to each masked input.
pipe.set_ip_adapter_scale((1.0, scale_1))

processor = IPAdapterMaskProcessor()
female_mask = Image.open("./assets/female_mask.png")
male_mask = Image.open("./assets/male_mask.png")
background_mask = Image.open("./assets/background_mask.png")
composition_mask = Image.open("./assets/composition_mask.png")
mask1 = processor.preprocess((composition_mask), height=1024, width=1024)
mask2 = processor.preprocess((female_mask, male_mask, background_mask), height=1024, width=1024)
mask2 = mask2.reshape(1, mask2.shape(0), mask2.shape(2), mask2.shape(3))   # output -> (1, 3, 1024, 1024)

ip_female_style = Image.open("./assets/ip_female_style.png")
ip_male_style = Image.open("./assets/ip_male_style.png")
ip_background = Image.open("./assets/ip_background.png")
ip_composition_image = Image.open("./assets/ip_composition_image.png")

image = pipe(
    prompt="high quality, cinematic photo, cinemascope, 35mm, film grain, highly detailed",
    negative_prompt="",
    ip_adapter_image=(ip_composition_image, (ip_female_style, ip_male_style, ip_background)),
    cross_attention_kwargs={"ip_adapter_masks": (mask1, mask2)},
    guidance_scale=6.5,
    num_inference_steps=25,
).images(0)
image

High-generation of resolution

We are working HIDIOMANUSION to seamlessly generated high-resolution images, you can install by pip install hidiffusion.

from hidiffusion import apply_hidiffusion, remove_hidiffusion

# reduce memory consumption
pipe.enable_vae_tiling()

# apply hidiffusion with a single line of code.
apply_hidiffusion(pipe)

...

# generate image at higher resolution
images = ip_model.generate(pil_image=image,
                           prompt="a cat, masterpiece, best quality, high quality",
                           negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
                           scale=1.0,
                           guidance_scale=5,
                           num_samples=1,
                           num_inference_steps=30, 
                           seed=42,
                           height=2048,
                           width=2048
                          )

With distributed setups, you can run many GPUs with πŸ€— quick distribution or distributed Pytorch to many prompts, if there is limited VRAM on each GPU. More information can be found HERE. Make sure you have installed the diffusers from the source and the most powerful acceleration.

max_memory = {0:"10GB", 1:"10GB"}
pipe = StableDiffusionXLPipeline.from_pretrained(
    base_model_path,
    torch_dtype=torch.float16,
    add_watermarker=False,
    device_map="balanced",
    max_memory=max_memory
)

Starting a local Gradio Demo

Run the following command:

git clone https://github.com/InstantStyle/InstantStyle.git
cd ./InstantStyle/gradio_demo/
pip install -r requirements.txt
python app.py

Pretrained checkpoints follow the license to Ip-adapter. The freedom is given the users to create images using this tool, but they are obliged to follow local laws and apply them responsible. Developers do not think of any responsibility for the potential misuse of users.

InstantStyle is enhanced by Instantx Team and builds closely Ip-adapterthat is unfairly compared to many other acts. In instantstystyle make ip-adapter well as well. Furthermore, we acknowledge U ye for his valuable discussion.

Star history chart

If you find InstantStyle useful for your research and application, please tell us using BibTex:

@article{wang2024instantstyle,
  title={InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation},
  author={Wang, Haofan and Xing, Peng and Huang, Renyuan and Ai, Hao and Wang, Qixun and Bai, Xu},
  journal={arXiv preprint arXiv:2407.00788},
  year={2024}
}

@article{wang2024instantstyle,
  title={InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation},
  author={Wang, Haofan and Wang, Qixun and Bai, Xu and Qin, Zekui and Chen, Anthony},
  journal={arXiv preprint arXiv:2404.02733},
  year={2024}
}

For any question, feel free to contact us by Hafanwang.ai@gmail.com.




https://opengraph.githubassets.com/1fad242b09905d7cdf0d369f8ac7a271b84d0d93866aba9398e1a8b739570e69/instantX-research/InstantStyle

2025-03-07 02:31:00

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button