HomoDiffusion 2 HomoVision

2.7K

167

265

HomoVision (I do like cheesy names... sue me :v) Contrary to Homo Diffusion 2 & Homo Diffusion 2 Anime this model aims to be as photorealistic you can be with SD1.5.

Merged in CyberRealistic V3.1, Realistic Vision V3.0 (No VAE), Deliberate V2 and RPG V4 with HomoDiffusion 2.0

Mix is capable of producing same quality images as HomoDiffusion 2.0. Only two differences are the fact that my trained data haven't been converted to more stylistic/illustrative version and due to merging via SmoothAdd MT it inherited more realistic style of all aforementioned models.

Both style of prompting will work i.e anime tags and normal, natural language.

Recommended VAE:

vae-ft-mse-840000-ema-pruned

General Tips

Sentence or prompt structure matters in both SD Next and InvokeAI 3.0. They both handle them differently due to different pipelines, parser and underlining optimisations. Additionally depending on WebUI and it's version some of prompts will be restricted to 75 tokens, meaning anything above that count will be either completely ignored or just split up. Latest version of InvokeAI & SD.Next support up to 225 tokens.

For example:

If you structure your prompt the same way like in first example i.e description of quality, lighting condition, blurriness (Gaussian, Bokeh, depth of field, aperture, focus, ISO, DLSR or mirror-less make & model and so forth), resolution properties, different types of compression artefacts and by-products (i.e JPEG artefacts) and then the subject i.e muscular/skinny/lean male/man/men, facial& body features, clothing preferences, details regarding background you will end up with gens like the one below.

Example no.1:

Prompt:(Realistic), (photorealistic), 4k, best quality, masterpiece, real photos, cinematic lighting, realistic shadow, volumetric lighting, highest detail, ultra-detailed, professional photography, depth of field, bokeh, detailed face, subsurface scattering, realistic hair, realistic eyes, realistic hands, detailed, intricate,  wet, bulge, ?????, from below, pov, lying on bed, ?????, (muscular male, beard, white skin, pale skin, DARKNESS), (solo male, bara, mature, short hair, stubble), handsome, solo male, male focus, cinematic lighting, realistic, detailed background, clear texture, best background, depth of field,light particles,(Balance and coordination between all things),real light and shadow, perspective, composition, adventurous, energy, exploration, contrast, experimental, unique,

But if you restructure the whole sentence and first describe person/objects and background, then add all the other descriptors, there's high chance you'll end up with something complete different.

Example no.1:

wet, bulge, ?????, from below, pov, lying on bed, ?????, (muscular male, beard, white skin, pale skin, DARKNESS), (solo male, bara, mature, short hair, stubble), handsome, solo male, male focus, cinematic lighting, realistic, detailed background, clear texture, best background, depth of field,light particles,(Balance and coordination between all things),real light and shadow, perspective, composition, adventurous, energy, exploration, contrast, experimental, unique, (Realistic), (photorealistic), 4k, best quality, masterpiece, real photos, cinematic lighting, realistic shadow, volumetric lighting, highest detail, ultra-detailed, professional photography, depth of field, bokeh, detailed face, subsurface scattering, realistic hair, realistic eyes, realistic hands, detailed, intricate,

Example no.2:

Prompt:
 wet, bulge, ?????, from below, pov, lying on bed, ?????, (muscular male, beard, white skin, pale skin, DARKNESS), (solo male, bara, mature, short hair, stubble), handsome, solo male, male focus, cinematic lighting, realistic hair, realistic eyes, realistic hands, detailed background, clear texture, best background, light particles, ( Balance and coordination between all things), real light and shadow, perspective, composition, adventurous, energy, exploration, contrast, experimental, unique, Realistic, photorealistic, 4k, best quality, masterpiece, real photos, cinematic lighting, realistic shadow, volumetric lighting, highest detail, ultra-detailed, professional photography, depth of field, bokeh, detailed face, subsurface scattering, detailed, intricate,

Both examples were made in Invoke AI 3..0.0 using txt2img.

Recommended settings:

SD Next:

Step Count:

From 20 to 70

Samplers:

Euler A, DPM++ a Karras, DPM++ 2M/2S a Karras

Second Pass & Upscaler:

4x-UltraSharp with denoising strength between 0.1 to 0.3, 5 to 50 steps. For second pass sampler you can use combination of sampler mentioned above. Some of them will handle better adding details, textures, some might cause extra issues like double glands, elongated limbs, missing limbs or blurriness.

Refiner:

Since SD Next lets you mix&match not only samplers but refiner for SD 0.9 XL with SD 1.5 base models I do recommend to at least play with new refiners to increase fidelity of generations, get crispy details or to fix some issues.

aDetailer:

Face either mediapipe mesh/full or yolov8s,

Detection threshold varies from 0.3 to 0.45,

Steps count for aDetailer model 12-30,

Resolution for in-painting from 512x512 to 768x768 - it will slowdown generation significantly.

Resolution:

from 512px to 1024px with aspect ratio from 4:3, 3:4 or 16:9, 9:16.

InvokeAI 3.0

Step Count:

15-75

Samplers:

DPM++2M SDE Karras, DPM++2M Karras, DPM++2S, Euler Ancestral, Euler Karras, Huen, Heun Karras,

Upscaler:

RealESRGAN_x4plus

Resolution:

768px up to 1280px.

Portrait, Free or Wide.

Disclaimer:

Some of prompts are not mine. They're either found here, on CivitAI or taken from user of Unstable Discord #men-only channel.

Model, recommended resources, hint and description of each of my model may change over time with new LoRA/TI tested or/and with new versions of HD2.

ดูการแปล