hello guys this is my first article about image generation , will improve with upcoming articles//
This article is basically a basic guide + test and comparison of "t2i" image generation of new alibaba's image generation model "Z-IMAGE-TURBO" with other models consisting of-
- flux.2 -dev
- flux.2 pro
- gemini nano banana 2/pro
basic guide for "Z-IMAGE-TURBO" image generation-
1.use tag based prompting-
a. [less effective]-" mid shot image of a girl sitting in a chair on her room looking front, night time, only light source is moonlight"
b. [more effective]-" girl, sitting on chair, looking front, mid-shot, dark room, night time, moon light "

2.Chinese prompt sometimes gives slight better prompt adherence
prompt: a samurai, riding a white horse, holding a long sword, raining, forest background, cinematic, dynamic front angle

3.For me currently this settings are best ( will update if i find better)
aura flow- 3-7
sampler- euler/ dpmpp_sde
scheduler- simple/normal/beta57/ddim_uniform
steps- 6-10
cfg- 1
"as the cfg is 1 "negative prompts" doesn't make any difference"
4.keep the prompts precise, the more precise it is the better the results
5.model recognizes many celebs, famous art styles, artists, i cant show examples here but you can try it
Comparison with other models
each image is generated in 1200*1200 resolution,
generation time-
flux.2 dev fp8 = 2- 2.30 min
flux.2 pro = 20-35 sec
nano banana pro= 20-35 sec
z-image turbo= 15-25 sec
test-1 = 20yr old northeast indiana tribal lady, tribal clothes, smiling, camera angle upto chest, "olive green right eye", "caramel brown left eye", real life based image, clear, hd image, non blurry,, camera focused on eyes, sunlight refracting on face

TEST-2 = lion resting under a tree, realistic wildlife, savannah environment, sunny warm lighting, warthog beside lion, warthog alert and looking around, meerkat standing on warthog’s back, natural colors, detailed animal fur, cinematic composition, National Geographic style, high resolution, sharp focus

TEST-3 = a japanese male, sitting on chair, both legs over desk, "smoking ciggratte", back view, holding beer can on one hand, looking directly up toward camera, irritated expression, messy room, captured from a high far overhead camera angle, fish eye lens, curved edges, exaggerated perspective

TEST-4 = female ballet dancer performing in mid-air, full body visible, wearing a flowing pink ballet dress, holding a golden plate in one hand with a wine glass filled with red wine, sad emotional expression, camera focused tightly on her sad face while capturing full body motion, dramatic cinematic lighting, dynamic pose with dress fluttering, graceful movement frozen mid-leap, high detail fabric textures, realistic shadows, soft volumetric light, elegant atmosphere, high resolution, dramatic composition

TEST-5 = four portrait images of the same woman, consistent face, 1st image happy expression, 2nd image crying expression with tears, 3rd image angry expression, 4th image goofy playful expression, high detail facial features, sharp focus, natural colors, portrait style, high resolution

test-6 = person performing a complex breakdance move mid-air, extreme dynamic pose, powerful body twist, strong muscle tension, cinematic low-angle shot, dramatic motion blur on limbs, crisp focus on the dancer’s face, urban street setting with concrete floor, scattered dust kicked up from movement, intense stage-like lighting, volumetric shadows, high-contrast highlights, dynamic composition, energetic atmosphere, realistic clothing folds, detailed textures, high resolution, action-packed visual impact

TEST-7 = street skateboarder escaping police chasing him, high-speed action, cinematic urban scene, dramatic motion blur, dynamic angle, captured from a random NPC’s phone camera, slightly shaky handheld feel, late afternoon street lighting, realistic city environment, police running behind, skateboarder weaving through pedestrians, intense expressions, gritty modern look, high detail, natural colors, sharp focus on subject, sense of urgency and movement

MODEL NOTES-
1: as model is trained more on asian characters than other region persons, it many times on default creates asians in order to create different persons be "specific" about such region of person
2: while model is decent at generating atmosphere, it lacks in generating high defined atmosphere or it takes many attempts to get such result, eg. "hailstorms, windstorms, strong winds, water pressures, sun complex rays, seasonal complex atmosphere" it may generate result, but not good and precise as other models
3: as specified before its great in generating many real life characters
4: although model is good at generating non realistic/2d/anime/semi realistic/ arts, it will not directly give you exceptional results, the reason is model is trained highly on realism, the best way to generate non realistic/2d arts is to mention similar artists art style/ anime name/ studio art style (eg. ghibli studio art style, mappa studios art, marvel comic style, vincent van gogh art,)
5: prompt adherence is good at short to mid long prompts with super long prompts it may no generate accurate results so keep it mid length and precise
6: texts on image is accurate but okish in styles not that great like its image generation style
overall:
" Z-image-turbo" is a great model, with its exceptional speed and image generation quality, cant wait for loras to replace its cons" you can try it yourself here- https://www.seaart.ai/workFlowDetail/d4jsu05e878c73fgtp8g
thank you guys for reading this!!, as its a turbo model version i have not shown much of a examples, i am waiting for base model and edit image model version to release, so i can show various types of examples and my view on the model, i will update this article if i find more interesting techniques of this model,
as this is my first article i don't have much experience of article writing please leave comments if you find something interesting about " Z-image- turbo" or ways to improve my article writing,
