Training run 1, epoch 8
Dataset 1
Images
Count: 59 images
Sources:
karsh.org
Treatment:
All source images were used as-is.
Flip augmentation: enabled
Tagging
Methodology: Images were initially tagged by BLIP (model_base_caption_capfilt_large). Tags were edited by model author for correctness and consistency.
Tags were structured following this pattern:
Tag 1: "[monochrome ]photograph in karsh style of"
"monochrome" omitted if picture has colour.
Tag 2: a brief description of the subject and scene
Tag 3: "{close|medium|full|wide} shot"
"close", "medium", "full" largely used for portraiture.
"wide" largely used for landscape/street photographs, or photographs where a person was not the primary focus.
Tag order was preserved during training.
Training
Dataset replicates/epoch: 10
Epochs: 8
Batch size: 4
Checkpoint: sd_xl_base_1.0.safetensors
Network configuration: Kohya Standard (dim=16, alpha=8)
LR: 0.0001 for both U-Net and text encoder. Algorithm was "cosine_with_restarts" with 5% warmup.
Optimizer: AdamW






