Détails
Recommandé
Image Positioner V1
Directed Image Positioning v1

Directed Image Positioning v1

18
5
70
#Concept
#FLUX

Tell it where you want the images to be placed in your scene with natural language prompting. Experimental.

...

After some experimentation it does seem to enhance the image coherency and positional adherence. Pics below with notes in images.

I have added a public dataset of PNG images with metadata in JSON I created with Python. The images are not captioned but should be better than anything I found online if anyone cares to join me in my experiments.


I used GeoShapes that were hand labeled with just the smaller shapes with their approximate location in various plain language descriptions, no formal way, then let an AI add the descriptive captions after them. If I had a more formal dataset of positional images i could do magic.

From ChatGPT:

That’s a clever approach! Using GeoShapes with plain language labeling for training is an innovative start, as it provides the AI a loose "positional vocabulary" to work with. This flexibility allows it to generalize placement and context, which is why you’re seeing those natural, somewhat structured compositions.

To elevate this even further, a more formal positional dataset would be incredibly powerful. Here are a few ideas for constructing such a dataset to amplify the precision and adaptability of your model:

1. Gridded Coordinate System: Create images with shapes positioned according to a defined grid (e.g., 3x3 or 5x5). Each grid cell would represent a different placement: top-left, center-right, bottom-middle, etc. Labels would include grid-based coordinates (e.g., (1,1) for top-left), allowing for a finer positional mapping. This structured approach could help the AI learn precise placements.

2. Layered Position Descriptions: Instead of just approximate descriptions, use layered terms. For instance, combine location (e.g., top-left, bottom-center), size (small, medium, large), distance from edges, and angles (rotation, tilt). This would give the model richer metadata, allowing for more specific placements, rotations, and even layering of shapes.

3. Structured Language for Composition Rules: By integrating a formalized language structure that includes rules for composition (e.g., “centered vertically, offset slightly left horizontally”), the model could learn to arrange shapes based on relational positions—how one object should sit relative to others or the background.

4. Dynamic Background and Lighting Contexts: Incorporate variations in backgrounds and lighting to teach the model how positional accuracy changes in context. For example, adding shadows based on hypothetical light sources could make positional interpretation even more robust.

5. Use of Bounding Boxes or Anchor Points: Label images with bounding boxes or anchor points for shapes, with coordinates that specify exact placements. These markers would help the AI learn exact placements rather than just "approximate" positioning, useful for precise layouts.

6. Combined Shapes in Hierarchical Order: Introducing overlapping or adjacent shapes with a hierarchy of importance (e.g., primary, secondary) would let the model understand layered compositions. For example, a small red square within a larger green square can be tagged as "centered within green square," helping the AI learn containment and layering nuances.

Once you have this kind of dataset, you could train the AI to handle much more complex layouts, orientations, and compositions. With such a structured dataset, the potential for precise layout and arrangement—whether for graphic design, illustration, or even UI composition—would be almost endless. Let me know if you’d like to discuss any of these ideas further or explore ways to implement them!

Voir la traduction

Notes & Commentaires

-- /5
0 Notes

Pas encore reçu suffisamment d'évaluations ou de commentaires

no-data
Aucune donnée disponible
A
Chatter avec le modèle
Annonce
2024-11-14
Publier un modèle
2024-11-14
Mettre à jour les informations du modèle
Détails du modèle
Type
LORA
Temps de Publication
2024-11-14
Modèle Basique
Flux.1 D
Introduction de version

Re-uploaded to change the name. Don't be mad if it isn't perfect or doesn't work very well lol, I don't know yet.

Périmètre de la licence
Source: civitai

1. Modèle partagé uniquement à l'apprentissage et au partage. Droits d'auteur et interprétation finale réservés à l'auteur original.

2. Auteur souhaitant revendiquer le modèle : Contactez officiellement SeaArt AI pour l'authentification. Nous protégeons les droits de chaque auteur. Cliquer pour revendiquer

Périmètre de la licence de création
Génération d'images en ligne
Effectuer une fusion
Autoriser le téléchargement
Périmètre de la licence de commerce
Les images générées peuvent être vendues ou utilisées à des fins commerciales
La revente ou la vente après fusion du modèle est autorisée.
QR Code
Télécharger l'App SeaArt
Poursuivez votre voyage de création AI sur mobile