상세 정보

Directed Image Positioning v1

#개념

#FLUX

Tell it where you want the images to be placed in your scene with natural language prompting. Experimental.

...

After some experimentation it does seem to enhance the image coherency and positional adherence. Pics below with notes in images.

I have added a public dataset of PNG images with metadata in JSON I created with Python. The images are not captioned but should be better than anything I found online if anyone cares to join me in my experiments.

I used GeoShapes that were hand labeled with just the smaller shapes with their approximate location in various plain language descriptions, no formal way, then let an AI add the descriptive captions after them. If I had a more formal dataset of positional images i could do magic.

From ChatGPT:

That’s a clever approach! Using GeoShapes with plain language labeling for training is an innovative start, as it provides the AI a loose "positional vocabulary" to work with. This flexibility allows it to generalize placement and context, which is why you’re seeing those natural, somewhat structured compositions.

To elevate this even further, a more formal positional dataset would be incredibly powerful. Here are a few ideas for constructing such a dataset to amplify the precision and adaptability of your model:

1. Gridded Coordinate System: Create images with shapes positioned according to a defined grid (e.g., 3x3 or 5x5). Each grid cell would represent a different placement: top-left, center-right, bottom-middle, etc. Labels would include grid-based coordinates (e.g., (1,1) for top-left), allowing for a finer positional mapping. This structured approach could help the AI learn precise placements.

2. Layered Position Descriptions: Instead of just approximate descriptions, use layered terms. For instance, combine location (e.g., top-left, bottom-center), size (small, medium, large), distance from edges, and angles (rotation, tilt). This would give the model richer metadata, allowing for more specific placements, rotations, and even layering of shapes.

3. Structured Language for Composition Rules: By integrating a formalized language structure that includes rules for composition (e.g., “centered vertically, offset slightly left horizontally”), the model could learn to arrange shapes based on relational positions—how one object should sit relative to others or the background.

4. Dynamic Background and Lighting Contexts: Incorporate variations in backgrounds and lighting to teach the model how positional accuracy changes in context. For example, adding shadows based on hypothetical light sources could make positional interpretation even more robust.

5. Use of Bounding Boxes or Anchor Points: Label images with bounding boxes or anchor points for shapes, with coordinates that specify exact placements. These markers would help the AI learn exact placements rather than just "approximate" positioning, useful for precise layouts.

6. Combined Shapes in Hierarchical Order: Introducing overlapping or adjacent shapes with a hierarchy of importance (e.g., primary, secondary) would let the model understand layered compositions. For example, a small red square within a larger green square can be tagged as "centered within green square," helping the AI learn containment and layering nuances.

Once you have this kind of dataset, you could train the AI to handle much more complex layouts, orientations, and compositions. With such a structured dataset, the potential for precise layout and arrangement—whether for graphic design, illustration, or even UI composition—would be almost endless. Let me know if you’d like to discuss any of these ideas further or explore ways to implement them!

번역문 보기

평점 및 리뷰

-- /5

0 개의 평점

충분한 평가나 댓글을 받지 못했습니다.

데이터 없음

angrysky

모델과 대화하기

공고

2024-11-14

모델 게시

2024-11-14

모델 정보 업데이트

모델 상세정보

유형

LORA

게시 날짜

2024-11-14

기본 모델

Flux.1 D

버전 소개

Re-uploaded to change the name. Don't be mad if it isn't perfect or doesn't work very well lol, I don't know yet.

허가 범위

모델 출처: civitai

1. 재게시된 모델의 권리는 원 제작자에게 있습니다.

2. 모델 원작자가 모델을 인증받으려면 공식 채널을 통해 SeaArt.AI 직원에게 문의하세요. 저희는 모든 창작자의 권리를 보호하기 위해 노력합니다. 인증하러 이동

창작 허가 범위

온라인 생방송

혼합 진행

다운로드 허용

상업적 허가 범위

생성된 이미지를 판매하거나 상업적 목적으로 사용 가능

모델의 재판매 또는 융합 후 판매 허용

SeaArt 앱 다운로드

모바일에서 AI 창작 여정을 계속하세요