8,844
edits
Paradox-01 (talk | contribs) mNo edit summary |
Paradox-01 (talk | contribs) m (→2D images) |
||
| Line 59: | Line 59: | ||
==Tools== | ==Tools== | ||
===2D images=== | ===2D images=== | ||
All | All major [https://www.nvidia.com/en-us/glossary/multimodal-large-language-models/ multimodal LLM]s such as ChatGPT, Claude, Gemini, Grok, and Mistral provide image generation capabilities. | ||
* In addition, there are specialized tools (e.g., diffusion-based systems) that offer more control and customization. However, most users already have access to at least one of these platforms and can begin generating images immediately. | |||
* For ''high-volume generation'', a paid subscription or plan is typically required to increase rate limits and output capacity. | |||
For image editing | For image editing and post-processing, dedicated graphics software such as [https://www.gimp.org/downloads/ GIMP], [https://krita.org Krita], or [https://www.adobe.com/products/photoshop.html Photoshop] is recommended. These tools allow precise control (e.g., masking, compositing, color correction) and can complement GenAI workflows. | ||
'''Beginner workflow''' | |||
( | Mini tutorials based on ChatGPT. Though, this should similar for all MLLMs. | ||
* '''Iterative prompting''': Describe the desired result as clearly as possible: Motive, perspective, colors, lights, shadows, art style. Refine the prompt step by step based on undesired aspects rather than expecting a perfect result on the first attempt. | |||
* '''Avoid quality loss''': If the GenAI degenerates the image quality because of too many iterations, try from a new start with combined text prompts. | |||
* '''Reference images''': When supported, provide one or more images to guide style, composition, or subject consistency. This is often more reliable than text-only prompting. | |||
* '''Context management''': If previous prompts begin to overly influence results, start a new prompt and explicitly restate the desired outcome. This prevents unintended bias from earlier context. | |||
* '''Merging / composition''': Supplying multiple images in a single prompt can help combine elements. However, repeated re-editing of generated outputs may degrade detail or introduce artifacts. | |||
* '''Batch generation''': Since outputs are probabilistic, generate multiple final candidates and select the best. | |||
* '''Post-processing workflow''': Combine the best elements using external tools (e.g., masking in Photoshop or Krita). This hybrid approach often yields higher-quality results than relying on a single generation.<!--Not for these MLLMs: | |||
* '''Consistency strategies''': When available, use features such as seeds, style references, or controlled variations to maintain visual coherence across multiple images.--> | |||
Generate, refine via text prompts, select final candidate, refine via graphic tools. | |||
In context of its limitations, this workflow is still great for rapid prototyping and exploring different creative directions for drafts. | |||
(Add examples here.) | |||
<!-- | |||
Specialized: Canva--> | |||
'''Advanced workflow''' | |||
AUTOMATIC1111 (aka Stable diffusion), ComfyUI | |||
===Videos=== | ===Videos=== | ||
edits