Media creation with artificial intelligence: Difference between revisions

Media creation with artificial intelligence (view source)

Revision as of 15:10, 24 March 2026

1,392 bytes added , Tuesday at 15:10

m

→‎2D images

Paradox-01

8,844

edits

@@ Line 59: / Line 59: @@
 ==Tools==
 ===2D images===
-All big LLM applications such as ChatGPT, Claude, Gemini, Grok and Mistral support image generation. Of course there are other tools but you probably already at least one of these. Therefore you have there an account and can instantly use it for image generation. For ''mass production'' you probably need a paid subscription or "plan" so that more images can be created in a defined time frame (by that plan).
+All major [https://www.nvidia.com/en-us/glossary/multimodal-large-language-models/ multimodal LLM]s such as ChatGPT, Claude, Gemini, Grok, and Mistral provide image generation capabilities.
+* In addition, there are specialized tools (e.g., diffusion-based systems) that offer more control and customization. However, most users already have access to at least one of these platforms and can begin generating images immediately.
+* For ''high-volume generation'', a paid subscription or plan is typically required to increase rate limits and output capacity.
-For image editing you also want a specialized graphics tool such a [https://www.gimp.org/downloads/ gimp], [https://krita.org krita] or [https://www.adobe.com/products/photoshop.html Photoshop] (which itself has GenAI functions).
+For image editing and post-processing, dedicated graphics software such as [https://www.gimp.org/downloads/ GIMP], [https://krita.org Krita], or [https://www.adobe.com/products/photoshop.html Photoshop] is recommended. These tools allow precise control (e.g., masking, compositing, color correction) and can complement GenAI workflows.
-<!--Mini tutorials-->Easy Access: '''ChatGPT''' (The notes here may work the same for other well known LLMs.)
+'''Beginner workflow'''
-* Prompt exactly what you want (even if it just "higher quality"). Either it works or not.
-* Merging: When possible drop both images at the same time into the prompt. Re-editing an image means a loss of details.
-* The the context window gained to much control over the currently expected output, then start a new prompt that includes all the accumulated changes you want to make.
-* When you subscription allows it, output multiple final images. As every piece will be different, use have to chose the best one. You can also photoshop (merge) multiple final images to together by using masks.
-(Add some examples here.)
+Mini tutorials based on ChatGPT. Though, this should similar for all MLLMs.
+* '''Iterative prompting''': Describe the desired result as clearly as possible: Motive, perspective, colors, lights, shadows, art style. Refine the prompt step by step based on undesired aspects rather than expecting a perfect result on the first attempt.
+* '''Avoid quality loss''': If the GenAI degenerates the image quality because of too many iterations, try from a new start with combined text prompts.
+* '''Reference images''': When supported, provide one or more images to guide style, composition, or subject consistency. This is often more reliable than text-only prompting.
+* '''Context management''': If previous prompts begin to overly influence results, start a new prompt and explicitly restate the desired outcome. This prevents unintended bias from earlier context.
+* '''Merging / composition''': Supplying multiple images in a single prompt can help combine elements. However, repeated re-editing of generated outputs may degrade detail or introduce artifacts.
+* '''Batch generation''': Since outputs are probabilistic, generate multiple final candidates and select the best.
+* '''Post-processing workflow''': Combine the best elements using external tools (e.g., masking in Photoshop or Krita). This hybrid approach often yields higher-quality results than relying on a single generation.<!--Not for these MLLMs:
+* '''Consistency strategies''': When available, use features such as seeds, style references, or controlled variations to maintain visual coherence across multiple images.-->
-Specialized: Canva
+Generate, refine via text prompts, select final candidate, refine via graphic tools.
-Local solutions: AUTOMATIC1111 (aka Stable diffusion), ComfyUI
+In context of its limitations, this workflow is still great for rapid prototyping and exploring different creative directions for drafts.
+(Add examples here.)
+<!--
+Specialized: Canva-->
+'''Advanced workflow'''
+AUTOMATIC1111 (aka Stable diffusion), ComfyUI
 ===Videos===