Jump to content

Media creation with artificial intelligence: Difference between revisions

m
no edit summary
mNo edit summary
mNo edit summary
Line 48: Line 48:
'''GenAI systems operate probabilistically.''' Do not expect identical results when repeating prompts with the same inputs. The same text prompts may produce similar, but not identical, outputs. Therefore, in some scenarios, it can be beneficial to generate multiple results and select the most suitable candidates for your intermediate or final goal.
'''GenAI systems operate probabilistically.''' Do not expect identical results when repeating prompts with the same inputs. The same text prompts may produce similar, but not identical, outputs. Therefore, in some scenarios, it can be beneficial to generate multiple results and select the most suitable candidates for your intermediate or final goal.


===Sounds===
==Sounds==
For natural voices you may want to look for emotional text-to-speech.
For natural voices you may want to look for emotional text-to-speech.


Line 59: Line 59:
For professional-level fine-tuning you might want look at editors like [https://www.celemony.com/en/melodyne/what-is-melodyne Melodyn].
For professional-level fine-tuning you might want look at editors like [https://www.celemony.com/en/melodyne/what-is-melodyne Melodyn].


===Images===
==Images==
* Content generation based on:
===Techniques===
** Text prompts
Creating new content
** Own drafts
* Text-only prompts
** Merging (main image and references)
* Text prompts with one or multiple references
* Changing existing content
** Expanding
** Inpainting (replacement of subsections)
** Style transfers


===3D content===
Changing existing content
There exists content generators that turn 2D data into 3D data by calculating plausible assumptions for the missing dimension.
* Expanding
* Inpainting (replacement of subsections)
* Style transfers


==Tools==
===Workflows===
===Images===
[https://www.nvidia.com/en-us/glossary/multimodal-large-language-models/ Multimodal LLM]s and plugins-using LLMs such as ChatGPT, Copilot, Gemini, Grok and Meta AI provide image generation capabilities.
[https://www.nvidia.com/en-us/glossary/multimodal-large-language-models/ Multimodal LLM]s and plugins-using LLMs such as ChatGPT, Copilot, Gemini, Grok and Meta AI provide image generation capabilities.
* In addition, there are specialized tools (e.g., diffusion-based systems) that offer more control and customization. However, most users already have access to at least one of these platforms and can begin generating images immediately.
* In addition, there are specialized tools (e.g., diffusion-based systems) that offer more control and customization. However, most users already have access to at least one of these platforms and can begin generating images immediately.
Line 80: Line 77:
For image editing and post-processing, dedicated graphics software such as [https://www.gimp.org/downloads/ GIMP], [https://krita.org Krita], or [https://www.adobe.com/products/photoshop.html Photoshop] is recommended. These tools allow precise control (e.g., masking, compositing, color correction) and can complement GenAI workflows.
For image editing and post-processing, dedicated graphics software such as [https://www.gimp.org/downloads/ GIMP], [https://krita.org Krita], or [https://www.adobe.com/products/photoshop.html Photoshop] is recommended. These tools allow precise control (e.g., masking, compositing, color correction) and can complement GenAI workflows.


'''Beginner workflow'''
In general you will always want to takes these steps: Generate, refine via text prompts, select final candidates, refine via graphic tools.


Mini tutorial based on ChatGPT:
====Beginner workflows====
In context of its limitations, chatbot-based workflows are still improvements over a pure manual workflows: They speed up prototyping and let you explore different creative directions for drafts.
 
=====ChatGPT=====
* '''Iterative prompting''': Describe the desired result as clearly as possible: Motive, perspective, colors, lights, shadows, art style. Refine the prompt step by step based on undesired aspects rather than expecting a perfect result on the first attempt. Negative prompts: You can also explicitly write what you don't want.  
* '''Iterative prompting''': Describe the desired result as clearly as possible: Motive, perspective, colors, lights, shadows, art style. Refine the prompt step by step based on undesired aspects rather than expecting a perfect result on the first attempt. Negative prompts: You can also explicitly write what you don't want.  
* '''Avoid quality loss''': If the GenAI degenerates the image quality because of too many iterations, try from a new start with combined text prompts.
* '''Avoid quality loss''': If the GenAI degenerates the image quality because of too many iterations, try from a new start with combined text prompts.
Line 92: Line 92:
* '''Consistency strategies''': When available, use features such as seeds, style references, or controlled variations to maintain visual coherence across multiple images.-->
* '''Consistency strategies''': When available, use features such as seeds, style references, or controlled variations to maintain visual coherence across multiple images.-->


Conclusion: Generate, refine via text prompts, select final candidates, refine via graphic tools.
(Add examples here.)


In context of its limitations, this workflow is still great for rapid prototyping and exploring different creative directions for drafts.
=====Copilot=====
 
(Add examples here.)
<!--
Specialized: Canva-->


'''Advanced workflow'''
=====Gemini=====


AUTOMATIC1111 (aka Stable diffusion), ComfyUI
====Advanced workflows====
AUTOMATIC1111 (aka Stable diffusion)


'''Expert workflow'''
ComfyUI


====Expert workflows====
This would include to train own models. The idea is to let the models have a neural representation of objects that equals to screenshot-taking from 3D so that prompts will output almost never hallucinate details. That way artist can reduce post-editing as the generated outputs also include his own style.   
This would include to train own models. The idea is to let the models have a neural representation of objects that equals to screenshot-taking from 3D so that prompts will output almost never hallucinate details. That way artist can reduce post-editing as the generated outputs also include his own style.   


Own models once more boost rapid prototyping because they reduce the necessity to have a more complexer, combined 3D-2D-workflow.
Own models once more boost rapid prototyping because they reduce the necessity to have a more complexer, combined 3D-2D-workflow.


===Videos===
==Videos==
'''Google Veo'''
'''Google Veo'''


Line 117: Line 115:
'''Sora''' (OpenAI)
'''Sora''' (OpenAI)
* In 2026, Sora was announced to be discontinued. It will be probably just paused for a few years until future AI chips have lowered computation costs.
* In 2026, Sora was announced to be discontinued. It will be probably just paused for a few years until future AI chips have lowered computation costs.
==3D content==
There exists content generators that turn 2D data into 3D data by calculating plausible assumptions for the missing dimension.


===3D objects===
===3D objects===
* ...
* ...
===3D animations===
===3D animations===
* ...
* ...
===Complete world maps===
 
==World maps==
 
* ...
* ...
* World generator inside Unreal Engine 5
* World generator inside Unreal Engine 5
** As for 2026, this is technically speaking still "procedural" but it is plausible to expect an LLM-driven approach in the future. On the internet you can find already LLM-driven experiments.
** As for 2026, this is technically speaking still "procedural" but it is plausible to expect an LLM-driven approach in the future. On the internet you can find already LLM-driven experiments.
8,844

edits