Media creation with artificial intelligence: Difference between revisions

From OniGalore
mNo edit summary
mNo edit summary
 
(28 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[Category:Real World]]
[[Category:Real World]]
Sub pages:
* [[Media_creation_with_artificial_intelligence/Examples|Examples gallery]]
==Copyright and fair use==
==Copyright and fair use==
To understand the full picture of copyright, it is necessary to look at its real-world implementation.
To understand the full picture of copyright, it is necessary to look at its real-world implementation.
Line 10: Line 14:
* A '''space for like-minded people''' to socialize and exchange and develop ideas.
* A '''space for like-minded people''' to socialize and exchange and develop ideas.
** Such spaces were forums in the past, nowadays they are real-time communication platform like a Discord channel which is thematically decorated and helps to keep the product in '''general awareness''' over long time frames.
** Such spaces were forums in the past, nowadays they are real-time communication platform like a Discord channel which is thematically decorated and helps to keep the product in '''general awareness''' over long time frames.
** Communities pose a '''potential source for researching [https://www.tumblr.com/dragon-ball-meta/175722199189/toriyamas-official-comments-on-broly-for-the-new future products]'''.<!--//Commented out for now: Well, this is interesting but it quite lengthens the section.// The companies cannot simply "listen to the (hardcore) fans" but have to develop a profitable package that lures as many customers as possible. Nonetheless, '''fan service''' remains an important aspect which ranges from '''[https://www.latimes.com/entertainment-arts/business/story/2020-02-14/sonic-the-hedgehog-got-a-digital-makeover-after-an-internet-backlash-will-it-work respecting the original]''', placing '''easter eggs''' and '''insider jokes''' - and not ''Japanese''-like fan service which in the Western world basically translates to "sex sells".
** Communities pose a '''potential source for researching [https://www.tumblr.com/dragon-ball-meta/175722199189/toriyamas-official-comments-on-broly-for-the-new future products]'''.<!--//Commented out for now: Well, this is interesting but it quite lengthens the section.// The companies cannot simply "listen to the (hardcore) fans" but have to develop a profitable package that lures as many customers as possible. Nonetheless, '''fan service''' remains an important aspect which ranges from '''[https://www.latimes.com/entertainment-arts/business/story/2020-02-14/sonic-the-hedgehog-got-a-digital-makeover-after-an-internet-backlash-will-it-work respecting the original]''', to placing '''easter eggs''' and '''insider jokes''' - and not ''Japanese''-like fan service which in the Western world basically translates to "sex sells".
*** Besides the dependencies of the target group, it can be good choice to not exaggerate and provide consistent overall concept made from the envisioned world and its characters: [[Konoko#Design_by_Lorraine|Lorraine]]: "Lorraine McLees, Bungie artist, remembers the slight tug of war between the Chicago and Bungie West [offices]: 'Their vision of (Konoko) was more overtly -blam!- than ours. We specifically didn't want her to be just another Lara Croft, so we'd go back and forth with the California office. They'd send Konoko designs back with a bare midriff and the bottom of her breasts exposed, and we'd go back and re-clothe her. The funny thing is that the more we tried to de--blam!-ize her, the more sexy she became.")-->
*** Besides the dependencies of the target group, it can be good choice to not exaggerate and provide consistent overall concept made from the envisioned world and its characters. [[Konoko#Design_by_Lorraine|Bungie History]]: "Lorraine McLees, Bungie artist, remembers the slight tug of war between the Chicago and Bungie West [offices]: 'Their vision of (Konoko) was more overtly -blam!- than ours. We specifically didn't want her to be just another Lara Croft, so we'd go back and forth with the California office. They'd send Konoko designs back with a bare midriff and the bottom of her breasts exposed, and we'd go back and re-clothe her. The funny thing is that the more we tried to de--blam!-ize her, the more sexy she became.")-->
* Support in organizing competitions and other promotional events, including creative activities such as fan art. Cosplay contributions can increase a company's '''visibility''' at events like Gamescom.
* Support in organizing competitions and other promotional events, including creative activities such as fan art. Cosplay contributions can increase a company's '''visibility''' at events like Gamescom.
* An '''additional channel for updates''' (general information and content announcements).
* An '''additional channel for updates''' (general information and content announcements).
Line 29: Line 33:


Generative AI (GenAI) has the potential to disrupt this cost–benefit balance:
Generative AI (GenAI) has the potential to disrupt this cost–benefit balance:
* The large-scale or automated production of new content based on existing assets may conflict with the company’s interests.
* An automated large-scale production of new content based on existing assets may conflict with the company’s interests.
Considering the '''artists and technicians''' involved in creating the original work, mass-produced fan content generated at little or no cost could threaten established creative professions.
Considering the '''artists and technicians''' involved in creating the original work, mass-produced fan content generated at little or no cost could threaten established creative professions.
* An overabundance of derivative content may '''dilute attention''' and reduce consumer motivation to engage with official products.
* An overabundance of derivative content may '''dilute attention''' and reduce consumer motivation to engage with official products.
Line 44: Line 48:
You can use GenAI via websites, desktop clients, or dedicated programs that may run fully locally.
You can use GenAI via websites, desktop clients, or dedicated programs that may run fully locally.


To build your own programmatic solutions, you will either need downloadable AI models or API keys to access cloud services that perform the heavy computation remotely. With sufficient expertise, you can even build agentic AIs (such as Open Claw) that use your existing tools and carry out tasks automatically. However, caution is advised: Probabilistic AIs can hallucinate and may pose a risk to your system. Internally MCP are used for more safety but these don't compensate to have backup means. Sandboxes offer an additional layer of safety, but they can limit the usefulness of agents and introduce extra complexity, which may offset the time savings you intended to achieve.
To build your own programmatic solutions, you will either need downloadable AI models or API keys to access cloud services that perform the heavy computation remotely.
 
With sufficient expertise, you can even build agentic AIs such as Open Claw that use your existing tools and carry out tasks automatically. However, caution is advised: Probabilistic AIs can hallucinate and may pose a risk to your system. Internally, [[wp:Model_Context_Protocol|MCP]] are used for more safety but these don't compensate to have backup means. Sandboxes offer an additional layer of safety, but they can limit the usefulness of agents and introduce extra complexity, which may offset the time savings you intended to achieve.
 
'''GenAI systems operate probabilistically.''' Do not expect identical results when repeating prompts with the same inputs. The same text prompts may produce similar, but not identical, outputs. Therefore, in some scenarios, it can be beneficial to generate multiple results and select the most suitable candidates for your intermediate or final goal.
 
==Sounds==
For natural voices you may want to look for emotional text-to-speech.
 
===Voice cloners and generators===
====Elevenlabs====
* https://elevenlabs.io/
 
===Music generators===
====Suno====
* https://suno.com/create
 
====Lyria (Google Gemini)====
* https://gemini.google/overview/music-generation/
 
===Editors===
For fine-tuning on professional level you might want look at editors like [https://www.celemony.com/en/melodyne/what-is-melodyne Melodyn].
 
==Images==
===Techniques===
Creating new content.
* Text-only prompts
* Text prompts with one or multiple references
** As default, the first image serves as main canvas that gets modified by the other references. The order of references can be overridden by pieces in the text prompt. You have to be careful in describing what reference is used for what if there are multiple.
 
Changing existing content
* Expanding
* Inpainting (replacement of subsections)
* Style transfers
 
===Select best tool per use-case===
====Overview====
Right now, Gemini seems to perform best in most use-cases.
 
(Add table.)
 
====Drafting====
 
====Upscaling====
Very often you will prefer a generated style of one tool over the style of other tools. However, that shouldn't stop you from trying out other tools for other tasks. In the end a combination of generators can get you closer to the result you had in mind.
 
For instance, if you like ChatGPT for the style then maybe you still want to scale up a draft or reference image in Gemini first.
 
=====Gemini=====
* Free Users: Generally capped at 1K resolution.
* AI Plus/Pro Subscribers: Can access 2K resolution.
* AI Ultra Subscribers: Have full access to the 4K resolution toggle and downloads.
 
====Fine-drawing====


'''GenAI systems operate probabilistically.''' Do not expect identical results when repeating prompts with the same inputs. The same text prompts may produce similar, but not identical, outputs. Therefore, in some scenarios, it can be beneficial to generate multiple results and select the most suitable candidate for your intermediate or final goal.
====Generating====


===Sounds, voice acting and music===
====Coloring====
* Voice-cloning of existing or creation of new voices. For natural voices you may want to look for emotional text-to-speech.
* Music generations


===Image generation===
====Shading====
* Content generation based on:
** Text prompts
** Own drafts
** Merging (main image and references)
* Changing existing content
** Expanding
** Inpainting (replacement of subsections)
** Style transfers


===3D content generation===
====Editing and fine-tuning====
There exists content generators that turn 2D data into 3D data by calculating plausible assumptions for the missing dimension.


==Tools==
===Workflows===
===Images===
====Beginner workflows====
[https://www.nvidia.com/en-us/glossary/multimodal-large-language-models/ Multimodal LLM]s and plugins-using LLMs such as ChatGPT, Copilot, Gemini, Grok and Meta AI provide image generation capabilities.
[https://www.nvidia.com/en-us/glossary/multimodal-large-language-models/ Multimodal LLM]s and plugins-using LLMs such as ChatGPT, Copilot, Gemini, Grok and Meta AI provide image generation capabilities.
* In addition, there are specialized tools (e.g., diffusion-based systems) that offer more control and customization. However, most users already have access to at least one of these platforms and can begin generating images immediately.
* In addition, there are specialized tools (e.g., diffusion-based systems) that offer more control and customization. However, most users already have access to at least one of '''ChatBot''' and can begin generating images immediately.
* For ''high-volume generation'', a paid subscription or plan is typically required.
* For '''high-volume generation''', a '''paid subscription''' or plan is typically '''required'''.
 
'''For image editing''' and post-processing, '''dedicated graphics software''' such as [https://www.gimp.org/downloads/ GIMP], [https://krita.org Krita], or [https://www.adobe.com/products/photoshop.html Photoshop] '''is recommended'''. These tools allow precise control (e.g., '''masking, compositing, color correction''') and can '''complement GenAI workflows'''.
 
In general you will always want to takes these steps: Generate, refine via text prompts, select final candidates, refine via graphic tools.
 
In context of its limitations, chatbot-based workflows are most often nonetheless a big '''improvement over pure manual workflows''': They '''speed up prototyping''' and let you explore different creative directions for drafts.
 
When you want to work with chatbots, you effectively have to learn [[wp:prompt_engineering|prompt engineering]]: You basically learn how to write ''good prompts''.


For image editing and post-processing, dedicated graphics software such as [https://www.gimp.org/downloads/ GIMP], [https://krita.org Krita], or [https://www.adobe.com/products/photoshop.html Photoshop] is recommended. These tools allow precise control (e.g., masking, compositing, color correction) and can complement GenAI workflows.
=====General notes on image upgrading=====
For upgrading a very low quality yet important image you probably want to '''upgrade specific elements first''' so details are not hallucinated to an unacceptable level.


'''Beginner workflow'''
text prompt + low quality image + in advance updated elements used as references in text prompt = higher quality image


Mini tutorial based on ChatGPT:
=====ChatGPT=====
* '''Iterative prompting''': Describe the desired result as clearly as possible: Motive, perspective, colors, lights, shadows, art style. Refine the prompt step by step based on undesired aspects rather than expecting a perfect result on the first attempt. Negative prompts: You can also explicitly write what you don't want.  
* '''Iterative prompting''': Describe the desired result as clearly as possible: Motive, perspective, colors, lights, shadows, art style. Refine the prompt step by step based on undesired aspects rather than expecting a perfect result on the first attempt. Negative prompts: You can also explicitly write what you don't want.
** Caution: As for March 2026, you have explicitly tell ChateGPT to not crop the image in some scenarios. In that case you have to give it context like where the image should end and tell it how much "extra" space you want. Saying "don't crop the image" is to not specific enough.  
* '''Avoid quality loss''': If the GenAI degenerates the image quality because of too many iterations, try from a new start with combined text prompts.
* '''Avoid quality loss''': If the GenAI degenerates the image quality because of too many iterations, try from a new start with combined text prompts.
* '''Reference images''': When supported, provide one or more images to guide style, composition, or subject consistency. This is often more reliable than text-only prompting.
* '''Reference images''': When supported, provide one or more images to guide style, composition, or subject consistency. This is often more reliable than text-only prompting.
Line 85: Line 141:
* '''Consistency strategies''': When available, use features such as seeds, style references, or controlled variations to maintain visual coherence across multiple images.-->
* '''Consistency strategies''': When available, use features such as seeds, style references, or controlled variations to maintain visual coherence across multiple images.-->


Generate, refine via text prompts, select final candidate, refine via graphic tools.
(Add examples here.)
 
=====Copilot=====
 
=====Gemini=====


In context of its limitations, this workflow is still great for rapid prototyping and exploring different creative directions for drafts.
=====Paint (Cocreator)=====
With '''Windows 11 and 40 TOPS minimum''' you can use '''Microsoft Paint with its Cocreator module'''. The Cocreator is sometimes also named Image Creator. (The Tera Operations Per Second is usually referring to INT8 operations on AI accelerator hardware, NPUs.) Windows PCs that have the naming tag '''Copilot+''' are safe to assume to have that feature.
* You write a prompt, optionally select a style and then draw a draft that gets updated almost in real-time in a secondary panel.


(Add examples here.)
=====Photoshop (Adobe Firefly)=====
<!--
Photoshop has a build-in image generator which can use Adobe's own Firefly model as well as other ones.
Specialized: Canva-->


'''Advanced workflow'''
In Photoshop you can prompt images and immediately start editing them. Or you chose expand them first or do partial replacements.


AUTOMATIC1111 (aka Stable diffusion), ComfyUI
====Advanced workflows====
AUTOMATIC1111 (aka Stable diffusion)


'''Expert workflow'''
ComfyUI


This would include to train own models. The idea is to let the models have an mental image of objects that equals screenshot-taking from 3D so that prompts will output images that almost never include hallucinated details.
====Expert workflows====
This would include to train own models. The idea is to let the models have a neural representation of objects that equals to screenshot-taking from 3D so that prompts will output almost never hallucinate details. That way artists can reduce post-editing as the generated outputs also include there own styles.


Own models are interesting for rapid prototyping because they reduce the necessity to have a more complexer, combined 2D-3D-workflow.
Own models once more boost rapid prototyping because they reduce the necessity to have a more complexer, combined 3D-2D-workflow.


===Videos===
==Videos==
'''Google Veo'''
===Google Veo===


'''Grok''' (xAI)
===Grok (xAI)===
   
   
'''Sora''' (OpenAI)
===Sora (OpenAI)===
* In 2026, Sora was announced to be discontinued. It will be probably just paused for a few years until future AI chips have lowered computation costs.
* In 2026, Sora was announced to be discontinued. It will be probably just paused for a few years until future AI chips have lowered computation costs.
==3D content==
There exists content generators that turn 2D data into 3D data by calculating plausible assumptions for the missing dimension.


===3D objects===
===3D objects===
* ...
* ...
===3D animations===
===3D animations===
* ...
* ...
===World generators===
 
* World generator inside Unreal Engine 5 (As for 2026, this is technically speaking still "procedural" but it is plausible to expect an LLM-driven approach in the future. On the internet you can find already LLM-driven experiments.)
==World maps==
 
* ...
* ...
* World generator inside Unreal Engine 5
** As for 2026, this is technically speaking still "procedural" but it is plausible to expect an LLM-driven approach in the future. On the internet you can find already LLM-driven experiments.

Latest revision as of 14:12, 28 March 2026

Sub pages:


Copyright and fair use

To understand the full picture of copyright, it is necessary to look at its real-world implementation.

Game companies have a strong interest in not upsetting their fan base, especially organized structures such as gaming communities.

Game communities function as voluntary support structures that provide services companies would otherwise need to fund themselves. Many of these contributions can be understood as direct or indirect sales promotion. These communities typically offer:

  • First contact and general help for newbies.
  • The creation of guides, tips and tricks, and even complete playthroughs.
  • A space for like-minded people to socialize and exchange and develop ideas.
    • Such spaces were forums in the past, nowadays they are real-time communication platform like a Discord channel which is thematically decorated and helps to keep the product in general awareness over long time frames.
    • Communities pose a potential source for researching future products.
  • Support in organizing competitions and other promotional events, including creative activities such as fan art. Cosplay contributions can increase a company's visibility at events like Gamescom.
  • An additional channel for updates (general information and content announcements).
  • Bug reporting and, in some cases, bug fixing. In rare instances, community members may even contribute to maintaining source code.
  • Mods that improve replay value and thus increase overall customer satisfaction.
  • Increased likelihood that fans will purchase other games and products (merchandise) from the company.
  • A pool of trusted and engaged players who can be recruited as beta testers for new releases.

As a result, most companies also employ community managers. At least for active cash cows.

In practice, companies often tolerate limited uses of their intellectual property because they benefit from these activities. A strictly enforced copyright regime could suppress creative community contributions, reduce engagement, and ultimately harm the company itself.

However, this does not mean that copyright is overridden. In some cases, fan works may fall under fair use (depending on jurisdiction), but more often they exist within a space of informal tolerance or explicit licensing policies.

Game modifications (mods) often remain short of becoming independent games:

  • Typically, they add optional 2D and 3D content, sometimes in large quantities. However, more fundamental changes - such as new game mechanics - often require access to or modification of the game engine and are therefore limited.

Generative AI (GenAI) has the potential to disrupt this cost–benefit balance:

  • An automated large-scale production of new content based on existing assets may conflict with the company’s interests.

Considering the artists and technicians involved in creating the original work, mass-produced fan content generated at little or no cost could threaten established creative professions.

  • An overabundance of derivative content may dilute attention and reduce consumer motivation to engage with official products.

The goal should be a form of symbiotic coexistence:

As of 2026, conflicts arising from GenAI-driven mods remain largely hypothetical, but their relevance is likely to increase. In the long term, cooperative development models between companies and communities are conceivable, though this remains uncharted territory.

Possibilities and limitations

Generative artificial intelligence (GenAI) can ease and accelerate content creation. The difficulty of using or creating your own setups will continue to decrease with each newly released commercial forerunner model.

You can use GenAI via websites, desktop clients, or dedicated programs that may run fully locally.

To build your own programmatic solutions, you will either need downloadable AI models or API keys to access cloud services that perform the heavy computation remotely.

With sufficient expertise, you can even build agentic AIs such as Open Claw that use your existing tools and carry out tasks automatically. However, caution is advised: Probabilistic AIs can hallucinate and may pose a risk to your system. Internally, MCP are used for more safety but these don't compensate to have backup means. Sandboxes offer an additional layer of safety, but they can limit the usefulness of agents and introduce extra complexity, which may offset the time savings you intended to achieve.

GenAI systems operate probabilistically. Do not expect identical results when repeating prompts with the same inputs. The same text prompts may produce similar, but not identical, outputs. Therefore, in some scenarios, it can be beneficial to generate multiple results and select the most suitable candidates for your intermediate or final goal.

Sounds

For natural voices you may want to look for emotional text-to-speech.

Voice cloners and generators

Elevenlabs

Music generators

Suno

Lyria (Google Gemini)

Editors

For fine-tuning on professional level you might want look at editors like Melodyn.

Images

Techniques

Creating new content.

  • Text-only prompts
  • Text prompts with one or multiple references
    • As default, the first image serves as main canvas that gets modified by the other references. The order of references can be overridden by pieces in the text prompt. You have to be careful in describing what reference is used for what if there are multiple.

Changing existing content

  • Expanding
  • Inpainting (replacement of subsections)
  • Style transfers

Select best tool per use-case

Overview

Right now, Gemini seems to perform best in most use-cases.

(Add table.)

Drafting

Upscaling

Very often you will prefer a generated style of one tool over the style of other tools. However, that shouldn't stop you from trying out other tools for other tasks. In the end a combination of generators can get you closer to the result you had in mind.

For instance, if you like ChatGPT for the style then maybe you still want to scale up a draft or reference image in Gemini first.

Gemini
  • Free Users: Generally capped at 1K resolution.
  • AI Plus/Pro Subscribers: Can access 2K resolution.
  • AI Ultra Subscribers: Have full access to the 4K resolution toggle and downloads.

Fine-drawing

Generating

Coloring

Shading

Editing and fine-tuning

Workflows

Beginner workflows

Multimodal LLMs and plugins-using LLMs such as ChatGPT, Copilot, Gemini, Grok and Meta AI provide image generation capabilities.

  • In addition, there are specialized tools (e.g., diffusion-based systems) that offer more control and customization. However, most users already have access to at least one of ChatBot and can begin generating images immediately.
  • For high-volume generation, a paid subscription or plan is typically required.

For image editing and post-processing, dedicated graphics software such as GIMP, Krita, or Photoshop is recommended. These tools allow precise control (e.g., masking, compositing, color correction) and can complement GenAI workflows.

In general you will always want to takes these steps: Generate, refine via text prompts, select final candidates, refine via graphic tools.

In context of its limitations, chatbot-based workflows are most often nonetheless a big improvement over pure manual workflows: They speed up prototyping and let you explore different creative directions for drafts.

When you want to work with chatbots, you effectively have to learn prompt engineering: You basically learn how to write good prompts.

General notes on image upgrading

For upgrading a very low quality yet important image you probably want to upgrade specific elements first so details are not hallucinated to an unacceptable level.

text prompt + low quality image + in advance updated elements used as references in text prompt = higher quality image
ChatGPT
  • Iterative prompting: Describe the desired result as clearly as possible: Motive, perspective, colors, lights, shadows, art style. Refine the prompt step by step based on undesired aspects rather than expecting a perfect result on the first attempt. Negative prompts: You can also explicitly write what you don't want.
    • Caution: As for March 2026, you have explicitly tell ChateGPT to not crop the image in some scenarios. In that case you have to give it context like where the image should end and tell it how much "extra" space you want. Saying "don't crop the image" is to not specific enough.
  • Avoid quality loss: If the GenAI degenerates the image quality because of too many iterations, try from a new start with combined text prompts.
  • Reference images: When supported, provide one or more images to guide style, composition, or subject consistency. This is often more reliable than text-only prompting.
  • Context management: If previous prompts begin to overly influence results, start a new prompt and explicitly restate the desired outcome. This prevents unintended bias from earlier context.
  • Merging / composition: Supplying multiple images in a single prompt can help combine elements. However, repeated re-editing of generated outputs may degrade detail or introduce artifacts.
  • Batch generation: Since outputs are probabilistic, generate multiple final candidates and select the best.
  • Post-processing workflow: Combine the best elements using external tools (e.g., masking in Photoshop or Krita). This hybrid approach often yields higher-quality results than relying on a single generation.

(Add examples here.)

Copilot
Gemini
Paint (Cocreator)

With Windows 11 and 40 TOPS minimum you can use Microsoft Paint with its Cocreator module. The Cocreator is sometimes also named Image Creator. (The Tera Operations Per Second is usually referring to INT8 operations on AI accelerator hardware, NPUs.) Windows PCs that have the naming tag Copilot+ are safe to assume to have that feature.

  • You write a prompt, optionally select a style and then draw a draft that gets updated almost in real-time in a secondary panel.
Photoshop (Adobe Firefly)

Photoshop has a build-in image generator which can use Adobe's own Firefly model as well as other ones.

In Photoshop you can prompt images and immediately start editing them. Or you chose expand them first or do partial replacements.

Advanced workflows

AUTOMATIC1111 (aka Stable diffusion)

ComfyUI

Expert workflows

This would include to train own models. The idea is to let the models have a neural representation of objects that equals to screenshot-taking from 3D so that prompts will output almost never hallucinate details. That way artists can reduce post-editing as the generated outputs also include there own styles.

Own models once more boost rapid prototyping because they reduce the necessity to have a more complexer, combined 3D-2D-workflow.

Videos

Google Veo

Grok (xAI)

Sora (OpenAI)

  • In 2026, Sora was announced to be discontinued. It will be probably just paused for a few years until future AI chips have lowered computation costs.

3D content

There exists content generators that turn 2D data into 3D data by calculating plausible assumptions for the missing dimension.

3D objects

  • ...

3D animations

  • ...

World maps

  • ...
  • World generator inside Unreal Engine 5
    • As for 2026, this is technically speaking still "procedural" but it is plausible to expect an LLM-driven approach in the future. On the internet you can find already LLM-driven experiments.