Talk:Restless Souls/Technology: Difference between revisions

Talk:Restless Souls/Technology (view source)

Revision as of 12:59, 30 March 2026

937 bytes added , 30 March

m

no edit summary

Paradox-01

8,966

edits

@@ Line 576: / Line 576: @@
 :: Due to current shortcomings, specialized agentic AIs are also setup to work in groups. Multi-agent systems.
 * '''Physical AI''' = Physical Artificial Intelligence. Basically AI used in robots, including self-driving cars.
+:: Physical AI can be seen as an ''application'' of world models but there are other approaches like VLA in development. The first approach will be likely seen in more general purpose household robots (because of broader unpredictable situations) and the later approach might first dominate in industry robots because of more standardized situations.
 :: The general idea is: Like humans or other real organisms, AIs benefit from having an "inner world" to improve understanding and reasoning. The use of large language models (LLMs) is optional but can be a useful design choice to assist humans in directing such systems.
 :: Training from ''first-hand sensor input'' is obvious but real world actions can be dangerous and are - because realtime - ''slow'' in context of the computer age. Therefore, AIs are alternatively pre-trained in a simulation where the robot is represented by a digital twin. '''Real world training will be kept for fine-tuning.''' Modern physical AIs are in overall multimodal.
-:: Alternatively, physical AIs are trained from video. As this posses ''second-hand sensor input'', the reasoning capabilities are less potent.<!--// commenting out company specific information for reasons ... //
+:: Alternatively, physical AIs are trained from video. As this posses ''second-hand sensor input'' <!--3D reduced to a 2D stream, with no control and no interaction, and less rich than an native multimodal world model approach-->, the reasoning capabilities are less potent.<!--Like Sora. "Sora currently exhibits numerous limitations as a simulator. For example, it does not accurately model the physics of many basic interactions, like glass shattering." https://openai.com/index/video-generation-models-as-world-simulators/ -- "We find that across a range of current models (Sora, Runway, Pika, Lumiere, Stable Video Diffusion, and VideoPoet), physical understanding is severely limited" https://arxiv.org/html/2501.09038v1--><!--// commenting out company specific information for reasons ... //
 ::: In 2026 Sam Altman said OpenAI's next breakthrough is expected within two years, possibly meaning such hybrid approach which either gives him a something close to a world model, if not the real thing. The knowledge gained from Sora is rumored to flow into that. Google is working on a (''native'') "general purpose world model" instead. So, Genie 3 will - when it is released - probably perform better.-->
 :: Training with motion capture data is quickly done - and often falls short in generating stable locomotion as additional training would be needed.<!--Embodied AI-->