The most popular and comprehensive Open Source ECM platform
Computers are mastering how to interpret images and audio. Increasingly AI models are incorporating not just a single sensory set of data, like for vision, sound, and touch, but as an aggregate of information from multiple sources, something being called multi-modal AI, and the result is a smarter AI algorithm.
Multi-modal AI is the ability for AI to process and draw relationships between different types of data. One example is the DALL-E model by OpenAI that can generate images based on textual input. By combining the inputs from language, the algorithm can choose visual objects and render them as an original image.
Mark Riedl at the Georgia Institute of Technology, said that “the more concepts that a system is able to sensibly blend together, the more likely the AI system both understands the semantics of the request and can demonstrate that understanding creatively.”
Jeff Dean, AI chief at Google, said that “that whole research thread, I think, has been quite fruitful in terms of actually yielding machine learning models that do more sophisticated NLP tasks than we used to be able to do. We’d still like to be able to do much more contextual kinds of models.”