The most popular and comprehensive Open Source ECM platform
Synthetic Data and Digital Twins: A Synergistic Cycle of Continuous Improvement
The current generation of machine learning and AI algorithms require training with data. Lots of it. The algorithms scan massive amounts of data and are able to identify recurring patterns. After being trained, when the AI algorithm encounters similar data, it can recognize known patterns, often better than a human, and respond appropriately.
ML and AI work great when there is lots of data, but what about for situations where only a sparse amount of data are available? Without sufficient data, the AI can’t be trained to a point where it will be able to correctly respond.
When there isn’t sufficient data available, researchers can generate simulated data using a ‘digital twin’. A digital twin is a software simulation of a process, system, or object. Digital twin software might be designed using computer aided design, finite element analysis, physics engines, statistical and probabilistic techniques, and other software modeling tools to create the simulation. Then, many simulations are made using different assumptions and environmental conditions, and the output data from the digital twin simulation is observed and collected. In this way, simulated data can provide a large pool of training data for the AI/ML model.
George Brunner, VP of analytics and CTO at Acument Analytics, said that “a digital twin uses data from the physical entity to create the algorithm which then ‘models’ the physical entity. Once the digital twin AI algorithm is created it can then be used to generate synthetic data. Therefore, they can work in unison in an AI workflow cycle. Synthetic data can ‘prime the pump’ to create the initial digital twin. Data capture from the physical twin allows the digital twin to improve over time. Then the digital twin can be used to enhance the quality of Synthetic data in a cycle of continuous improvement.”
A report by CapGenmini found that “digital twins provide the ideal playground to test hypotheses, train and evaluate algorithms, test transparency, and generate synthetic data and events – exploring levels of “smart” that initially might even seem inapplicable to the real world.”
Vaibhav Nivargi, co-founder and chief technology officer of Moveworks, told the Wall Street Journal that “synthetic data becomes very important because we operate in a domain with limited data.” Gartner predicts that “by 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated.”
What are the examples of things that it can do?