The basic idea behind machine learning is that an artificial system learns from patterns and relations in data. Thus, machine learning cannot be utilized without an adequate data basis. In this context, the term “adequate data basis” refers to the quantity, quality, relevance, and diversity of data.
However, the high volume of data in production is often a challenge, as the data must be generated in the production environment and annotated extensively. A further problem is that the data rarely covers borderline cases. This can result in the trained model making incorrect decisions when used. Learning on the real system, which would be necessary with reinforcement learning approaches (i.e., the principle of learning by trial and error), is generally not feasible due to the high cost, time, and maintenance involved.
Therefore, the key question is: How can a machine learning model still be trained effectively if only a small amount of real-world data is available? This is where data-efficient AI comes into play. Possible solutions are highly diverse and range from the use of simulation environments and digital twins to data-saving learning methods and approaches for integrating existing knowledge. The work on data-efficient AI primarily concentrates on so-called physics-informed machine learning and data-efficient reinforcement learning.