How much data is needed for materials informatics?
The predictive accuracy and efficiency of materials informatics is governed by the amount of relevant data available. Relevance means that the selected variables and data ranges collected therein are in fact significant predictors of the desired outcome.
Once the right variables are selected, it is very important to collect a variety of values per each variable. Materials data is multidimensional in nature, combining formulations, processes and protocols. If all the data points are similar in terms of data ranges and variability, then lots of data points do not improve the predictability that much.
However, if the data is diverse enough, then few data points can provide superior predictability. This is the reason why even “failed experiments” need to be recorded. Also, you never know, one person’s “failure” might be exactly what another person is looking for.
MaterialsZone - the materials informatics platform - has successfully showcased its proficiency in materials data management and data analysis for determining the influential variables. Want to learn more about materials data management?👉 click here.
MaterialsZone also rapidly points to data gaps and guide next experiments. This serves to reduce the tedious trial-and-error experimental habit, and rapidly provides a thorough understanding and predictability of the domain. Want to learn more about design of experiments?👉 click here.