Machine learning (ML) has become a central driver of modern materials discovery, fundamentally reshaping how materials are designed, screened, and experimentally realized. This review examines recent advances in ML-accelerated materials discovery and emphasizes the ongoing progress in material representation and descriptor development toward fully autonomous experimental platforms. We discuss how increasingly sophisticated descriptors—ranging from composition-based features and structure-aware representations to ab initio–derived and learned embeddings—have improved predictive accuracy, data efficiency, and physical interpretability across diverse materials systems. Based on these findings, we discuss the evolution of ML frameworks for property prediction, classification, and inverse design, with particular attention to uncertainty-aware modeling, multiobjective optimization, and explainable learning strategies that bridge predictive performance with scientific insight. The study also highlights the growing role of active learning and generative models in efficiently navigating vast chemical and structural spaces, enabling data-efficient exploration and hypothesis-driven discovery. At the frontier of these developments, autonomous experimental systems integrate ML with robotics to form closed-loop workflows that iteratively design, execute, and refine experiments with minimal human intervention. Applications spanning perovskites, alloys, energy materials, and nanostructures illustrate the broad impact of these approaches in overcoming traditional trial-and-error limitations. Finally, we discuss persistent challenges associated with data scarcity, extrapolation, interpretability, and system integration, and outline future directions toward more robust, scalable, and sustainable autonomous materials discovery. Collectively, these advances represent a paradigm shift from passive data-driven prediction to intelligent, self-guided materials innovation.
The integration of artificial intelligence (AI) and machine learning (ML) into materials science, often referred to as materials informatics or materials AI, has accelerated the discovery, design, and optimization of advanced materials. However, materials science frequently operates in small-data and sparse-regime conditions, where datasets are limited in size (often tens to hundreds of samples), high-dimensional, imbalanced, or sparsely populated due to the high cost, time, and complexity of experimental measurements and high-fidelity simulations. This narrative review synthesizes recent advances in methods tailored to these constraints, categorizing approaches at the data-source level (e.g., literature extraction, database construction, high-throughput workflows), algorithmic level (e.g., support vector machines, Gaussian process regression, ensemble models, imbalanced learning techniques), and strategic level (e.g., active learning, transfer learning). Key assumptions underlying these methods are examined, including similarity between source and target domains for transfer learning, representativeness of initial samples and reliable uncertainty quantification in active learning, and the validity of physical priors or inductive biases in physics-informed approaches. The review also addresses inherent limits, such as risks of overfitting, poor generalization beyond the training distribution, sensitivity to data quality and noise, challenges in uncertainty calibration, and dependence on domain expertise. By highlighting successful applications in property prediction, alloy design, and perovskite optimization, this work elucidates the current capabilities and boundaries of small-data and sparse-regime learning in materials AI, guiding researchers navigating data-limited environments.
The field of computational and data-driven materials engineering has undergone rapid evolution, driven by advancements in high-throughput computational screening, machine learning algorithms, and integrated workflows that accelerate materials discovery. This review synthesizes recent developments in materials informatics, focusing on platforms that enable efficient exploration of vast chemical spaces through automated computations and data analytics. Key areas include the application of graph neural networks and representation learning for property prediction, active learning strategies to optimize experimental feedback loops, and the integration of multimodal datasets for enhanced model accuracy. High-throughput methods have facilitated discoveries in diverse domains, such as superconductors, battery materials, and high-entropy alloys, by combining density functional theory simulations with machine learning surrogates. Autonomous laboratories and closed-loop systems represent a paradigm shift, allowing self-driving experiments that minimize human intervention while maximizing discovery efficiency. Uncertainty quantification plays a critical role in guiding these processes, ensuring reliable predictions amid sparse data. This narrative review structures the landscape into computational ecosystems, workflow integrations, and discovery outcomes, highlighting cross-study synergies. It positions the field at the cusp of scalable, inverse design paradigms, where data-driven insights bridge simulation and experimentation to address grand challenges in materials science.
The field of computational and data-driven materials engineering has transformed from traditional high-throughput simulations to sophisticated ecosystems integrating machine learning with multimodal datasets for accelerated discovery. This review synthesizes recent advancements in materials informatics, emphasizing the role of graph neural networks and deep learning in processing complex structural and property data. We examine multimodal datasets that combine experimental, computational, and textual modalities, enabling robust representation learning and uncertainty quantification. Integration frameworks are discussed, including active learning loops and multi-fidelity models that bridge simulation and experiment, addressing challenges like data sparsity and distribution shifts. The discovery potential is highlighted through applications in property prediction, inverse design, and autonomous systems, such as identifying stable alloys and energy materials. By providing an original synthesis of these elements, this article underscores the shift toward closed-loop workflows that enhance generalizability and interpretability, while identifying gaps in handling finite-temperature stability and disordered systems. Ultimately, these approaches promise to expand the known materials space by orders of magnitude, fostering innovations in sustainable technologies.
The rapid evolution of computational and data-driven materials engineering has ushered in an era where self-driving laboratories (SDLs) promise to transform materials discovery by integrating automation, machine learning, and high-throughput experimentation into cohesive governance architectures. These architectures orchestrate the interplay between data generation, model training, and decision-making processes to enable closed-loop optimization in materials design. This review synthesizes recent advancements in SDL governance, focusing on how computational workflows—encompassing materials informatics, graph neural networks, representation learning, and uncertainty quantification—facilitate autonomous systems in addressing complex materials challenges. We examine the foundational elements of data-driven ecosystems, including multimodal datasets and simulation-experiment integration, and explore active learning strategies that balance exploration and exploitation in inverse design paradigms. Key governance components, such as orchestration platforms like ChemOS 2.0 and Bayesian active learning frameworks, are analyzed for their role in accelerating discovery cycles. By integrating perspectives from high-impact studies, we highlight how these architectures mitigate inefficiencies in traditional trial-and-error approaches, enabling scalable, reproducible materials innovation. The review positions SDL governance as a critical infrastructure for future materials engineering, emphasizing systems-level integration over isolated techniques. Ultimately, it underscores the potential of these architectures to democratize access to advanced materials development while identifying pathways for enhanced interoperability and robustness in computational ecosystems.