Generative models have emerged as transformative tools in materials science, enabling the inverse design of novel materials with tailored properties by learning from vast datasets of structures and compositions. This review synthesizes recent advancements in generative approaches, including variational autoencoders, generative adversarial networks, diffusion models, and large language models. It highlights their conceptual capabilities for accelerating discovery while addressing scientific limits such as data scarcity, synthesizability, and interpretability. By examining applications in inorganic crystals, organic molecules, and energy materials, we delineate how these models bridge computational efficiency with experimental validation, yet face challenges in generalizability and physical fidelity. Future directions emphasize hybrid physics-informed architectures and closed-loop automation to overcome current barriers and unlock sustainable materials innovation.
The integration of artificial intelligence (AI) and machine learning (ML) in materials science has revolutionized traditional approaches to material discovery, design, and application. This narrative review explores how AI models not only predict material properties but also influence scientific decision-making by providing actionable insights, optimizing experimental strategies, and enabling inverse design paradigms. Drawing on recent advancements, we examine the transition from data-driven prediction to AI-assisted decision-making, highlighting case studies in porous materials, optoelectronics, and polymeric membranes. The review addresses challenges such as data scarcity, model interpretability, and integration with experimental workflows, while proposing future directions for AI to enhance human decision-making in materials research. Ultimately, AI is positioned as a collaborative tool that augments scientific intuition, accelerating innovation in sustainable and high-performance materials.
Machine learning (ML) has become a central driver of modern materials discovery, fundamentally reshaping how materials are designed, screened, and experimentally realized. This review examines recent advances in ML-accelerated materials discovery and emphasizes the ongoing progress in material representation and descriptor development toward fully autonomous experimental platforms. We discuss how increasingly sophisticated descriptors—ranging from composition-based features and structure-aware representations to ab initio–derived and learned embeddings—have improved predictive accuracy, data efficiency, and physical interpretability across diverse materials systems. Based on these findings, we discuss the evolution of ML frameworks for property prediction, classification, and inverse design, with particular attention to uncertainty-aware modeling, multiobjective optimization, and explainable learning strategies that bridge predictive performance with scientific insight. The study also highlights the growing role of active learning and generative models in efficiently navigating vast chemical and structural spaces, enabling data-efficient exploration and hypothesis-driven discovery. At the frontier of these developments, autonomous experimental systems integrate ML with robotics to form closed-loop workflows that iteratively design, execute, and refine experiments with minimal human intervention. Applications spanning perovskites, alloys, energy materials, and nanostructures illustrate the broad impact of these approaches in overcoming traditional trial-and-error limitations. Finally, we discuss persistent challenges associated with data scarcity, extrapolation, interpretability, and system integration, and outline future directions toward more robust, scalable, and sustainable autonomous materials discovery. Collectively, these advances represent a paradigm shift from passive data-driven prediction to intelligent, self-guided materials innovation.
The integration of artificial intelligence (AI) and machine learning (ML) into materials science has fundamentally transformed how material properties are predicted, analyzed, and understood. While early data-driven approaches emphasized predictive accuracy and high-throughput screening, recent advances are increasingly focusing on interpretability and explainability, enabling AI models to contribute to mechanistic scientific insight rather than functioning as opaque black boxes. This study examines the evolution of interpretable AI in materials science and highlights the transition from property prediction to explanation-driven understanding of structure–property relationships. In this thesis, we investigate the progress in machine learning frameworks that operate with limited or implicit structural information, alongside the growing use of explainable AI (XAI) techniques to uncover physically meaningful descriptors, atomic-scale interactions, and microstructural drivers of material behavior. Methods such as graph-based learning, attention mechanisms, feature attribution, and uncertainty-aware modeling are discussed for their ability to improve model reliability, expose data bias, and guide hypothesis generation. Representative applications across alloys, perovskites, organic semiconductors, and ferroelectric materials demonstrate how interpretable models have revealed governing mechanisms spanning atomic, mesoscopic, and macroscopic length scales. Beyond individual case studies, this study examines persistent challenges in interpretable materials AI, including data quality, generalizability, explanation stability, and computational overhead. We argue that interpretability is not merely an auxiliary feature but a prerequisite for trustworthy and scientifically helpful AI in materials research. By synthesizing recent methodological and application-driven advances, this review positions interpretable AI as a critical enabler of mechanism-oriented discovery, experimental validation, and theory development, ultimately advancing AI from a predictive accelerator to an integral partner in scientific understanding.
The integration of artificial intelligence (AI) and machine learning (ML) into materials science has accelerated the discovery and design of novel materials by enabling high-throughput prediction of properties from composition, structure, and processing parameters. However, the reliability of these predictions is frequently compromised by uncertainties stemming from limited datasets, model approximations, experimental noise, and intrinsic variability in materials systems. This narrative review synthesizes recent advances in understanding uncertainty and reliability in materials AI. It covers fundamental concepts such as aleatoric and epistemic uncertainty; methods for quantification, including Bayesian neural networks, ensembles, and Gaussian processes; inconsistencies in terminology and language across the literature; and the downstream consequences for decision-making in materials engineering, design, and deployment. Emphasis is placed on calibration of uncertainty estimates, domain-of-applicability assessment, and risk-aware applications in safety-critical contexts such as structural alloys and energy materials. By highlighting best practices and gaps, the review advocates for standardized frameworks to build trust and facilitate industrial translation of materials AI. Key challenges include data scarcity in high-performance materials and the need for physics-informed UQ to mitigate overconfidence in extrapolative predictions. This synthesis underscores the importance of robust uncertainty handling for responsible AI deployment in materials innovation.
Materials informatics has emerged as a central paradigm in contemporary materials science, leveraging machine learning and data-driven modeling to accelerate materials discovery, optimization, and deployment. Despite substantial advances in predictive accuracy, most existing approaches remain fundamentally correlational, limiting their reliability under distribution shifts, experimental interventions, and real-world deployment scenarios. This reliance on correlation constrains scientific interpretability and undermines the capacity of AI systems to function as genuine instruments of materials reasoning. Causality offers a principled framework for overcoming these limitations by explicitly modeling cause-and-effect relationships among composition, processing, structure, and properties. This narrative review synthesizes conceptual progress in integrating causal inference into materials informatics, examining foundational causal frameworks, advances in causal discovery, and hybrid causal–machine learning approaches, and emerging applications across materials domains such as nanocatalysis, ferroelectrics, and electrochemical energy storage. We critically analyze persistent challenges—including data scarcity, assumption violations, limited external validity, and computational and epistemic constraints—that currently hinder widespread adoption. Drawing exclusively on peer-reviewed literature published, the review emphasizes thematic and epistemic developments rather than algorithmic prescriptions. We argue that causality represents a structural shift in how AI systems contribute to materials science: from correlational predictors to intervention-aware, mechanism-aligned reasoning tools. By articulating future directions centered on hybrid modeling, domain-knowledge integration, and interdisciplinary collaboration, this review positions causality as a necessary foundation for robust, generalizable, and scientifically legitimate materials informatics.
The integration of artificial intelligence (AI) and machine learning (ML) into materials science, often referred to as materials informatics or materials AI, has accelerated the discovery, design, and optimization of advanced materials. However, materials science frequently operates in small-data and sparse-regime conditions, where datasets are limited in size (often tens to hundreds of samples), high-dimensional, imbalanced, or sparsely populated due to the high cost, time, and complexity of experimental measurements and high-fidelity simulations. This narrative review synthesizes recent advances in methods tailored to these constraints, categorizing approaches at the data-source level (e.g., literature extraction, database construction, high-throughput workflows), algorithmic level (e.g., support vector machines, Gaussian process regression, ensemble models, imbalanced learning techniques), and strategic level (e.g., active learning, transfer learning). Key assumptions underlying these methods are examined, including similarity between source and target domains for transfer learning, representativeness of initial samples and reliable uncertainty quantification in active learning, and the validity of physical priors or inductive biases in physics-informed approaches. The review also addresses inherent limits, such as risks of overfitting, poor generalization beyond the training distribution, sensitivity to data quality and noise, challenges in uncertainty calibration, and dependence on domain expertise. By highlighting successful applications in property prediction, alloy design, and perovskite optimization, this work elucidates the current capabilities and boundaries of small-data and sparse-regime learning in materials AI, guiding researchers navigating data-limited environments.
The integration of physical principles into machine learning (ML) frameworks has emerged as a transformative approach in materials science, addressing the limitations of purely data-driven models by incorporating domain knowledge to enhance predictive accuracy, generalizability, and interpretability. This narrative review explores the conceptual taxonomies of physics-integrated ML methods, their applications in materials discovery and design, and the associated challenges in data bias and ethical considerations. Drawing on recent peer-reviewed literature, we classify physics-integration strategies such as physics-informed neural networks (PINNs), hybrid models combining ML with physical simulations, and constraint-based learning, and highlight their roles in solving complex problems such as material property prediction, microstructure analysis, and phase stability. We also examine how data biases in training datasets can propagate errors and inequities in model outputs, and discuss the ethical values underpinning the use of AI in scientific research, including transparency, accountability, and societal impact. The review underscores the potential of these methods to accelerate innovation in materials science while emphasizing the need for rigorous validation and interdisciplinary collaboration. By synthesizing current advancements, this article aims to provide a foundational understanding for researchers and practitioners, paving the way for future developments in this interdisciplinary field.
This review systematically examines the treatment of absence and null results in the materials machine learning literature spanning 2017–2022, drawing exclusively on a curated set of 30 peer-reviewed publications and foundational works that address publication bias, negative findings, and reproducibility challenges in data-driven materials discovery. Through a targeted search strategy across databases such as Web of Science, Scopus, and arXiv using terms including “null result,” “negative result,” “publication bias,” “file drawer,” “failed synthesis,” and “reproducibility” combined with materials informatics keywords, the analysis reveals a persistent imbalance: while successful predictions and syntheses dominate published outputs, systematic documentation of failed predictions, unsuccessful syntheses, null correlations, and abandoned model architectures remains exceedingly rare. What is currently reported tends to be limited to negative outcomes that coincidentally reveal mechanistic insights or contradict high-profile hypotheses, whereas what is systematically unreported encompasses the vast majority of unsuccessful hyperparameter searches, negative active learning campaigns, and non-discoveries that yield no novel materials meeting target criteria. The typology of absence and null results developed here identifies six distinct categories—negative predictive outcomes, null hypothesis non-rejection, failed synthesis, non-discovery, failed replication, and abandoned architecture—each carrying unique implications for scientific progress. The consequences of this non-reporting include severe overestimation of model performance, widespread redundant experimental effort, a false sense of methodological consensus across the field, and slowed overall discovery rates as potentially informative negative signals remain invisible. Ultimately, this review offers concrete recommendations for authors, journals, and the broader community to shift incentives toward transparent reporting of absence, thereby restoring balance to the materials AI literature and accelerating reliable data-driven discovery.
This review systematically examines the handling—or more often the neglect—of domain shift within the materials machine learning literature published between 2017 and 2023, drawing on a targeted search of peer-reviewed publications across specialized databases and journals to compile and analyze exactly 30 representative studies that span foundational overviews, application-focused works, and methodological explorations. Domain shift in materials science takes four distinct yet interrelated forms—temporal, compositional, experimental, and theoretical—each arising from the inherently heterogeneous nature of materials data sources that range from evolving laboratory protocols and diverse chemical families to inter-laboratory variations and discrepancies between computational approximations and experimental realities. Current practices reveal that explicit acknowledgment of domain shift remains rare, with the majority of papers proceeding under the default assumption of identical training and test distributions. At the same time, detection methods and adaptation strategies appear in fewer than one in five studies, leaving models vulnerable to silent degradation when deployed on real-world materials problems. The surveyed methods for handling domain shift include statistical detection techniques, domain-adversarial training frameworks, feature-alignment approaches, and shift-robust evaluation protocols, many of which have been proposed in adjacent machine-learning fields yet remain underutilized in materials contexts despite their direct relevance to property prediction and inverse design tasks. Collectively, these findings underscore the urgent need for standardized shift-reporting protocols, the development of materials-specific out-of-distribution benchmarks, and the integration of domain-adaptation pipelines into routine workflows, thereby elevating the reliability, generalizability, and practical utility of machine-learning models in accelerating materials discovery.
The field of computational and data-driven materials engineering has undergone rapid evolution, driven by advancements in high-throughput computational screening, machine learning algorithms, and integrated workflows that accelerate materials discovery. This review synthesizes recent developments in materials informatics, focusing on platforms that enable efficient exploration of vast chemical spaces through automated computations and data analytics. Key areas include the application of graph neural networks and representation learning for property prediction, active learning strategies to optimize experimental feedback loops, and the integration of multimodal datasets for enhanced model accuracy. High-throughput methods have facilitated discoveries in diverse domains, such as superconductors, battery materials, and high-entropy alloys, by combining density functional theory simulations with machine learning surrogates. Autonomous laboratories and closed-loop systems represent a paradigm shift, allowing self-driving experiments that minimize human intervention while maximizing discovery efficiency. Uncertainty quantification plays a critical role in guiding these processes, ensuring reliable predictions amid sparse data. This narrative review structures the landscape into computational ecosystems, workflow integrations, and discovery outcomes, highlighting cross-study synergies. It positions the field at the cusp of scalable, inverse design paradigms, where data-driven insights bridge simulation and experimentation to address grand challenges in materials science.
The field of computational and data-driven materials engineering has transformed from traditional high-throughput simulations to sophisticated ecosystems integrating machine learning with multimodal datasets for accelerated discovery. This review synthesizes recent advancements in materials informatics, emphasizing the role of graph neural networks and deep learning in processing complex structural and property data. We examine multimodal datasets that combine experimental, computational, and textual modalities, enabling robust representation learning and uncertainty quantification. Integration frameworks are discussed, including active learning loops and multi-fidelity models that bridge simulation and experiment, addressing challenges like data sparsity and distribution shifts. The discovery potential is highlighted through applications in property prediction, inverse design, and autonomous systems, such as identifying stable alloys and energy materials. By providing an original synthesis of these elements, this article underscores the shift toward closed-loop workflows that enhance generalizability and interpretability, while identifying gaps in handling finite-temperature stability and disordered systems. Ultimately, these approaches promise to expand the known materials space by orders of magnitude, fostering innovations in sustainable technologies.
The advent of data-driven approaches has revolutionized materials engineering, enabling inverse design strategies that prioritize target properties to guide material synthesis and optimization. This review synthesizes recent advancements in machine learning architectures tailored for materials informatics, including graph neural networks and representation learning frameworks that capture atomic-scale interactions and multiscale phenomena. We examine the integration of high-throughput computations with experimental workflows, highlighting closed-loop systems that incorporate active learning and uncertainty quantification to accelerate discovery. Key application domains span energy materials, metamaterials, and catalytic systems, where multimodal datasets facilitate simulation-experiment synergies. By analyzing computational ecosystems, we underscore the shift from forward modeling to inverse paradigms, emphasizing autonomous laboratories that iteratively refine hypotheses through data feedback loops. Challenges in generalizability and data scarcity are contextualized within broader systems integration, offering a cohesive perspective on how these tools reshape materials design. This narrative integrates cross-study insights to propose unified frameworks for scalable, data-centric engineering, bridging theoretical models with practical implementations in computational materials science.
The rapid evolution of computational and data-driven materials engineering has ushered in an era where self-driving laboratories (SDLs) promise to transform materials discovery by integrating automation, machine learning, and high-throughput experimentation into cohesive governance architectures. These architectures orchestrate the interplay between data generation, model training, and decision-making processes to enable closed-loop optimization in materials design. This review synthesizes recent advancements in SDL governance, focusing on how computational workflows—encompassing materials informatics, graph neural networks, representation learning, and uncertainty quantification—facilitate autonomous systems in addressing complex materials challenges. We examine the foundational elements of data-driven ecosystems, including multimodal datasets and simulation-experiment integration, and explore active learning strategies that balance exploration and exploitation in inverse design paradigms. Key governance components, such as orchestration platforms like ChemOS 2.0 and Bayesian active learning frameworks, are analyzed for their role in accelerating discovery cycles. By integrating perspectives from high-impact studies, we highlight how these architectures mitigate inefficiencies in traditional trial-and-error approaches, enabling scalable, reproducible materials innovation. The review positions SDL governance as a critical infrastructure for future materials engineering, emphasizing systems-level integration over isolated techniques. Ultimately, it underscores the potential of these architectures to democratize access to advanced materials development while identifying pathways for enhanced interoperability and robustness in computational ecosystems.