The Measurement Problem in Materials Informatics: When Observing Changes in the System

Ravi Kumar; Neha Sharma; Aniket Deshmukh; Arjun Nair; Meera Pillai

Ravi Kumar^*✉ , Neha Sharma , Aniket Deshmukh , Arjun Nair , Meera Pillai

109 Accesses

Abstract

In materials informatics, the act of measuring a material property is routinely treated as a neutral act of passive observation. Yet, every measurement consumes finite resources, physically alters the sample, or reshapes the space of future measurements through model-guided selection. This paper identifies a direct analog of the quantum measurement problem within data-driven materials discovery: observation is not merely informative but constitutively changes the system being observed by depleting experimental budgets, inducing material modifications, and biasing the very distribution of data that subsequent AI models will learn. The theoretical claim advanced here is that materials informatics harbors an intrinsic measurement problem in which AI-guided measurement actively constructs rather than neutrally samples the observable landscape, thereby rendering the resulting datasets and models path-dependent on the history of prior observations. Key concepts include resource depletion, selection feedback loops, and measurement-driven evolution, all of which distinguish classical materials measurement effects from quantum collapse while sharing the core epistemic feature of non-neutrality. The implications are far-reaching for AI-guided materials discovery: autonomous laboratories must treat measurement policies as interventions rather than recordings, active-learning algorithms must internalize the cost of altering the observable world, and dataset curation protocols must document measurement history as rigorously as they document final property values. By theorizing this measurement problem, the present analysis offers a conceptual framework that reframes experiment design, model training, and discovery workflows as inherently self-referential processes in which the observer and the observed co-evolve.

Explore related subjects

Discover the latest articles in related subjects:

Materials Informatics Data-Driven Materials Design Computational Materials Science Materials Modeling and Simulation Predictive Modeling of Material Properties High-Throughput Materials Screening Digital Materials Engineering Artificial Intelligence in Materials Science Machine Learning for Materials Discovery Materials Characterization and Analysis AI-Assisted Materials Synthesis Smart Materials Nanomaterials Advanced Functional Materials Sustainable Materials Development

Introduction

In materials informatics, the prevailing assumption is that measurement constitutes a passive window onto an independent reality: one selects a composition, synthesizes a sample, characterizes its properties, and records the outcome without thereby changing the underlying distribution of discoverable materials. Yet this assumption collapses under scrutiny. Every characterization technique requires time, energy, and material; many are destructive or consumptive; and when artificial intelligence enters the loop to recommend the next experiment, the choice of what to measure next is conditioned on what has already been measured. The act of observing, therefore, changes what is observed and, more critically, what remains observable. The present theoretical analysis formalizes this phenomenon as the measurement problem in materials informatics.

The problem manifests at three interrelated scales. First, physical measurement often irreversibly modifies the specimen itself, as when electron-beam exposure induces defect migration or when mechanical testing fractures a tensile bar. Second, the experimental budget is finite: beamtime at a synchrotron, furnace runs in a high-throughput laboratory, or the number of viable precursor batches cannot be replenished instantaneously, so each measurement reallocates future opportunity. Third, and most distinctive to the AI era, model-driven selection creates a feedback loop in which the current state of the surrogate model determines the next query point, thereby shaping the training distribution for all subsequent iterations. These three layers—consumptive alteration, resource reallocation, and algorithmic self-reference—render observation non-neutral.

Table 1 analytically decomposes the measurement problem into its core structural dimensions, clarifying how classical assumptions collapse under intervention-driven measurement dynamics.

Table 1. Structural decomposition of the measurement problem in materials informatics

Dimension	Classical assumption	Measurement problem reality	Mechanism	Effect
Observation	Passive recording	Active intervention	All three mechanisms	System co-evolution
Material state	Stable under measurement	Altered by measurement	Measurement-driven evolution	Property drift
Resource availability	Exogenous constraint	Endogenously modified	Resource depletion	Reduced future search space
Data distribution	Representative sampling	Policy-shaped construction	Selection feedback	Biased datasets
Experimental path	Order-independent	Path-dependent	Resource + feedback coupling	Irreversible trajectories
Model learning	Learns material physics	Learns physics + measurement process	Selection feedback	Embedded measurement signature
Uncertainty	Epistemic only	Epistemic + intervention-induced	All mechanisms	Meta-uncertainty

Classical materials science has long acknowledged destructive testing and sample-to-sample variability, yet the informatics paradigm has largely inherited the physicist’s ideal of repeatable, observer-independent measurement. The quantum measurement problem [1-3] provided an early warning that observation can collapse possibility into actuality, but its classical counterpart in materials discovery has remained undertheorised. Recent advances in autonomous laboratories and Bayesian optimization [4-6] have made the feedback character of measurement impossible to ignore: the very data that train the next recommendation are themselves the product of prior recommendations. This paper, therefore, theorizes the measurement problem for materials AI by tracing its conceptual ancestry, specifying its materials-specific mechanisms, stating a formal claim, deriving corollaries, and outlining practical implications. In doing so, it shifts the epistemic stance from “measurement reveals the system” to “measurement and system co-construct each other.” [7, 8].

The Measurement Problem in Physics and Beyond

The quantum measurement problem, canonically articulated by Heisenberg [1], von Neumann [2], and Bohr [3], centers on the non-commutativity of observation and state: the act of measuring one observable renders conjugate observables indeterminate. The measuring apparatus entangles with the system, and the subsequent decoherence or collapse selects one outcome from the superposition. While materials informatics operates in a classical regime, the epistemic structure is analogous: the choice of measurement protocol selects which aspects of the material’s configuration space will be rendered visible while simultaneously rendering others inaccessible.

Beyond quantum mechanics, measurement problems appear in statistical inference, where the act of sampling alters the population parameters available for future samples, and in social science, where the presence of an observer changes the behavior being observed (the Hawthorne effect). In each case, the defining feature is non-neutrality: observation is not a transparent window but a transformative interface. In data-driven science more generally, the epistemology of measurement [9-21] has begun to recognize that instruments, protocols, and human choices embed assumptions that become reified in the resulting dataset. Parker [22] emphasizes that computer simulation and measurement are not independent; each measurement assimilates prior model assumptions, and each model update assimilates prior measurements. The materials informatics setting inherits all three lineages—quantum non-neutrality, statistical path-dependence, and socio-technical reactivity—yet adds a fourth: the closed algorithmic loop.

A measurement problem, therefore, exists whenever (i) the act of acquiring data expends an irreplaceable resource, (ii) the data acquisition process physically or chemically modifies the system, or (iii) the selection of future measurements is conditioned on the outcomes of past ones. Under these conditions, the observed distribution is no longer an unbiased sample of an external reality but a joint product of the material’s intrinsic properties and the history of the observer’s interventions. This conceptual definition supplies the bridge from quantum foundations to classical materials discovery [1, 3, 22].

Measurement in Materials Informatics

Within materials informatics, the act of measurement departs from any notion of neutrality, not through a single mechanism but through a set of intertwined constraints that reshape both the material system and the data landscape it generates. A fundamental source of this departure lies in the physical consequences of many characterization techniques themselves. Measurement is often inseparable from transformation or consumption: transmission electron microscopy, for instance, requires sample thinning to electron transparency, irreversibly altering or destroying the very volume that might otherwise be subjected to complementary thermal or mechanical analyses. Similar dynamics arise in high-throughput combinatorial platforms, where precursor materials are expended in the process of generating data, precluding straightforward replication or reuse. Under these conditions, each observation does more than reveal information; it actively reduces the accessible region of phase space, embedding scarcity directly into the epistemic process.

This material constraint is further intensified by the economic structure within which experimentation unfolds. Resources such as synchrotron beamtime, automated synthesis infrastructure, and large-scale computational capacity are allocated in finite increments, and their deployment is inherently path-dependent. Once a sequence of measurements has been executed in a given compositional or processing regime, the associated expenditure cannot be reallocated retrospectively, even if subsequent results render those early decisions suboptimal. The trajectory through composition–process–structure space thus acquires a historical imprint, where unpromising directions nonetheless consume opportunities that might have supported alternative explorations. While this dynamic has been formalized in the literature through sequential decision frameworks, particularly in Bayesian optimization approaches [6, 13, 14], its deeper implication is often underemphasized: the experimental search space is not merely explored but progressively constrained by irreversible commitments.

The introduction of artificial intelligence into this setting adds a further layer of complexity by rendering measurement reflexive. In active-learning systems, experimental selection is guided by the model’s current representation of uncertainty or expected utility, such that each new datum is chosen because it is predicted to be maximally informative. This creates a feedback loop in which the evolving dataset is shaped by the model’s own internal assumptions, progressively concentrating observations in regions deemed valuable by prior states of the system. Consequently, the resulting data distribution departs from any notion of random or even systematically stratified sampling, instead reflecting a trajectory sculpted through iterative model-driven decisions. Studies in machine learning for materials and molecular discovery have demonstrated how this closed-loop structure accelerates exploration [7, 8]. Yet, it simultaneously introduces a measurement effect that cannot be disentangled from the learning process itself.

What emerges from the interaction of these dynamics is a redefinition of measurement as an intervention rather than a passive act of observation. The consumptive nature of experimental techniques, the irreversibility imposed by resource allocation, and the self-referential logic of algorithmic selection converge to ensure that each measurement reshapes both the material substrate and the statistical environment available for future inquiry. In practice, the system under investigation and the data used to model it co-evolve, such that the act of measuring becomes inseparable from the process of constructing the very landscape that subsequent measurements will inhabit [5-7].

Theoretical Claim: Observation Changes the System

The central theoretical claim advanced here is that, within materials informatics, observation cannot be treated as a neutral act that reveals an underlying, observer-independent distribution of material properties. When measurement is mediated by artificial intelligence, it instead participates in the construction of that distribution, shaping both what is observed and what remains accessible. This constructive role emerges through the convergence of three dynamics: the depletion of finite experimental resources, the physical alteration or consumption of samples during characterization, and the recursive feedback introduced by model-guided selection. The resulting dataset is therefore not a passive reflection of material reality but a trajectory-dependent artifact, contingent on the sequence of prior measurements and the decisions that governed them. In this sense, the statistical structure of observed data is inseparable from the history of its acquisition.

Clarifying the scope of this claim requires distinguishing it from the more familiar measurement problem in quantum theory. The transformation induced by measurement in materials informatics unfolds incrementally rather than instantaneously, accumulating across successive experimental steps and often across distinct physical specimens. The uncertainty that arises is not rooted in ontological indeterminacy but in epistemic limitation and resource exhaustion, as regions of composition space remain theoretically accessible yet become practically unreachable once experimental budgets are expended. A further divergence lies in the mediating role of the surrogate model, whose parameters evolve deterministically in response to incoming data, in contrast to the irreducible unpredictability associated with quantum entanglement. These distinctions situate the present claim within a different conceptual register, one in which measurement effects are grounded in the operational realities of scientific practice rather than in the foundational limits of physical theory.

The force of this argument becomes particularly evident in the architecture of autonomous discovery systems, where synthesis, characterization, and model updating are integrated into a closed-loop pipeline. As articulated in visions of self-driving laboratories, each iteration generates data that immediately informs subsequent experimental choices, thereby collapsing the distinction between observation and intervention [5]. Within such systems, an initial measurement updates the surrogate model, which in turn selects the next candidate through an acquisition function; this selection is executed experimentally, consuming resources and potentially modifying the material substrate, after which the resulting data re-enter the model. The process unfolds as a continuous cycle in which no stage returns the system to an unaltered baseline. Observation and system evolution are thus entangled in a cumulative trajectory, with each step redefining the conditions under which the next becomes possible [1, 3, 5, 6].

From this perspective, several implications follow that reshape how data and models in materials informatics should be interpreted. The sequence of measurements acquires structural significance, as each experimental decision constrains the set of configurations that remain reachable in subsequent steps. Because resources are irreversibly expended and samples may be altered or consumed, the accessible search space contracts in a manner that depends on the specific path taken, rather than on abstract sampling considerations alone. This introduces a form of path dependence that cannot be reduced to statistical variance, as alternative sequences would have produced fundamentally different trajectories through composition–process–structure space.

A related implication concerns the nature of dataset bias. Rather than arising solely from incomplete or uneven sampling, bias is actively constructed through the interaction between the measurement policy and the material system under study. The resulting dataset encodes not only the properties of materials but also the preferences and constraints embedded in the selection strategy. Models trained on such data inevitably internalize this joint structure, such that their predictions reflect both the underlying physical relationships and the historical logic of data acquisition. What is often treated as ground truth is therefore better understood as the trace of a particular experimental trajectory.

This shift also reframes the interpretive status of learned models themselves. When trained on data shaped by consumptive measurement, resource allocation, and algorithmic selection, the model cannot be said to represent purely intrinsic material behavior. Instead, it approximates a composite process in which physical phenomena are intertwined with observer-induced modifications, including beam effects, budget-driven truncation of exploration, and the directional biases introduced by acquisition functions. Disentangling these components remains an unresolved epistemic challenge, particularly because standard evaluation procedures rely on held-out data generated under the same measurement regime. As a result, performance metrics may inadvertently validate the model’s ability to reproduce the measurement process as much as its capacity to capture underlying physics.

Reconsidering existing active-learning literature through this lens reveals a subtle but consequential tension. While studies emphasize the efficiency gains achieved through intelligent experimental selection [6, 13], the same mechanisms that accelerate discovery also inscribe a measurement signature into the resulting dataset. The implications of this signature vary across contexts. In regimes characterized by abundant materials, non-destructive characterization, and weak model dependence, the effects may remain negligible. Under conditions involving scarce resources, beam-sensitive structures, or extended autonomous campaigns, however, the influence of measurement becomes structurally significant and cannot be treated as a secondary concern. The framework developed here thus provides a basis for identifying when measurement effects can be safely abstracted away and when they must be explicitly incorporated into both model design and interpretation.

Mechanisms of Measurement-Induced Change

The claim developed earlier is grounded in a set of mutually reinforcing mechanisms through which measurement in materials informatics ceases to function as a passive observational act and instead becomes constitutive of the system it seeks to describe. What is at stake is not a single source of distortion but a layered interaction between physical constraints and algorithmic decisions, operating across distinct yet interdependent scales. Resource depletion, selection feedback, and measurement-driven evolution each contribute to this transformation, and their convergence produces a data landscape that is inherently path-dependent and co-constructed. In contrast to the instantaneous, ontological non-neutrality associated with quantum measurement [1, 3], the effect here emerges cumulatively, unfolding through repeated interventions that are bounded by finite experimental budgets and mediated by continuously updated surrogate models. Under these conditions, observation participates in shaping both the material substrate and the statistical structure through which it is subsequently interpreted.

A key driver of this transformation lies in the irreversible consumption of experimental resources. Measurement is inseparable from expenditure: precursor materials, instrument time, energy, and even sample volume are depleted with each observation, constraining what can be explored thereafter. In high-throughput synthesis environments, for example, combinatorial libraries rely on finite material inputs that cannot be instantaneously replenished, such that each characterized composition effectively reallocates the remaining experimental budget. This introduces a form of epistemic irreversibility, where regions of composition space that remain theoretically accessible become practically unattainable once prior allocations have been committed. Although visions of autonomous discovery platforms emphasize the integration of synthesis, characterization, and model updating within closed-loop systems [5], they often treat resource limitations as external constraints rather than as factors that actively reshape the observable domain. Yet the depletion of scarce inputs—whether rare-earth precursors or limited synchrotron beamtime—does more than restrict throughput; it alters the topology of future exploration by foreclosing certain trajectories while privileging others. Measurement, in this sense, redistributes opportunity across the search space, transforming the act of data acquisition into a process that continuously redefines the boundaries of what can still be known.

This reconfiguration becomes even more pronounced when experimental selection is governed by artificial intelligence, introducing a recursive dynamic in which the model’s internal state directly influences the evolution of the dataset. As each new observation updates the surrogate model, the acquisition function recalibrates the criteria for subsequent measurements, directing attention toward regions expected to yield maximal informational gain. The consequence is a feedback loop in which the data distribution progressively aligns with the model’s current representation of uncertainty and value. Empirical demonstrations of closed-loop discovery through Bayesian active learning illustrate the efficiency gains achievable under such regimes [6]. At the same time, broader benchmarking efforts confirm the robustness of these strategies across diverse materials domains [13]. Yet the same mechanisms that accelerate discovery also embed a structural bias into the data-generating process. The model does not merely infer properties from a pre-existing distribution; it actively participates in constructing that distribution by determining which regions of composition–process space will be sampled next. Over successive iterations, this recursive selection narrows the observable manifold around regions deemed informative by prior model states, yielding a dataset that reflects not only the underlying physics of materials but also the evolving preferences and assumptions encoded in the learning system.

Figure 1 provides a hierarchical representation of how measurement actions propagate through mechanisms of non-neutrality to transform the material system and ultimately construct path-dependent epistemic outcomes.

Figure 1. A hierarchical representation of how measurement actions propagate through mechanisms of non-neutrality to transform the material system and ultimately construct path-dependent epistemic outcomes.

Figure 1. A hierarchical representation of how measurement actions propagate through mechanisms of non-neutrality to transform the material system and ultimately construct path-dependent epistemic outcomes.

A further layer of non-neutrality emerges through the cumulative transformation of the material system under repeated observation, where measurement itself becomes a driver of physical and chemical change. Interactions that are often treated as benign or incidental—electron-beam exposure during transmission electron microscopy, thermal cycling in repeated characterization protocols, or mechanical loading in fatigue experiments—can, over time, reconfigure defect populations, alter phase stability, or modify surface chemistry in ways that are neither negligible nor random. While certain experimental frameworks explicitly leverage such interactions, as in in-situ ion-irradiation platforms designed to probe dynamic response [23-29], even ostensibly non-destructive techniques introduce incremental perturbations when applied iteratively. What appears as repeated measurement of a stable object is, in practice, an evolving sequence in which each intervention leaves a trace that conditions subsequent observations.

This introduces a temporal dimension that fundamentally distinguishes dynamic experimental regimes from static datasets. The material encountered at a later stage of an autonomous campaign is not simply a resampled instance of an unchanged system but the outcome of a trajectory shaped by prior measurement choices. The divergence between earlier and later states cannot be attributed solely to intrinsic variability; it reflects the accumulated imprint of the observer’s interventions. Under these conditions, path dependence operates in a compounded manner. Not only does the sequence of measurements matter, but the specific modalities through which those measurements are conducted also contribute to the evolving state of the system. The material becomes, in effect, co-produced by the experimental protocol that interrogates it.

When this mechanism is coupled with model-driven selection, the implications extend beyond local sample evolution to the broader structure of the data landscape. Active-learning systems may repeatedly target regions that have already undergone prior characterization, particularly if those regions are deemed informative according to the model’s current uncertainty estimates. In doing so, the model preferentially acquires data from portions of the space that have already been modified by earlier measurements, reinforcing a feedback loop in which both the sampling strategy and the material state co-evolve. The resulting representation captures not an untouched property landscape but one that has been progressively reshaped through interaction. In this light, observation in materials informatics cannot be understood as a neutral interface with an external reality. Through the combined effects of resource allocation, algorithmic selection, and measurement-induced transformation, it operates as a constitutive process that actively defines the domain it seeks to characterize, thereby providing the mechanistic basis for claim 1 and clarifying why both datasets and models must be interpreted as products of this ongoing co-construction.

Figure 2 develops a hierarchical decision-structural representation of AI-guided measurement, illustrating how sequential selection processes irreversibly prune the experimental search space and generate path-dependent epistemic outcomes.

Figure 2. A hierarchical decision-structural representation of AI-guided measurement.

Figure 2. A hierarchical decision-structural representation of AI-guided measurement.

Relation to Existing Concepts

The measurement problem articulated here stands in a specific conceptual relation to three well-established frameworks in materials informatics: active learning, Bayesian experimental design, and representation bias. Rather than supplanting these concepts, the present analysis augments them by foregrounding the non-neutrality that each already harbors but rarely theorizes explicitly. Active learning, as developed by Kusne et al. [6], Lookman et al. [8], and Talapatra et al. [14], deliberately selects measurements to reduce model uncertainty or maximize an acquisition function. The measurement problem reveals that this deliberate selection is not a neutral optimization step, but the very engine of selection feedback; the acceleration celebrated in the literature is purchased at the price of an observer-imposed imprint on the data distribution. Active learning, therefore, does not merely accelerate discovery—it actively constructs the discoverable landscape, rendering the final model sensitive to the history of its own queries.

Table 2 consolidates the theoretical contribution by demonstrating how the measurement problem reframes foundational paradigms in materials informatics as intervention-driven rather than observational processes.

Table 2. Theoretical integration of the measurement problem with core materials informatics paradigms

Framework	Conventional interpretation	Reinterpretation under the measurement problem	Hidden assumption exposed	New theoretical implication
Active learning	Efficient sampling strategy	Mechanism of selection feedback	Neutral sampling assumption	Data distribution is constructed
Bayesian experimental design	Optimal decision-making under uncertainty	Resource-constrained intervention policy	Reversible decision tree	Irreversible pruning of possibilities
Representation bias	Historical dataset imbalance	Measurement-policy-induced bias	The dataset is pre-existing	The dataset is actively generated
Autonomous laboratories	Closed-loop acceleration	Self-referential intervention system	Measurement neutrality	Observer-system co-evolution
Surrogate modeling	Approximation of material properties	Approximation of material + measurement process	Separation of data and process	Entangled learning dynamics
Dataset curation	Recording outcomes	Recording intervention history	Static data assumption	Need for full measurement provenance

Bayesian experimental design [13, 14] formalizes measurement choice as a decision under uncertainty, treating the next experiment as an action that balances exploration and exploitation. The measurement problem adds that every such decision simultaneously expends an irreplaceable resource and potentially alters the system, thereby converting the decision tree from a static search over possibilities into a dynamic pruning of actualities. The expected utility of a measurement must now incorporate not only information gain but also the downstream cost of resource depletion and the risk of measurement-driven evolution. In this light, Bayesian optimization becomes a theory of intervention rather than pure inference.

Representation bias, long discussed in machine-learning literature for materials science [7, 18], typically concerns imbalances in training data arising from historical or socioeconomic factors. The measurement problem reframes such bias as an active construction bias generated by the very measurement policy employed during data acquisition. The dataset is not a pre-existing corpus that happens to be skewed; it is the cumulative trace of a sequence of observer interventions. What the measurement problem adds conceptually, therefore, is an explicit epistemology of measurement: a recognition that the non-neutrality of observation is not an artifact to be mitigated but an intrinsic feature of closed-loop discovery that must be modeled and documented. By integrating these relations, the framework supplies a unifying theoretical account that connects active learning’s algorithmic agency, Bayesian design’s decision-theoretic structure, and representation bias’s statistical consequences under a single heading—the measurement problem in materials informatics.

Implications for Ai-Guided Measurement

The framework developed above implies a reconfiguration of how AI-guided measurement systems are designed, shifting the role of measurement from a neutral input channel to an explicitly managed component of the discovery process. One immediate consequence concerns the treatment of experimental resources. When measurement is understood as consumptive and path-dependent, experimental budgets can no longer be treated as abstract or externally replenishable constraints. Instead, they become endogenous to the optimization problem itself. Acquisition strategies must therefore internalize depletion as a dynamic cost, introducing penalties that scale with cumulative expenditure and reflect the diminishing flexibility of the remaining search space. Under these conditions, the objective of the system is no longer reducible to maximizing immediate information gain; it becomes a more nuanced negotiation between advancing discovery and preserving future optionality. Early measurements, in particular, acquire disproportionate significance, as they shape not only current knowledge but the set of trajectories that remain viable.

This shift also introduces a deeper requirement for models to reason about the consequences of their own interventions. It is no longer sufficient for surrogate models to quantify uncertainty solely in relation to unobserved material properties. A related implication is the need to represent uncertainty about the integrity of the data itself, particularly in light of measurement-induced transformations. Data points are not static carriers of information but may embody accumulated effects of prior experimental actions, and this introduces a second-order uncertainty that conventional frameworks do not capture. Incorporating such meta-uncertainty into the feedback loop allows acquisition functions to differentiate between regions where observed variability reflects intrinsic material behavior and those where it arises from observer-induced perturbations. In practice, this could guide the system either toward stabilizing measurements that minimize further alteration or toward targeted interventions that explicitly characterize the degree of measurement-induced evolution. The resulting models are better positioned to disentangle physical phenomena from artifacts of the experimental process.

A further extension of this perspective concerns the status of data itself, particularly in relation to documentation and reuse. If datasets are understood as products of a measurement trajectory rather than as neutral samples, then their provenance becomes indispensable to their interpretation. Recording only final property values is insufficient; what is required is a structured account of how those values were obtained. This includes the sequence of prior measurements, the protocols applied at each stage, the cumulative resource expenditure, and the state of the surrogate model at the moment each decision was made. Such metadata transforms datasets from static repositories into traceable histories, enabling downstream users to evaluate the extent to which observed patterns reflect intrinsic material properties or the contingencies of the measurement process. Without this level of detail, the risk remains that models trained on such data will inadvertently reproduce the biases embedded in their acquisition.

Taken together, these implications signal a broader epistemic transition in AI-driven materials discovery. Measurement is no longer treated as a passive act of recording but as an intervention that must be anticipated, monitored, and governed. By embedding awareness of resource depletion, measurement-induced change, and data provenance directly into system design, autonomous platforms can move toward a form of discovery that is not only accelerated but also reflexive. This reorientation does not diminish the role of active learning or Bayesian optimization; rather, it reframes their operation within a measurement-aware paradigm in which the act of observation is recognized as constitutive of the knowledge it produces.

Conclusion

The present theoretical analysis has identified and formalized a measurement problem analog within materials informatics. Claim 1 asserts that AI-guided measurement does not passively reveal an observer-independent distribution but actively constructs that distribution through resource depletion, physical sample alteration, and selection feedback. The three derived corollaries—path dependence of measurement sequences, active construction bias in datasets, and the learning of the measurement process itself—follow directly from this claim. At the same time, the three mechanisms supply the concrete pathways through which non-neutrality manifests. By distinguishing these classical effects from the quantum measurement problem along ontological, temporal, and epistemic dimensions, the framework clarifies both the shared epistemic core of non-neutral observation and the materials-specific character of its realization.

The implications extend beyond conceptual clarification. Autonomous laboratories, active-learning algorithms, and dataset curation protocols must henceforth treat measurement as intervention rather than passive recording. Only measurement-aware AI systems—those that internalize resource costs, track meta-uncertainty over observer-induced changes, and document full measurement provenance—can claim epistemic responsibility for the materials they discover. The measurement problem in materials informatics is therefore not a peripheral inconvenience but a foundational feature of data-driven science in the AI era. Recognizing and modeling this feature promises to deepen our understanding of discovery itself as a co-evolutionary process between observer and observed.

Acknowledgements

None

Conflict of interest

None

Financial support

None

Ethics statement

None

References

Meyer JC, Passante G, Pollock SJ, Wilcox BR. Today’s interdisciplinary quantum information classroom: Themes from a survey of quantum information science instructors. Phys Rev Phys Educ Res. 2022;18(1):010150.

Denk R. Mathematische grundlagen der quantenmechanik. Springer Spektrum; 2022.

Isert C, Atz K, Jiménez-Luna J, Schneider G. QMugs, quantum mechanical properties of drug-like molecules. Sci Data. 2022;9(1):273.

Mekki-Berrada F, Xie J, Khan SA. High-throughput and high-speed absorbance measurements in microfluidic droplets using hyperspectral imaging. Chem Methods. 2022;2(5):e202100086.

Montoya JH, Aykol M, Anapolsky A, Gopal CB, Herring PK, Hummelshøj JS, et al. Toward autonomous materials research: Recent progress and future challenges. Appl Phys Rev. 2022;9(1):011405.
https://doi.org/10.1063/5.0076324

Kusne AG, Yu H, Wu C, Zhang H, Hattrick-Simpers J, DeCost B, et al. On-the-fly closed-loop materials discovery via Bayesian active learning. Nat Commun. 2020;11(1):5966.

Butler KT, Davies DW, Cartwright H, Isayev O, Walsh A. Machine learning for molecular and materials science. Nature. 2018;559(7715):547-55.

Lookman T, Balachandran PV, Xue D, Yuan R. Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design. npj Comput Mater. 2019;5(1):21.

Wang A, Liang H, McDannald A, Takeuchi I, Kusne AG. Benchmarking active learning strategies for materials optimization and discovery. Oxf Open Mater Sci. 2022;2(1):itac006.

Hara K, Yamada S, Kurotani A, Chikayama E, Kikuchi J. Materials informatics approach using domain modelling for exploring structure-property relationships of polymers. Sci Rep. 2022;12(1):10558.

Shekhawat D, Vauth M, Pezoldt J. Size dependent properties of reactive materials. Inorganics. 2022;10(4):56.

Cranford S. Reducing the editorial observer effect. Matter. 2021;4(8):2571-3.

Liang Q, Gongora AE, Ren Z, Tiihonen A, Liu Z, Sun S, et al. Benchmarking the performance of Bayesian optimization across multiple experimental materials science domains. npj Comput Mater. 2021;7(1):188.

Talapatra A, Boluki S, Duong T, Qian X, Dougherty E, Arróyave R. Autonomous efficient experiment design for materials discovery with Bayesian model averaging. Phys Rev Mater. 2018;2(11):113803.

Talapatra A, Boluki S, Honarmandi P, Solomou A, Zhao G, Ghoreishi SF, et al. Experiment design frameworks for accelerated discovery of targeted materials across scales. Front Mater. 2019;6:82.

Sanchez-Lengeling B, Aspuru-Guzik A. Inverse molecular design using machine learning: Generative models for matter engineering. Science. 2018;361(6400):360-5.

Ramprasad R, Batra R, Pilania G, Mannodi-Kanakkithodi A, Kim C. Machine learning in materials informatics: Recent applications and prospects. npj Comput Mater. 2017;3(1):54.

Schmidt J, Marques MR, Botti S, Marques MA. Recent advances and applications of machine learning in solid-state materials science. npj Comput Mater. 2019;5(1):83.

Pilania G. Machine learning in materials science: From the perspective of materials informatics. Mater Today. 2021;50:1-8.
https://doi.org/10.1016/j.mattod.2021.06.004

Zvyagin LS. Modern trends in the development of the concept of the soft measurements. In: System Analysis in Economics - 2018; 2018. p. 199-202.

Tal E. A model-based epistemology of measurement. In: Schlaudt C, Huber L, editors. Reasoning in measurement. London: Routledge; 2017. p. 245-65.

Parker WS. Computer simulation, measurement, and data assimilation. Br J Philos Sci. 2017;68(1):273-304.
https://doi.org/10.1093/bjps/axv037

Köhler B, Kissing J, Gartsev S, Barth M, Rjelka M, Bamberg J, et al. New measurement approach for determination of residual stress and cold work at surface treated aero engine materials.

Lang E, Dennett CA, Madden N, Hattar K. The in situ ion irradiation toolbox: Time-resolved structure and property measurements. JOM. 2022;74(1):126-42.

Kaufman JG. Properties and selection of cast aluminum alloys. Materials Park (OH): ASM International; 2004.

Lu J. Seeing the weak bonding. Matter. 2019;1(2):304-5.

McGinley M, Roy S, Parameswaran SA. Absolutely stable spatiotemporal order in noisy quantum systems. Phys Rev Lett. 2022;129(9):090404.

McConnell DA, Chapman L, Czajka CD, Jones JP, Ryker KD, Wiggen J. Instructional utility and learning efficacy of common active learning strategies. J Geosci Educ. 2017;65(4):604-25.

Huang T, Pan H, Sun W, Gao H. Sine resistance network-based motion planning approach for autonomous electric vehicles in dynamic environments. IEEE Trans Transp Electrification. 2022;8(2):2862-73.

Author information

Ravi Kumar, Neha Sharma, Aniket Deshmukh, Arjun Nair & Meera Pillai contributed to this work.

Authors and affiliations

Department of AI in Materials Science, IIT Delhi, New Delhi, India
Ravi Kumar, Neha Sharma & Arjun Nair

Department of Computational Materials Engineering, IIT Bombay, Mumbai, India
Aniket Deshmukh & Meera Pillai

Corresponding author

Correspondence to Ravi Kumar

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

About this article

Cite this article

Vancouver

Kumar R, Sharma N, Deshmukh A, Nair A, Pillai M. The Measurement Problem in Materials Informatics: When Observing Changes in the System. J. Artif. Intell. Mater. Sci.. 2022;1:101.

APA

Kumar, R., Sharma, N., Deshmukh, A., Nair, A., & Pillai, M. (2022). The Measurement Problem in Materials Informatics: When Observing Changes in the System. Journal of Artificial Intelligence for Materials Science, 1, 101.

Download citation

Received

03 September 2021

Revised

13 November 2021

Accepted

11 December 2021

Published

18 January 2022

Version of record

18 January 2022

Keywords

Materials informatics Active learning Measurement problem Resource depletion Selection feedback Measurement-induced change

The Measurement Problem in Materials Informatics: When Observing Changes in the System

Scan to access
this article

Journal archive

Ready to submit?

Start a new submission or continue a submission in progress:

Submission Portal Instructions for authors

Follow this journal

Get notified of new updates and articles.

Abstract

Introduction

The Measurement Problem in Physics and Beyond

Measurement in Materials Informatics

Theoretical Claim: Observation Changes the System

Mechanisms of Measurement-Induced Change

Relation to Existing Concepts

Implications for Ai-Guided Measurement

Conclusion

Acknowledgements

Conflict of interest

Financial support

Ethics statement

References

Author information

Authors and affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords