In the rapidly evolving field of computational and data-driven materials engineering, machine learning models are increasingly deployed for property prediction, inverse design, and autonomous discovery. However, the integrity of these models hinges on the quality of training datasets, which often embed subtle biases arising from construction methodologies. This manuscript explores the conceptual underpinnings of dataset construction bias in materials AI evaluation, framing it as an epistemic challenge that distorts benchmarking outcomes and impedes genuine materials discovery. We introduce the Dataset Integrity Cascade (DIC) framework, a layered conceptual model that maps data curation processes to inference distortions, incorporating feedback mechanisms to reveal how biases propagate through representation learning, model training, and validation pipelines. By synthesizing recent advances in materials informatics, graph neural networks, and uncertainty quantification, the framework highlights systemic trade-offs between dataset scale and representational fidelity. Implications extend to high-throughput computation, closed-loop experimentation, and foundation models for science, suggesting pathways for more robust computational steering in materials design. This work underscores the need for integrative approaches that align dataset architectures with the inherent complexities of materials systems, fostering epistemically sound innovation without empirical validation.