Display options
Share it on

Comput Biol Med. 2021 Dec 11;141:105118. doi: 10.1016/j.compbiomed.2021.105118. Epub 2021 Dec 11.

Mining real-world high dimensional structured data in medicine and its use in decision support. Some different perspectives on unknowns, interdependency, and distinguishability.

Computers in biology and medicine

Barry Robson, S Boray, J Weisman

Affiliations

  1. Ingine Inc, Ohio, USA; The Dirac Foundation, Oxfordshire, UK. Electronic address: [email protected].
  2. Ingine Inc, Ohio, USA. Electronic address: [email protected].
  3. The Dirac Foundation, Oxfordshire, UK. Electronic address: [email protected].

PMID: 34971979 DOI: 10.1016/j.compbiomed.2021.105118

Abstract

There are many difficulties in extracting and using knowledge for medical analytic and predictive purposes from Real-World Data, even when the data is already well structured in the manner of a large spreadsheet. Preparative curation and standardization or "normalization" of such data involves a variety of chores but underlying them is an interrelated set of fundamental problems that can in part be dealt with automatically during the datamining and inference processes. These fundamental problems are reviewed here and illustrated and investigated with examples. They concern the treatment of unknowns, the need to avoid independency assumptions, and the appearance of entries that may not be fully distinguished from each other. Unknowns include errors detected as implausible (e.g., out of range) values that are subsequently converted to unknowns. These problems are further impacted by high dimensionality and problems of sparse data that inevitably arise from high-dimensional datamining even if the data is extensive. All these considerations are different aspects of incomplete information, though they also relate to problems that arise if care is not taken to avoid or ameliorate consequences of including the same information twice or more, or if misleading or inconsistent information is combined. This paper addresses these aspects from a slightly different perspective using the Q-UEL language and inference methods based on it by borrowing some ideas from the mathematics of quantum mechanics and information theory. It takes the view that detection and correction of probabilistic elements of knowledge subsequently used in inference need only involve testing and correction so that they satisfy certain extended notions of coherence between probabilities. This is by no means the only possible view, and it is explored here and later compared with a related notion of consistency.

Copyright © 2021 Elsevier Ltd. All rights reserved.

Keywords: Approximations; Assumptions; Bayes net; Bayes' rule; Clinical decision support; Coherence; Distinguishability; Hyperbolic Dirac net; Inference net; Interdependency; Real world data; Unknowns

Publication Types