You can edit almost every page by Creating an account and confirming your email.

Ghost data

From EverybodyWiki Bios & Wiki




Ghost data (Chinese: 幽灵数据[1] Archived 2022-11-17 at the Wayback Machine) refers to data that is generally invisible to us.

From the perspective of data science and statistics, it often includes virtual data (such as simulation data.[1], virtual patient[2]), missing data [3], forged data [4], highly sparse data, and other similar data.

It is easy to think of missing data. Due to the difference in the missing mechanism (ignorable or non-ignorable, missing at random or missing not at random, etc.), the processing methods are also different. In addition to missing data, ghost data also includes other invisible data. These data may be the one that some people can perceive but others cannot, such as survival as a digital ghost[5], digital museum and archive.

Overview

Ghost data is first proposed by Professor Dennis Lin in a series of invited lectures[6][7][8][9]. Ghost data widely exists in all kinds of historical data, including recorded diaries, photos, audio recordings, videos, and even memory information stored in fossils and cultural relics. Although these cultural relics only record part of the surface features, some people can perceive more related information, while others cannot perceive that information. People who can perceive more information can construct parts and approximately copy the entire life program. For example, in the virtual restoration of cultural relics, the cooperation of data scientists and cultural relic experts can store more of their records and restore the memory information they carry. With the advancement of technology, it will be able to provide a perfect replica of the original life.

Data type

Ghost data includes virtual data, missing data, fake data, simulated data, highly sparse data, and other similar data. It is easy to think of missing data. Due to the different missing mechanisms, the processing methods are also different. Missing data can be divided into missing at random (Missing at random, MAR) and missing not at random (Missing not at random, MNAR). Missing data processing methods mainly include the use of sample sampling inference, Bayesian inference, and likelihood method inference. For experimental design, a random complete block design will also encounter missing problems. An incomplete balanced block design can also be regarded as a random complete block design with missing data for statistical analysis. Evidence gaps in global health research and research related to the problem of counterfeit medicines pointed out that global health research and policies have warned about the growing threat of counterfeit and substandard medicines.

Application example

(1) Simulation data. Model simulation provides solutions for "what if". Such as simulation data generated by simulation software, such as Alphazaro.

(2) Virtual data. The fact does not exist, but it is visible. Virtual reality, science fiction movies, such as "I see what you did not see" in the movie "The Sixth Sense"; the Live Die Repeat in "Edge of Tomorrow".

(3) Missing data. Existence should be valuable, but there is a lack. For example, air forces dropped out in battles. Another example is the incomplete data appearing in the electronic health records, the pain data of heart disease virtual patients.

(4) Falsify (disguise) data. Fabricated data. Such as fake drug data; "fake data" in Scientific misconduct. One example is the evidence gap in global health research and related research on the issue of counterfeit drugs. Global health research and policies have warned about the increasing threat of counterfeit and inferior drugs[10]

(5) Highly sparse data. Without these data, only 0 values remain, such as transaction record data in e-commerce.

(6) Other relevant data. How to deal with data we can't see. Data contained in historical relics or cultural relics/site "restored" data, such as digital technology assists Terracotta Warrior dig, evidence data of Holmes, etc.

See also

References

  1. Kaitai Fang, Dennis KJ Lin (2003). Uniform experimental design and its applications in industry. North Holland, Amsterdam. Search this book on
  2. Man Xu, Jiang Shen, Haiyan Yu (2017). Big Data for Healthcare. Beijing: Machinery Industry Press. Search this book on
  3. Haiyan Yu, Jingjing Chen (2020). "Gaussian mixture clustering algorithm with maximization expectation for censored data". ACTA AUTOMATICA SINICA.[permanent dead link]
  4. Sarah Hodges, Emma Garnett (2020). "The ghost in the data: Evidence gaps and the problem of fake drugs in global health research". Global Public Health: 1744-1692.
  5. Eric Steinhart. "Survival as a Digital Ghost". Minds & Machines. 17: 261–271.
  6. uwaterloo. "Statistics and Actuarial ScienceEvents2018". uwaterloo.ca. Archived from the original on 2020-11-24. Retrieved 2018-11-15.
  7. Arizona State University. "Ghost Data". Retrieved 2019-11-07.[permanent dead link]
  8. Haiyan Yu, CQUPT. "Ghost data in the post- big data era by Professor Dennis KJ Lin". Archived from the original on 2019-07-23. Retrieved 2018-07-17.
  9. bc.njupt.edu.cn/. ""Ghost Data (phantom data)" lecture was successfully held". Nanjing University of Posts and Telecommunications School of Management. Retrieved 2019-05-28.
  10. Grau, Oliver (2017). Museum and archive on the move: changing cultural institutions in the digital era. Walter de Gruyter GmbH & Co KG. Search this book on


This article "Ghost data" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Ghost data. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.