Data-driven astronomy
It has been suggested that this article be merged into Astroinformatics. (Discuss) Proposed since May 2024. |
Data-driven astronomy (DDA) refers to the use of data science in astronomy. Several outputs of telescopic observations and sky surveys are taken into consideration and approaches related to data mining and big data management are used to analyze, filter, and normalize the data set that are further used for making Classifications, Predictions, and Anomaly detections by advanced Statistical approaches, digital image processing and machine learning. The output of these processes is used by astronomers and space scientists to study and identify patterns, anomalies, and movements in outer space and conclude theories and discoveries in the cosmos.
History[edit]
In 2007, the Galaxy Zoo project[1] was launched for morphological classification[2][3] of a large number of galaxies. In this project, 900,000 images were considered for classification that were taken from the Sloan Digital Sky Survey (SDSS)[4] for the past 7 years. The task was to study each picture of a galaxy, classify it as elliptical or spiral, and determine whether it was spinning or not. The team of Astrophysicists led by Kevin Schawinski in Oxford University were in charge of this project and Kevin and his colleague Chris Linlott figured out that it would take a period of 3–5 years for such a team to complete the work.[5] There they came up with the idea of using Machine Learning and Data Science techniques for analyzing the images and classifying them.[6]
Methodology[edit]
The data retrieved from the sky surveys are first brought for data preprocessing. In this, redundancies are removed and filtrated. Further, feature extraction is performed on this filtered data set, which is further taken for processes.[7] Some of the renowned sky surveys are listed below:
- The Palomar Digital Sky Survey (DPOSS)[8]
- The Two-Micron All Sky Survey (2MASS)[9]
- Green Bank Telescope (GBT)[10]
- The Galaxy Evolution Explorer (GALEX)[11]
- The Sloan Digital Sky Survey (SDSS)[4]
- SkyMapper Southern Sky Survey (SMSS)[12]
- The Panoramic Survey Telescope and Rapid Response System (PanSTARRS)[13]
- The Large Synoptic Survey Telescope (LSST)[14]
- The Square Kilometer Array (SKA)[15]
The size of data from the above-mentioned sky surveys ranges from 3 TB to almost 4.6 EB.[7] Further, data mining tasks that are involved in the management and manipulation of the data involve methods like classification, regression, clustering, anomaly detection, and time-series analysis. Several approaches and applications for each of these methods are involved in the task accomplishments.
Classification[edit]
Classification[16] is used for specific identifications and categorizations of astronomical data such as Spectral classification, Photometric classification, Morphological classification, and classification of solar activity. The approaches of classification techniques are listed below:
- Artificial neural network (ANN)
- Support vector machine (SVM)
- Learning vector quantization (LVQ)
- Decision tree
- Random forest
- k-nearest neighbors
- Naïve Bayesian networks
- Radial basis function network
- Gaussian process
- Decision table
- Alternating decision tree (ADTree)
Regression[edit]
Regression[17] is used to make predictions based on the retrieved data through statistical trends and statistical modeling. Different uses of this technique are used for fetching Photometric redshifts and measurements of physical parameters of stars.[18] The approaches are listed below:
- Artificial neural network (ANN)
- Support vector regression (SVR)
- Decision tree
- Random forest
- k-nearest neighbors regression
- Kernel regression
- Principal component regression (PCR)
- Gaussian process
- Least squared regression (LSR)
- Partial least squares regression
Clustering[edit]
Clustering[19] is classifying objects based on a similarity measure metric. It is used in Astronomy for Classification as well as Special/rare object detection. The approaches are listed below:
- Principal component analysis (PCA)
- DBSCAN
- k-means clustering
- OPTICS
- Cobweb model
- Self-organizing map (SOM)
- Expectation Maximization
- Hierarchical Clustering
- AutoClass[20]
- Gaussian Mixture Modeling (GMM)
Anomaly detection[edit]
Anomaly detection[21] is used for detecting irregularities in the dataset. However, this technique is used here to detect rare/special objects. The following approaches are used:
- Principal Component Analysis (PCA)
- k-means clustering
- Expectation Maximization
- Hierarchical clustering
- One-class SVM
Time-series analysis[edit]
Time-Series analysis[22] helps in analyzing trends and predicting outputs over time. It is used for trend prediction and novel detection (detection of unknown data). The approaches used here are:
References[edit]
- ↑ "Zooniverse". www.zooniverse.org. Retrieved 2024-05-10.
- ↑ Cavanagh, Mitchell K.; Bekki, Kenji; Groves, Brent A. (2021-07-08). "Morphological classification of galaxies with deep learning: comparing 3-way and 4-way CNNs". Monthly Notices of the Royal Astronomical Society. 506 (1): 659–676. arXiv:2106.01571. doi:10.1093/mnras/stab1552. ISSN 0035-8711.
- ↑ Goyal, Lalit Mohan; Arora, Maanak; Pandey, Tushar; Mittal, Mamta (2020-12-01). "Morphological classification of galaxies using Conv-nets". Earth Science Informatics. 13 (4): 1427–1436. doi:10.1007/s12145-020-00526-w. ISSN 1865-0481.
- ↑ 4.0 4.1 "Sloan Digital Sky Survey-V: Pioneering Panoptic Spectroscopy - SDSS-V". Retrieved 2024-05-10.
- ↑ Pati, Satavisa (2021-06-18). "How Data Science is Used in Astronomy?". Analytics Insight. Retrieved 2024-05-10.
- ↑ Baron, Dalya (2019-04-15), Machine Learning in Astronomy: a practical overview, arXiv:1904.07248
- ↑ 7.0 7.1 Zhang, Yanxia; Zhao, Yongheng (2015-05-22). "Astronomy in the Big Data Era". Data Science Journal. 14: 11. Bibcode:2015DatSJ..14...11Z. doi:10.5334/dsj-2015-011. ISSN 1683-1470.
- ↑ "The Palomar Digital Sky Survey (DPOSS)". sites.astro.caltech.edu. Retrieved 2024-05-10.
- ↑ "IRSA - Two Micron All Sky Survey (2MASS)". irsa.ipac.caltech.edu. Retrieved 2024-05-10.
- ↑ "GBT". Green Bank Observatory. 2023-06-26. Retrieved 2024-05-10.
- ↑ "GALEX - Galaxy Evolution Explorer". www.galex.caltech.edu. Retrieved 2024-05-10.
- ↑ "SkyMapper Southern Sky Survey". skymapper.anu.edu.au. Retrieved 2024-05-10.
- ↑ "Pan-STARRS1 data archive home page - PS1 Public Archive - STScI Outerspace". outerspace.stsci.edu. Retrieved 2024-05-10.
- ↑ Telescope, Large Synoptic Survey. "Rubin Observatory". Rubin Observatory. Retrieved 2024-05-10.
- ↑ "Explore | SKAO". www.skao.int. Retrieved 2024-05-10.
- ↑ Chowdhury, Shovan; Schoen, Marco P. (2020-10-02). "Research Paper Classification using Supervised Machine Learning Techniques". 2020 Intermountain Engineering, Technology and Computing (IETC). IEEE. pp. 1–6. doi:10.1109/IETC47856.2020.9249211. ISBN 978-1-7281-4291-3. Search this book on
- ↑ Sarstedt, Marko; Mooi, Erik (2014), Sarstedt, Marko; Mooi, Erik, eds., "Regression Analysis", A Concise Guide to Market Research: The Process, Data, and Methods Using IBM SPSS Statistics, Berlin, Heidelberg: Springer, pp. 193–233, doi:10.1007/978-3-642-53965-7_7, ISBN 978-3-642-53965-7, retrieved 2024-05-10
- ↑ "Bulletin de la Société Royale des Sciences de Liège | PoPuPS". Bulletin de la Société Royale des Sciences de Liège (in français). ISSN 0037-9565.
- ↑ Bindra, Kamalpreet; Mishra, Anuranjan (September 2017). "A detailed study of clustering algorithms". 2017 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO). IEEE. pp. 371–376. doi:10.1109/ICRITO.2017.8342454. ISBN 978-1-5090-3012-5. Search this book on
- ↑ Pizzuti, C.; Talia, D. (May 2003). "P-autoclass: scalable parallel clustering for mining large data sets". IEEE Transactions on Knowledge and Data Engineering. 15 (3): 629–641. doi:10.1109/TKDE.2003.1198395. ISSN 1041-4347.
- ↑ Thudumu, Srikanth; Branch, Philip; Jin, Jiong; Singh, Jugdutt (Jack) (2020-07-02). "A comprehensive survey of anomaly detection techniques for high dimensional big data". Journal of Big Data. 7 (1): 42. doi:10.1186/s40537-020-00320-x. hdl:10536/DRO/DU:30158643. ISSN 2196-1115.
- ↑ Weiner, Irving B., ed. (2003-04-15). Handbook of Psychology (1 ed.). Wiley. doi:10.1002/0471264385.wei0223. ISBN 978-0-471-17669-5. Search this book on
This article "Data-driven astronomy" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Data-driven astronomy. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.