暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
Gaia Data Release 3_All-sky classification of 12.4 million variable sources into 25 classes.pdf
87
105页
2次
2024-10-13
免费下载
A&A 674, A14 (2023)
https://doi.org/10.1051/0004-6361/202245591
c
The Authors 2023
Astronomy
&
Astrophysics
Gaia Data Release 3 Special issue
Gaia Data Release 3
All-sky classification of 12.4 million variable sources into 25 classes
Lorenzo Rimoldini
1,?
, Berry Holl
1,2
, Panagiotis Gavras
3
, Marc Audard
1,2
, Joris De Ridder
4
,
Nami Mowlavi
1,2
, Krzysztof Nienartowicz
5
, Grégory Jevardat de Fombelle
1
, Isabelle Lecoeur-Taïbi
1
,
Lea Karbevska
1,6
, Dafydd W. Evans
7
, Péter Ábrahám
8,9
, Maria I. Carnerero
10
, Gisella Clementini
11
,
Elisa Distefano
12
, Alessia Garofalo
11
, Pedro García-Lario
13
, Roy Gomel
14
, Sergei A. Klioner
15
,
Katarzyna Kruszy
´
nska
16
, Alessandro C. Lanzafame
12,17
, Thomas Lebzelter
18
, Gábor Marton
8
,
Tsevi Mazeh
14
, Roberto Molinaro
19
, Aviad Panahi
14
, Claudia M. Raiteri
10
, Vincenzo Ripepi
19
,
László Szabados
8
, David Teyssier
20
, Michele Trabucchi
2
, Łukasz Wyrzykowski
16
,
Shay Zucker
21
, and Laurent Eyer
2
(Aliations can be found after the references)
Received 30 November 2022 / Accepted 17 December 2022
ABSTRACT
Context. Gaia DR3 contains 1.8 billion sources with G -band photometry, 1.5 billion of which with G
BP
and G
RP
photometry, complemented by
positions on the sky, parallax, and proper motion. The median number of field-of-view transits in the three photometric bands is between 40 and
44 measurements per source and covers 34 months of data collection.
Aims. We pursue a classification of Galactic and extra-galactic objects that are detected as variable by Gaia across the whole sky.
Methods. Supervised machine learning (eXtreme Gradient Boosting and Random Forest) was employed to generate multi-class, binary, and meta-
classifiers that classified variable objects with photometric time series in the G, G
BP
, and G
RP
bands.
Results. Classification results comprise 12.4 million sources (selected from a much larger set of potential variable objects) and include about
9 million variable stars classified into 22 variability types in the Milky Way and nearby galaxies such as the Magellanic Clouds and Andromeda,
plus thousands of supernova explosions in distant galaxies, 1 million active galactic nuclei, and almost 2.5 million galaxies. The identification of
galaxies was made possible by the artificial variability of extended objects as detected by Gaia, so they were published in the galaxy_candidates
table of the Gaia DR3 archive, separate from the classifications of genuine variability (in the vari_classifier_result table). The latter contains
24 variability classes or class groups of periodic and non-periodic variables (pulsating, eclipsing, rotating, eruptive, cataclysmic, stochastic, and
microlensing), with amplitudes from a few milli-magnitudes to several magnitudes.
Key words. catalogs – galaxies: general – methods: data analysis – quasars: general – stars: variables: general
1. Introduction
Time-dependent brightness variations of celestial objects may be
caused by dierent phenomena: intrinsic physical changes, such
as pulsations, eruptions, and cataclysmic outbursts, or extrin-
sic reasons that depend on the direction of observation, such
as eclipsing binaries, stars rotating with spots or with ellip-
soidal shapes, and microlensing events, as shown in Fig. 1 of
Gaia Collaboration (2019). The detection of variability requires
multi-epoch observations and, depending on the signal sampling,
a certain set of classes can be identified. Gaias sparse sampling
allows for the detection of periodic signals ranging from min-
utes to years and for medium to long-term non-periodic variabil-
ity. The chance of detection of up to approximately six-week
long transient phenomena, for example, crucially depends on
the sampling at a given location in the sky (see Appendix A in
Eyer et al. 2017), which follows from the scanning law proper-
ties (Gaia Collaboration 2016b). Although the scanning law of
Gaia was designed for astrometric goals, it allows for the iden-
tification of a broad variety of variability types, with dierent
possible levels of completeness (Eyer et al. 2023).
?
Corresponding author: L. Rimoldini,
e-mail: Lorenzo.Rimoldini@unige.ch
As the time span of Gaia data collection progressively
increased from Gaia data release 1 (DR1) to DR2 and DR3 (14,
22, and 34 months, respectively; Gaia Collaboration 2016a, 2018,
2023c), the classified variability types increased from Cepheids
and RR Lyrae stars in a limited region of the sky (in DR1;
Eyer et al. 2017), to an all-sky
1
classification of the DR1 classes
plus long-period variables and δ Scuti or SX Phoenicis stars (in
DR2; Rimoldini et al. 2019), and 20 further variability classes
in DR3 (presented in this article and listed in Sect. 3.1.1). For
brevity, we refer to Table 1 for selected publications related to
these classes, with representatives identified in various surveys.
Machine learning is a practical tool to automate classifica-
tion tasks that involve multiple known classes and a possibly high
number of attributes to identify such classes and distinguish them
from others (e.g. see Debosscher et al. 2007; Sarro et al. 2009;
Blomme et al. 2010; Richards et al. 2011; Dubath et al. 2011).
Herein, we present how a supervised classification was applied to
Gaia DR3 data to classify variable sources into two dozen classes
(plus galaxies). In particular, we describe the details concerning
the construction of the training set and of the classifiers, the verifi-
cation of the results, and the generation of an overall classification
1
Due to the scanning law coverage, only from Gaia DR2 onwards were
sucient epochs available at all locations on the sky, for example, see
Fig. 1 in Holl et al. (2018).
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.
A14, page 1 of 105
Rimoldini, L., et al.: A&A 674, A14 (2023)
score. Selection procedures, parameter distributions, and assess-
ments of candidates are presented for each class.
Some of the classification results are further processed
by specific object studies (SOSs) dedicated to single classes,
typically describing a subset of the most reliable candidates in
detail. Such single-class processing modules are available in
DR3 for active galactic nuclei (AGNs Carnerero et al. 2023),
Cepheids (Ripepi et al. 2023), compact companions (Gomel et al.
2023), eclipsing binaries (Mowlavi et al. 2023), long-period
variables (Lebzelter et al. 2023), main-sequence oscillators
(Gaia Collaboration 2023b), planetary transits (Panahi et al.
2022), and RR Lyrae stars (Clementini et al. 2023). Other SOS
modules, such as microlensing events (Wyrzykowski et al. 2023),
short-timescale variables (see Sect. 10.12 of the Gaia DR3
documentation; Rimoldini et al. 2022), and solar-like rotation
modulation stars (Distefano et al. 2023), were executed indepen-
dently of the classification results, as they relied on their own
candidate selection. A summary of the variability results from all
modules is presented in Eyer et al. (2023).
This article is organised as follows. The classification input
data are outlined in Sect. 2; the preparation, application, and veri-
fication ofsupervised learning procedures are described in Sect. 3;
the results for each class are presented in Sect. 4; and conclusions
are drawn in Sect. 5. Special training selections applied to a sub-
set of classes are detailed in Appendix A; selected classification
attributes are listed in Appendix B; additional class labels from
the literature (among the false positive classes listed in Table 3)
are defined in Appendix C; some examples of queries to facilitate
the exploitation of classification results in the Gaia archive
2
are
provided in Appendix D; and common diagrams for all classes,
including a summary of trained and classified sources, an assess-
ment of the results with respect to the literature, and sample light
curves, are presented in Appendix E. All table names in the Gaia
archive that are mentioned in the text assume the prefix gaiadr3
(as shown in Appendix D).
2. Data
As part of the Gaia variability pipeline (Eyer et al. 2023), the gen-
eral classification module received as input sources with pho-
tometric time series in the G, G
BP
, and G
RP
bands (Riello et al.
2021) that had at least five field-of-view (FoV) measurements
in the G band, which were already identified as potential vari-
able sources and characterised by basic statistics and periodic-
ity parameters. Before any computation, sources and associated
epoch FoV transits were processed by the chain of operators
described in Sect. 10.2.3 of the Gaia DR3 documentation
(Rimoldini et al. 2022) and Sect. 3.1 of Eyer et al. (2023), which
selected, transformed, and cleaned time series from spurious or
doubtful observations. The balance between outlier removal and
signal preservation favoured the latter, considering that some of
the targeted variability types relied on a small number of outlier-
like measurements (such as Algol-type eclipsing binaries and
microlensing events). All time series and derived statistical num-
bers hereafter refer to these cleaned time series. The median
number of FoV measurements in the three photometric bands is
between 40 and 44 per source (Eyer et al. 2023), within a time
span of typically 900–1000 days in the G band.
While the processing of Gaia (early) DR3 photometry
included significant calibration improvements with respect to
DR2 (Riello et al. 2021), some low-level uncalibrated system-
atic eects remained and their impact on epoch photometry are
2
https://gea.esac.esa.int/archive/
described in Evans et al. (2023). Among instrumental eects,
scan-angle dependent signals were induced mainly by asym-
metric extended sources (such as barred spiral galaxies and
tidally distorted stars) and multiple close pairs (.1
00
) of point-
like sources (Holl et al. 2023). Although such signals helped the
identification of galaxies from photometric variations, in general
data artefacts might interfere with the correct identification of
classes with genuine variability, especially those associated with
low signal-to-noise ratios.
The classification of variables employed also astromet-
rically derived parameters such as parallax and proper
motion (Lindegren et al. 2021b). However, Gaia DR3 astro-
physical parameters (Andrae et al. 2023; Creevey et al. 2023;
Delchambre et al. 2023; Fouesneau et al. 2023) could not be
included as they were processed in parallel and became avail-
able after the results of the variability pipeline were finalised.
A subset of classified sources were analysed in more detail
by subsequent SOS modules, typically focusing on specific
classes, as mentioned in Sect. 1. The results of all variability
modules were subject to additional source filtering before their
ingestion into the public Gaia archive (Babusiaux et al. 2023).
Statistical parameters of all the photometric time series pub-
lished in Gaia DR3 are available in the vari_summary table.
3. Method
For Gaia DR3, general classification relied on supervised
machine learning, that is, training classifiers with sources of
known variability types and applying the resulting models to
classify sources of an unknown variability type. Known vari-
ables in the literature are cross-matched with Gaia sources,
verified, selected, and characterised by attributes derived from
the Gaia data. The use of both cross-match sources and (opti-
mised) classification attributes for training was described in
Rimoldini et al. (2019) and it is not repeated herein.
An extensive cross-match of Gaia sources was compiled by
Gavras et al. (2023), which provided millions of variable objects
from the literature and represented over 100 variability types. The
robustness of the cross-match method, which included astromet-
ric and photometric information in the identification of matches,
and the verification of the genuineness of literature classifi-
cations ensured the reliability of training sources (critical to
supervised classification) and of the validation of the results.
3.1. Training set
Potential training sources from literature were vetted for each
class to ensure the correct class membership. This was repeated
for every catalogue that was deemed trustable for training the
class under investigation. The reliance of supervised classifica-
tion on known objects makes it vulnerable to biases from the
literature, for instance, related to their data acquisition and clas-
sification methods. Thus, in addition to class verification, the
cross-matched objects were probed in several dimensions to
identify intrinsic biases, such as limited sky coverage or appar-
ent magnitude range with respect to the ones of Gaia, in order
to prevent (or minimise) the transfer of literature selection func-
tions to the Gaia classifications.
3.1.1. Published classes
Since it was dicult to know a priori the full list of classes
that could be identified in Gaia DR3, the verification of liter-
ature classifications and source selection for training purposes
A14, page 2 of 105
of 105
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜