Simons Center for Data Analysis
We measure carbon and nitrogen abundances to \lesssim 0.1 dex for 450,000 giant stars from their low-resolution (R\sim1800) LAMOST DR2 survey spectra. We use these [C/M] and [N/M] measurements, together with empirical relations based on the APOKASC sample, to infer stellar masses and implied ages for 230,000 of these objects to 0.08 dex and 0.2 dex respectively. We use The Cannon, a data-driven approach to spectral modeling, to construct a predictive model for LAMOST spectra. Our reference set comprises 8125 stars observed in common between the APOGEE and LAMOST surveys, taking seven APOGEE DR12 labels (parameters) as ground truth: Teff, logg, [M/H], [α\alpha/M], [C/M], [N/M], and Ak. We add seven colors to the Cannon model, based on the g, r, i, J, H, K, W1, and W2 magnitudes from APASS, 2MASS & WISE, which improves our constraints on Teff and logg by up to 20% and on Ak by up to 70%. Cross-validation of the model demonstrates that, for high-SNR objects, our inferred labels agree with the APOGEE values to within 50 K in temperature, 0.04 magnitudes in Ak, and < 0.1 dex in logg, [M/H], [C/M], [N/M], and [α\alpha/M]. We apply the model to 450,000 giants in LAMOST DR2 that have not been observed by APOGEE. This demonstrates that precise individual abundances can be measured from low-resolution spectra, and represents the largest catalog of [C/M], [N/M], masses and ages to date. As as result, we greatly increase the number and sky coverage of stars with mass and age estimates.
Neural network models of early sensory processing typically reduce the dimensionality of streaming input data. Such networks learn the principal subspace, in the sense of principal component analysis (PCA), by adjusting synaptic weights according to activity-dependent learning rules. When derived from a principled cost function these rules are nonlocal and hence biologically implausible. At the same time, biologically plausible local rules have been postulated rather than derived from a principled cost function. Here, to bridge this gap, we derive a biologically plausible network for subspace learning on streaming data by minimizing a principled cost function. In a departure from previous work, where cost was quantified by the representation, or reconstruction, error, we adopt a multidimensional scaling (MDS) cost function for streaming data. The resulting algorithm relies only on biologically plausible Hebbian and anti-Hebbian local learning rules. In a stochastic setting, synaptic weights converge to a stationary state which projects the input data onto the principal subspace. If the data are generated by a nonstationary distribution, the network can track the principal subspace. Thus, our result makes a step towards an algorithmic theory of neural computation.
We develop a data-driven spectral model for identifying and characterizing spatially unresolved multiple-star systems and apply it to APOGEE DR13 spectra of main-sequence stars. Binaries and triples are identified as targets whose spectra can be significantly better fit by a superposition of two or three model spectra, drawn from the same isochrone, than any single-star model. From an initial sample of \sim20,000 main-sequence targets, we identify \sim2,500 binaries in which both the primary and secondary star contribute detectably to the spectrum, simultaneously fitting for the velocities and stellar parameters of both components. We additionally identify and fit \sim200 triple systems, as well as \sim700 velocity-variable systems in which the secondary does not contribute detectably to the spectrum. Our model simplifies the process of simultaneously fitting single- or multi-epoch spectra with composite models and does not depend on a velocity offset between the two components of a binary, making it sensitive to traditionally undetectable systems with periods of hundreds or thousands of years. In agreement with conventional expectations, almost all the spectrally-identified binaries with measured parallaxes fall above the main sequence in the color-magnitude diagram. We find excellent agreement between spectrally and dynamically inferred mass ratios for the \sim600 binaries in which a dynamical mass ratio can be measured from multi-epoch radial velocities. We obtain full orbital solutions for 64 systems, including 14 close binaries within hierarchical triples. We make available catalogs of stellar parameters, abundances, mass ratios, and orbital parameters.
Determining the three-dimensional structure of proteins and protein complexes at atomic resolution is a fundamental task in structural biology. Over the last decade, remarkable progress has been made using "single particle" cryo-electron microscopy (cryo-EM) for this purpose. In cryo-EM, hundreds of thousands of two-dimensional images are obtained of individual copies of the same particle, each held in a thin sheet of ice at some unknown orientation. Each image corresponds to the noisy projection of the particle's electron-scattering density. The reconstruction of a high-resolution image from this data is typically formulated as a nonlinear, non-convex optimization problem for unknowns which encode the angular pose and lateral offset of each particle. Since there are hundreds of thousands of such parameters, this leads to a very CPU-intensive task---limiting both the number of particle images which can be processed and the number of independent reconstructions which can be carried out for the purpose of statistical validation. Here, we propose a deterministic method for high-resolution reconstruction that operates in an ab initio manner---that is, without the need for an initial guess. It requires a predictable and relatively modest amount of computational effort, by marching out radially in the Fourier domain from low to high frequency, increasing the resolution by a fixed increment at each step.
The Kepler Mission has discovered thousands of exoplanets and revolutionized our understanding of their population. This large, homogeneous catalog of discoveries has enabled rigorous studies of the occurrence rate of exoplanets and planetary systems as a function of their physical properties. However, transit surveys like Kepler are most sensitive to planets with orbital periods much shorter than the orbital periods of Jupiter and Saturn, the most massive planets in our Solar System. To address this deficiency, we perform a fully automated search for long-period exoplanets with only one or two transits in the archival Kepler light curves. When applied to the 40,000\sim 40,000 brightest Sun-like target stars, this search produces 16 long-period exoplanet candidates. Of these candidates, 6 are novel discoveries and 5 are in systems with inner short-period transiting planets. Since our method involves no human intervention, we empirically characterize the detection efficiency of our search. Based on these results, we measure the average occurrence rate of exoplanets smaller than Jupiter with orbital periods in the range 2-25 years to be 2.0±0.72.0\pm0.7 planets per Sun-like star.
In this era of large-scale stellar spectroscopic surveys, measurements of stellar attributes ("labels," i.e. parameters and abundances) must be made precise and consistent across surveys. Here, we demonstrate that this can be achieved by a data-driven approach to spectral modeling. With The Cannon, we transfer information from the APOGEE survey to determine precise Teff, log g, [Fe/H], and [α\alpha/M] from the spectra of 450,000 LAMOST giants. The Cannon fits a predictive model for LAMOST spectra using 9952 stars observed in common between the two surveys, taking five labels from APOGEE DR12 as ground truth: Teff, log g, [Fe/H], [\alpha/M], and K-band extinction AkA_k. The model is then used to infer Teff, log g, [Fe/H], and [α\alpha/M] for 454,180 giants, 20% of the LAMOST DR2 stellar sample. These are the first [α\alpha/M] values for the full set of LAMOST giants, and the largest catalog of [α\alpha/M] for giant stars to date. Furthermore, these labels are by construction on the APOGEE label scale; for spectra with S/N > 50, cross-validation of the model yields typical uncertainties of 70K in Teff, 0.1 in log g, 0.1 in [Fe/H], and 0.04 in [α\alpha/M], values comparable to the broadly stated, conservative APOGEE DR12 uncertainties. Thus, by using "label transfer" to tie low-resolution (LAMOST R \sim 1800) spectra to the label scale of a much higher-resolution (APOGEE R \sim 22,500) survey, we substantially reduce the inconsistencies between labels measured by the individual survey pipelines. This demonstrates that label transfer with The Cannon can successfully bring different surveys onto the same physical scale.
We present AGNfitter, a publicly available open-source algorithm implementing a fully Bayesian Markov Chain Monte Carlo method to fit the spectral energy distributions (SEDs) of active galactic nuclei (AGN) from the sub-mm to the UV, allowing one to robustly disentangle the physical processes responsible for their emission. AGNfitter makes use of a large library of theoretical, empirical, and semi-empirical models to characterize both the nuclear and host galaxy emission simultaneously. The model consists of four physical emission components: an accretion disk, a torus of AGN heated dust, stellar populations, and cold dust in star forming regions. AGNfitter determines the posterior distributions of numerous parameters that govern the physics of AGN with a fully Bayesian treatment of errors and parameter degeneracies, allowing one to infer integrated luminosities, dust attenuation parameters, stellar masses, and star formation rates. We tested AGNfitter's performace on real data by fitting the SEDs of a sample of 714 X-ray selected AGN from the XMM-COSMOS survey, spectroscopically classified as Type1 (unobscured) and Type2 (obscured) AGN by their optical-UV emission lines. We find that two independent model parameters, namely the reddening of the accretion disk and the column density of the dusty torus, are good proxies for AGN obscuration, allowing us to develop a strategy for classifying AGN as Type1 or Type2, based solely on an SED-fitting analysis. Our classification scheme is in excellent agreement with the spectroscopic classification, giving a completeness fraction of 86%\sim 86\% and 70%\sim 70\%, and an efficiency of 80%\sim 80\% and 77%\sim 77\%, for Type1 and Type2 AGNs, respectively.
Despite our extensive knowledge of biophysical properties of neurons, there is no commonly accepted algorithmic theory of neuronal function. Here we explore the hypothesis that single-layer neuronal networks perform online symmetric nonnegative matrix factorization (SNMF) of the similarity matrix of the streamed data. By starting with the SNMF cost function we derive an online algorithm, which can be implemented by a biologically plausible network with local learning rules. We demonstrate that such network performs soft clustering of the data as well as sparse feature discovery. The derived algorithm replicates many known aspects of sensory anatomy and biophysical properties of neurons including unipolar nature of neuronal activity and synaptic weights, local synaptic plasticity rules and the dependence of learning rate on cumulative neuronal activity. Thus, we make a step towards an algorithmic theory of neuronal function, which should facilitate large-scale neural circuit simulations and biologically inspired artificial intelligence.
We have shown that data-driven models are effective for inferring physical attributes of stars (labels; Teff, logg, [M/H]) from spectra, even when the signal-to-noise ratio is low. Here we explore whether this is possible when the dimensionality of the label space is large (Teff, logg, and 15 abundances: C, N, O, Na, Mg, Al, Si, S, K, Ca, Ti, V, Mn, Fe, Ni) and the model is non-linear in its response to abundance and parameter changes. We adopt ideas from compressed sensing to limit overall model complexity while retaining model freedom. The model is trained with a set of 12,681 red-giant stars with high signal-to-noise spectroscopic observations and stellar parameters and abundances taken from the APOGEE Survey. We find that we can successfully train and use a model with 17 stellar labels. Validation shows that the model does a good job of inferring all 17 labels (typical abundance precision is 0.04 dex), even when we degrade the signal-to-noise by discarding ~50% of the observing time. The model dependencies make sense: the spectral derivatives with respect to abundances correlate with known atomic lines, and we identify elements belonging to atomic lines that were previously unknown. We recover (anti-)correlations in abundance labels for globular cluster stars, consistent with the literature. However we find the intrinsic spread in globular cluster abundances is 3--4 times smaller than previously reported. We deliver 17 labels with associated errors for 87,563 red giant stars, as well as open-source code to extend this work to other spectroscopic surveys.
There are no more papers matching your filters at the moment.