National Science Foundation
Grace Yang Program Director for Statistics Division of Mathematical Sciences
DMS in a Nutshell Directorate
NSF
BIO CISE ENG GEO MPS SBE EHR
.. .
Division AST CHE DMS DMR PHY
Core Program Algebra & Number Theory Analysis Applied Math Computational Math Geometric Analysis/Topology/ Foundations Infrastructure Math Biology Statistics & Probability
CDI Seeks Transformative Multidisciplinary Research Proposals •
Within or across the three thematic areas:
• Extracting Knowledge from Data: comprises of concepts and techniques for organizing, analyzing, and visualizing massive data sets. Statisticians do that. Massive data sets pose challenges. • Understanding Complexity in Natural, Built, and Social Systems: deriving fundamental insights on systems comprising multiple interacting elements of multi scales. Examples: flow of information across the internet • Building Virtual Organizations: enhancing discovery and innovation by bringing people and resources together across institutional, geographical and cultural boundaries. • Examples: share data and tools.
Cyber-enabled Discovery and Innovation (CDI)
NSF's five-year bold initiative to create revolutionary science and engineering research outcomes made possible by innovations and advances in computational thinking Computational thinking is defined comprehensively to encompass methods, models, algorithms, and tools. Examples: Statistical methods, Probabilistic models, data mining, machine learning and the statistics discipline have much in common. Many tools used are statistical tools.
Mathematics/statistics plays a prominent role in CDI
Statistical challenges posed by massive data sets Examples: 1. The np problem in which the number parameters (p) either grows with the number of observations (n) or p >> n. Study of random matrices, sparsity. 2. How to draw inference from observational data ? How to generalize a discovery from a sample to a population ? modeling of uncertainty.
• 3. Validation procedures. Heterogeneity is challenging. Homogeneity is not necessarily good. Imaging there is only one data set to work on. How do we know that is a valid data set. • 4. Missing value/ partially observed data. Hidden Markov process, nonparametric analysis • 5. Study properties of computing algorithms, rates of convergence • 6. Study of manifold data: random shapes, images, model of skin, etc
Example • Emanuel Candes working with doctors on imaging data • Compressive sampling, use a probabilistic sampling method and wavelets to reconstruct the original image with a very few data points. The work used optimization theory, combinatorics, random matrix theory.
CDI initiative • DMS currently supports seven math science institutes: – IMA, IPAM, MSRI – ARCC (AIM Research Conference Center), IAS, MBI, SAMSI (resulting from 2001-2002 competition) • Check out what’s happening at those institutes – http://www.mathinstitutes.org/
http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf
Emphasis on bold multidisciplinary activities that, through computational thinking Team up!
Examples
CAREER Award: Phenomenon of subdiffusion Conformation fluctuation of biomolecules does not agree with Brownian diffusion (Samuel Kou, Yale)
FRG C. Zhang: New statistical tools for studying brain function Eventrelated (efMRI). Goal is to locate specific regions in human brains when specific tasks are performed. Use nonparametric statistical method with spatial information over the entire brain.