Mathematical Sciences

School of Natural Sciences & Mathematics

Statistics Seminar S14

Apr 4

Yulia Gel

University of Texas at Dallas and University of Waterloo

Using bootstrap for statistical inference on random graphs

In this talk, we discuss new nonparametric approach to network inference that may be viewed
as a fusion of block sampling procedures for temporally and spatially dependent processes with the
classical network methodology. We develop estimation and uncertainty quantification procedures
for network mean degree using a “patchwork” sample and nonparametric bootstrap, under the
assumption of unknown degree distribution. We present a data-driven cross-validation methodology for selecting an optimal patch size. We validate the new patchwork bootstrap on simulated networks with short and long
tailed mean degree distributions, and revisit the Eros collaboration data to illustrate the proposed
methodology. This is a joint work with Mary Thompson, Lilia Leticia Ramirez Ramirez and Slava Lyubchich.

Apr 11

Michael Baron

University of Texas at Dallas

Bayesian and asymptotically optimal change-point detection in single and multiple channels

Classical change-point problems deal with identifying and estimating sudden changes in the distribution of observed data sequences. Analysis can be done retrospectively, for changes that occurred in the past, or sequentially, in real time, in which case one looks for a stopping time signaling an abrupt change. Change-points typically occur at random moments, and some prior information about them is often available, implying the need for Bayesian procedures.

Bayesian sequential change-point detection problem is studied for scalar and vector-valued data, where each component can experience a sudden change. The loss function penalizes for false alarms and detection delays, and the penalty increases with each missed change-point. For wide classes of stochastic processes, with or without nuisance parameters, asymptotically pointwise optimal (APO) stopping rules are obtained, translating the classical concept of Bickel and Yahav to sequential change-point detection. These APO rules are attractive because of their simple analytic form, straightforward computation, and weak assumptions.

Application of new methods in environmental science, finance, epidemiology, and energy disaggregation will be shown. These models often involve nuisance parameters, time-dependence, nonstationarity, and rather complex prior distributions. Proposed APO rules can operate under these conditions, achieving asymptotic optimality.

Apr 18

Jufen Chu and Sam Efromovich

University of Texas at Dallas

Hazard rate estimation for left truncated and right censored data and superefficiency in nonparametric estimation

(The first part will be presented by Jufen Chu and the second part will be presented by Sam Efromovich)

Abstract of first part:

Nonparametric estimation of a hazard rate from left truncated and right censored data is a typical situation in applications, and a number of consistent and rate-optimal estimators, under the mean integrated squared error (MISE) criterion, have been proposed. It is known that, under a mild assumption, neither truncation nor censoring affects the rate of the MISE convergence. Hence a sharp constant of the MISE convergence is needed to create a benchmark for an estimator.

This work develops the theory of sharp minimax nonparametric estimation of the hazard rate
with left truncated and right censored data. It is shown how left truncation and right censoring affect the MISE.
The proposed data-driven sharp minimax estimator adapts to smoothness of an underlying hazard rate and
it also adapts to unknown distributions of the truncating and censoring random variables.
Performance of the proposed estimator is illustrated via analysis of simulated and real data, and
for real data nonparametric estimates are complemented by hypotheses testing and confidence bands.

The abstract of second part:

Superefficiency in nonparametric estimation and new rates under a shrinking minimax.

Apr 25

Smirnova Ekaterina

University of Texas at Dallas

Wavelet Estimation: Theory and Application to fMRI data

New theoretical results on wavelet estimation, concerning new minimax rates, adaptive estimation and estimation of large-p-small-n cross-correlation matrices are presented. Theoretical results and the estimators are used for the analysis of fMRI images obtained in the study of neuroplasticity. Traditionally these studies are based on averaging images over large areas in right and left hemispheres and then finding a single cross-correlation function. It is proposed to conduct such an analysis based on a voxel-to-voxel level which immediately yields large cross-correlation matrices. The results allow us not only conclude that during fMRI experiments there is a change in cross-correlation between left and right hemispheres (the fact well known in the literature), but that we can also enrich our understanding how neural pathways are activated on a single voxel-to-voxel level.

May 2

Bhargab Chattopadhyay

University of Texas at Dallas

Sequential Estimation of Gini Index

Economic inequality arises due to the inequality in the distribution of income and assets among individuals or groups within a society, or region or even between countries. For continuous evaluation of different economic policies taken by the government, computation of Gini index periodically for the whole country or state or region is very important. But not all countries can afford or do not collect data from households in a relatively large scale periodically.

In order to compute Gini index for a particular country or a region at given time, a procedure is needed which will minimize both the error of estimation as well as the cost of sampling without any assuming any income distribution. It is well known that error in estimation decreases when the sample size increases which in turn will increase the overall cost of sampling. In the same way, if one wants to minimize the cost of sampling, then one has to use a smaller sample size which in turn will increase the error of estimation. So, a procedure is required which will act as a trade-off between the estimation accuracy and the sampling cost.