Inhomogeneous LargeScale Data: Maximin Effects and their Statistical Estimation Largescale or "big" data usually refers to scenarios with potentially
very many variables (dimension p) and very large sample size n. Such data
is most often of "inhomogeneous" nature, i.e., neither being
i.i.d. realizations from a distribution nor being generated from a
stationary distribution. We propose a new methodology for some class of
largescale inhomogeneous data, in terms of socalled maximin effects which
optimize performance in the most adversarial constellation. The advocated
procedure is computationally efficient and under certain circumstances
orders of magnitudes faster than standard penalized regression estimators,
and we provide statistical accuracy guarantees for scenarios where n and/or
p are large.
Scalable Bayesian Model Selection Methods Bayesian model selection faces challenges both in theory and in computation when the number of potential covariates p is large. We propose a Bayesian variable selection method for logistic regression that adapts to both the sample size n and the number of potential covariates p with two important features. First, it has strong model selection consistency even when p is large. Second, we propose a new Gibbs sampler that does not require p2 operations in each of its iterations. In contrast with the standard Gibbs sampler which requires sampling from a p dimensional multivariate normal distribution with a nonsparse covariance matrix, our new algorithm is much more scalable to high dimensional problems, both in memory and in computational efficiency. We compare our proposed method with several leading variable selection methods through a simulation study to show that our proposed approach selects the correct model with higher probabilities than most competitors. The talk is based on ongoing work with Naveen Narisetty and Juan Shen.
