Data-dependent Coreset for Large-scale, Robust, and Dynamic Machine Learning (Hu Ding)

Abstract

Asthe rapid development of big data, we often confront with large-scale and noisydataset for many machine learning tasks. Coreset is a popular data compressiontechnique that has been extensively studied before. However, most of existingcoreset methods are problem-dependent and cannot be used as a general tool fora broader range of applications. A key obstacle is that they often rely on thepseudo-dimension and total sensitivity bound that can be very high or hard toobtain. Moreover, existing coreset methods are sensitive to outliers and cannotbe efficiently constructed in a dynamic environment with data insertion anddeletion. In this talk, we introduce a new data-dependent framework for coresetconstruction, which is useful for many popular optimization objectives likek-means/median clustering, Lasso, Ridge, Logistic regression, and Gaussianmixture model. In particular, our framework can effectively deal with outliersand dynamical updates. To the best of our knowledge, this is the first robustand fully-dynamic coreset construction method for these problems. Part of thiswork have recently appeared in ICML’20 and ICML’21.

Time

2021-06-18  11:30-12:00    

Speaker

Hu Ding, University of Science and Technology of China

Room

Guangdong Hotel Shanghai