Non-convergence to undesirable critical points by adaptive gradient methods (Xiao Wang)


Adaptive first-order methods in optimization have widespread ML applications due to their ability to adapt to non-convex landscapes. However, their convergence guarantees are typically stated in terms of vanishing gradient norms, which leaves open the issue of converging to undesirable saddle points. This talk will focus on the AdaGrad family of algorithms and we examine the question of whether the method's trajectories avoid saddle points. We will see that classic method from dynamical system plays a prominent role in the analysis.


2022-5-17  13:30 - 14:30   


Xiao Wang,  ITCS@SUFE


Tencent meeting ID: 861-8393-9675; PW: 123456