We analyze the learning properties of the stochastic gradient method when multiple passes over the data and mini-batches are allowed. We study how regularization properties are controlled by the step-size, the number of passes and the mini-batch size. In particular, we consider the square loss and show that for a universal step-size choice, the...
-
2017 (v1)PublicationUploaded on: April 14, 2023
-
2016 (v1)Publication
We analyze the learning properties of the stochastic gradient method when multiple passes over the data and mini-batches are allowed. In particular, we consider the square loss and show that for a universal step-size choice, the number of passes acts as a regularization parameter, and optimal finite sample bounds can be achieved by...
Uploaded on: April 14, 2023 -
2016 (v1)Publication
We consider the problem of supervised learning with convex loss functions and propose a new form of iterative regularization based on the subgradient method. Unlike other regularization approaches, in iterative regularization no constraint or penalization is considered, and generalization is achieved by (early) stopping an empirical iteration....
Uploaded on: April 14, 2023 -
2017 (v1)Publication
In this note, we propose and study the notion of modified Fejér sequences. Within a Hilbert space setting, this property has been used to prove ergodic convergence of proximal incremental subgradient methods. Here we show that indeed it provides a unifying framework to prove convergence rates for objective function values of...
Uploaded on: April 14, 2023