Investigation of Training Order Effects on Artificial Neural Networks for Image Recognition
Wang, Xiaotian
:
2018-04-02
Abstract
The stochastic gradient descent algorithm for training neural networks is widely used in many machine learning, especially deep learning tasks. The stochastic gradient descent algorithm operates by choosing a small fraction of the training data, called a mini-batch, at each iteration to compute an approximation of the gradient of the objective function to be optimized. In practice, researchers tend to use small batch sizes, and the training data fed into the neural network is usually of various categories and is in random order. Researchers have shown the advantages of smaller sizes of mini-batches quantitatively, yet in the past, there were very few formal investigations into the question of how the order of training data would affect the training efficiency and generalizability of the neural network. To gain more insight into this problem, we have investigated effects of training order and the composition of a mini-batch by conducting a series of controlled experiments. In our experiments, we retrained an existing neural network model for object recognition with images from the ImageNet dataset and from a newly-collected dataset called the Toy-Box dataset. We investigated using optimization techniques like genetic algorithms and simulated annealing to optimize the order of training data. Also, we compared training efficiency for different compositions of mini-batches.