Distributed Systems for Neural Network models


Click here to go to the thesis: Distributed Systems for Neural Network models

Thesis overview: In the present thesis we discuss the implementation of distributed systems for the evaluation of convolutional neural network models, exploiting the isolation properties of Docker containers. The results are split in two main chapters: in the first chapter we build and analyse the performance of a cluster managed by Apache Spark for the inference of images starting from pre-trained models; in the second one we discuss a possible implementation of a distributed multi-hosts and multi-gpus system for the training phase of different models, considering its scalability limits. The focus of the exposition is on the principles behind the implementations, without dealing too much with the details, trying instead to describe every component in a logical order with specific examples encountered in the project. Most of the work has been a non-systematic effort to interpolate between official documentations and public available resources, in order to gain a knowledge as linear as possible. If this linearity does show up along the essay, then one of the main goals of this thesis will be achieved, as the logical dots will be properly connected. In the chapters we will refer directly to the code if needed; every piece of code discussed in the following as well as documentations can be found at the public repository https://gitlab.com/NFFA-Europe-JRA3/lciuffreda-mhpc-17-18%%. The results presented in the following chapters constitute an original body of work which is meant to be a building block for the development of new distributed neural network systems for future projects. As such they open to new lines of investigation for several classification problems.