Snippets Groups Projects

Reformatting and ZMQ fix for packet drop and timeout option for receive

Rishi Sharma authored 2 years ago

b2098c44

b2098c44 2 years ago

Name	Last commit	Last update
eval
src/decentralizepy
tutorial
.gitignore
.isort.cfg
README.rst
install_nMachines.sh
pyproject.toml
requirements.txt
setup.cfg
setup.py
split_into_files.py

decentralizepy

decentralizepy is a framework for running distributed applications (particularly ML) on top of arbitrary topologies (decentralized, federated, parameter server). It was primarily conceived for assessing scientific ideas on several aspects of distributed learning (communication efficiency, privacy, data heterogeneity etc.).

Setting up decentralizepy

Fork the repository.
Clone and enter your local repository.
Check if you have python>=3.8.
```
python --version
```

(Optional) Create and activate a virtual environment.

python3 -m venv [venv-name]
source [venv-name]/bin/activate

Update pip.

pip3 install --upgrade pip
pip install --upgrade pip

On Mac M1, installing pyzmq fails with pip. Use conda.
Install decentralizepy for development. (zsh)
```
pip3 install --editable .\[dev\]
```
Install decentralizepy for development. (bash)
```
pip3 install --editable .[dev]
```

Running the code

Choose and modify one of the config files in eval/{step,epoch}_configs.
Modify the dataset paths and addresses_filepath in the config file.
In eval/run.sh, modify arguments as required.
Execute eval/run.sh on all the machines simultaneously. There is a synchronization barrier mechanism at the start so that all processes start training together.

Contributing

isort and black are installed along with the package for code linting.
While in the root directory of the repository, before committing the changes, please run
```
black .
isort .
```

Node

The Manager. Optimizations at process level.

Dataset

Static

Training

Heterogeneity. How much do I want to work?

Graph

Static. Who are my neighbours? Topologies.

Mapping

Naming. The globally unique ids of the processes <-> machine_id, local_rank

Sharing

Leverage Redundancy. Privacy. Optimizations in model and data sharing.

Communication

IPC/Network level. Compression. Privacy. Reliability

Model

Learning Model