LBANN

利弗莫尔大型人工神经网络工具包。「Livermore Big Artificial Neural Network Toolkit」

Github stars Tracking Chart

LBANN:利弗莫尔大型人工神经网络工具包

利弗莫尔大型人工神经网络工具包(LBANN)是一个开源的、以 HPC 为中心的深度学习训练框架,它被优化为多层次的并行性组合。

LBANN 通过领域分解提供模型并行加速,以优化网络训练的强大扩展性。它还允许将模型并行性与数据并行性和集合训练方法结合起来,用大量的数据训练大型神经网络。LBANN 能够利用紧密耦合的加速器、低延迟高带宽网络和高带宽并行文件系统的优势。

除了传统的监督学习之外,LBANN 还支持最先进的训练算法,如无监督、自监督和对抗性(GAN)训练方法。它还支持通过时间反向传播(BPTT)训练的递归神经网络、转移学习以及多模型和集合训练方法。

构建 LBANN

LBANN 用户安装 LBANN 的首选方法是使用 Spack。在进行了一些系统配置后,这应该是很直接的,如:

spack install lbann

关于构建和安装 LBANN 的更多详细说明,可参见 LBANN 主文档

运行 LBANN

运行 LBANN 的基本模板是:

<mpi-launcher> <mpi-options> \
    lbann <lbann-options> \
    --model=model.prototext \
    --optimizer=opt.prototext \
    --reader=data_reader.prototext

当使用GPGPU加速器时,用户应该注意LBANN是针对每个MPI等级分配一个GPU的情况而优化的。在选择MPI启动器的参数时,应牢记这一点。

关于运行 LBANN 的更多细节记录在此

著作

这里 显示的是出版物、演讲和海报的清单。

报告问题

可以在 Github 问题跟踪器上提出问题、疑问和漏洞。

Overview

Name With OwnerLLNL/lbann
Primary LanguageC++
Program languageCMake (Language Count: 8)
PlatformLinux, Mac
License:Other
Release Count21
Last Release Namewide_resnet50_amp_baseline (Posted on 2023-11-09 12:04:44)
First Release Namev0.9 (Posted on 2016-07-19 13:40:23)
Created At2016-05-11 20:04:20
Pushed At2024-05-02 22:50:51
Last Commit At2023-11-07 17:09:58
Stargazers Count219
Watchers Count23
Fork Count80
Commits Count8.3k
Has Issues Enabled
Issues Count469
Issue Open Count166
Pull Requests Count1771
Pull Requests Open Count44
Pull Requests Close Count151
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private

LBANN: Livermore Big Artificial Neural Network Toolkit

The Livermore Big Artificial Neural Network toolkit (LBANN) is an
open-source, HPC-centric, deep learning training framework that is
optimized to compose multiple levels of parallelism.

LBANN provides model-parallel acceleration through domain
decomposition to optimize for strong scaling of network training. It
also allows for composition of model-parallelism with both data
parallelism and ensemble training methods for training large neural
networks with massive amounts of data. LBANN is able to advantage of
tightly-coupled accelerators, low-latency high-bandwidth networking,
and high-bandwidth parallel file systems.

LBANN supports state-of-the-art training algorithms such as
unsupervised, self-supervised, and adversarial (GAN) training methods
in addition to traditional supervised learning. It also supports
recurrent neural networks via back propagation through time (BPTT)
training, transfer learning, and multi-model and ensemble training
methods.

Building LBANN

The preferred method for LBANN users to install LBANN is to use
Spack. After some system
configuration, this should be as straightforward as

spack install lbann

More detailed instructions for building and installing LBANN are
available at the main LBANN
documentation
.

Running LBANN

The basic template for running LBANN is

<mpi-launcher> <mpi-options> \
    lbann <lbann-options> \
    --model=model.prototext \
    --optimizer=opt.prototext \
    --reader=data_reader.prototext

When using GPGPU accelerators, users should be aware that LBANN is
optimized for the case in which one assigns one GPU per MPI
rank. This should be borne in mind when choosing the parameters for
the MPI launcher.

More details about running LBANN are documented
here.

Publications

A list of publications, presentations and posters are shown
here.

Reporting issues

Issues, questions, and bugs can be raised on the Github issue
tracker
.

To the top