DPVS

DPVS 是基于 DPDK 的高性能第 4 层负载均衡器。(DPVS is a high performance Layer-4 load balancer based on DPDK.)

Github stars Tracking Chart

DPVS

简介

DPVS 是基于 DPDK 的高性能 Layer-4负载均衡器。它源自Linux虚拟服务器 LVS 及其修改阿里巴巴/LVS

名称 DPVS 来自 “DPDK-LVS”。

有几项技术适用于高性能:

  • 内核旁路(用户空间实现)
  • 无共享,每个CPU用于关键数据(无锁)
  • RX Steering CPU亲和力(避免上下文切换)
  • 批处理发送/接收
  • Zero Copy (避免数据包复制和系统调用)。
  • 轮询而不是中断。
  • 无锁消息用于高性能ICP。
  • 通过 DPDK 增强的其他技术。

DPVS 的主要功能包括:

  • L4 Load Balancer ,包括FNAT,DR模式等。
  • 与RR,WLC,WRR等不同的日程安排算法
  • 用户空间精简版IP堆栈(IPv4,路由,ARP,ICMP ...)。
  • SNAT 模式可从内部网络访问Internet。
  • 为不同的IDC环境支持 KNI VLAN 绑定
  • 安全性方面,支持 TCP syn-proxy Conn-Limit black-list
  • QoS:流量控制

快速入门

测试环境

这个快速入门已在下面的环境中进行测试。

  • Linux发行版:CentOS 7.2
  • 内核:3.10.0-327.el7.x86_64
  • CPU:Intel(R)Xeon(R)CPU E5-2650 v3 @ 2.30GHz
  • 网卡:英特尔公司以太网控制器10千兆位X540-AT2(rev 03)
  • 内存:带有两个NUMA节点的64G。
  • GCC:gcc版本4.8.5 20150623(Red Hat 4.8.5-4)

如果DPDK工作,其他环境也应该可以,请查阅 dpdk.org 获取更多信息。

克隆DPVS

$ git clone https://github.com/iqiyi/dpvs.git
$ cd dpvs

那么,让我们从DPDK开始吧。

DPDK设置。

目前,dpdk-stable-17.11.2 用于 DPVS。

如果遇到DPDK,可以跳过本节,并参阅链接获取详细信息

$ wget https://fast.dpdk.org/rel/dpdk-17.11.2.tar.xz   # download from dpdk.org if link failed.
$ tar vxf dpdk-17.11.2.tar.xz

有一个用于硬件多播的DPDKkni 驱动程序的补丁,如果需要,可以应用它(例如,在 kni 设备上启动 ospfd )。

假设我们在DPVS根目录下并且dpdk-stable-17.05.2在它之下,请注意它不是强制的,只是为了方便。

$ cd <path-of-dpvs>
$ cp patch/dpdk-stable-17.11.2/*.patch dpdk-stable-17.11.2/
$ cd dpdk-stable-17.11.2/
$ patch -p 1 < 0001-PATCH-kni-use-netlink-event-for-multicast-driver-par.patch

另一个DPDK补丁正在修复具有IP选项的数据包的校验和API,它是UOA模块所需要的。

$ patch -p1 < 0002-net-support-variable-IP-header-len-for-checksum-API.patch

DPDK 构建和安装

现在为DPDK应用程序(DPVS)构建DPDK并导出 RTE_SDK env变量。

$ cd dpdk-stable-17.11.2/
$ make config T=x86_64-native-linuxapp-gcc
Configuration done
$ make # or make -j40 to save time, where 40 is the cpu core number.
$ export RTE_SDK=$PWD

在我们的教程中,未设置 RTE_TARGET ,缺省值为“build”,因此可以在 dpdk-stable-17.11.2/build 中找到DPDK库和头文件。

现在要设置DPDK hugepage,我们的测试环境是NUMA系统。对于单节点系统,请参阅链接

$ # for NUMA machine
$ echo 8192 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
$ echo 8192 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages

$ mkdir /mnt/huge
$ mount -t hugetlbfs nodev /mnt/huge

安装内核模块并使用 igb_uio 驱动程序绑定NIC。快速入门只使用一个NIC,通常我们使用2个用于Full-NAT群集,甚至4个用于Bonding模式。假设 eth0 将用于DPVS/DPDK,另一个用于调试的独立Linux NIC,例如 eth1 。

$ modprobe uio
$ cd dpdk-stable-17.11.2

$ insmod build/kmod/igb_uio.ko
$ insmod build/kmod/rte_kni.ko

$ ./usertools/dpdk-devbind.py --status
$ ifconfig eth0 down  # assuming eth0 is 0000:06:00.0
$ ./usertools/dpdk-devbind.py -b igb_uio 0000:06:00.0

dpdk-devbind.py -u 可用于解除驱动程序绑定并将其切换回Linux驱动程序,如 ixgbe 。您也可以使用 lspci 或 ethtool -i eth0 来检查NIC PCI总线标识。详情请参阅 DPDK网站

建立DPVS

很简单,只需设置 RTE_SDK 并构建它。

$ cd dpdk-stable-17.11.2/
$ export RTE_SDK=$PWD
$ cd <path-of-dpvs>

$ make # or "make -j40" to speed up.
$ make install

可能需要安装依赖关系,例如 openssl , popt 和 numactl ,例如 yum install popt-devel (CentOS)。

输出文件安装到 dpvs/bin 。

$ ls bin/
dpip  dpvs  ipvsadm  keepalived
  • dpvs 是主程序。
  • dpip 是设置IP地址,路由,vlan,neigh等的工具。
  • ipvsadm 和 keepalived 来自LVS,两者都被修改。

启动DPVS

现在, dpvs.conf 必须放在 /etc/dpvs.conf 中,只需从 conf/dpvs.conf.single-nic.sample 复制它即可。

$ cp conf/dpvs.conf.single-nic.sample /etc/dpvs.conf

并启动DPVS,

$ cd <path-of-dpvs>/bin
$ ./dpvs &

检查它是否已启动?

$ ./dpip link show
1: dpdk0: socket 0 mtu 1500 rx-queue 8 tx-queue 8
    UP 10000 Mbps full-duplex fixed-nego promisc-off
    addr A0:36:9F:9D:61:F4 OF_RX_IP_CSUM OF_TX_IP_CSUM OF_TX_TCP_CSUM OF_TX_UDP_CSUM

如果您看到此消息。做得好, DPVS 正在使用NICdpdk0 !

如果您看到此错误,请不要担心,

EAL: Error - exiting with code: 1
  Cause: ports in DPDK RTE (2) != ports in dpvs.conf(1)
    

这意味着DPVS使用的NIC不匹配 /etc/dpvs.conf 。请使用 dpdk-devbind 来调整NIC编号或修改 dpvs.conf 。我们将改进此部分,以使DPVS更“巧妙”,以避免在NIC计数不匹配时修改配置文件。

测试全NAT负载均衡器

测试拓扑看起来像,

fnat-single-nic

在DPVS上设置VIP和本地IP(LIP,完全NAT模式需要)。让我们把命令放到 setup.sh 中。您可以通过 ./ipvsadm -ln , ./dpip addr show 进行一些检查。

$ cat setup.sh
VIP=192.168.100.100
LIP=192.168.100.200
RS=192.168.100.2

./dpip addr add ${VIP}/24 dev dpdk0
./ipvsadm -A -t ${VIP}:80 -s rr
./ipvsadm -a -t ${VIP}:80 -r ${RS} -b

./ipvsadm --add-laddr -z ${LIP} -t ${VIP}:80 -F dpdk0
$

$ ./setup.sh

从客户端访问VIP,看起来不错!

client $ curl 192.168.100.100
Your ip:port : 192.168.100.3:56890

配置教程

更多配置示例可以在教程文档中找到。包括,

  • WAN到LAN <代码>完全NAT 反向代理。
  • 直接路由( DR )模式。
  • 主/备份模式(keepalived)。
  • OSPF/ECMP集群模型。
  • SNAT 从内部网络访问Internet的模式。
  • 虚拟设备( Bonding , VLAN , kni )
  • ...

性能测试

我们的测试显示DPVS的转发速度(pps)是LVS的几倍,与Google的 Maglev

许可证

请参阅许可证档案。

联系我们

DPVS 由2016年4月开始由 iQiYi QLB 开发,现在开放-sourced。它已经在iQiYi IDC中用于L4负载均衡器和SNAT集群,并且我们计划用DPVS替换我们所有的LVS集群。我们很高兴在这个项目中有更多的人参与进来。欢迎尝试,报告问题并提交拉取请求。请随时通过 Github 电子邮件与我们联系。

  • github: hTTPS://github.com/iqiyi/dpvs
  • 电子邮件: qlb-devel#dev.qiyi.com (请删除空格并用 @ 替换#)。

Overview

Name With Owneriqiyi/dpvs
Primary LanguageC
Program languageMakefile (Language Count: 10)
PlatformLinux
License:Other
Release Count13
Last Release Namev1.9.6 (Posted on )
First Release Namev1.6.1 (Posted on )
Created At2017-10-10 06:14:02
Pushed At2024-04-19 11:39:46
Last Commit At2024-01-02 11:26:10
Stargazers Count2.9k
Watchers Count193
Fork Count703
Commits Count1.5k
Has Issues Enabled
Issues Count365
Issue Open Count98
Pull Requests Count434
Pull Requests Open Count17
Pull Requests Close Count135
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private

dpvs-logo.png

Introduction

DPVS is a high performance Layer-4 load balancer based on DPDK. It's derived from Linux Virtual Server LVS and its modification alibaba/LVS.

the name DPVS comes from "DPDK-LVS".

dpvs.png

Several techniques are applied for high performance:

  • Kernel by-pass (user space implementation)
  • Share-nothing, per-CPU for key data (Lockless)
  • RX Steering and CPU affinity (avoid context switch)
  • Batching TX/RX
  • Zero Copy (avoid packet copy and syscalls).
  • Polling instead of interrupt.
  • lockless message for high performance ICP.
  • other techs enhanced by DPDK.

Major features of DPVS including:

  • L4 Load Balancer, including FNAT, DR mode, etc.
  • Different schedule algorithm like RR, WLC, WRR, etc.
  • User-space Lite IP stack (IPv4, Routing, ARP, ICMP ...).
  • SNAT mode for Internet access from internal network.
  • Support KNI, VLAN, Bonding for different IDC environment.
  • Security aspect, support TCP syn-proxy, Conn-Limit, black-list.
  • QoS: Traffic Control.

DPVS feature modules are illustrated as following picture.

modules

Quick Start

Test Environment

This quick start is tested with the environment below.

  • Linux Distribution: CentOS 7.2
  • Kernel: 3.10.0-327.el7.x86_64
  • CPU: Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
  • NIC: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 03)
  • Memory: 64G with two NUMA node.
  • GCC: gcc version 4.8.5 20150623 (Red Hat 4.8.5-4)

Other environment should also OK if DPDK works, pls check dpdk.org for more info.

  • Pls check this link for NICs supported by DPDK: http://dpdk.org/doc/nics.
  • Note flow-director (fdir) is needed for Full-NAT and SNAT mode with multi-cores.

Clone DPVS

$ git clone https://github.com/iqiyi/dpvs.git
$ cd dpvs

Well, let's start from DPDK then.

DPDK setup.

Currently, dpdk-stable-17.11.2 is used for DPVS.

You can skip this section if experienced with DPDK, and refer the link for details.

$ wget https://fast.dpdk.org/rel/dpdk-17.11.2.tar.xz   # download from dpdk.org if link failed.
$ tar vxf dpdk-17.11.2.tar.xz

DPDK patchs

There's a patch for DPDK kni driver for hardware multicast, apply it if needed (for example, launch ospfd on kni device).

assuming we are in DPVS root dir and dpdk-stable-17.11.2 is under it, pls note it's not mandatory, just for convenience.

$ cd <path-of-dpvs>
$ cp patch/dpdk-stable-17.11.2/*.patch dpdk-stable-17.11.2/
$ cd dpdk-stable-17.11.2/
$ patch -p 1 < 0001-PATCH-kni-use-netlink-event-for-multicast-driver-par.patch

Another DPDK patch is fixing checksum API for the packets with IP options, it's needed for UOA module.

$ patch -p1 < 0002-net-support-variable-IP-header-len-for-checksum-API.patch

DPDK build and install

Now build DPDK and export RTE_SDK env variable for DPDK app (DPVS).

$ cd dpdk-stable-17.11.2/
$ make config T=x86_64-native-linuxapp-gcc
Configuration done
$ make # or make -j40 to save time, where 40 is the cpu core number.
$ export RTE_SDK=$PWD

In our tutorial, RTE_TARGET is not set, the value is "build" by default, thus DPDK libs and header files can be found in dpdk-stable-17.11.2/build.

Now to set up DPDK hugepage, our test environment is NUMA system. For single-node system pls refer the link.

$ # for NUMA machine
$ echo 8192 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
$ echo 8192 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages

$ mkdir /mnt/huge
$ mount -t hugetlbfs nodev /mnt/huge

Install Kernel modules and bind NIC with igb_uio driver. Quick start uses only one NIC, normally we use 2 for Full-NAT cluster, even 4 for bonding mode. Assuming eth0 will be used for DPVS/DPDK, and another standalone Linux NIC for debug, for example, eth1.

$ modprobe uio
$ cd dpdk-stable-17.11.2

$ insmod build/kmod/igb_uio.ko
$ insmod build/kmod/rte_kni.ko

$ ./usertools/dpdk-devbind.py --status
$ ifconfig eth0 down  # assuming eth0 is 0000:06:00.0
$ ./usertools/dpdk-devbind.py -b igb_uio 0000:06:00.0

dpdk-devbind.py -u can be used to unbind driver and switch it back to Linux driver like ixgbe. You can also use lspci or ethtool -i eth0 to check the NIC PCI bus-id. Pls see DPDK site for details.

Build DPVS

It's simple, just set RTE_SDK and build it.

$ cd dpdk-stable-17.11.2/
$ export RTE_SDK=$PWD
$ cd <path-of-dpvs>

$ make # or "make -j40" to speed up.
$ make install

may need install dependencies, like openssl, popt and numactl, e.g., yum install popt-devel (CentOS).

Output files are installed to dpvs/bin.

$ ls bin/
dpip  dpvs  ipvsadm  keepalived
  • dpvs is the main program.
  • dpip is the tool to set IP address, route, vlan, neigh etc.
  • ipvsadm and keepalived come from LVS, both are modified.

Launch DPVS

Now, dpvs.conf must be put at /etc/dpvs.conf, just copy it from conf/dpvs.conf.single-nic.sample.

$ cp conf/dpvs.conf.single-nic.sample /etc/dpvs.conf

and start DPVS,

$ cd <path-of-dpvs>/bin
$ ./dpvs &

Check if it's get started ?

$ ./dpip link show
1: dpdk0: socket 0 mtu 1500 rx-queue 8 tx-queue 8
    UP 10000 Mbps full-duplex fixed-nego promisc-off
    addr A0:36:9F:9D:61:F4 OF_RX_IP_CSUM OF_TX_IP_CSUM OF_TX_TCP_CSUM OF_TX_UDP_CSUM

If you see this message. Well done, DPVS is working with NIC dpdk0!

Don't worry if you see this error,

EAL: Error - exiting with code: 1
  Cause: ports in DPDK RTE (2) != ports in dpvs.conf(1)

it means the NIC used by DPVS is not match /etc/dpvs.conf. Pls use dpdk-devbind to adjust the NIC number or modify dpvs.conf. We'll improve this part to make DPVS more "clever" to avoid modify config file when NIC count is not match.

Test Full-NAT Load Balancer

The test topology looks like,

fnat-single-nic

Set VIP and Local IP (LIP, needed by Full-NAT mode) on DPVS. Let's put commands into setup.sh. You do some check by ./ipvsadm -ln, ./dpip addr show.

$ cat setup.sh
VIP=192.168.100.100
LIP=192.168.100.200
RS=192.168.100.2

./dpip addr add ${VIP}/24 dev dpdk0
./ipvsadm -A -t ${VIP}:80 -s rr
./ipvsadm -a -t ${VIP}:80 -r ${RS} -b

./ipvsadm --add-laddr -z ${LIP} -t ${VIP}:80 -F dpdk0
$

$ ./setup.sh

Access VIP from Client, it looks good!

client $ curl 192.168.100.100
Your ip:port : 192.168.100.3:56890

Configure Tutorial

More configure examples can be found in the Tutorial Document. Including,

  • WAN-to-LAN Full-NAT reverse proxy.
  • Direct Route (DR) mode setup.
  • Master/Backup model (keepalived) setup.
  • OSPF/ECMP cluster model setup.
  • SNAT mode for Internet access from internal network.
  • Virtual Devices (Bonding, VLAN, kni, ipip/GRE).
  • UOA module to get real UDP client IP/port in FNAT.
  • ... and more ...

Performance Test

Our test shows the forwarding speed (pps) of DPVS is several times than LVS and as good as Google's Maglev.

performance

License

Pls see the License file.

Contact Us

DPVS is developing by iQiYi QLB team since April 2016 and now open-sourced. It's already widely used in iQiYi IDC for L4 load balancer and SNAT clusters, and we plan to replace all our LVS clusters with DPVS. We are very happy that more people can get involved in this project. Welcome to try, report issues and submit pull requests. And pls feel free to contact us through Github or Email.

  • github: https://github.com/iqiyi/dpvs
  • email: qlb-devel # dev.qiyi.com (pls remove the white-spaces and replace # with @).
To the top