Keras Attention Layer

Keras Attention 层(Luong 和 Bahdanau 分数)。Keras Attention Layer (Luong and Bahdanau scores).

Github星跟踪图

Keras Attention Layer

Downloads
Downloads
license dep1

Attention Layer for Keras. Supports the score functions of Luong and Bahdanau.

Tested with Tensorflow 2.8, 2.9, 2.10, 2.11, 2.12, 2.13 and 2.14 (Sep 26, 2023).

Installation

PyPI

pip install attention

Attention Layer

Attention(
    units=128,
    score='luong',
    **kwargs
)

Arguments

  • units: Integer. The number of (output) units in the attention vector ($a_t$).

  • score: String. The score function $score(h_t, \bar{h_s})$. Possible values are luong or bahdanau.

    • Luong's multiplicative style. Link to paper.
    • Bahdanau's additive style. Link to paper.

Input shape

3D tensor with shape (batch_size, timesteps, input_dim).

Output shape

  • 2D tensor with shape (batch_size, num_units) ($a_t$).

If you want to visualize the attention weights, refer to this example examples/add_two_numbers.py.

Example

import numpy as np
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.models import load_model, Model

from attention import Attention


def main():
    # Dummy data. There is nothing to learn in this example.
    num_samples, time_steps, input_dim, output_dim = 100, 10, 1, 1
    data_x = np.random.uniform(size=(num_samples, time_steps, input_dim))
    data_y = np.random.uniform(size=(num_samples, output_dim))

    # Define/compile the model.
    model_input = Input(shape=(time_steps, input_dim))
    x = LSTM(64, return_sequences=True)(model_input)
    x = Attention(units=32)(x)
    x = Dense(1)(x)
    model = Model(model_input, x)
    model.compile(loss='mae', optimizer='adam')
    model.summary()

    # train.
    model.fit(data_x, data_y, epochs=10)

    # test save/reload model.
    pred1 = model.predict(data_x)
    model.save('test_model.h5')
    model_h5 = load_model('test_model.h5', custom_objects={'Attention': Attention})
    pred2 = model_h5.predict(data_x)
    np.testing.assert_almost_equal(pred1, pred2)
    print('Success.')


if __name__ == '__main__':
    main()

Other Examples

Browse examples.

Install the requirements before running the examples: pip install -r examples/examples-requirements.txt.

IMDB Dataset

In this experiment, we demonstrate that using attention yields a higher accuracy on the IMDB dataset. We consider two
LSTM networks: one with this attention layer and the other one with a fully connected layer. Both have the same number
of parameters for a fair comparison (250K).

Here are the results on 10 runs. For every run, we record the max accuracy on the test set for 10 epochs.

Measure No Attention (250K params) Attention (250K params)
MAX Accuracy 88.22 88.76
AVG Accuracy 87.02 87.62
STDDEV Accuracy 0.18 0.14

As expected, there is a boost in accuracy for the model with attention. It also reduces the variability between the runs, which is something nice to have.

Adding two numbers

Let's consider the task of adding two numbers that come right after some delimiters (0 in this case):

x = [1, 2, 3, 0, 4, 5, 6, 0, 7, 8]. Result is y = 4 + 7 = 11.

The attention is expected to be the highest after the delimiters. An overview of the training is shown below, where the
top represents the attention map and the bottom the ground truth. As the training progresses, the model learns the
task and the attention map converges to the ground truth.

Finding max of a sequence

We consider many 1D sequences of the same length. The task is to find the maximum of each sequence.

We give the full sequence processed by the RNN layer to the attention layer. We expect the attention layer to focus on the maximum of each sequence.

After a few epochs, the attention layer converges perfectly to what we expected.

References

主要指标

概览
名称与所有者philipperemy/keras-attention
主编程语言Python
编程语言Python (语言数: 1)
平台
许可证Apache License 2.0
所有者活动
创建于2017-05-23 05:28:01
推送于2023-11-17 10:37:02
最后一次提交2023-10-03 15:44:34
发布数2
最新版本名称5.0.0 (发布于 )
第一版名称3.0 (发布于 )
用户参与
星数2.8k
关注者数75
派生数664
提交数102
已启用问题?
问题数56
打开的问题数2
拉请求数10
打开的拉请求数1
关闭的拉请求数3
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?