torch-twrl

Torch-twrl is a package that enables reinforcement learning in Torch.

  • Owner: twitter-archive/torch-twrl
  • Platform:
  • License:: MIT License
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

Build Status License Join the chat at https://gitter.im/torch-twrl/Lobby

torch-twrl: Reinforcement Learning in Torch

torch-twrl is an RL framework built in Lua/Torch by Twitter.

Installation

Install torch

git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh

Install torch-twrl

git clone --recursive https://github.com/twitter/torch-twrl.git
cd torch-twrl
luarocks make

Want to play in the gym?

  1. Start a virtual environment, not necessary but it helps keep your installation clean

  2. Download and install OpenAI Gym, gym-http-api requirements, and ffmpeg

pip install virtualenv
virtualenv venv
source venv/bin/activate
pip install gym
pip install -r src/gym-http-api/requirements.txt
brew install ffmpeg

Works so far?

You should have everything you need:

  • Start your gym_http_server with
python src/gym-http-api/gym_http_server.py
  • In a new console window (or tab), run the example script (policy gradient agent in environment CartPole-v0)
cd examples
chmod u+x cartpole-pg.sh
./cartpole-pg.sh

This script sets parameters for the experiment, in detail here is what it is calling:

th run.lua \
	-env 'CartPole-v0' \
	-policy categorical \
	-learningUpdate reinforce \
   	-model mlp \
	-optimAlpha 0.9 \
   	-timestepsPerBatch 1000 \
	-stepsizeStart 0.3 \
	-gamma 1 \
	-nHiddenLayerSize 10 \
	-gradClip 5 \
	-baselineType padTimeDepAvReturn \
	-beta 0.01 \
	-weightDecay 0 \
	-windowSize 10 \
   	-nSteps 1000 \
	-nIterations 1000 \
	-video 100 \
	-optimType rmsprop \
	-verboseUpdate true \
	-uploadResults false \
	-renderAllSteps false

Your results should look something our results from the OpenAI Gym leaderboard

Doesn't work?

  1. Test the gym-http-api
cd /src/gym-http-api/
nose2
  1. Start a Gym HTTP server in your virtual environment
python src/gym-http-api/gym_http_server.py
  1. In a new console window (or tab), run torch-twrl tests
luarocks make; th test/test.lua

Dependencies

Testing of RL development is a tricky endeavor, it requires well established, unified, baselines and a large community of active developers. The OpenAI Gym provides a great set of example environments for this purpose. Link: https://github.com/openai/gym

The OpenAI Gym is written in python and it expects algorithms which interact with its various environments to be as well. torch-twrl is compatible with the OpenAI Gym with the use of a Gym HTTP API from OpenAI; gym-http-api is a submodule of torch-twrl.

All Lua dependencies should be installed on your first build.

Note: if you make changes, you will need to recompile with

luarocks make

Agents

torch-twrl implements several agents, they are located in src/agents.
Agents are defined by a model, policy, and learning update.

  • Random
    • model: noModel
    • policy: random
    • learningUpdate: noLearning
  • TD(Lambda)
    • model: qFunction
    • policy: egreedy
    • learningUpdate: tdLambda - implements temporal difference (Q-learning or SARSA) learning with eligibility traces (replacing or accumulating)
  • Policy Gradient Williams, 1992:
    • model: mlp - multilayer perceptron, final layeer: tanh for continuous, softmax for discrete
    • policy: stochasticModelPolicy, normal for continuous actions, categorical for discrete
    • learningUpdate: reinforce

Important note about agent/environment compatibility:

The OpenAI Gym has many environments and is continuously growing. Some agents may be compatible with only a subset of environments. That is, an agent built for continuous action space environments may not work if the environment expects discrete action spaces.

Here is a useful table of the environments, with details on the different variables that may help to configure agents appropriately.

Testing details:

Continuous integration is accomplished by building with Travis. Testing is done with LUAJIT21, LUA51 and LUA52 with compilers gcc and clang.

Tests are defined in the /tests directory with separate basic unit tests set and a Gym integration test set.

Known Issues:

  • LUA52 and libhash not working, so tilecoding examples fail in LUA52.

Future Work

References

  1. Boyan, J., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. Advances in neural information processing systems, 369-376.
  2. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine learning, 3(1), 9-44.
  3. Singh, S. P., & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine learning, 22(1-3), 123-158.
  4. Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. Systems, Man and Cybernetics, IEEE Transactions on, (5), 834-846.
  5. Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. Vol. 1. No. 1. Cambridge: MIT press, 1998.
  6. Williams, Ronald J. "Simple statistical gradient-following algorithms for connectionist reinforcement learning." Machine learning 8.3-4 (1992): 229-256.

License

torch-twrl is released under the MIT License. Copyright (c) 2016 Twitter, Inc.

Main metrics

Overview
Name With Ownertwitter-archive/torch-twrl
Primary LanguageLua
Program languageLua (Language Count: 2)
Platform
License:MIT License
所有者活动
Created At2016-09-01 07:10:56
Pushed At2017-05-27 16:04:02
Last Commit At2017-01-30 11:42:59
Release Count1
Last Release Name0.1 (Posted on )
First Release Name0.1 (Posted on )
用户参与
Stargazers Count249
Watchers Count33
Fork Count45
Commits Count38
Has Issues Enabled
Issues Count20
Issue Open Count1
Pull Requests Count13
Pull Requests Open Count0
Pull Requests Close Count4
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private