OpenAI Gym

状态：维护（预期 bug 修复和次要更新）

OpenAI Gym 是用于开发和比较强化学习算法的工具包。这是 Gym 开放源代码库，它给你提供了一套标准化的环境。

Gym 不对代理的结构进行任何假设，并且与任何数值计算库（例如 TensorFlow 或 Theano）兼容。您可以从 Python 代码中使用它，很快就可以从其他语言中使用它。

如果你不确定从哪里开始，我们建议你从我们网站上的文档开始。请参阅常见问题解答。

有关 OpenAI Gym 的白皮书，请访问 http://arxiv.org/abs/1606.01540，这是一个 BibTeX 条目，您可以在出版物中引用它：

@misc{1606.01540,
  Author = {Greg Brockman and Vicki Cheung and Ludwig Pettersson and Jonas Schneider and John Schulman and Jie Tang and Wojciech Zaremba},
  Title = {OpenAI Gym},
  Year = {2016},
  Eprint = {arXiv:1606.01540},
}

文档

基础

在强化学习中有两个基本概念：环境（即外部世界）和代理（即您正在编写的算法）。代理将操作发送到环境，并且环境以观察和奖励（即得分）进行响应。

Gym 的核心接口是 Env，这是统一的环境接口。没有代理的接口。那部分留给你。以下是您应该知道的 Env 方法：

reset（self）：重置环境的状态。返回观察值。

步骤（自我，行动）：一步一步地完成环境的调整。返回观察，奖励，完成，信息。

render（self，mode ='human'）：渲染环境的一帧。默认模式将执行一些人性化的操作，例如弹出一个窗口。

支持系统

当前，我们支持运行 Python 3.5-3.8 的Linux 和 OSX。Windows 支持处于试验阶段 -- 算法、toy_text、classic_control 和 atari 应该可以在 Windows 上运行（有关安装说明，请参见下一部分）；不过，请您自担风险。

安装

您可以使用以下工具最少安装 Gym：

git clone https://github.com/openai/gym.git
cd gym
pip install -e .

如果愿意，您可以直接从 PyPI 进行打包版本的最小安装：

pip install gym

您将能够立即运行一些环境：

算法
toy_text
classic_control （尽管您需要 pyglet 来呈现）

我们建议先使用这些环境，然后再安装其余环境的依赖项。

您也可以在 gitpod.io 上运行 Gym，在线观看示例。在预览窗口中，您可以单击要查看的 mp4 文件。如果要查看另一个 mp4 文件，只需按返回按钮，然后单击另一个 mp4 文件。

安装一切

要安装完整的环境，您需要安装一些系统包。随着时间的推移，我们会建立这个列表;请让我们知道你最终在你的平台上安装了什么。另外，看看 docker 文件(py.Dockerfile)，以查看经过 CI 测试的图像的组成。

在 Ubuntu 16.04 和 18.04 上：

apt-get install -y libglu1-mesa-dev libgl1-mesa-dev libosmesa6-dev xvfb ffmpeg curl patchelf libglfw3 libglfw3-dev cmake zlib1g zlib1g-dev swig

MuJoCo 具有专有依赖性，我们无法为您设置。请按照 mujoco-py 软件包中的说明寻求帮助。请注意，我们目前不支持 MuJoCo 2.0 及更高版本，因此您将需要安装 mujoco-py 版本，该版本是为 MuJoCo 的较低版本（例如 MuJoCo 1.5）而构建的（例如 mujoco-py-1.50.1.0）。作为 mujoco-py 的替代方案，请考虑使用开源的 Bullet 物理引擎且没有许可证要求的 PyBullet。

一旦准备好安装所有内容，请运行 pip install -e '.[all]'（或 pip install 'gym[all]'）。

pip 版本

要运行 pip install -e '.[all]'，您需要一个较新的 pip。请确保您的 pip 至少为 1.5.0 版。您可以使用以下方法进行升级：pip install --ignore-installed pip。或者，您可以打开 setup.py 并手动安装依赖项。

在服务器上渲染

如果要在服务器上渲染视频，则需要连接假显示器。最简单的方法是在 xvfb-run 下运行（在 Ubuntu 上，安装 xvfb 软件包）：

xvfb-run -s "-screen 0 1400x900x24" bash

为特定环境安装依赖项

如果您只想为特定环境安装依赖项，请参阅 setup.py。我们在每个环境组的基础上维护依赖项列表。

环境

请参阅环境列表和 gym 网站。

有关创建自己的环境的信息，请参阅创建自己的环境。

例子

请参阅示例目录。

运行 examples/agents/random_agent.py 以运行简单的随机代理。
运行 examples/agents/cem.py 以运行实际的学习代理（使用交叉熵方法）。
运行examples/scripts/list_envs 以生成所有环境的列表。

测试

我们正在使用 pytest 进行测试。您可以通过以下方式运行它们：

pytest

资源

（The first version translated by vz on 2020.07.19）

Name With Owner	openai/gym
Primary Language	Python
Program language	Shell (Language Count: 3)
Platform	Linux, Mac
License:	Other

Created At	2016-04-27 14:59:16
Pushed At	2024-10-11 20:07:05
Last Commit At	2023-01-30 13:15:21
Release Count	56
Last Release Name	0.26.2 (Posted on )
First Release Name	v0.7.3 (Posted on )

Stargazers Count	36.1k
Watchers Count	1.1k
Fork Count	8.7k
Commits Count	1.8k
Has Issues Enabled
Issues Count	1837
Issue Open Count	111
Pull Requests Count	998
Pull Requests Open Count	16
Pull Requests Close Count	448

Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private

Status: Maintenance (expect bug fixes and minor updates)

OpenAI Gym

OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. This is the gym open-source library, which gives you access to a standardized set of environments.

.. image:: https://travis-ci.org/openai/gym.svg?branch=master
:target: https://travis-ci.org/openai/gym

See What's New section below <#what-s-new>_

gym makes no assumptions about the structure of your agent, and is compatible with any numerical computation library, such as TensorFlow or Theano. You can use it from Python code, and soon from other languages.

If you're not sure where to start, we recommend beginning with the
docs <https://gym.openai.com/docs>_ on our site. See also the FAQ <https://github.com/openai/gym/wiki/FAQ>_.

A whitepaper for OpenAI Gym is available at http://arxiv.org/abs/1606.01540, and here's a BibTeX entry that you can use to cite it in a publication::

@misc{1606.01540,
Author = {Greg Brockman and Vicki Cheung and Ludwig Pettersson and Jonas Schneider and John Schulman and Jie Tang and Wojciech Zaremba},
Title = {OpenAI Gym},
Year = {2016},
Eprint = {arXiv:1606.01540},
}

.. contents:: Contents of this document
:depth: 2

Basics

There are two basic concepts in reinforcement learning: the
environment (namely, the outside world) and the agent (namely, the
algorithm you are writing). The agent sends actions to the
environment, and the environment replies with observations and
rewards (that is, a score).

The core gym interface is Env <https://github.com/openai/gym/blob/master/gym/core.py>_, which is
the unified environment interface. There is no interface for agents;
that part is left to you. The following are the Env methods you
should know:

reset(self): Reset the environment's state. Returns observation.
step(self, action): Step the environment by one timestep. Returns observation, reward, done, info.
render(self, mode='human'): Render one frame of the environment. The default mode will do something human friendly, such as pop up a window.

Supported systems

We currently support Linux and OS X running Python 2.7 or 3.5 -- 3.7.
Windows support is experimental - algorithmic, toy_text, classic_control and atari should work on Windows (see next section for installation instructions); nevertheless, proceed at your own risk.

Installation

You can perform a minimal install of gym with:

.. code:: shell

git clone https://github.com/openai/gym.git
cd gym
pip install -e .

If you prefer, you can do a minimal install of the packaged version directly from PyPI:

.. code:: shell

pip install gym

You'll be able to run a few environments right away:

algorithmic
toy_text
classic_control (you'll need pyglet to render though)

We recommend playing with those environments at first, and then later
installing the dependencies for the remaining environments.

You can also run gym on gitpod.io <https://gitpod.io/#https://github.com/openai/gym/blob/master/examples/agents/cem.py>_ to play with the examples online.
In the preview window you can click on the mp4 file you want to view. If you want to view another mp4 file just press the back button and click on another mp4 file.

Installing everything

To install the full set of environments, you'll need to have some system
packages installed. We'll build out the list here over time; please let us know
what you end up installing on your platform. Also, take a look at the docker files (py.Dockerfile) to
see the composition of our CI-tested images.

On Ubuntu 16.04 and 18.04:

.. code:: shell
apt-get install -y libglu1-mesa-dev libgl1-mesa-dev libosmesa6-dev xvfb ffmpeg curl patchelf libglfw3 libglfw3-dev

MuJoCo has a proprietary dependency we can't set up for you. Follow
the
instructions <https://github.com/openai/mujoco-py#obtaining-the-binaries-and-license-key>_
in the mujoco-py package for help. As an alternative to mujoco-py, consider PyBullet <https://github.com/openai/gym/blob/master/docs/environments.md#pybullet-robotics-environments>_ which uses the open source Bullet physics engine and has no license requirement.

Once you're ready to install everything, run pip install -e '.[all]' (or pip install 'gym[all]').

Pip version

To run pip install -e '.[all]', you'll need a semi-recent pip.
Please make sure your pip is at least at version 1.5.0. You can
upgrade using the following: pip install --ignore-installed pip. Alternatively, you can open setup.py <https://github.com/openai/gym/blob/master/setup.py>_ and
install the dependencies by hand.

Rendering on a server

If you're trying to render video on a server, you'll need to connect a
fake display. The easiest way to do this is by running under
xvfb-run (on Ubuntu, install the xvfb package):

.. code:: shell

 xvfb-run -s "-screen 0 1400x900x24" bash

Installing dependencies for specific environments

If you'd like to install the dependencies for only specific
environments, see setup.py <https://github.com/openai/gym/blob/master/setup.py>_. We
maintain the lists of dependencies on a per-environment group basis.

Environments

See List of Environments <docs/environments.md>_ and the gym site <http://gym.openai.com/envs/>_.

For information on creating your own environments, see Creating your own Environments <docs/creating-environments.md>_.

Examples

See the examples directory.

Run examples/agents/random_agent.py <https://github.com/openai/gym/blob/master/examples/agents/random_agent.py>_ to run a simple random agent.
Run examples/agents/cem.py <https://github.com/openai/gym/blob/master/examples/agents/cem.py>_ to run an actual learning agent (using the cross-entropy method).
Run examples/scripts/list_envs <https://github.com/openai/gym/blob/master/examples/scripts/list_envs>_ to generate a list of all environments.

Testing

We are using pytest <http://doc.pytest.org>_ for tests. You can run them via:

.. code:: shell

pytest

.. _See What's New section below:

What's new

2020-02-09 (v 0.16.0)
- EnvSpec API change - remove tags field (retro-active version bump, the changes are actually already in the codebase since 0.15.5 - thanks @wookayin for keeping us in check!)
2020-02-03 (v0.15.6)
- pyglet 1.4 compatibility (this time for real :))
- Fixed the bug in BipedalWalker and BipedalWalkerHardcore, bumped version to 3 (thanks @chozabu!)
2020-01-24 (v0.15.5)
- pyglet 1.4 compatibility
- remove python-opencv from the requirements
2019-11-08 (v0.15.4)
- Added multiple env wrappers (thanks @zuoxingdong and @hartikainen!)
- Removed mujoco >= 2.0 support due to lack of tests
2019-10-09 (v0.15.3)
- VectorEnv modifications - unified the VectorEnv api (added reset_async, reset_wait, step_async, step_wait methods to SyncVectorEnv); more flexibility in AsyncVectorEnv workers
2019-08-23 (v0.15.2)
- More Wrappers - AtariPreprocessing, FrameStack, GrayScaleObservation, FilterObservation, FlattenDictObservationsWrapper, PixelObservationWrapper, TransformReward (thanks @zuoxingdong, @hartikainen)
- Remove rgb_rendering_tracking logic from mujoco environments (default behavior stays the same for the -v3 environments, rgb rendering returns a view from tracking camera)
- Velocity goal constraint for MountainCar (thanks @abhinavsagar)
- Taxi-v2 -> Taxi-v3 (add missing wall in the map to replicate env as describe in the original paper, thanks @kobotics)
2019-07-26 (v0.14.0)
- Wrapper cleanup
- Spec-related bug fixes
- VectorEnv fixes
2019-06-21 (v0.13.1)
- Bug fix for ALE 0.6 difficulty modes
- Use narrow range for pyglet versions
2019-06-21 (v0.13.0)
- Upgrade to ALE 0.6 (atari-py 0.2.0) (thanks @JesseFarebro!)
2019-06-21 (v0.12.6)
- Added vectorized environments (thanks @tristandeleu!). Vectorized environment runs multiple copies of an environment in parallel. To create a vectorized version of an environment, use gym.vector.make(env_id, num_envs, **kwargs), for instance, gym.vector.make('Pong-v4',16).
2019-05-28 (v0.12.5)
- fixed Fetch-slide environment to be solvable.
2019-05-24 (v0.12.4)
- remove pyopengl dependency and use more narrow atari-py and box2d-py versions
2019-03-25 (v0.12.1)
- rgb rendering in MuJoCo locomotion -v3 environments now comes from tracking camera (so that agent does not run away from the field of view). The old behaviour can be restored by passing rgb_rendering_tracking=False kwarg. Also, a potentially breaking change!!! Wrapper class now forwards methods and attributes to wrapped env.
2019-02-26 (v0.12.0)
- release mujoco environments v3 with support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc
2019-02-06 (v0.11.0)
- remove gym.spaces.np_random common PRNG; use per-instance PRNG instead.
- support for kwargs in gym.make
- lots of bugfixes
2018-02-28: Release of a set of new robotics environments.
2018-01-25: Made some aesthetic improvements and removed unmaintained parts of gym. This may seem like a downgrade in functionality, but it is actually a long-needed cleanup in preparation for some great new things that will be released in the next month.
- Now your Env and Wrapper subclasses should define step, reset, render, close, seed rather than underscored method names.
- Removed the board_game, debugging, safety, parameter_tuning environments since they're not being maintained by us at OpenAI. We encourage authors and users to create new repositories for these environments.
- Changed MultiDiscrete action space to range from [0, ..., n-1] rather than [a, ..., b-1].
- No more render(close=True), use env-specific methods to close the rendering.
- Removed scoreboard directory, since site doesn't exist anymore.
- Moved gym/monitoring to gym/wrappers/monitoring
- Add dtype to Space.
- Not using python's built-in module anymore, using gym.logger
2018-01-24: All continuous control environments now use mujoco_py >= 1.50.
Versions have been updated accordingly to -v2, e.g. HalfCheetah-v2. Performance
should be similar (see https://github.com/openai/gym/pull/834) but there are likely
some differences due to changes in MuJoCo.
2017-06-16: Make env.spec into a property to fix a bug that occurs
when you try to print out an unregistered Env.
2017-05-13: BACKWARDS INCOMPATIBILITY: The Atari environments are now at
v4. To keep using the old v3 environments, keep gym <= 0.8.2 and atari-py
<= 0.0.21. Note that the v4 environments will not give identical results to
existing v3 results, although differences are minor. The v4 environments
incorporate the latest Arcade Learning Environment (ALE), including several
ROM fixes, and now handle loading and saving of the emulator state. While
seeds still ensure determinism, the effect of any given seed is not preserved
across this upgrade because the random number generator in ALE has changed.
The *NoFrameSkip-v4 environments should be considered the canonical Atari
environments from now on.
2017-03-05: BACKWARDS INCOMPATIBILITY: The configure method has been removed
from Env. configure was not used by gym, but was used by some dependent
libraries including universe. These libraries will migrate away from the
configure method by using wrappers instead. This change is on master and will be released with 0.8.0.
2016-12-27: BACKWARDS INCOMPATIBILITY: The gym monitor is now a
wrapper. Rather than starting monitoring as
env.monitor.start(directory), envs are now wrapped as follows:
env = wrappers.Monitor(env, directory). This change is on master
and will be released with 0.7.0.
2016-11-1: Several experimental changes to how a running monitor interacts
with environments. The monitor will now raise an error if reset() is called
when the env has not returned done=True. The monitor will only record complete
episodes where done=True. Finally, the monitor no longer calls seed() on the
underlying env, nor does it record or upload seed information.
2016-10-31: We're experimentally expanding the environment ID format
to include an optional username.
2016-09-21: Switch the Gym automated logger setup to configure the
root logger rather than just the 'gym' logger.
2016-08-17: Calling close on an env will also close the monitor
and any rendering windows.
2016-08-17: The monitor will no longer write manifest files in
real-time, unless write_upon_reset=True is passed.
2016-05-28: For controlled reproducibility, envs now support seeding
(cf #91 and #135). The monitor records which seeds are used. We will
soon add seed information to the display on the scoreboard.

OpenAI Gym

Github stars Tracking Chart

OpenAI Gym

文档

基础

支持系统

安装

安装一切

pip 版本

在服务器上渲染

为特定环境安装依赖项

环境

例子

测试

资源

Main metrics

Basics

Supported systems

Installation

Installing everything

Pip version

Rendering on a server

Installing dependencies for specific environments

Environments

Examples

Testing

What's new