aquarium

Splash + HAProxy + Docker Compose

  • 所有者: TeamHG-Memex/aquarium
  • 平台:
  • 许可证: MIT License
  • 分类:
  • 主题:
  • 喜欢:
    0
      比较:

Github星跟踪图

Aquarium

Aquarium is a cookiecuter_ template for hassle-free
Docker Compose_ + Splash_ setup. Think of it as a Splash instance
with extra features and without common pitfalls.

.. _cookiecuter: http://cookiecutter.rtfd.org
.. _Splash: https://github.com/scrapinghub/splash
.. _Docker Compose: https://docs.docker.com/compose/

Usage

First, make sure Docker and Docker Compose are installed.

Then install cookiecutter::

pip install cookiecutter

or (on OS X + homebrew)::

brew install cookiecutter

Then generate a folder with config files::

cookiecutter gh:TeamHG-Memex/aquarium

With all default options it'll create an aquarium folder in the current
path. Go to this folder and start the Splash cluster::

cd ./aquarium
docker-compose up

Then use http://:8050 as a regular Splash_ instance. On Linux
http://0.0.0.0:8050 should work; on OS X and Windows IP address depends on
boot2docker or docker-machine.

Options

When generating a config, cookiecutter will ask a bunch of questions.

  • folder_name (default is "aquarium") - a name of the target folder.

  • num_splashes (default is "3") - a number of Splash instances to create.
    To utilize full server capacity it makes sense to create slightly more Splash
    instances than CPU cores - e.g. on 2-core machine 3 instances often
    work best.

  • splash_version (default is "3.0") - a version of scrapighub/splash
    Docker image.

  • auth_user (default is "user"), auth_password (default is "userpass")

    • HTTP Basic Auth credentials for Splash.
  • splash_verbosity (default is "1") - Splash log verbosity, from 0 to 5.

  • max_timeout (default is "3600") - maximum allowed timeout.

  • maxrss_mb (default is "3000") - a soft memory limit, in MB. Splash
    container will be restarted after some time if it starts to use more memory
    then this value.

  • splash_slots (default is 5) - a number of Splash slots to use, i.e.
    how many render jobs to run in parallel in a single Splash process.

  • stats_enabled (default is "1") - whether to enable HAProxy stats.
    If stats are enabled visit http://:8036 to see stats page.

  • stats_auth (default is "admin:adminpass") - HTTP Basic Auth credentials
    for HAProxy stats.

  • tor (default is "1") - enter 0 to disable Tor_ support. When Tor support
    is enabled, all .onion links are opened using Tor. In addition to
    that, there is tor Splash proxy profile_ which you can use to render
    any page using Tor.

  • adblock (default is "1") - Enter 0 to disable AdBlock Plus
    request filters_ (FIXME: this option is not working yet;
    filters are always available). By default, the following filters
    are available:

    • easylist: default set of EasyList_ filters for English;
    • easyprivacy: EasyPrivacy filters remove tracking scripts;
    • easylist_noadult: EasyList variant without filters for adult domains;
    • fanboy-social: removes social media content such as the Facebook like
      buttons and other widgets.
    • fanboy-annoyance: blocks Social Media content, in-page pop-ups
      and other annoyances; use it to decrease loading times and uncluttering
      pages. fanboy-social is already included in this filter.

.. _Tor: http://torproject.org
.. _Splash proxy profile: http://splash.readthedocs.org/en/latest/api.html#proxy-profiles
.. _request filters: http://splash.readthedocs.org/en/latest/api.html#request-filters
.. _EasyList: https://easylist.to/

Contributing

License is MIT.


.. image:: https://hyperiongray.s3.amazonaws.com/define-hg.svg
:target: https://www.hyperiongray.com/?pk_campaign=github&pk_kwd=aquarium
:alt: define hyperiongray

主要指标

概览
名称与所有者TeamHG-Memex/aquarium
主编程语言Python
编程语言Python (语言数: 2)
平台
许可证MIT License
所有者活动
创建于2015-08-24 22:33:05
推送于2018-11-29 18:23:23
最后一次提交2018-05-29 17:29:27
发布数0
用户参与
星数195
关注者数14
派生数39
提交数35
已启用问题?
问题数30
打开的问题数24
拉请求数0
打开的拉请求数0
关闭的拉请求数1
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?