Aquarium
Aquarium is a cookiecuter_ template for hassle-free
Docker Compose
_ + Splash_ setup. Think of it as a Splash instance
with extra features and without common pitfalls.
.. _cookiecuter: http://cookiecutter.rtfd.org
.. _Splash: https://github.com/scrapinghub/splash
.. _Docker Compose: https://docs.docker.com/compose/
Usage
First, make sure Docker and Docker Compose are installed.
Then install cookiecutter::
pip install cookiecutter
or (on OS X + homebrew)::
brew install cookiecutter
Then generate a folder with config files::
cookiecutter gh:TeamHG-Memex/aquarium
With all default options it'll create an aquarium
folder in the current
path. Go to this folder and start the Splash cluster::
cd ./aquarium
docker-compose up
Then use http://:8050 as a regular Splash_ instance. On Linux
http://0.0.0.0:8050 should work; on OS X and Windows IP address depends on
boot2docker or docker-machine.
Options
When generating a config, cookiecutter will ask a bunch of questions.
-
folder_name (default is "aquarium")
- a name of the target folder. -
num_splashes (default is "3")
- a number of Splash instances to create.
To utilize full server capacity it makes sense to create slightly more Splash
instances than CPU cores - e.g. on 2-core machine 3 instances often
work best. -
splash_version (default is "3.0")
- a version of scrapighub/splash
Docker image. -
auth_user (default is "user")
,auth_password (default is "userpass")
- HTTP Basic Auth credentials for Splash.
-
splash_verbosity (default is "1")
- Splash log verbosity, from 0 to 5. -
max_timeout (default is "3600")
- maximum allowed timeout. -
maxrss_mb (default is "3000")
- a soft memory limit, in MB. Splash
container will be restarted after some time if it starts to use more memory
then this value. -
splash_slots (default is 5)
- a number of Splash slots to use, i.e.
how many render jobs to run in parallel in a single Splash process. -
stats_enabled (default is "1")
- whether to enable HAProxy stats.
If stats are enabled visit http://:8036 to see stats page. -
stats_auth (default is "admin:adminpass")
- HTTP Basic Auth credentials
for HAProxy stats. -
tor (default is "1")
- enter 0 to disable Tor_ support. When Tor support
is enabled, all .onion links are opened using Tor. In addition to
that, there istor
Splash proxy profile
_ which you can use to render
any page using Tor. -
adblock (default is "1")
- Enter 0 to disable AdBlock Plus
request filters
_ (FIXME: this option is not working yet;
filters are always available). By default, the following filters
are available:easylist
: default set of EasyList_ filters for English;easyprivacy
: EasyPrivacy filters remove tracking scripts;easylist_noadult
: EasyList variant without filters for adult domains;fanboy-social
: removes social media content such as the Facebook like
buttons and other widgets.fanboy-annoyance
: blocks Social Media content, in-page pop-ups
and other annoyances; use it to decrease loading times and uncluttering
pages.fanboy-social
is already included in this filter.
.. _Tor: http://torproject.org
.. _Splash proxy profile: http://splash.readthedocs.org/en/latest/api.html#proxy-profiles
.. _request filters: http://splash.readthedocs.org/en/latest/api.html#request-filters
.. _EasyList: https://easylist.to/
Contributing
- Source code: https://github.com/TeamHG-Memex/aquarium
- Bug tracker: https://github.com/TeamHG-Memex/aquarium/issues
License is MIT.
.. image:: https://hyperiongray.s3.amazonaws.com/define-hg.svg
:target: https://www.hyperiongray.com/?pk_campaign=github&pk_kwd=aquarium
:alt: define hyperiongray