docker-hadoop-spark-workbench

[EXPERIMENTAL] This repo includes deployment instructions for running HDFS/Spark inside docker containers. Also includes spark-notebook and HDFS FileBrowser.

  • Owner: big-data-europe/docker-hadoop-spark-workbench
  • Platform:
  • License::
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

Gitter chat

How to use HDFS/Spark Workbench

To start an HDFS/Spark Workbench:

    docker-compose up -d

docker-compose does not work to scale up spark-workers, for distributed setup see swarm folder

Starting workbench with Hive support

Before starting the next command, check that the previous service is running correctly (with docker logs servicename).

docker-compose -f docker-compose-hive.yml up -d namenode hive-metastore-postgresql
docker-compose -f docker-compose-hive.yml up -d datanode hive-metastore
docker-compose -f docker-compose-hive.yml up -d hive-server
docker-compose -f docker-compose-hive.yml up -d spark-master spark-worker spark-notebook hue

Interfaces

  • Namenode: http://localhost:50070
  • Datanode: http://localhost:50075
  • Spark-master: http://localhost:8080
  • Spark-notebook: http://localhost:9001
  • Hue (HDFS Filebrowser): http://localhost:8088/home

Important

When opening Hue, you might encounter NoReverseMatch: u'about' is not a registered namespace error after login. I disabled 'about' page (which is default one), because it caused docker container to hang. To access Hue when you have such an error, you need to append /home to your URI: http://docker-host-ip:8088/home

Docs

Count Example for Spark Notebooks

val spark = SparkSession
  .builder()
  .appName("Simple Count Example")
  .getOrCreate()

val tf = spark.read.textFile("/data.csv")
tf.count()

Maintainer

  • Ivan Ermilov @earthquakesan

Note: this repository was a part of BDE H2020 EU project and no longer actively maintained by the project participants.

Main metrics

Overview
Name With Ownerbig-data-europe/docker-hadoop-spark-workbench
Primary LanguageMakefile
Program languageShell (Language Count: 2)
Platform
License:
所有者活动
Created At2016-03-21 22:26:31
Pushed At2020-10-01 11:30:09
Last Commit At2018-10-05 17:14:39
Release Count0
用户参与
Stargazers Count693
Watchers Count38
Fork Count377
Commits Count51
Has Issues Enabled
Issues Count60
Issue Open Count18
Pull Requests Count5
Pull Requests Open Count2
Pull Requests Close Count5
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private