[PYTHON] About the matter that localhost: 4040 cannot be accessed after running Spark with Docker

Only Spark's Spark Web UI (localhost: 4040) couldn't connect for some reason, so I've been researching various things, but there is no solution, and if you're not used to it, people who have the same problems I thought there was, so I will post it for the first time. Also, due to an error that occurred during work, we have not posted any code etc. Thank you for your understanding.

environment

I'm bridging a Python container and a Spark container (which connects woker) with Docker. The Pyspark and java environments are installed in both environments.

Reference for environment construction

Until you raise one Master and one Worker of Spark with docker-compose and run pyspark https://qiita.com/hrkt/items/fe9b1162f7a08a07e812

goal

I want to go inside Python's Docker (docker exec ...), do spark-submit, and connect to localhost: 4040 to check the WebUI while processing.

About the problem

After booting, localhost: 8080 connects, but 4040 cannot connect for some reason

solution

  1. Access to localhost: 4040 only when Spark is processing
  2. I wrote port 4040: 4040 in the Spark container of docker-compose.yml, but I had to write port 4040: 4040 on the Python side that manages Jobs. (This is the case this time)

Summary

I couldn't understand at all by reading StackOverFlow etc., but when I asked my seniors, it was solved in one shot. I want to be stronger in infrastructure.

bonus

I was running the sample code in spark, but when I df.show () ʻInitial job has not accepted any resources; check your cluster ui to ensure that workers are registered and have sufficient resources`. I changed the memory settings because I didn't have enough resources, but I couldn't solve it.

I borrowed the sample code from here.

Handle arguments in PySpark script files https://blog.amedama.jp/entry/2018/03/17/113516

Referenced sample code

from pyspark import SparkConf
from pyspark import SparkContext
from pyspark.sql import SparkSession
def main():
    conf = SparkConf()
    conf.setAppName('example')
    sc = SparkContext(conf=conf)
    spark = SparkSession(sc)
    df = spark.sql('SELECT "Hello, World!" AS message')
    df.show()

solution

conf = SparkConf().setMaster('local')


Recommended Posts

About the matter that localhost: 4040 cannot be accessed after running Spark with Docker
Solved the trap that Rails commands could not be used after using the development environment with Docker
About the matter that torch summary can be really used when building a model with Pytorch
About the matter that the re.compiled object can be used for the re.match pattern
Address to the bug that node.surface cannot be obtained with python3 + mecab
Items that cannot be imported with sklearn
About the matter that was worried about sampling error
Can't change aspect_ratio with sympy.plotting? About the matter
The problem that the ifconfig command cannot be used
Workaround for the problem that UTF-8 Japanese mail cannot be sent with Flask-Mail (Python3)
About the matter that the contents of Python print are not visible in docker logs
About the case that it became a Chinese font after updating with Linux (correction method)
Dealing with the error that HTTP fetch error occurs in gpg and the key cannot be obtained
Options when installing libraries that cannot be piped with pyenv
Solution when the image cannot be displayed with tkinter [python]
Examples and solutions that cannot be optimized well with scipy.optimize.least_squares
I investigated the pretreatment that can be done with PyCaret