Only Spark's Spark Web UI
(localhost: 4040) couldn't connect for some reason, so I've been researching various things, but there is no solution, and if you're not used to it, people who have the same problems I thought there was, so I will post it for the first time. Also, due to an error that occurred during work, we have not posted any code etc. Thank you for your understanding.
I'm bridging a Python container and a Spark container (which connects woker) with Docker.
The Pyspark
and java
environments are installed in both environments.
Until you raise one Master and one Worker of Spark with docker-compose and run pyspark https://qiita.com/hrkt/items/fe9b1162f7a08a07e812
I want to go inside Python's Docker (docker exec ...), do spark-submit, and connect to localhost: 4040 to check the WebUI while processing.
After booting, localhost: 8080 connects, but 4040 cannot connect for some reason
I couldn't understand at all by reading StackOverFlow etc., but when I asked my seniors, it was solved in one shot. I want to be stronger in infrastructure.
I was running the sample code in spark, but when I df.show ()
ʻInitial job has not accepted any resources; check your cluster ui to ensure that workers are registered and have sufficient resources`. I changed the memory settings because I didn't have enough resources, but I couldn't solve it.
I borrowed the sample code from here.
Handle arguments in PySpark script files https://blog.amedama.jp/entry/2018/03/17/113516
Referenced sample code
from pyspark import SparkConf
from pyspark import SparkContext
from pyspark.sql import SparkSession
def main():
conf = SparkConf()
conf.setAppName('example')
sc = SparkContext(conf=conf)
spark = SparkSession(sc)
df = spark.sql('SELECT "Hello, World!" AS message')
df.show()
solution
conf = SparkConf().setMaster('local')
Recommended Posts