In the Dockerfile
Loading Pyspark as.
Python 3.7.6 pyspark 2.4.5
from pyspark.sql import SparkSession
/usr/local/spark/python/pyspark/__init__.py in <module> 49 50 from pyspark.conf import SparkConf ---> 51 from pyspark.context import SparkContext 52 from pyspark.rdd import RDD, RDDBarrier 53 from pyspark.files import SparkFiles /usr/local/spark/python/pyspark/context.py in <module> 27 from tempfile import NamedTemporaryFile 28 ---> 29 from py4j.protocol import Py4JError 30 31 from pyspark import accumulators ModuleNotFoundError: No module named 'py4j'
This time on the jupyter notebook as it will cause problems
!pip install py4j
Corresponded with The following error will appear, so I would like to add it as soon as I understand how to deal with it without an error.
ERROR: pyspark 2.4.5 has requirement py4j==0.10.7, but you'll have py4j 0.10.9.1 which is incompatible.