Settings
Spark session
This library uses session_handler.State
to provide universal access to the same session for all modules.
Default session will be created automatically and can be accessed as a session
attribute.
from replay.utils.session_handler import State
State().session
There is also a helper function to provide basic settings for the creation of Spark session
- replay.utils.session_handler.get_spark_session(spark_memory=None, shuffle_partitions=None, core_count=None)
Get default SparkSession
- Parameters
spark_memory (
Optional
[int
]) – GB of memory allocated for Spark; 70% of RAM by default.shuffle_partitions (
Optional
[int
]) – number of partitions for Spark; triple CPU count by defaultcore_count (
Optional
[int
]) – Count of cores to execute,-1
means using all available cores. IfNone
then checking out environment variableREPLAY_SPARK_CORE_COUNT
, if variable is not set then using-1
. Default:None
.
- Return type
SparkSession
You can pass any Spark session to State
for it to be available in library.
from replay.utils.session_handler import get_spark_session
session = get_spark_session(2)
State(session)
- class replay.utils.session_handler.State(session=None)
All modules look for Spark session via this class. You can put your own session here.
Logging
Logger name is replay
.
Default level is logging.INFO
.
import logging
logger = logging.getLogger("replay")
logger.setLevel(logging.DEBUG)