Settings
Spark session
This library uses session_handler.State to provide universal access to the same session for all modules.
Default session will be created automatically and can be accessed as a session attribute.
from replay.utils.session_handler import State
State().session
There is also a helper function to provide basic settings for the creation of Spark session
- replay.utils.session_handler.get_spark_session(spark_memory=None, shuffle_partitions=None, core_count=None)
Get default SparkSession
- Parameters
spark_memory (
Optional[int]) – GB of memory allocated for Spark; 70% of RAM by default.shuffle_partitions (
Optional[int]) – number of partitions for Spark; triple CPU count by defaultcore_count (
Optional[int]) – Count of cores to execute,-1means using all available cores. IfNonethen checking out environment variableREPLAY_SPARK_CORE_COUNT, if variable is not set then using-1. Default:None.
- Return type
SparkSession
You can pass any Spark session to State for it to be available in library.
from replay.utils.session_handler import get_spark_session
session = get_spark_session(2)
State(session)
- class replay.utils.session_handler.State(session=None)
All modules look for Spark session via this class. You can put your own session here.
Logging
Logger name is replay.
Default level is logging.INFO.
import logging
logger = logging.getLogger("replay")
logger.setLevel(logging.DEBUG)