Interface SupportsPushDownVariants
- All Superinterfaces:
Scan
Scan. Data sources can implement this interface to
support pushing down variant field access operations to the data source.
When variant columns are accessed with specific field extractions (e.g., variant_get), the optimizer can push these accesses down to the data source. The data source can then read only the required fields from variant columns, reducing I/O and improving performance.
The typical workflow is:
- Optimizer analyzes the query plan and identifies variant field accesses
- Optimizer calls
pushVariantAccess(org.apache.spark.sql.connector.read.VariantAccessInfo[])with the access information - Data source validates and stores the variant access information
- Optimizer retrieves pushed information via
pushedVariantAccess() - Data source uses the information to optimize reading in
Scan.readSchema()and readers
- Since:
- 4.1.0
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.sql.connector.read.Scan
Scan.ColumnarSupportMode -
Method Summary
Modifier and TypeMethodDescriptionReturns the variant access information that has been pushed down to this scan.booleanpushVariantAccess(VariantAccessInfo[] variantAccessInfo) Pushes down variant field access information to the data source.Methods inherited from interface org.apache.spark.sql.connector.read.Scan
columnarSupportMode, description, readSchema, reportDriverMetrics, supportedCustomMetrics, toBatch, toContinuousStream, toMicroBatchStream
-
Method Details
-
pushVariantAccess
Pushes down variant field access information to the data source.Implementations should validate if the variant accesses can be pushed down based on the data source's capabilities. If some accesses cannot be pushed down, the implementation can choose to:
- Push down only the supported accesses and return true
- Reject all pushdown and return false
The implementation should store the variant access information that can be pushed down. The stored information will be retrieved later via
pushedVariantAccess().- Parameters:
variantAccessInfo- Array of variant access information, one per variant column- Returns:
- true if at least some variant accesses were pushed down, false if none were pushed
-
pushedVariantAccess
VariantAccessInfo[] pushedVariantAccess()Returns the variant access information that has been pushed down to this scan.This method is called by the optimizer after
pushVariantAccess(org.apache.spark.sql.connector.read.VariantAccessInfo[])to retrieve what variant accesses were actually accepted by the data source. The optimizer uses this information to rewrite the query plan.If
pushVariantAccess(org.apache.spark.sql.connector.read.VariantAccessInfo[])was not called or returned false, this should return an empty array.- Returns:
- Array of pushed down variant access information
-