Interface SupportsPushDownVariantExtractions

All Superinterfaces:
ScanBuilder

@Experimental public interface SupportsPushDownVariantExtractions extends ScanBuilder
A mix-in interface for ScanBuilder. Data sources can implement this interface to support pushing down variant field extraction operations to the data source.

When variant columns are accessed with specific field extractions (e.g., variant_get, try_variant_get), the optimizer can push these extractions down to the data source. The data source can then read only the required fields from variant columns, reducing I/O and improving performance.

Each VariantExtraction in the input array represents one field extraction operation. Data sources should examine each extraction and determine which ones can be handled efficiently. The return value is a boolean array of the same length, where each element indicates whether the corresponding extraction was accepted.

Since:
4.1.0
  • Method Summary

    Modifier and Type
    Method
    Description
    boolean[]
    Pushes down variant field extractions to the data source.

    Methods inherited from interface org.apache.spark.sql.connector.read.ScanBuilder

    build
  • Method Details

    • pushVariantExtractions

      boolean[] pushVariantExtractions(VariantExtraction[] extractions)
      Pushes down variant field extractions to the data source.

      Each element in the input array represents one field extraction operation from a variant column. Data sources should examine each extraction and determine whether it can be pushed down based on the data source's capabilities (e.g., supported data types, path complexity, etc.).

      The return value is a boolean array of the same length as the input array, where each element indicates whether the corresponding extraction was accepted:

      • true: The extraction will be handled by the data source
      • false: The extraction will be handled by Spark after reading

      Data sources can choose to accept all, some, or none of the extractions. Spark will handle any extractions that are not pushed down.

      Parameters:
      extractions - Array of variant extractions, one per field extraction operation
      Returns:
      Boolean array indicating which extractions were accepted (same length as input)