Interface SupportsPushDownVariants

All Superinterfaces:
Scan

@Evolving public interface SupportsPushDownVariants extends Scan
A mix-in interface for Scan. Data sources can implement this interface to support pushing down variant field access operations to the data source.

When variant columns are accessed with specific field extractions (e.g., variant_get), the optimizer can push these accesses down to the data source. The data source can then read only the required fields from variant columns, reducing I/O and improving performance.

The typical workflow is:

  1. Optimizer analyzes the query plan and identifies variant field accesses
  2. Optimizer calls pushVariantAccess(org.apache.spark.sql.connector.read.VariantAccessInfo[]) with the access information
  3. Data source validates and stores the variant access information
  4. Optimizer retrieves pushed information via pushedVariantAccess()
  5. Data source uses the information to optimize reading in Scan.readSchema() and readers
Since:
4.1.0
  • Method Details

    • pushVariantAccess

      boolean pushVariantAccess(VariantAccessInfo[] variantAccessInfo)
      Pushes down variant field access information to the data source.

      Implementations should validate if the variant accesses can be pushed down based on the data source's capabilities. If some accesses cannot be pushed down, the implementation can choose to:

      • Push down only the supported accesses and return true
      • Reject all pushdown and return false

      The implementation should store the variant access information that can be pushed down. The stored information will be retrieved later via pushedVariantAccess().

      Parameters:
      variantAccessInfo - Array of variant access information, one per variant column
      Returns:
      true if at least some variant accesses were pushed down, false if none were pushed
    • pushedVariantAccess

      VariantAccessInfo[] pushedVariantAccess()
      Returns the variant access information that has been pushed down to this scan.

      This method is called by the optimizer after pushVariantAccess(org.apache.spark.sql.connector.read.VariantAccessInfo[]) to retrieve what variant accesses were actually accepted by the data source. The optimizer uses this information to rewrite the query plan.

      If pushVariantAccess(org.apache.spark.sql.connector.read.VariantAccessInfo[]) was not called or returned false, this should return an empty array.

      Returns:
      Array of pushed down variant access information