sheetwise.extractors¶

Compression modules for SpreadsheetLLM framework (Enhanced).

Classes

`DataFormatAggregator`()	Implements data-format-aware aggregation for numerical cells
`InvertedIndexTranslator`()	Implements inverted-index translation with 2D block compression
`StructuralAnchorExtractor`([k])	Implements structural-anchor-based extraction for layout understanding

class sheetwise.extractors.StructuralAnchorExtractor(k=4)[source]¶

Implements structural-anchor-based extraction for layout understanding

__init__(k=4)[source]¶

Initialize with k parameter controlling neighborhood retention

Parameters:: k (int) – Number of rows/columns to retain around anchor points

find_structural_anchors(df)[source]¶

Identify heterogeneous rows and columns that serve as structural anchors.

Improvements (v2.5.1): - Added ‘Transition Detection’: Captures boundaries between regions (e.g., Header -> Body) - Vectorized implementation

extract_skeleton(df)[source]¶

Extract spreadsheet skeleton by keeping only structurally important rows/columns.

class sheetwise.extractors.InvertedIndexTranslator[source]¶

Implements inverted-index translation with 2D block compression

translate(df)[source]¶

class sheetwise.extractors.DataFormatAggregator[source]¶

Implements data-format-aware aggregation for numerical cells

aggregate(df)[source]¶

Aggregate cells by data format and type. Uses 2D Block Compression for token efficiency.