sheetwise.extractors

Compression modules for SpreadsheetLLM framework (Enhanced).

Classes

DataFormatAggregator()

Implements data-format-aware aggregation for numerical cells

InvertedIndexTranslator()

Implements inverted-index translation with 2D block compression

StructuralAnchorExtractor([k])

Implements structural-anchor-based extraction for layout understanding

class sheetwise.extractors.StructuralAnchorExtractor(k=4)[source]

Implements structural-anchor-based extraction for layout understanding

__init__(k=4)[source]

Initialize with k parameter controlling neighborhood retention

Parameters:

k (int) – Number of rows/columns to retain around anchor points

find_structural_anchors(df)[source]

Identify heterogeneous rows and columns that serve as structural anchors.

Improvements (v2.5.1): - Added ‘Transition Detection’: Captures boundaries between regions (e.g., Header -> Body) - Vectorized implementation

Return type:

Tuple[List[int], List[int]]

extract_skeleton(df)[source]

Extract spreadsheet skeleton by keeping only structurally important rows/columns.

Return type:

DataFrame

class sheetwise.extractors.InvertedIndexTranslator[source]

Implements inverted-index translation with 2D block compression

translate(df)[source]
Return type:

Dict[str, List[str]]

class sheetwise.extractors.DataFormatAggregator[source]

Implements data-format-aware aggregation for numerical cells

aggregate(df)[source]

Aggregate cells by data format and type. Uses 2D Block Compression for token efficiency.

Return type:

Dict[str, Any]