sheetwise package¶
SpreadsheetLLM: A Python Package for Encoding Spreadsheets for Large Language Models
This package implements the key components from the SpreadsheetLLM research: - SheetCompressor: Efficient encoding framework with three modules - Chain of Spreadsheet: Multi-step reasoning approach - Vanilla encoding methods with cell addresses and formats
Additional features include: - Formula extraction and analysis - Multi-sheet workbook support - Advanced table detection - Visualization tools
Based on the research paper: “SpreadsheetLLM: Encoding Spreadsheets for Large Language Models” by Microsoft Research Team
- class sheetwise.SpreadsheetLLM(compression_params=None, enable_logging=False)[source]¶
Bases:
objectMain class integrating all SheetWise components. Includes Offline SQL and JSON export capabilities.
- compress_and_encode_for_llm(df)[source]¶
Original Markdown encoding (retained for compatibility).
- Return type:
- encode_compressed_for_llm(compressed_result)[source]¶
Generate text representation (Markdown).
- Return type:
- encode_to_json(df)[source]¶
Encode compressed spreadsheet data into structured JSON. Ideal for piping into other scripts or APIs.
- Return type:
- load_from_file(filepath)[source]¶
Load spreadsheet from file with robust type detection. Detects file type using magic numbers (signatures) rather than extensions.
- Return type:
- query_sql(df, sql_query, params=None)[source]¶
Run a SQL query against the DataFrame using DuckDB with enhanced security.
- Parameters:
df (
DataFrame) – The dataframe to query (registered as table ‘input_data’)sql_query (
str) – SQL query. Use ‘input_data’ to refer to the dataframe. Example: “SELECT * FROM input_data WHERE Year > ?”params (
Union[list,Dict[str,Any],None]) – Optional parameters for the query to prevent SQL injection. Supports list (for ‘?’) or dict (for ‘$name’) parameters.
- Return type:
- Returns:
Result as a new DataFrame
- class sheetwise.SheetCompressor(k=4, use_extraction=True, use_translation=True, use_aggregation=True)[source]¶
Bases:
objectMain compression framework combining all three modules. Optimized for memory efficiency.
- class sheetwise.VanillaEncoder[source]¶
Bases:
objectVanilla spreadsheet encoding with cell addresses and formats
- class sheetwise.ChainOfSpreadsheet(compressor=None)[source]¶
Bases:
objectImplements deterministic ‘Chain of Spreadsheet’ reasoning. Uses fuzzy matching and heuristic scoring instead of LLMs.
- class sheetwise.CellInfo(address, value, data_type, format_string=None, row=0, col=0)[source]¶
Bases:
objectInformation about a spreadsheet cell
- __init__(address, value, data_type, format_string=None, row=0, col=0)¶
- class sheetwise.TableRegion(top_left, bottom_right, rows, cols, confidence=0.0)[source]¶
Bases:
objectRepresents a detected table region in the spreadsheet
- __init__(top_left, bottom_right, rows, cols, confidence=0.0)¶
- sheetwise.create_realistic_spreadsheet()[source]¶
Create a realistic large, sparse spreadsheet with multiple tables
- class sheetwise.FormulaParser[source]¶
Bases:
objectExtracts, analyzes and simplifies Excel formulas from spreadsheets. Optimized for memory usage with streaming reads.
- CELL_REF_PATTERN = re.compile('([A-Z]+[0-9]+|[A-Z]+\\:[A-Z]+|[0-9]+\\:[0-9]+|[A-Z]+[0-9]+\\:[A-Z]+[0-9]+)')¶
- extract_formulas(excel_path, sheet_name=None)[source]¶
Extract all formulas from an Excel file using Memory-Efficient Streaming.
- class sheetwise.FormulaDependencyAnalyzer(formula_parser=None)[source]¶
Bases:
objectSpecialized analyzer for formula dependencies.
- class sheetwise.CompressionVisualizer(enable_interactive=True)[source]¶
Bases:
objectVisualization tools for spreadsheet compression analysis. Now includes Interactive HTML Reports.
- create_data_density_heatmap(df, title='Data Density Heatmap')[source]¶
Generate a heatmap showing data density in the spreadsheet.
- Return type:
Figure
- generate_html_report(original_df, compressed_result)[source]¶
Legacy static report (Backwards compatibility)
- Return type:
- class sheetwise.WorkbookManager[source]¶
Bases:
objectManages multi-sheet workbooks and cross-sheet references.
This class provides utilities to: 1. Load and process entire Excel workbooks with multiple sheets 2. Handle cross-sheet references and relationships 3. Compress entire workbooks 4. Identify inter-sheet relationships
- encode_workbook_for_llm(compression_results)[source]¶
Generate LLM-ready encoding of the entire workbook.
- class sheetwise.SmartTableDetector(min_table_size=2, max_empty_ratio=0.7, header_detection=True)[source]¶
Bases:
objectAdvanced table detection with enhanced capabilities.
This class provides utilities to: 1. Detect multiple tables in spreadsheets 2. Identify table headers and structures 3. Classify tables by type 4. Handle complex table layouts
- __init__(min_table_size=2, max_empty_ratio=0.7, header_detection=True)[source]¶
Initialize the detector.
- class sheetwise.TableType(value)[source]¶
Bases:
EnumTypes of tables that can be detected.
- DATA_TABLE = 'data_table'¶
- PIVOT_TABLE = 'pivot_table'¶
- MATRIX = 'matrix'¶
- FORM = 'form'¶
- MIXED = 'mixed'¶
- SPARSE = 'sparse'¶
- class sheetwise.EnhancedTableRegion(top_left, bottom_right, rows, cols, confidence=1.0, table_type=TableType.DATA_TABLE, has_headers=False, header_rows=None, header_cols=None)[source]¶
Bases:
TableRegionExtended table region with additional metadata.
- __init__(top_left, bottom_right, rows, cols, confidence=1.0, table_type=TableType.DATA_TABLE, has_headers=False, header_rows=None, header_cols=None)¶
Submodules¶
- sheetwise.chain module
- sheetwise.classifiers module
- sheetwise.cli module
- sheetwise.compressor module
- sheetwise.core module
NumpyEncoderSpreadsheetLLMSpreadsheetLLM.__init__()SpreadsheetLLM.load_from_file()SpreadsheetLLM.query_sql()SpreadsheetLLM.encode_to_json()SpreadsheetLLM.auto_configure()SpreadsheetLLM.compress_and_encode_for_llm()SpreadsheetLLM.encode_compressed_for_llm()SpreadsheetLLM.process_qa_query()SpreadsheetLLM.encode_vanilla()SpreadsheetLLM.compress_spreadsheet()SpreadsheetLLM.get_encoding_stats()
- sheetwise.data_types module
- sheetwise.detectors module
- sheetwise.encoders module
- sheetwise.extractors module
- sheetwise.formula_parser module
- sheetwise.smart_tables module
TableTypeEnhancedTableRegionEnhancedTableRegion.table_typeEnhancedTableRegion.has_headersEnhancedTableRegion.header_rowsEnhancedTableRegion.header_colsEnhancedTableRegion.confidenceEnhancedTableRegion.start_rowEnhancedTableRegion.end_rowEnhancedTableRegion.start_colEnhancedTableRegion.end_colEnhancedTableRegion.__init__()EnhancedTableRegion.top_leftEnhancedTableRegion.bottom_rightEnhancedTableRegion.rowsEnhancedTableRegion.cols
SmartTableDetector
- sheetwise.utils module
- sheetwise.visualizer module
- sheetwise.workbook module