sheetwise¶
SpreadsheetLLM: A Python Package for Encoding Spreadsheets for Large Language Models
This package implements the key components from the SpreadsheetLLM research: - SheetCompressor: Efficient encoding framework with three modules - Chain of Spreadsheet: Multi-step reasoning approach - Vanilla encoding methods with cell addresses and formats
Additional features include: - Formula extraction and analysis - Multi-sheet workbook support - Advanced table detection - Visualization tools
Based on the research paper: “SpreadsheetLLM: Encoding Spreadsheets for Large Language Models” by Microsoft Research Team
- class sheetwise.SpreadsheetLLM(compression_params=None, enable_logging=False)[source]¶
Main class integrating all SheetWise components. Includes Offline SQL and JSON export capabilities.
- compress_and_encode_for_llm(df)[source]¶
Original Markdown encoding (retained for compatibility).
- Return type:
- encode_compressed_for_llm(compressed_result)[source]¶
Generate text representation (Markdown).
- Return type:
- encode_to_json(df)[source]¶
Encode compressed spreadsheet data into structured JSON. Ideal for piping into other scripts or APIs.
- Return type:
- load_from_file(filepath)[source]¶
Load spreadsheet from file with robust type detection. Detects file type using magic numbers (signatures) rather than extensions.
- Return type:
- query_sql(df, sql_query, params=None)[source]¶
Run a SQL query against the DataFrame using DuckDB with enhanced security.
- Parameters:
df (
DataFrame) – The dataframe to query (registered as table ‘input_data’)sql_query (
str) – SQL query. Use ‘input_data’ to refer to the dataframe. Example: “SELECT * FROM input_data WHERE Year > ?”params (
Union[list,Dict[str,Any],None]) – Optional parameters for the query to prevent SQL injection. Supports list (for ‘?’) or dict (for ‘$name’) parameters.
- Return type:
- Returns:
Result as a new DataFrame
- class sheetwise.SheetCompressor(k=4, use_extraction=True, use_translation=True, use_aggregation=True)[source]¶
Main compression framework combining all three modules. Optimized for memory efficiency.
- class sheetwise.VanillaEncoder[source]¶
Vanilla spreadsheet encoding with cell addresses and formats
- class sheetwise.ChainOfSpreadsheet(compressor=None)[source]¶
Implements deterministic ‘Chain of Spreadsheet’ reasoning. Uses fuzzy matching and heuristic scoring instead of LLMs.
- class sheetwise.CellInfo(address, value, data_type, format_string=None, row=0, col=0)[source]¶
Information about a spreadsheet cell
- __init__(address, value, data_type, format_string=None, row=0, col=0)¶
- class sheetwise.TableRegion(top_left, bottom_right, rows, cols, confidence=0.0)[source]¶
Represents a detected table region in the spreadsheet
- __init__(top_left, bottom_right, rows, cols, confidence=0.0)¶
- sheetwise.create_realistic_spreadsheet()[source]¶
Create a realistic large, sparse spreadsheet with multiple tables
- class sheetwise.FormulaParser[source]¶
Extracts, analyzes and simplifies Excel formulas from spreadsheets. Optimized for memory usage with streaming reads.
- CELL_REF_PATTERN = re.compile('([A-Z]+[0-9]+|[A-Z]+\\:[A-Z]+|[0-9]+\\:[0-9]+|[A-Z]+[0-9]+\\:[A-Z]+[0-9]+)')¶
- extract_formulas(excel_path, sheet_name=None)[source]¶
Extract all formulas from an Excel file using Memory-Efficient Streaming.
- class sheetwise.FormulaDependencyAnalyzer(formula_parser=None)[source]¶
Specialized analyzer for formula dependencies.
- class sheetwise.CompressionVisualizer(enable_interactive=True)[source]¶
Visualization tools for spreadsheet compression analysis. Now includes Interactive HTML Reports.
- create_data_density_heatmap(df, title='Data Density Heatmap')[source]¶
Generate a heatmap showing data density in the spreadsheet.
- Return type:
Figure
- generate_html_report(original_df, compressed_result)[source]¶
Legacy static report (Backwards compatibility)
- Return type:
- class sheetwise.WorkbookManager[source]¶
Manages multi-sheet workbooks and cross-sheet references.
This class provides utilities to: 1. Load and process entire Excel workbooks with multiple sheets 2. Handle cross-sheet references and relationships 3. Compress entire workbooks 4. Identify inter-sheet relationships
- encode_workbook_for_llm(compression_results)[source]¶
Generate LLM-ready encoding of the entire workbook.
- class sheetwise.SmartTableDetector(min_table_size=2, max_empty_ratio=0.7, header_detection=True)[source]¶
Advanced table detection with enhanced capabilities.
This class provides utilities to: 1. Detect multiple tables in spreadsheets 2. Identify table headers and structures 3. Classify tables by type 4. Handle complex table layouts
- __init__(min_table_size=2, max_empty_ratio=0.7, header_detection=True)[source]¶
Initialize the detector.
- class sheetwise.TableType(value)[source]¶
Types of tables that can be detected.
- DATA_TABLE = 'data_table'¶
- PIVOT_TABLE = 'pivot_table'¶
- MATRIX = 'matrix'¶
- FORM = 'form'¶
- MIXED = 'mixed'¶
- SPARSE = 'sparse'¶
- class sheetwise.EnhancedTableRegion(top_left, bottom_right, rows, cols, confidence=1.0, table_type=TableType.DATA_TABLE, has_headers=False, header_rows=None, header_cols=None)[source]¶
Extended table region with additional metadata.
- __init__(top_left, bottom_right, rows, cols, confidence=1.0, table_type=TableType.DATA_TABLE, has_headers=False, header_rows=None, header_cols=None)¶
Modules
Chain of Spreadsheet reasoning implementation (Offline Edition). |
|
Data type classification utilities for spreadsheet cells. |
|
Command line interface for SheetWise. |
|
Main compression framework combining all modules. |
|
Main SpreadsheetLLM class integrating all components (Offline Edition). |
|
Data types and structures used throughout the SpreadsheetLLM package. |
|
Table detection utilities. |
|
Encoding utilities for spreadsheet data. |
|
Compression modules for SpreadsheetLLM framework (Enhanced). |
|
Formula parsing and analysis utilities for spreadsheets. |
|
Advanced table detection and classification utilities. |
|
Utility functions and demo data creation. |
|
Visualization utilities for spreadsheet compression. |
|
Multi-sheet workbook handling and cross-sheet reference management. |