sheetwise.smart_tables module

Advanced table detection and classification utilities.

class sheetwise.smart_tables.TableType(value)[source]

Bases: Enum

Types of tables that can be detected.

DATA_TABLE = 'data_table'
PIVOT_TABLE = 'pivot_table'
MATRIX = 'matrix'
FORM = 'form'
MIXED = 'mixed'
SPARSE = 'sparse'
class sheetwise.smart_tables.EnhancedTableRegion(top_left, bottom_right, rows, cols, confidence=1.0, table_type=TableType.DATA_TABLE, has_headers=False, header_rows=None, header_cols=None)[source]

Bases: TableRegion

Extended table region with additional metadata.

table_type: TableType = 'data_table'
has_headers: bool = False
header_rows: List[int] = None
header_cols: List[int] = None
confidence: float = 1.0
property start_row: int
property end_row: int
property start_col: int
property end_col: int
__init__(top_left, bottom_right, rows, cols, confidence=1.0, table_type=TableType.DATA_TABLE, has_headers=False, header_rows=None, header_cols=None)
top_left: str
bottom_right: str
rows: range
cols: range
class sheetwise.smart_tables.SmartTableDetector(min_table_size=2, max_empty_ratio=0.7, header_detection=True)[source]

Bases: object

Advanced table detection with enhanced capabilities.

This class provides utilities to: 1. Detect multiple tables in spreadsheets 2. Identify table headers and structures 3. Classify tables by type 4. Handle complex table layouts

__init__(min_table_size=2, max_empty_ratio=0.7, header_detection=True)[source]

Initialize the detector.

Parameters:
  • min_table_size (int) – Minimum number of rows/columns to consider a table

  • max_empty_ratio (float) – Maximum ratio of empty cells allowed in a table

  • header_detection (bool) – Whether to detect headers

detect_tables(df)[source]

Detect multiple tables in a spreadsheet.

Parameters:

df (DataFrame) – Input DataFrame

Return type:

List[EnhancedTableRegion]

Returns:

List of detected enhanced table regions

extract_tables_to_dataframes(df)[source]

Extract all tables from a spreadsheet into separate dataframes.

Parameters:

df (DataFrame) – Input DataFrame

Return type:

Dict[str, DataFrame]

Returns:

Dictionary mapping table names to extracted DataFrames