White Papers

Semi-structured Document Feature Extraction

The document discusses the challenges organizations face in dealing with semi-structured documents, particularly spreadsheets, due to their diverse formats and lack of standardization. It highlights the presence of defects within spreadsheets, often unnoticed by end-users, which pose difficulties for automated processes. The document proposes a method to classify spreadsheet elements and create a structured format resembling a JSON file to address these challenges.

Click here to open the white paper

Table Layout Regular Expression - Layex

In the modern landscape of data presentation, tables serve as a ubiquitous tool for organizing and conveying information efficiently. Whether in the structured presentation of scientific findings or the widespread use of spreadsheets in corporate environments, tables play a pivotal role in facilitating data interpretation. Consequently, the extraction of valuable insights encapsulated within these tables becomes paramount in any data pipeline process. This white paper introduces a novel mechanism designed to streamline the extraction of data from tables, particularly those with intricate layouts. Through the construction of a regular language customized to tabular representation, it aims to enhance efficiency and accuracy in data extraction processes, ultimately empowering organizations to unlock the full potential of their tabular data assets.

Click here to open the white paper