Accurate document processing

A comprehensive document management solution implemented for a mid-sized financial services firm. The primary goal was to enhance efficiency, accuracy, and security in document handling processes, including splitting, classification, and extraction of critical data.

Background

The company faces challenges in processing a high volume of diverse documents like claim forms, birth cerificates, expense bills, prescriptions etc for the last 15 years. The manual processing was time-consuming, error-prone, and a bottleneck in customer service efficiency.

Objective

To automate and streamline the document handling process, reducing processing time, ensuring data accuracy, and maintaining high standards of data privacy and compliance.

Challenge

Volume and Variety: Millons of documents of varying document file sizes and handling a vast array of document types (around 140), each requiring different processing methods.
Accuracy and Efficiency: Ensuring high accuracy in data extraction and reducing turnaround time.
Compliance and Security: Meeting stringent data privacy standards and regulatory compliance.

Evaluation of Existing Tools

Azure Vision API: Good for initial document analysis but limited in handling complex data structures.
AWS Textract: Effective in text extraction but faced challenges with non-standard layouts.
Mindee: Excellent in specific data extraction but concerns arose regarding limitations with reading checkboxes and tables. Also, datastore location and SOC audit conformity was unclear.

Google Document AI

Why Google Document AI?

Versatility: Superior in handling a variety of document formats.
Accuracy: High accuracy in data extraction, even from complex layouts.
Compliance: Strong adherence to data privacy regulations and compliance standards.

Implementation Strategy

Initial Setup and Integration: Seamlessly integrated Google Document AI into the existing IT infrastructure.
Configured the API to align with specific document types and data extraction needs.
Document Splitting: Automated the segmentation of multi-page documents into individual units based on content and layout.
Document Classification: Trained the AI model to categorize documents into predefined classes such as invoices, contracts, etc.
Data Extraction: Extracted key information like dates, amounts, and client details with high precision.
Customized the extraction templates for specific document types.
Human-in-the-Loop (HITL) Setup: Incorporated a HITL workflow for quality assurance and handling exceptions.
Trained staff to review and validate AI-processed data, ensuring accuracy and compliance.

Outcome

Processing Time: Reduced document processing time by 70%, significantly increasing operational efficiency.
Accuracy: Achieved 90% accuracy in data extraction, minimizing the risk of errors.
Lowered Operational Costs: More documents are processed in less time and fewer documents fall into HITL pipelines.

Lessons learned

The importance of choosing a tool that aligns with specific business needs and compliance requirements.
The necessity of integrating HITL in AI-driven workflows to ensure quality and compliance.
The value of continuous training and model refinement to adapt to evolving document formats and business needs.