HOME > CASE STUDIES > YOU ARE HERE

Case Study: Automating Data Entry in On-Prem CRM with AWS Services

Consultative and Engineering Support towards automating legacy data entry processes in an on-premise CRM.

  • S3
  • SQL
  • Lambda
  • Textract
  • Step Functions

Project Background

The client, a mid-sized firm in the financial services sector, was struggling with manual data entry processes involving scanned documents, such as quarterly accounts submissions, contracts, and compliance forms. These documents needed to be processed into an on-prem CRM, where key data points could be extracted and stored for further processing, analysis, and decision-making.

Previously, the client relied on a combination of manual entry and rudimentary OCR (Optical Character Recognition) tools to extract data from scanned documents. The manual process led to several operational inefficiencies, including delays in processing time, human errors, and significant overhead costs. Additionally, the existing OCR tools had limited accuracy and did not integrate well with their legacy on-premise CRM, leading to further workflow disruptions.



Research & Development

The primary objectives of the client were:

  1. Automating data extraction from scanned documents and importing it into their CRM.
  2. Reducing human error and accelerating the data processing times.
  3. Ensuring scalability to handle increasing document volumes.
  4. Providing a cost-effective solution that could scale without the need for significant infrastructure investments or specialised personnel on staff.

During the research phase, we conducted an in-depth analysis of the client’s existing data entry workflow. Key bottlenecks were identified, including slow OCR performance, limited CRM integration, and the need for significant manual intervention. The client also expressed concerns about the scalability of their current process as the volume of incoming documents increased.

To address these challenges, we proposed leveraging AWS’s serverless offerings for scalable and more accurate OCR processing. After evaluating various options, we proposed the following:

  1. Amazon Textract for OCR processing due to its ability to automatically extract text, forms, and tables from scanned documents with high accuracy.
  2. AWS Lambda for serverless execution of OCR tasks, and a pay-per-use model without managing servers.
  3. Amazon S3 for storage of the scanned documents before and after OCR processing.
  4. AWS Step Functions to orchestrate the OCR workflow.
  5. Proprietary APIs to integrate with AWS and automatically push processed data into the client's on-premise CRM.

These services were selected for their seamless integration with each other in AWS Cloud, and their pay-as-you-go pricing model, which ensured the client only paid for the resources they used.

Architecture

The solution was designed to be entirely serverless, ensuring that it could scale effortlessly with the client’s growing document processing needs. The architecture included the following key components:

  1. Amazon S3: Scanned documents are uploaded to an S3 bucket, triggering the processing workflow.
  2. AWS Lambda: A Lambda function is triggered by the upload of a new document to S3. This function calls Amazon Textract to perform OCR on the document.
  3. Amazon Textract: Textract processes the document and extracts the required data points, including both structured and unstructured data such as names, addresses, and financial amounts.
  4. AWS Step Functions: Step Functions coordinate the workflow, handling failures, retries, and orchestrating the process of calling Textract, validating the results, and pushing the data to the client's CRM via a proprietary REST API.
  5. Proprietary APIs: The extracted data is sent via REST API calls to an on-prem Windows Server, where it is mapped to the appropriate fields in the client’s CRM system.

This architecture provided several benefits:

  1. Scalability: The serverless design allowed the client to process a large number of documents simultaneously without any performance degradation.
  2. Cost Efficiency: Since the chosen AWS services are pay-per-use, the client only incurred costs when documents were processed, eliminating the need for upfront infrastructure investment.
  3. Automation: The end-to-end automation reduced manual interventions, eliminated errors, and significantly shortened processing times.
  4. Security: Using AWS IAM roles, encryption, and controlled access to S3 buckets, the data of the client and its customers remained segregated, secure and compliant with industry standards.

Deployment

The deployment was divided into several key stages, ensuring a smooth rollout:

  1. Infrastructure Setup:
    • S3 buckets were configured to store incoming scanned documents and processed outputs. Permissions were set using AWS IAM to ensure secure access.
    • The client’s proprietary API was configured to accept incoming data from AWS Lambda with appropriate OAuth authentication.
  2. Lambda Functions and Step Functions:
    • AWS Lambda functions were written to trigger OCR processing when a new document was uploaded to S3. Additional functions handled error checking, validation, and communication with the client's on-prem CRM.
    • AWS Step Functions were used to orchestrate the entire workflow, ensuring proper sequencing and error handling.
  3. On-Prem CRM Integration:
    • A custom API integration was developed to map the extracted OCR data to CRM objects.
    • Data validation scripts ensured that only accurate and relevant data was pushed into the CRM, avoiding data quality issues.
  4. Testing and Validation:
    • Extensive testing was conducted using a wide variety of scanned documents to ensure that Textract handled various document types (forms, tables, unstructured documents) effectively.
    • The solution was tested for performance and scalability by processing large batches of documents simultaneously.
  5. CI/CD Pipeline:
    • A CI/CD pipeline was established using GitHub Actions to automate deployment and ensure ongoing updates could be rolled out with minimal disruption.

Conclusion

The implementation of the serverless OCR data processing solution provided significant business value to the client. By integrating AWS's serverless services with the client's on-prem CRM, the client was able to automate a previously manual and error-prone process, resulting in:

  1. 80% reduction in processing time: Documents that previously took hours to process manually were now processed in minutes.
  2. Significant cost savings: The client experienced a 35% reduction in operational costs, primarily due to the elimination of manual data entry and the pay-per-use nature of the application.
  3. Improved data accuracy: The solution reduced human errors by automating the data extraction and entry process.
  4. Scalability: The client’s document processing capability scaled effortlessly to meet growing demands without requiring additional infrastructure.

The new architecture resolved the client’s pain points by delivering a solution that not only optimized their current workflow but also positioned them for future growth. The use of serverless technology allowed the client to focus on their core business, confident that their data processing system could grow with them.

Found this useful? Spread the word 🙏

Share this Case Study on LinkedIn

Need help with a Data or Cloud use-case? Get in touch today 🤝