Worked in a Canadian startup developing smart solutions to improve healthcare admin processes.
Overview
Developed a Document AI engine to extract information from medical documents, simplifying tasks
for healthcare staff. The engine includes a multimodal transformer (image-encoder decoder model) for
visual extraction and post-processoring using OpenAI API for cleaning, validation, and reformatting.
Contributions
- Design and implement entire AI engine system in Python
- Label datasets
- Train model on AWS EC2
- Develop evaluation procedure for engine performance
- Develop post processing, including integration with OpenAI API
- Develop procedure for detecting duplicate files
- Deploy AI engine & backend on AWS (EC2, SQS, S3, Lambda, DynamoDB)
System Diagrams
AI Engine:

Backend:
Results
The AI engine is robust, effectively handling documents with high noise levels, handwriting,
and information scattered across multiple pages.
It achieves an accuracy of over 85% on
client documents, significantly improving from the previous version's ~60% and surpassing
general-purpose models like GPT-4o, which performs at around ~75%.
The AI engine processing time for a single page file is about ~6s, with ~4s for each additional
page. End-to-end processing time (from file upload to seeing output on UI), is about ~10s for a
single page file.