Back to all projects
AI/ML2025Featured
MetaExtract
Extract 45,000+ metadata fields from any document
PythonFastAPIReactDockerTesseract OCRPostgreSQL
Problem
Organizations need to extract structured metadata from thousands of heterogeneous documents — invoices, contracts, medical records, research papers — each with different formats and field layouts.
Approach
Built a FastAPI backend with a modular extraction pipeline. Each document type gets a dedicated parser combining regex patterns, layout analysis, and ML classification. Results are normalized into a unified schema and served via REST API with React dashboard for monitoring.
Result
Production-grade metadata extraction system built for document-heavy healthcare workflows.