A prominent online grocery delivery service streamlined its data validation and classification processes by implementing an AI-powered pipeline solution from Everforth Apex.
SITUATION
Our client, a leading online grocery delivery service, faced challenges with manually auditing and validating thousands of grocery product entries. This involved cross-referencing product images, web pages, and internal taxonomy documents to ensure consistency in product descriptions, categorization, and data quality. The process was labor-intensive, error-prone, and not scalable for millions of Stock Keeping Units (SKUs) and frequent catalog updates across more than 100 retail partners.
92% Accuracy In Automated Classification Validation Compared To Human Baseline
Everforth Apex implemented an AI-powered pipeline to automate product data validation and classification auditing using deep learning, computer vision, natural language processing (NLP), and Google Cloud Platform (GCP)-native AI tools. The solution included:
-
Image-to-Text Extraction: Optical character recognition (OCR) models extracted brand names, net weights, and product names from packaging images.
-
Visual Object Detection: You Only Look Once (YOLO) v8 models detected branded logos, product types, and package features.
-
Textual NLP Classification: HuggingFace models, fine-tuned on grocery taxonomy datasets, classified products based on textual descriptions.
-
Semantic Comparison Engine: Google’s Gemma large language model (LLM) compared OCR and YOLO outputs against expected taxonomy definitions to identify discrepancies.
-
Audit Report Generation: Automated reports highlighted mismatches between image-derived data, web text content, and internal taxonomy expectations.
RESULTS
The implementation of this AI-powered solution yielded significant improvements:
-
80% reduction in manual auditing time per product entry.
-
92% accuracy in automated classification validation compared to the human baseline.
-
35% reduction in data errors due to mismatches between image and text in the first rollout.
-
Continuous improvement through human-in-the-loop retraining of models.