GenAI - Driven Face Recognition Data Collection for Cybersecurity Firm

SUMMARY

To enhance AI-based facial recognition, Oprimes collected over 120,000+ diverse face
images from 25+ countries and 50+ device types. This effort reduced bias, boosted model
accuracy by 15%, and improved performance under low-light and varied conditions. It also
shortened AI training time by 2–3 weeks, accelerating time-to-market.

THE CHALLENGE

  • Lack of Diversity in training data caused biased facial recognition across ethnicities,
    lighting, and expressions.
  • Poor Performance on low-end devices and in low-light or motion conditions.
  • Synthetic Data Limitations needed real-world samples to avoid inaccuracies.

SOLUTION

  • Global Coverage: Focused on achieving demographic and environmental diversity through wide geographic representation.
  • Real-world Grounding: Ensured synthetic data generated via GenAI was grounded in real-world samples to maintain accuracy.
  • Human-in-the-Loop Validation: Manual checks helped improve dataset quality by 15%.

KEY OUTCOMES

  • Data Collection: Over 120k face images from 25+ countries, capturing a wide range of ethnicities, lighting, expressions, and devices.
  • Diverse Datasets: Included 20,000+ low-light images, 15,000+ images with accessories (glasses, masks), and 10,000+ facial expression variations.
  • Captured data using 50+ device types under diverse conditions (low-light, varied angles, with accessories).