GenAI - German Automotive Command Utterances Collection with Dialect Diversity

SUMMARY

For a global automotive client, we collected and recorded 1.5 lakh unique and duplicate German phrases. These phrases, used in voice-activated automobile systems, were spoken by native German speakers with dialect variations to ensure real-world diversity. The recordings were humanized and natural, enabling better ASR (Automatic Speech Recognition) training.

THE CHALLENGE

  • Ensuring dialectal diversity among native German speakers
  • Maintaining consistency in pronunciation across duplicated phrases
  • Achieving natural, non-robotic delivery while adhering to strict phrase structure
  • Large-scale recording effort with quality control under tight timelines

SOLUTION

  • Curated a pool of native German speakers across multiple dialect zones
  • Set up guided recording workflows ensuring tone, clarity, and pronunciation standards
  • Performed dual-level QC on duplicate utterances for variation and correctness
  • Implemented streamlined project tracking to meet aggressive delivery timelines
  • Delivered structured and labelled outputs ready for model ingestion

KEY OUTCOMES

  • 100% humanized utterances from real native German speakers
  • Enhanced ASR training datasets tailored for in-car voice commands
  • High accuracy and dialectal coverage achieved across all phrase variants
  • Delivered within 4-month timeline, covering 1.5 lakh phrase recordings