AI Service > Speech to Text > Overview
Speech to Text(STT) service uses NHN Cloud's speech recognition and text synthesis technology to recognize input voice and convert the recognized voice into text. It can be applied to various fields that require conversion of voice into text such as voice dictation, device control through voice, and voice chatbot service.
Voice Input Guide
For more accurate speech recognition, please refer to the following guide.
- Supported formats for voice file upload: WAV, WebM, MP3, OGG, FLAC, AAC, AC3
- Maximum size: 3MB
- Supported duration for voice file recognition: minimum 0.36 seconds, maximum 60 seconds
- File format: WAV
- Bit: 16bit
- Sample rate: 16 kHz
- Number of channels: Mono
- Voice file duration: 10 seconds
- Please record in an environment that is as quiet as possible.
- When it is necessary to build a feature to automatically dictate voice (customer center consultation, subtitle creation, etc.)
- When device control through voice is required (IoT device, etc.)
- When you are building a voice chatbot service
Information on Processing of Personal Information
- Consignee: NHN Cloud Corp.
- Consignment Description: Providing the Speech to Text service