AI Service > Speech to Text > Overview
Speech to Text(STT) service uses NHN Cloud's speech recognition and text synthesis technology to recognize input voice and convert the recognized voice into text. It can be applied to various fields that require conversion of voice into text such as voice dictation, device control through voice, and voice chatbot service.
Main Features
For more accurate speech recognition, please refer to the following guide.
- Supported formats for voice file upload: WAV, WebM, MP3, OGG, FLAC, AAC, AC3
- Maximum size: 3MB
- Supported duration for voice file recognition: minimum 0.36 seconds, maximum 60 seconds
- Recommendations
- File format: WAV
- Bit: 16bit
- Sample rate: 16 kHz
- Number of channels: Mono
- Voice file duration: 10 seconds
- Please record in an environment that is as quiet as possible.
Service Targets
- When it is necessary to build a feature to automatically dictate voice (customer center consultation, subtitle creation, etc.)
- When device control through voice is required (IoT device, etc.)
- When you are building a voice chatbot service
- While using the Speech to Text service, the customer may collect/use the user's personal information. In this case, the customer is obliged to comply with relevant laws such as the Personal Information Protection Act. In addition, by using this service, the customer consigns and provides the work regarding personal information processing to NHN Cloud. A customer in the status of a consignor may conclude a separate written personal information processing consignment contract with NHN Cloud, the consignee, and may notify the following in the privacy policy operated by the customer, and must obtain consent for provision of personal information to a third party from users.
- Consignee: NHN Cloud Corp.
- Consignment Description: Providing the Speech to Text service