Since 2016, Digital Nirvana has been providing live streaming transcription services to leading financial institutions. Digital Nirvana implements a synchronous (live streaming) ASR model, which can create language-specific models for a single customer, leading to large-scale improvements in output quality. Typically using this methodology, we have seen quality improvements in the range of at least four percentage points with every update. The Speech to Text (STT) engines can do auto ML-based training and supervised training based on the corrected data feedback provided by the customer.
These automatic speech recognition engines can do specific and concentrated training on data from a particular customer, region, or industry segment. Once trained, these models prove to generate high-quality automated transcripts in comparison to generic models. We also offer to correct automated transcripts as a service where high-quality transcripts are returned to the customer and feedback to the ASR engines. Moreover, if daily news feeds are provided to the ASR engine, it will automatically update the language model using the text documents. Periodic updates to the acoustic models are done. Customization and client-specific models can also be developed and maintained.
The current latency ranges around 25 seconds, and Digital Nirvana is striving hard to get the latency down to milliseconds so that this solution can be utilized for live captioning. Leading companies in the financial space use this service to stream transcripts of earnings and corporate conference calls conducted by Fortune 500 companies to their Terminal in real-time. The service can scale up to 400 hours of live transcription in a day during the earnings release season in the financial industry and scale down to fewer hours during other days in a calendar quarter.
The services can be utilized via REST APIs.