IBM has added three new advancements to the Watson Cloud platform: real-time Speaker Diarisation (beta) support available via the Watson Speech to Text API, Visual Recognition tagging with a built-in set of visual labels, and the Watson Discovery Service.
With these new capabilities, developers will be able to add intelligent visual recognition and speech to text capabilities to web and mobile applications.
Speaker diarisation is basically used for speech transcription. IBM describes the process as “the algorithms used to identify and segment speech by speaker identity.” By adding speaker diarisation to the Watson Speech to Text API, developers will be able to build applications capable of analysing conversations and taking action while the conversation is happening between two people in real-time.
IBM has updated Visual Recognition tagging and now includes a built-in library of tens of thousands of visual labels, allowing the platform to recognise various visual concepts such as objects, people, places, activities, scenes. Watson Visual Recognition can recognise broad visual concepts and objects in photos and understand visual scenes based on context. It also features custom training and classification capabilities.
The Watson Discovery Service converts, normalises, and betters streams of data so that the content can be analysed to gain insights, discover patterns, and contextualised. This happens via using integrated Watson APIs such as the AlchemyLanguage API and Document Conversion API. Developers can upload their own sets of data to the service or use publicly available datasets.