[watsonstt] initial contribution (#12161)
* [watsonstt] initial contribution Signed-off-by: Miguel Álvarez Díez <miguelwork92@gmail.com>
This commit is contained in:
parent
99cfb65aba
commit
9a086fd6e3
|
@ -383,6 +383,7 @@
|
||||||
/bundles/org.openhab.voice.pollytts/ @hillmanr
|
/bundles/org.openhab.voice.pollytts/ @hillmanr
|
||||||
/bundles/org.openhab.voice.porcupineks/ @GiviMAD
|
/bundles/org.openhab.voice.porcupineks/ @GiviMAD
|
||||||
/bundles/org.openhab.voice.voicerss/ @JochenHiller @lolodomo
|
/bundles/org.openhab.voice.voicerss/ @JochenHiller @lolodomo
|
||||||
|
/bundles/org.openhab.voice.watsonstt/ @GiviMAD
|
||||||
/itests/org.openhab.binding.astro.tests/ @gerrieg
|
/itests/org.openhab.binding.astro.tests/ @gerrieg
|
||||||
/itests/org.openhab.binding.avmfritz.tests/ @cweitkamp
|
/itests/org.openhab.binding.avmfritz.tests/ @cweitkamp
|
||||||
/itests/org.openhab.binding.feed.tests/ @svilenvul
|
/itests/org.openhab.binding.feed.tests/ @svilenvul
|
||||||
|
|
|
@ -1906,6 +1906,11 @@
|
||||||
<artifactId>org.openhab.voice.voicerss</artifactId>
|
<artifactId>org.openhab.voice.voicerss</artifactId>
|
||||||
<version>${project.version}</version>
|
<version>${project.version}</version>
|
||||||
</dependency>
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>org.openhab.addons.bundles</groupId>
|
||||||
|
<artifactId>org.openhab.voice.watsonstt</artifactId>
|
||||||
|
<version>${project.version}</version>
|
||||||
|
</dependency>
|
||||||
</dependencies>
|
</dependencies>
|
||||||
|
|
||||||
</project>
|
</project>
|
||||||
|
|
|
@ -0,0 +1,20 @@
|
||||||
|
This content is produced and maintained by the openHAB project.
|
||||||
|
|
||||||
|
* Project home: https://www.openhab.org
|
||||||
|
|
||||||
|
== Declared Project Licenses
|
||||||
|
|
||||||
|
This program and the accompanying materials are made available under the terms
|
||||||
|
of the Eclipse Public License 2.0 which is available at
|
||||||
|
https://www.eclipse.org/legal/epl-2.0/.
|
||||||
|
|
||||||
|
== Source Code
|
||||||
|
|
||||||
|
https://github.com/openhab/openhab-addons
|
||||||
|
|
||||||
|
== Third-party Content
|
||||||
|
|
||||||
|
com.ibm.watson: speech-to-text
|
||||||
|
* License: Apache 2.0 License
|
||||||
|
* Project: https://github.com/watson-developer-cloud/java-sdk
|
||||||
|
* Source: https://github.com/watson-developer-cloud/java-sdk/tree/master/speech-to-text
|
|
@ -0,0 +1,65 @@
|
||||||
|
# IBM Watson Speech-to-Text
|
||||||
|
|
||||||
|
Watson STT Service uses the non-free IBM Watson Speech-to-Text API to transcript audio data to text.
|
||||||
|
Be aware that using this service may incur cost on your IBM account.
|
||||||
|
You can find pricing information on [this page](https://www.ibm.com/cloud/watson-speech-to-text/pricing).
|
||||||
|
|
||||||
|
## Obtaining Credentials
|
||||||
|
|
||||||
|
Before you can use this add-on, you should create a Speech-to-Text instance in the IBM Cloud service.
|
||||||
|
|
||||||
|
* Go to the following [link](https://cloud.ibm.com/catalog/services/speech-to-text) and create the instance in your desired region.
|
||||||
|
* After the instance is created you should be able to view its url and api key.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### Authentication Configuration
|
||||||
|
|
||||||
|
Use your favorite configuration UI to edit **Settings / Other Services - IBM Watson Speech-to-Text** and set:
|
||||||
|
|
||||||
|
* **Api Key** - Api key for Speech-to-Text instance created on IBM Cloud.
|
||||||
|
* **Instance Url** - Url for Speech-to-Text instance created on IBM Cloud.
|
||||||
|
|
||||||
|
### Speech to Text Configuration
|
||||||
|
|
||||||
|
Use your favorite configuration UI to edit **Settings / Other Services - IBM Watson Speech-to-Text**:
|
||||||
|
|
||||||
|
* **Background Audio Suppression** - Use the parameter to suppress side conversations or background noise.
|
||||||
|
* **Speech Detector Sensitivity** - Use the parameter to suppress word insertions from music, coughing, and other non-speech events.
|
||||||
|
* **Inactivity Timeout** - The time in seconds after which, if only silence (no speech) is detected in the audio, the connection is closed.
|
||||||
|
* **Opt Out Logging** - By default, all IBM Watson™ services log requests and their results. Logging is done only to improve the services for future users. The logged data is not shared or made public.
|
||||||
|
* **No Results Message** - Message to be told when no results.
|
||||||
|
* **Smart Formatting** - If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable. (Not available for all locales)
|
||||||
|
* **Redaction** - If true, the service redacts, or masks, numeric data from final transcripts. (Not available for all locales)
|
||||||
|
|
||||||
|
### Configuration via a text file
|
||||||
|
|
||||||
|
In case you would like to setup the service via a text file, create a new file in `$OPENHAB_ROOT/conf/services` named `watsonstt.cfg`
|
||||||
|
|
||||||
|
Its contents should look similar to:
|
||||||
|
|
||||||
|
```
|
||||||
|
org.openhab.voice.watsonstt:apiKey=******
|
||||||
|
org.openhab.voice.watsonstt:instanceUrl=https://api.***.speech-to-text.watson.cloud.ibm.com/instances/*****
|
||||||
|
org.openhab.voice.watsonstt:backgroundAudioSuppression=0.5
|
||||||
|
org.openhab.voice.watsonstt:speechDetectorSensitivity=0.5
|
||||||
|
org.openhab.voice.watsonstt:inactivityTimeout=2
|
||||||
|
org.openhab.voice.watsonstt:optOutLogging=false
|
||||||
|
org.openhab.voice.watsonstt:smartFormatting=false
|
||||||
|
org.openhab.voice.watsonstt:redaction=false
|
||||||
|
org.openhab.voice.watsonstt:noResultsMessage="Sorry, I didn't understand you"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Default Speech-to-Text Configuration
|
||||||
|
|
||||||
|
You can setup your preferred default Speech-to-Text in the UI:
|
||||||
|
|
||||||
|
* Go to **Settings**.
|
||||||
|
* Edit **System Services - Voice**.
|
||||||
|
* Set **Watson** as **Speech-to-Text**.
|
||||||
|
|
||||||
|
In case you would like to setup these settings via a text file, you can edit the file `runtime.cfg` in `$OPENHAB_ROOT/conf/services` and set the following entries:
|
||||||
|
|
||||||
|
```
|
||||||
|
org.openhab.voice:defaultSTT=watsonstt
|
||||||
|
```
|
|
@ -0,0 +1,70 @@
|
||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0"
|
||||||
|
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||||
|
|
||||||
|
<modelVersion>4.0.0</modelVersion>
|
||||||
|
|
||||||
|
<parent>
|
||||||
|
<groupId>org.openhab.addons.bundles</groupId>
|
||||||
|
<artifactId>org.openhab.addons.reactor.bundles</artifactId>
|
||||||
|
<version>3.3.0-SNAPSHOT</version>
|
||||||
|
</parent>
|
||||||
|
|
||||||
|
<artifactId>org.openhab.voice.watsonstt</artifactId>
|
||||||
|
|
||||||
|
<name>openHAB Add-ons :: Bundles :: Voice :: IBM Watson Speech to Text</name>
|
||||||
|
<properties>
|
||||||
|
<bnd.importpackage>!android.*,!dalvik.*,!kotlin.*,sun.security.*;resolution:=optional,org.openjsse.*;resolution:=optional,org.conscrypt.*;resolution:=optional,org.bouncycastle.*;resolution:=optional,okhttp3.logging.*;resolution:=optional,com.google.gson.*;resolution:=optional,io.reactivex;resolution:=optional,okio.*;resolution:=optional,org.apache.commons.*;resolution:=optional,*</bnd.importpackage>
|
||||||
|
</properties>
|
||||||
|
<dependencies>
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.ibm.watson</groupId>
|
||||||
|
<artifactId>speech-to-text</artifactId>
|
||||||
|
<version>9.3.1</version>
|
||||||
|
<scope>compile</scope>
|
||||||
|
</dependency>
|
||||||
|
<!-- sdk deps -->
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.ibm.cloud</groupId>
|
||||||
|
<artifactId>sdk-core</artifactId>
|
||||||
|
<version>9.15.0</version>
|
||||||
|
<scope>compile</scope>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.ibm.watson</groupId>
|
||||||
|
<artifactId>common</artifactId>
|
||||||
|
<version>9.3.1</version>
|
||||||
|
<scope>compile</scope>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.squareup.okhttp3</groupId>
|
||||||
|
<artifactId>okhttp</artifactId>
|
||||||
|
<version>4.9.1</version>
|
||||||
|
<scope>compile</scope>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.squareup.okhttp3</groupId>
|
||||||
|
<artifactId>okhttp-urlconnection</artifactId>
|
||||||
|
<version>4.9.1</version>
|
||||||
|
<scope>compile</scope>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>org.jetbrains.kotlin</groupId>
|
||||||
|
<artifactId>kotlin-stdlib</artifactId>
|
||||||
|
<version>1.4.10</version>
|
||||||
|
<scope>compile</scope>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.squareup.okio</groupId>
|
||||||
|
<artifactId>okio</artifactId>
|
||||||
|
<version>2.8.0</version>
|
||||||
|
<scope>compile</scope>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.google.code.gson</groupId>
|
||||||
|
<artifactId>gson</artifactId>
|
||||||
|
<version>2.8.9</version>
|
||||||
|
<scope>compile</scope>
|
||||||
|
</dependency>
|
||||||
|
</dependencies>
|
||||||
|
</project>
|
|
@ -0,0 +1,9 @@
|
||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<features name="org.openhab.voice.watsonstt-${project.version}" xmlns="http://karaf.apache.org/xmlns/features/v1.4.0">
|
||||||
|
<repository>mvn:org.openhab.core.features.karaf/org.openhab.core.features.karaf.openhab-core/${ohc.version}/xml/features</repository>
|
||||||
|
|
||||||
|
<feature name="openhab-voice-watsonstt" description="IBM Watson Speech-to-Text" version="${project.version}">
|
||||||
|
<feature>openhab-runtime-base</feature>
|
||||||
|
<bundle start-level="80">mvn:org.openhab.addons.bundles/org.openhab.voice.watsonstt/${project.version}</bundle>
|
||||||
|
</feature>
|
||||||
|
</features>
|
|
@ -0,0 +1,63 @@
|
||||||
|
/**
|
||||||
|
* Copyright (c) 2010-2022 Contributors to the openHAB project
|
||||||
|
*
|
||||||
|
* See the NOTICE file(s) distributed with this work for additional
|
||||||
|
* information.
|
||||||
|
*
|
||||||
|
* This program and the accompanying materials are made available under the
|
||||||
|
* terms of the Eclipse Public License 2.0 which is available at
|
||||||
|
* http://www.eclipse.org/legal/epl-2.0
|
||||||
|
*
|
||||||
|
* SPDX-License-Identifier: EPL-2.0
|
||||||
|
*/
|
||||||
|
package org.openhab.voice.watsonstt.internal;
|
||||||
|
|
||||||
|
import org.eclipse.jdt.annotation.NonNullByDefault;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The {@link WatsonSTTConfiguration} class contains fields mapping thing configuration parameters.
|
||||||
|
*
|
||||||
|
* @author Miguel Álvarez - Initial contribution
|
||||||
|
*/
|
||||||
|
@NonNullByDefault
|
||||||
|
public class WatsonSTTConfiguration {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Api key for Speech-to-Text instance created on IBM Cloud.
|
||||||
|
*/
|
||||||
|
public String apiKey = "";
|
||||||
|
/**
|
||||||
|
* Url for Speech-to-Text instance created on IBM Cloud.
|
||||||
|
*/
|
||||||
|
public String instanceUrl = "";
|
||||||
|
/**
|
||||||
|
* Use the parameter to suppress side conversations or background noise.
|
||||||
|
*/
|
||||||
|
public float backgroundAudioSuppression = 0f;
|
||||||
|
/**
|
||||||
|
* Use the parameter to suppress word insertions from music, coughing, and other non-speech events.
|
||||||
|
*/
|
||||||
|
public float speechDetectorSensitivity = 0.5f;
|
||||||
|
/**
|
||||||
|
* If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and
|
||||||
|
* internet addresses into more readable.
|
||||||
|
*/
|
||||||
|
public boolean smartFormatting = false;
|
||||||
|
/**
|
||||||
|
* If true, the service redacts, or masks, numeric data from final transcripts.
|
||||||
|
*/
|
||||||
|
public boolean redaction = false;
|
||||||
|
/**
|
||||||
|
* The time in seconds after which, if only silence (no speech) is detected in the audio, the connection is closed.
|
||||||
|
*/
|
||||||
|
public int inactivityTimeout = 3;
|
||||||
|
/**
|
||||||
|
* Message to be told when no results
|
||||||
|
*/
|
||||||
|
public String noResultsMessage = "No results";
|
||||||
|
/**
|
||||||
|
* By default, all IBM Watson™ services log requests and their results. Logging is done only to improve the services
|
||||||
|
* for future users. The logged data is not shared or made public.
|
||||||
|
*/
|
||||||
|
public boolean optOutLogging = true;
|
||||||
|
}
|
|
@ -0,0 +1,43 @@
|
||||||
|
/**
|
||||||
|
* Copyright (c) 2010-2022 Contributors to the openHAB project
|
||||||
|
*
|
||||||
|
* See the NOTICE file(s) distributed with this work for additional
|
||||||
|
* information.
|
||||||
|
*
|
||||||
|
* This program and the accompanying materials are made available under the
|
||||||
|
* terms of the Eclipse Public License 2.0 which is available at
|
||||||
|
* http://www.eclipse.org/legal/epl-2.0
|
||||||
|
*
|
||||||
|
* SPDX-License-Identifier: EPL-2.0
|
||||||
|
*/
|
||||||
|
package org.openhab.voice.watsonstt.internal;
|
||||||
|
|
||||||
|
import org.eclipse.jdt.annotation.NonNullByDefault;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The {@link WatsonSTTConstants} class defines common constants, which are
|
||||||
|
* used across the whole binding.
|
||||||
|
*
|
||||||
|
* @author Miguel Álvarez - Initial contribution
|
||||||
|
*/
|
||||||
|
@NonNullByDefault
|
||||||
|
public class WatsonSTTConstants {
|
||||||
|
/**
|
||||||
|
* Service name
|
||||||
|
*/
|
||||||
|
public static final String SERVICE_NAME = "IBM Watson";
|
||||||
|
/**
|
||||||
|
* Service id
|
||||||
|
*/
|
||||||
|
public static final String SERVICE_ID = "watsonstt";
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Service category
|
||||||
|
*/
|
||||||
|
public static final String SERVICE_CATEGORY = "voice";
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Service pid
|
||||||
|
*/
|
||||||
|
public static final String SERVICE_PID = "org.openhab." + SERVICE_CATEGORY + "." + SERVICE_ID;
|
||||||
|
}
|
|
@ -0,0 +1,310 @@
|
||||||
|
/**
|
||||||
|
* Copyright (c) 2010-2022 Contributors to the openHAB project
|
||||||
|
*
|
||||||
|
* See the NOTICE file(s) distributed with this work for additional
|
||||||
|
* information.
|
||||||
|
*
|
||||||
|
* This program and the accompanying materials are made available under the
|
||||||
|
* terms of the Eclipse Public License 2.0 which is available at
|
||||||
|
* http://www.eclipse.org/legal/epl-2.0
|
||||||
|
*
|
||||||
|
* SPDX-License-Identifier: EPL-2.0
|
||||||
|
*/
|
||||||
|
package org.openhab.voice.watsonstt.internal;
|
||||||
|
|
||||||
|
import static org.openhab.voice.watsonstt.internal.WatsonSTTConstants.*;
|
||||||
|
|
||||||
|
import java.util.List;
|
||||||
|
import java.util.Locale;
|
||||||
|
import java.util.Map;
|
||||||
|
import java.util.Set;
|
||||||
|
import java.util.concurrent.ScheduledExecutorService;
|
||||||
|
import java.util.concurrent.atomic.AtomicReference;
|
||||||
|
import java.util.stream.Collectors;
|
||||||
|
|
||||||
|
import javax.net.ssl.SSLPeerUnverifiedException;
|
||||||
|
|
||||||
|
import org.eclipse.jdt.annotation.NonNullByDefault;
|
||||||
|
import org.eclipse.jdt.annotation.Nullable;
|
||||||
|
import org.openhab.core.audio.AudioFormat;
|
||||||
|
import org.openhab.core.audio.AudioStream;
|
||||||
|
import org.openhab.core.common.ThreadPoolManager;
|
||||||
|
import org.openhab.core.config.core.ConfigurableService;
|
||||||
|
import org.openhab.core.config.core.Configuration;
|
||||||
|
import org.openhab.core.voice.RecognitionStartEvent;
|
||||||
|
import org.openhab.core.voice.RecognitionStopEvent;
|
||||||
|
import org.openhab.core.voice.STTException;
|
||||||
|
import org.openhab.core.voice.STTListener;
|
||||||
|
import org.openhab.core.voice.STTService;
|
||||||
|
import org.openhab.core.voice.STTServiceHandle;
|
||||||
|
import org.openhab.core.voice.SpeechRecognitionErrorEvent;
|
||||||
|
import org.openhab.core.voice.SpeechRecognitionEvent;
|
||||||
|
import org.osgi.framework.Constants;
|
||||||
|
import org.osgi.service.component.annotations.Activate;
|
||||||
|
import org.osgi.service.component.annotations.Component;
|
||||||
|
import org.osgi.service.component.annotations.Modified;
|
||||||
|
import org.slf4j.Logger;
|
||||||
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
|
import com.ibm.cloud.sdk.core.http.HttpMediaType;
|
||||||
|
import com.ibm.cloud.sdk.core.security.IamAuthenticator;
|
||||||
|
import com.ibm.watson.speech_to_text.v1.SpeechToText;
|
||||||
|
import com.ibm.watson.speech_to_text.v1.model.RecognizeWithWebsocketsOptions;
|
||||||
|
import com.ibm.watson.speech_to_text.v1.model.SpeechRecognitionAlternative;
|
||||||
|
import com.ibm.watson.speech_to_text.v1.model.SpeechRecognitionResult;
|
||||||
|
import com.ibm.watson.speech_to_text.v1.model.SpeechRecognitionResults;
|
||||||
|
import com.ibm.watson.speech_to_text.v1.websocket.RecognizeCallback;
|
||||||
|
|
||||||
|
import okhttp3.WebSocket;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The {@link WatsonSTTService} allows to use Watson as Speech-to-Text engine
|
||||||
|
*
|
||||||
|
* @author Miguel Álvarez - Initial contribution
|
||||||
|
*/
|
||||||
|
@NonNullByDefault
|
||||||
|
@Component(configurationPid = SERVICE_PID, property = Constants.SERVICE_PID + "=" + SERVICE_PID)
|
||||||
|
@ConfigurableService(category = SERVICE_CATEGORY, label = SERVICE_NAME
|
||||||
|
+ " Speech-to-Text", description_uri = SERVICE_CATEGORY + ":" + SERVICE_ID)
|
||||||
|
public class WatsonSTTService implements STTService {
|
||||||
|
private final Logger logger = LoggerFactory.getLogger(WatsonSTTService.class);
|
||||||
|
private final ScheduledExecutorService executor = ThreadPoolManager.getScheduledPool("OH-voice-watsonstt");
|
||||||
|
private final List<String> models = List.of("ar-AR_BroadbandModel", "de-DE_BroadbandModel", "en-AU_BroadbandModel",
|
||||||
|
"en-GB_BroadbandModel", "en-US_BroadbandModel", "es-AR_BroadbandModel", "es-CL_BroadbandModel",
|
||||||
|
"es-CO_BroadbandModel", "es-ES_BroadbandModel", "es-MX_BroadbandModel", "es-PE_BroadbandModel",
|
||||||
|
"fr-CA_BroadbandModel", "fr-FR_BroadbandModel", "it-IT_BroadbandModel", "ja-JP_BroadbandModel",
|
||||||
|
"ko-KR_BroadbandModel", "nl-NL_BroadbandModel", "pt-BR_BroadbandModel", "zh-CN_BroadbandModel");
|
||||||
|
private final Set<Locale> supportedLocales = models.stream().map(name -> name.split("_")[0])
|
||||||
|
.map(Locale::forLanguageTag).collect(Collectors.toSet());
|
||||||
|
private WatsonSTTConfiguration config = new WatsonSTTConfiguration();
|
||||||
|
|
||||||
|
@Activate
|
||||||
|
protected void activate(Map<String, Object> config) {
|
||||||
|
this.config = new Configuration(config).as(WatsonSTTConfiguration.class);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Modified
|
||||||
|
protected void modified(Map<String, Object> config) {
|
||||||
|
this.config = new Configuration(config).as(WatsonSTTConfiguration.class);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getId() {
|
||||||
|
return SERVICE_ID;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getLabel(@Nullable Locale locale) {
|
||||||
|
return SERVICE_NAME;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public Set<Locale> getSupportedLocales() {
|
||||||
|
return supportedLocales;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public Set<AudioFormat> getSupportedFormats() {
|
||||||
|
return Set.of(AudioFormat.WAV, AudioFormat.OGG, new AudioFormat("OGG", "OPUS", null, null, null, null),
|
||||||
|
AudioFormat.MP3);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public STTServiceHandle recognize(STTListener sttListener, AudioStream audioStream, Locale locale, Set<String> set)
|
||||||
|
throws STTException {
|
||||||
|
if (config.apiKey.isBlank() || config.instanceUrl.isBlank()) {
|
||||||
|
throw new STTException("service is not correctly configured");
|
||||||
|
}
|
||||||
|
String contentType = getContentType(audioStream);
|
||||||
|
if (contentType == null) {
|
||||||
|
throw new STTException("Unsupported format, unable to resolve audio content type");
|
||||||
|
}
|
||||||
|
logger.debug("Content-Type: {}", contentType);
|
||||||
|
var speechToText = new SpeechToText(new IamAuthenticator.Builder().apikey(config.apiKey).build());
|
||||||
|
speechToText.setServiceUrl(config.instanceUrl);
|
||||||
|
if (config.optOutLogging) {
|
||||||
|
speechToText.setDefaultHeaders(Map.of("X-Watson-Learning-Opt-Out", "1"));
|
||||||
|
}
|
||||||
|
RecognizeWithWebsocketsOptions wsOptions = new RecognizeWithWebsocketsOptions.Builder().audio(audioStream)
|
||||||
|
.contentType(contentType).redaction(config.redaction).smartFormatting(config.smartFormatting)
|
||||||
|
.model(locale.toLanguageTag() + "_BroadbandModel").interimResults(true)
|
||||||
|
.backgroundAudioSuppression(config.backgroundAudioSuppression)
|
||||||
|
.speechDetectorSensitivity(config.speechDetectorSensitivity).inactivityTimeout(config.inactivityTimeout)
|
||||||
|
.build();
|
||||||
|
final AtomicReference<@Nullable WebSocket> socketRef = new AtomicReference<>();
|
||||||
|
var task = executor.submit(() -> {
|
||||||
|
int retries = 2;
|
||||||
|
while (retries > 0) {
|
||||||
|
try {
|
||||||
|
socketRef.set(speechToText.recognizeUsingWebSocket(wsOptions,
|
||||||
|
new TranscriptionListener(sttListener, config)));
|
||||||
|
break;
|
||||||
|
} catch (RuntimeException e) {
|
||||||
|
var cause = e.getCause();
|
||||||
|
if (cause instanceof SSLPeerUnverifiedException) {
|
||||||
|
logger.debug("Retrying on error: {}", cause.getMessage());
|
||||||
|
retries--;
|
||||||
|
} else {
|
||||||
|
var errorMessage = e.getMessage();
|
||||||
|
logger.warn("Aborting on error: {}", errorMessage);
|
||||||
|
sttListener.sttEventReceived(
|
||||||
|
new SpeechRecognitionErrorEvent(errorMessage != null ? errorMessage : "Unknown error"));
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
});
|
||||||
|
return new STTServiceHandle() {
|
||||||
|
@Override
|
||||||
|
public void abort() {
|
||||||
|
var socket = socketRef.get();
|
||||||
|
if (socket != null) {
|
||||||
|
socket.close(1000, null);
|
||||||
|
socket.cancel();
|
||||||
|
try {
|
||||||
|
Thread.sleep(100);
|
||||||
|
} catch (InterruptedException ignored) {
|
||||||
|
}
|
||||||
|
}
|
||||||
|
task.cancel(true);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
private @Nullable String getContentType(AudioStream audioStream) throws STTException {
|
||||||
|
AudioFormat format = audioStream.getFormat();
|
||||||
|
String container = format.getContainer();
|
||||||
|
String codec = format.getCodec();
|
||||||
|
if (container == null || codec == null) {
|
||||||
|
throw new STTException("Missing audio stream info");
|
||||||
|
}
|
||||||
|
Long frequency = format.getFrequency();
|
||||||
|
Integer bitDepth = format.getBitDepth();
|
||||||
|
switch (container) {
|
||||||
|
case AudioFormat.CONTAINER_WAVE:
|
||||||
|
if (AudioFormat.CODEC_PCM_SIGNED.equals(codec)) {
|
||||||
|
if (bitDepth == null || bitDepth != 16) {
|
||||||
|
return "audio/wav";
|
||||||
|
}
|
||||||
|
// rate is a required parameter for this type
|
||||||
|
if (frequency == null) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
StringBuilder contentTypeL16 = new StringBuilder(HttpMediaType.AUDIO_PCM).append(";rate=")
|
||||||
|
.append(frequency);
|
||||||
|
// // those are optional
|
||||||
|
Integer channels = format.getChannels();
|
||||||
|
if (channels != null) {
|
||||||
|
contentTypeL16.append(";channels=").append(channels);
|
||||||
|
}
|
||||||
|
Boolean bigEndian = format.isBigEndian();
|
||||||
|
if (bigEndian != null) {
|
||||||
|
contentTypeL16.append(";")
|
||||||
|
.append(bigEndian ? "endianness=big-endian" : "endianness=little-endian");
|
||||||
|
}
|
||||||
|
return contentTypeL16.toString();
|
||||||
|
}
|
||||||
|
case AudioFormat.CONTAINER_OGG:
|
||||||
|
switch (codec) {
|
||||||
|
case AudioFormat.CODEC_VORBIS:
|
||||||
|
return "audio/ogg;codecs=vorbis";
|
||||||
|
case "OPUS":
|
||||||
|
return "audio/ogg;codecs=opus";
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case AudioFormat.CONTAINER_NONE:
|
||||||
|
if (AudioFormat.CODEC_MP3.equals(codec)) {
|
||||||
|
return "audio/mp3";
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
private static class TranscriptionListener implements RecognizeCallback {
|
||||||
|
private final Logger logger = LoggerFactory.getLogger(TranscriptionListener.class);
|
||||||
|
private final StringBuilder transcriptBuilder = new StringBuilder();
|
||||||
|
private final STTListener sttListener;
|
||||||
|
private final WatsonSTTConfiguration config;
|
||||||
|
private float confidenceSum = 0f;
|
||||||
|
private int responseCount = 0;
|
||||||
|
private boolean disconnected = false;
|
||||||
|
|
||||||
|
public TranscriptionListener(STTListener sttListener, WatsonSTTConfiguration config) {
|
||||||
|
this.sttListener = sttListener;
|
||||||
|
this.config = config;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void onTranscription(@Nullable SpeechRecognitionResults speechRecognitionResults) {
|
||||||
|
logger.debug("onTranscription");
|
||||||
|
if (speechRecognitionResults == null) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
speechRecognitionResults.getResults().stream().filter(SpeechRecognitionResult::isXFinal).forEach(result -> {
|
||||||
|
SpeechRecognitionAlternative alternative = result.getAlternatives().stream().findFirst().orElse(null);
|
||||||
|
if (alternative == null) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
logger.debug("onTranscription Final");
|
||||||
|
Double confidence = alternative.getConfidence();
|
||||||
|
transcriptBuilder.append(alternative.getTranscript());
|
||||||
|
confidenceSum += confidence != null ? confidence.floatValue() : 0f;
|
||||||
|
responseCount++;
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void onConnected() {
|
||||||
|
logger.debug("onConnected");
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void onError(@Nullable Exception e) {
|
||||||
|
var errorMessage = e != null ? e.getMessage() : null;
|
||||||
|
if (errorMessage != null && disconnected && errorMessage.contains("Socket closed")) {
|
||||||
|
logger.debug("Error ignored: {}", errorMessage);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
logger.warn("TranscriptionError: {}", errorMessage);
|
||||||
|
sttListener.sttEventReceived(
|
||||||
|
new SpeechRecognitionErrorEvent(errorMessage != null ? errorMessage : "Unknown error"));
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void onDisconnected() {
|
||||||
|
logger.debug("onDisconnected");
|
||||||
|
disconnected = true;
|
||||||
|
sttListener.sttEventReceived(new RecognitionStopEvent());
|
||||||
|
float averageConfidence = confidenceSum / (float) responseCount;
|
||||||
|
String transcript = transcriptBuilder.toString();
|
||||||
|
if (!transcript.isBlank()) {
|
||||||
|
sttListener.sttEventReceived(new SpeechRecognitionEvent(transcript, averageConfidence));
|
||||||
|
} else {
|
||||||
|
if (!config.noResultsMessage.isBlank()) {
|
||||||
|
sttListener.sttEventReceived(new SpeechRecognitionErrorEvent(config.noResultsMessage));
|
||||||
|
} else {
|
||||||
|
sttListener.sttEventReceived(new SpeechRecognitionErrorEvent("No results"));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void onInactivityTimeout(@Nullable RuntimeException e) {
|
||||||
|
if (e != null) {
|
||||||
|
logger.debug("InactivityTimeout: {}", e.getMessage());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void onListening() {
|
||||||
|
logger.debug("onListening");
|
||||||
|
sttListener.sttEventReceived(new RecognitionStartEvent());
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void onTranscriptionComplete() {
|
||||||
|
logger.debug("onTranscriptionComplete");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,68 @@
|
||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<config-description:config-descriptions
|
||||||
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||||
|
xmlns:config-description="https://openhab.org/schemas/config-description/v1.0.0"
|
||||||
|
xsi:schemaLocation="https://openhab.org/schemas/config-description/v1.0.0
|
||||||
|
https://openhab.org/schemas/config-description-1.0.0.xsd">
|
||||||
|
|
||||||
|
<config-description uri="voice:watsonstt">
|
||||||
|
<parameter-group name="authentication">
|
||||||
|
<label>Authentication</label>
|
||||||
|
<description>Information for connection to your Watson Speech-to-Text instance.</description>
|
||||||
|
</parameter-group>
|
||||||
|
<parameter-group name="stt">
|
||||||
|
<label>STT Configuration</label>
|
||||||
|
<description>Parameters for Watson Speech-to-Text API.</description>
|
||||||
|
</parameter-group>
|
||||||
|
<parameter name="apiKey" type="text" required="true" groupName="authentication">
|
||||||
|
<label>Api Key</label>
|
||||||
|
<description>Api key for Speech-to-Text instance created on IBM Cloud.</description>
|
||||||
|
</parameter>
|
||||||
|
<parameter name="instanceUrl" type="text" required="true" groupName="authentication">
|
||||||
|
<label>Instance Url</label>
|
||||||
|
<description>Url for Speech-to-Text instance created on IBM Cloud.</description>
|
||||||
|
</parameter>
|
||||||
|
<parameter name="backgroundAudioSuppression" type="decimal" min="0" max="1" step="0.1" groupName="stt">
|
||||||
|
<label>Background Audio Suppression</label>
|
||||||
|
<description>Use the parameter to suppress side conversations or background noise.</description>
|
||||||
|
<default>0</default>
|
||||||
|
</parameter>
|
||||||
|
<parameter name="speechDetectorSensitivity" type="decimal" min="0" max="1" step="0.1" groupName="stt">
|
||||||
|
<label>Speech Detector Sensitivity</label>
|
||||||
|
<description>Use the parameter to suppress word insertions from music, coughing, and other non-speech events.</description>
|
||||||
|
<default>0.5</default>
|
||||||
|
</parameter>
|
||||||
|
<parameter name="inactivityTimeout" type="integer" unit="s" groupName="stt">
|
||||||
|
<label>Inactivity Timeout</label>
|
||||||
|
<description>The time in seconds after which, if only silence (no speech) is detected in the audio, the connection is
|
||||||
|
closed.</description>
|
||||||
|
<default>3</default>
|
||||||
|
</parameter>
|
||||||
|
<parameter name="noResultsMessage" type="text" groupName="stt">
|
||||||
|
<label>No Results Message</label>
|
||||||
|
<description>Message to be told when no transcription is done.</description>
|
||||||
|
<default>No results</default>
|
||||||
|
</parameter>
|
||||||
|
<parameter name="optOutLogging" type="boolean" groupName="stt">
|
||||||
|
<label>Opt Out Logging</label>
|
||||||
|
<description>By default, all IBM Watson™ services log requests and their results. Logging is done only to improve the
|
||||||
|
services for future users. The logged data is not shared or made public.</description>
|
||||||
|
<default>true</default>
|
||||||
|
</parameter>
|
||||||
|
<parameter name="smartFormatting" type="boolean" groupName="stt">
|
||||||
|
<label>Smart Formatting</label>
|
||||||
|
<description>If true, the service converts dates, times, series of digits and numbers, phone numbers, currency
|
||||||
|
values, and internet addresses into more readable. (Not available for all locales)</description>
|
||||||
|
<default>false</default>
|
||||||
|
<advanced>true</advanced>
|
||||||
|
</parameter>
|
||||||
|
<parameter name="redaction" type="boolean" groupName="stt">
|
||||||
|
<label>Redaction</label>
|
||||||
|
<description>If true, the service redacts, or masks, numeric data from final transcripts. (Not available for all
|
||||||
|
locales)</description>
|
||||||
|
<default>false</default>
|
||||||
|
<advanced>true</advanced>
|
||||||
|
</parameter>
|
||||||
|
</config-description>
|
||||||
|
|
||||||
|
</config-description:config-descriptions>
|
|
@ -0,0 +1,26 @@
|
||||||
|
voice.config.watsonstt.apiKey.label = Api Key
|
||||||
|
voice.config.watsonstt.apiKey.description = Api key for Speech-to-Text instance created on IBM Cloud.
|
||||||
|
voice.config.watsonstt.backgroundAudioSuppression.label = Background Audio Suppression
|
||||||
|
voice.config.watsonstt.backgroundAudioSuppression.description = Use the parameter to suppress side conversations or background noise.
|
||||||
|
voice.config.watsonstt.group.authentication.label = Authentication
|
||||||
|
voice.config.watsonstt.group.authentication.description = Information for connection to your Watson Speech-to-Text instance.
|
||||||
|
voice.config.watsonstt.group.stt.label = STT Configuration
|
||||||
|
voice.config.watsonstt.group.stt.description = Parameters for Watson Speech-to-Text API.
|
||||||
|
voice.config.watsonstt.inactivityTimeout.label = Inactivity Timeout
|
||||||
|
voice.config.watsonstt.inactivityTimeout.description = The time in seconds after which, if only silence (no speech) is detected in the audio, the connection is closed.
|
||||||
|
voice.config.watsonstt.instanceUrl.label = Instance Url
|
||||||
|
voice.config.watsonstt.instanceUrl.description = Url for Speech-to-Text instance created on IBM Cloud.
|
||||||
|
voice.config.watsonstt.noResultsMessage.label = No Results Message
|
||||||
|
voice.config.watsonstt.noResultsMessage.description = Message to be told when no transcription is done.
|
||||||
|
voice.config.watsonstt.optOutLogging.label = Opt Out Logging
|
||||||
|
voice.config.watsonstt.optOutLogging.description = By default, all IBM Watson™ services log requests and their results. Logging is done only to improve the services for future users. The logged data is not shared or made public.
|
||||||
|
voice.config.watsonstt.redaction.label = Redaction
|
||||||
|
voice.config.watsonstt.redaction.description = If true, the service redacts, or masks, numeric data from final transcripts. (Not available for all locales)
|
||||||
|
voice.config.watsonstt.smartFormatting.label = Smart Formatting
|
||||||
|
voice.config.watsonstt.smartFormatting.description = If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable. (Not available for all locales)
|
||||||
|
voice.config.watsonstt.speechDetectorSensitivity.label = Speech Detector Sensitivity
|
||||||
|
voice.config.watsonstt.speechDetectorSensitivity.description = Use the parameter to suppress word insertions from music, coughing, and other non-speech events.
|
||||||
|
|
||||||
|
# service
|
||||||
|
|
||||||
|
service.voice.watsonstt.label = IBM Watson Speech-to-Text
|
|
@ -401,6 +401,7 @@
|
||||||
<module>org.openhab.voice.pollytts</module>
|
<module>org.openhab.voice.pollytts</module>
|
||||||
<module>org.openhab.voice.porcupineks</module>
|
<module>org.openhab.voice.porcupineks</module>
|
||||||
<module>org.openhab.voice.voicerss</module>
|
<module>org.openhab.voice.voicerss</module>
|
||||||
|
<module>org.openhab.voice.watsonstt</module>
|
||||||
</modules>
|
</modules>
|
||||||
|
|
||||||
<properties>
|
<properties>
|
||||||
|
|
Loading…
Reference in New Issue