[rustpotterks] Upgrade to version 2 (#14615)
* [rustpotter] Use version 2 Signed-off-by: Miguel Álvarez <miguelwork92@gmail.com>
This commit is contained in:
parent
1786bb0eec
commit
aa3229a97f
@ -5,6 +5,11 @@ This voice service allows you to use the open source library Rustpotter as your
|
|||||||
|
|
||||||
Rustpotter provides personal on-device wake word detection. You need to generate a model for your keyword using audio samples.
|
Rustpotter provides personal on-device wake word detection. You need to generate a model for your keyword using audio samples.
|
||||||
|
|
||||||
|
You can test library in your browser using these web pages:
|
||||||
|
|
||||||
|
- [The spot demo](https://givimad.github.io/rustpotter-worklet-demo/), which include some example wakewords (but it's recommended to use your own).
|
||||||
|
- [The model creation demo](https://givimad.github.io/rustpotter-create-model-demo/), it allows you to record compatible wav files and generate a wakeword file that you can test on the previous page.
|
||||||
|
|
||||||
Important: No voice data listened by this service will be uploaded to the Cloud.
|
Important: No voice data listened by this service will be uploaded to the Cloud.
|
||||||
The voice data is processed offline, locally on your openHAB server by Rustpotter.
|
The voice data is processed offline, locally on your openHAB server by Rustpotter.
|
||||||
|
|
||||||
@ -12,17 +17,19 @@ The voice data is processed offline, locally on your openHAB server by Rustpotte
|
|||||||
|
|
||||||
After installing, you will be able to access the service options through the openHAB configuration page in UI (**Settings / Other Services - Rustpotter Keyword Spotter**) to edit them:
|
After installing, you will be able to access the service options through the openHAB configuration page in UI (**Settings / Other Services - Rustpotter Keyword Spotter**) to edit them:
|
||||||
|
|
||||||
* **Threshold** - Configures the detector threshold, is the min score (in range 0. to 1.) that some wake word template should obtain to trigger a detection. Defaults to 0.5.
|
- **Threshold** - Configures the detector threshold, is the min score (in range 0. to 1.) that some wake word template should obtain to trigger a detection. Defaults to 0.5.
|
||||||
* **Averaged Threshold** - Configures the detector averaged threshold, is the min score (in range 0. to 1.) that the audio should obtain against a combination of the wake word templates, the detection will be aborted if this is not the case. This way it can prevent to run the comparison of the current frame against each of the wake word templates which saves cpu. If set to 0 this functionality is disabled.
|
- **Averaged Threshold** - Configures the detector averaged threshold, is the min score (in range 0. to 1.) that the audio should obtain against a combination of the wake word templates, the detection will be aborted if this is not the case. This way it can prevent to run the comparison of the current frame against each of the wake word templates which saves cpu. If set to 0 this functionality is disabled.
|
||||||
* **Eager mode** - Enables eager mode. End detection as soon as a result is over the score, instead of waiting to see if the next frame has a higher score.
|
- **Score Mode** - Indicates how to calculate the final score.
|
||||||
* **Noise Detection Mode** - Use build-in noise detection to reduce computation on absence of noise. Configures the difficulty to consider a frame as noise (the required noise level).
|
- **Min Scores** - Minimum number of positive scores to consider a partial detection as a detection.
|
||||||
* **Noise Detection Sensitivity** - Noise/silence ratio in the last second to consider noise is detected. Defaults to 0.5.
|
- **Comparator Ref** - Configures the reference for the comparator used to match the samples.
|
||||||
* **VAD Mode** - Use a voice activity detector to reduce computation in the absence of vocal sound.
|
- **Comparator Band Size** - Configures the band-size for the comparator used to match the samples.
|
||||||
* **VAD Sensitivity** - Voice/silence ratio in the last second to consider voice is detected.
|
- **Gain Normalizer** - Enables an audio filter that intent to approximate the volume of the stream to a reference level.
|
||||||
* **VAD Delay** - Seconds to disable the vad detector after voice is detected. Defaults to 3.
|
- **Min Gain** - Min gain applied by the gain normalizer filter.
|
||||||
* **Comparator Ref** - Configures the reference for the comparator used to match the samples.
|
- **Max Gain** - Max gain applied by the gain normalizer filter.
|
||||||
* **Comparator Band Size** - Configures the band-size for the comparator used to match the samples.
|
- **Gain Ref** - The RMS reference used by the gain-normalizer to calculate the gain applied. If unset an estimation of the wakeword level is used.
|
||||||
|
- **Band Pass** - Enables an audio filter that attenuates frequencies outside the low cutoff and high cutoff range.
|
||||||
|
- **Low Cutoff** - Low cutoff for the band-pass filter.
|
||||||
|
- **High Cutoff** - High cutoff for the band-pass filter.
|
||||||
|
|
||||||
In case you would like to setup the service via a text file, create a new file in `$OPENHAB_ROOT/conf/services` named `rustpotterks.cfg`
|
In case you would like to setup the service via a text file, create a new file in `$OPENHAB_ROOT/conf/services` named `rustpotterks.cfg`
|
||||||
|
|
||||||
@ -31,21 +38,24 @@ Its contents should look similar to:
|
|||||||
```
|
```
|
||||||
org.openhab.voice.rustpotterks:threshold=0.5
|
org.openhab.voice.rustpotterks:threshold=0.5
|
||||||
org.openhab.voice.rustpotterks:averagedthreshold=0.2
|
org.openhab.voice.rustpotterks:averagedthreshold=0.2
|
||||||
|
org.openhab.voice.rustpotterks:scoreMode=max
|
||||||
|
org.openhab.voice.rustpotterks:minScores=5
|
||||||
org.openhab.voice.rustpotterks:comparatorRef=0.22
|
org.openhab.voice.rustpotterks:comparatorRef=0.22
|
||||||
org.openhab.voice.rustpotterks:comparatorBandSize=6
|
org.openhab.voice.rustpotterks:comparatorBandSize=5
|
||||||
org.openhab.voice.rustpotterks:eagerMode=true
|
org.openhab.voice.rustpotterks:gainNormalizer=true
|
||||||
org.openhab.voice.rustpotterks:noiseDetectionMode=hard
|
org.openhab.voice.rustpotterks:minGain=0.5
|
||||||
org.openhab.voice.rustpotterks:noiseDetectionSensitivity=0.5
|
org.openhab.voice.rustpotterks:maxGain=1
|
||||||
org.openhab.voice.rustpotterks:vadMode=aggressive
|
org.openhab.voice.rustpotterks:gainRef=
|
||||||
org.openhab.voice.rustpotterks:vadSensitivity=0.5
|
org.openhab.voice.rustpotterks:bandPass=true
|
||||||
org.openhab.voice.rustpotterks:vadDelay=3
|
org.openhab.voice.rustpotterks:lowCutoff=80
|
||||||
|
org.openhab.voice.rustpotterks:highCutoff=400
|
||||||
```
|
```
|
||||||
|
|
||||||
## Magic Word Configuration
|
## Magic Word Configuration
|
||||||
|
|
||||||
The magic word to spot is gathered from your 'Voice' configuration.
|
The magic word to spot is gathered from your 'Voice' configuration.
|
||||||
|
|
||||||
You can generate your own wake word model by using the [Rustpotter CLI](https://github.com/GiviMAD/rustpotter-cli).
|
You can generate your own wakeword files using the [Rustpotter CLI](https://github.com/GiviMAD/rustpotter-cli).
|
||||||
|
|
||||||
You can also download the models used as examples on the [rustpotter web demo](https://givimad.github.io/rustpotter-worklet-demo/) from [this folder](https://github.com/GiviMAD/rustpotter-worklet-demo/tree/main/static).
|
You can also download the models used as examples on the [rustpotter web demo](https://givimad.github.io/rustpotter-worklet-demo/) from [this folder](https://github.com/GiviMAD/rustpotter-worklet-demo/tree/main/static).
|
||||||
|
|
||||||
@ -59,11 +69,11 @@ The service will only work if it's able to find the correct rpw for your magic w
|
|||||||
|
|
||||||
You can setup your preferred default keyword spotter and default magic word in the UI:
|
You can setup your preferred default keyword spotter and default magic word in the UI:
|
||||||
|
|
||||||
* Go to **Settings**.
|
- Go to **Settings**.
|
||||||
* Edit **System Services - Voice**.
|
- Edit **System Services - Voice**.
|
||||||
* Set **Rustpotter Keyword Spotter** as **Default Keyword Spotter**.
|
- Set **Rustpotter Keyword Spotter** as **Default Keyword Spotter**.
|
||||||
* Choose your preferred **Magic Word** for your setup.
|
- Choose your preferred **Magic Word** for your setup.
|
||||||
* Choose optionally your **Listening Switch** item that will be switch ON during the period when the dialog processor has spotted the keyword and is listening for commands.
|
- Choose optionally your **Listening Switch** item that will be switch ON during the period when the dialog processor has spotted the keyword and is listening for commands.
|
||||||
|
|
||||||
In case you would like to setup these settings via a text file, you can edit the file `runtime.cfg` in `$OPENHAB_ROOT/conf/services` and set the following entries:
|
In case you would like to setup these settings via a text file, you can edit the file `runtime.cfg` in `$OPENHAB_ROOT/conf/services` and set the following entries:
|
||||||
|
|
||||||
|
|||||||
@ -18,7 +18,7 @@
|
|||||||
<dependency>
|
<dependency>
|
||||||
<groupId>io.github.givimad</groupId>
|
<groupId>io.github.givimad</groupId>
|
||||||
<artifactId>rustpotter-java</artifactId>
|
<artifactId>rustpotter-java</artifactId>
|
||||||
<version>1.0.0</version>
|
<version>2.0.0</version>
|
||||||
</dependency>
|
</dependency>
|
||||||
</dependencies>
|
</dependencies>
|
||||||
</project>
|
</project>
|
||||||
|
|||||||
@ -13,6 +13,7 @@
|
|||||||
package org.openhab.voice.rustpotterks.internal;
|
package org.openhab.voice.rustpotterks.internal;
|
||||||
|
|
||||||
import org.eclipse.jdt.annotation.NonNullByDefault;
|
import org.eclipse.jdt.annotation.NonNullByDefault;
|
||||||
|
import org.eclipse.jdt.annotation.Nullable;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* The {@link RustpotterKSConfiguration} class contains fields mapping thing configuration parameters.
|
* The {@link RustpotterKSConfiguration} class contains fields mapping thing configuration parameters.
|
||||||
@ -36,31 +37,13 @@ public class RustpotterKSConfiguration {
|
|||||||
*/
|
*/
|
||||||
public float averagedThreshold = 0.2f;
|
public float averagedThreshold = 0.2f;
|
||||||
/**
|
/**
|
||||||
* Terminate the detection as son as one result is above the score,
|
* Indicates how to calculate the final score.
|
||||||
* instead of wait to see if the next frame has a higher score.
|
|
||||||
*/
|
*/
|
||||||
public boolean eagerMode = true;
|
public String scoreMode = "max";
|
||||||
/**
|
/**
|
||||||
* Use build-in noise detection to reduce computation on absence of noise.
|
* Minimum number of positive scores to consider a partial detection as a detection.
|
||||||
* Configures the difficulty to consider a frame as noise (the required noise level).
|
|
||||||
*/
|
*/
|
||||||
public String noiseDetectionMode = "disabled";
|
public int minScores = 5;
|
||||||
/**
|
|
||||||
* Noise/silence ratio in the last second to consider noise is detected. Defaults to 0.5.
|
|
||||||
*/
|
|
||||||
public float noiseSensitivity = 0.5f;
|
|
||||||
/**
|
|
||||||
* Seconds to disable the vad detector after voice is detected. Defaults to 3.
|
|
||||||
*/
|
|
||||||
public int vadDelay = 3;
|
|
||||||
/**
|
|
||||||
* Voice/silence ratio in the last second to consider voice is detected.
|
|
||||||
*/
|
|
||||||
public float vadSensitivity = 0.5f;
|
|
||||||
/**
|
|
||||||
* Use a voice activity detector to reduce computation in the absence of vocal sound.
|
|
||||||
*/
|
|
||||||
public String vadMode = "disabled";
|
|
||||||
/**
|
/**
|
||||||
* Configures the reference for the comparator used to match the samples.
|
* Configures the reference for the comparator used to match the samples.
|
||||||
*/
|
*/
|
||||||
@ -68,5 +51,35 @@ public class RustpotterKSConfiguration {
|
|||||||
/**
|
/**
|
||||||
* Configures the band-size for the comparator used to match the samples.
|
* Configures the band-size for the comparator used to match the samples.
|
||||||
*/
|
*/
|
||||||
public int comparatorBandSize = 6;
|
public int comparatorBandSize = 5;
|
||||||
|
/**
|
||||||
|
* Enables an audio filter that intent to approximate the volume of the stream to a reference level (RMS of the
|
||||||
|
* samples is used as volume measure).
|
||||||
|
*/
|
||||||
|
public boolean gainNormalizer = false;
|
||||||
|
/**
|
||||||
|
* Min gain applied by the gain normalizer filter.
|
||||||
|
*/
|
||||||
|
public float minGain = 0.5f;
|
||||||
|
/**
|
||||||
|
* Max gain applied by the gain normalizer filter.
|
||||||
|
*/
|
||||||
|
public float maxGain = 1f;
|
||||||
|
/**
|
||||||
|
* Set the RMS reference used by the gain-normalizer to calculate the gain applied. If unset an estimation of the
|
||||||
|
* wakeword level is used.
|
||||||
|
*/
|
||||||
|
public @Nullable Float gainRef = null;
|
||||||
|
/**
|
||||||
|
* Enables an audio filter that attenuates frequencies outside the low cutoff and high cutoff range.
|
||||||
|
*/
|
||||||
|
public boolean bandPass = false;
|
||||||
|
/**
|
||||||
|
* Low cutoff for the band-pass filter.
|
||||||
|
*/
|
||||||
|
public float lowCutoff = 80f;
|
||||||
|
/**
|
||||||
|
* High cutoff for the band-pass filter.
|
||||||
|
*/
|
||||||
|
public float highCutoff = 400f;
|
||||||
}
|
}
|
||||||
|
|||||||
@ -17,6 +17,7 @@ import static org.openhab.voice.rustpotterks.internal.RustpotterKSConstants.*;
|
|||||||
import java.io.File;
|
import java.io.File;
|
||||||
import java.io.IOException;
|
import java.io.IOException;
|
||||||
import java.nio.file.Path;
|
import java.nio.file.Path;
|
||||||
|
import java.util.ArrayList;
|
||||||
import java.util.Locale;
|
import java.util.Locale;
|
||||||
import java.util.Map;
|
import java.util.Map;
|
||||||
import java.util.Set;
|
import java.util.Set;
|
||||||
@ -38,7 +39,6 @@ import org.openhab.core.voice.KSService;
|
|||||||
import org.openhab.core.voice.KSServiceHandle;
|
import org.openhab.core.voice.KSServiceHandle;
|
||||||
import org.openhab.core.voice.KSpottedEvent;
|
import org.openhab.core.voice.KSpottedEvent;
|
||||||
import org.osgi.framework.Constants;
|
import org.osgi.framework.Constants;
|
||||||
import org.osgi.service.component.ComponentContext;
|
|
||||||
import org.osgi.service.component.annotations.Activate;
|
import org.osgi.service.component.annotations.Activate;
|
||||||
import org.osgi.service.component.annotations.Component;
|
import org.osgi.service.component.annotations.Component;
|
||||||
import org.osgi.service.component.annotations.Modified;
|
import org.osgi.service.component.annotations.Modified;
|
||||||
@ -46,10 +46,10 @@ import org.slf4j.Logger;
|
|||||||
import org.slf4j.LoggerFactory;
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
import io.github.givimad.rustpotter_java.Endianness;
|
import io.github.givimad.rustpotter_java.Endianness;
|
||||||
import io.github.givimad.rustpotter_java.NoiseDetectionMode;
|
import io.github.givimad.rustpotter_java.Rustpotter;
|
||||||
import io.github.givimad.rustpotter_java.RustpotterJava;
|
import io.github.givimad.rustpotter_java.RustpotterBuilder;
|
||||||
import io.github.givimad.rustpotter_java.RustpotterJavaBuilder;
|
import io.github.givimad.rustpotter_java.SampleFormat;
|
||||||
import io.github.givimad.rustpotter_java.VadMode;
|
import io.github.givimad.rustpotter_java.ScoreMode;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* The {@link RustpotterKSService} is a keyword spotting implementation based on rustpotter.
|
* The {@link RustpotterKSService} is a keyword spotting implementation based on rustpotter.
|
||||||
@ -76,7 +76,7 @@ public class RustpotterKSService implements KSService {
|
|||||||
}
|
}
|
||||||
|
|
||||||
@Activate
|
@Activate
|
||||||
protected void activate(ComponentContext componentContext, Map<String, Object> config) {
|
protected void activate(Map<String, Object> config) {
|
||||||
modified(config);
|
modified(config);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -111,7 +111,7 @@ public class RustpotterKSService implements KSService {
|
|||||||
throws KSException {
|
throws KSException {
|
||||||
logger.debug("Loading library");
|
logger.debug("Loading library");
|
||||||
try {
|
try {
|
||||||
RustpotterJava.loadLibrary();
|
Rustpotter.loadLibrary();
|
||||||
} catch (IOException e) {
|
} catch (IOException e) {
|
||||||
throw new KSException("Unable to load rustpotter lib: " + e.getMessage());
|
throw new KSException("Unable to load rustpotter lib: " + e.getMessage());
|
||||||
}
|
}
|
||||||
@ -126,8 +126,13 @@ public class RustpotterKSService implements KSService {
|
|||||||
}
|
}
|
||||||
var endianness = isBigEndian ? Endianness.BIG : Endianness.LITTLE;
|
var endianness = isBigEndian ? Endianness.BIG : Endianness.LITTLE;
|
||||||
logger.debug("Audio wav spec: frequency '{}', bit depth '{}', channels '{}', '{}'", frequency, bitDepth,
|
logger.debug("Audio wav spec: frequency '{}', bit depth '{}', channels '{}', '{}'", frequency, bitDepth,
|
||||||
channels, audioFormat.isBigEndian() ? "big-endian" : "little-endian");
|
channels, isBigEndian ? "big-endian" : "little-endian");
|
||||||
RustpotterJava rustpotter = initRustpotter(frequency, bitDepth, channels, endianness);
|
Rustpotter rustpotter;
|
||||||
|
try {
|
||||||
|
rustpotter = initRustpotter(frequency, bitDepth, channels, endianness);
|
||||||
|
} catch (Exception e) {
|
||||||
|
throw new KSException("Unable to configure rustpotter: " + e.getMessage(), e);
|
||||||
|
}
|
||||||
var modelName = keyword.replaceAll("\\s", "_") + ".rpw";
|
var modelName = keyword.replaceAll("\\s", "_") + ".rpw";
|
||||||
var modelPath = Path.of(RUSTPOTTER_FOLDER, modelName);
|
var modelPath = Path.of(RUSTPOTTER_FOLDER, modelName);
|
||||||
if (!modelPath.toFile().exists()) {
|
if (!modelPath.toFile().exists()) {
|
||||||
@ -141,48 +146,43 @@ public class RustpotterKSService implements KSService {
|
|||||||
logger.debug("Model '{}' loaded", modelPath);
|
logger.debug("Model '{}' loaded", modelPath);
|
||||||
AtomicBoolean aborted = new AtomicBoolean(false);
|
AtomicBoolean aborted = new AtomicBoolean(false);
|
||||||
executor.submit(() -> processAudioStream(rustpotter, ksListener, audioStream, aborted));
|
executor.submit(() -> processAudioStream(rustpotter, ksListener, audioStream, aborted));
|
||||||
return new KSServiceHandle() {
|
return () -> {
|
||||||
@Override
|
logger.debug("Stopping service");
|
||||||
public void abort() {
|
aborted.set(true);
|
||||||
logger.debug("Stopping service");
|
|
||||||
aborted.set(true);
|
|
||||||
}
|
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
private RustpotterJava initRustpotter(long frequency, int bitDepth, int channels, Endianness endianness) {
|
private Rustpotter initRustpotter(long frequency, int bitDepth, int channels, Endianness endianness)
|
||||||
var rustpotterBuilder = new RustpotterJavaBuilder();
|
throws Exception {
|
||||||
|
var rustpotterBuilder = new RustpotterBuilder();
|
||||||
// audio configs
|
// audio configs
|
||||||
rustpotterBuilder.setBitsPerSample(bitDepth);
|
rustpotterBuilder.setBitsPerSample(bitDepth);
|
||||||
rustpotterBuilder.setSampleRate(frequency);
|
rustpotterBuilder.setSampleRate(frequency);
|
||||||
rustpotterBuilder.setChannels(channels);
|
rustpotterBuilder.setChannels(channels);
|
||||||
|
rustpotterBuilder.setSampleFormat(SampleFormat.INT);
|
||||||
rustpotterBuilder.setEndianness(endianness);
|
rustpotterBuilder.setEndianness(endianness);
|
||||||
// detector configs
|
// detector configs
|
||||||
rustpotterBuilder.setThreshold(config.threshold);
|
rustpotterBuilder.setThreshold(config.threshold);
|
||||||
rustpotterBuilder.setAveragedThreshold(config.averagedThreshold);
|
rustpotterBuilder.setAveragedThreshold(config.averagedThreshold);
|
||||||
|
rustpotterBuilder.setScoreMode(getScoreMode(config.scoreMode));
|
||||||
|
rustpotterBuilder.setMinScores(config.minScores);
|
||||||
rustpotterBuilder.setComparatorRef(config.comparatorRef);
|
rustpotterBuilder.setComparatorRef(config.comparatorRef);
|
||||||
rustpotterBuilder.setComparatorBandSize(config.comparatorBandSize);
|
rustpotterBuilder.setComparatorBandSize(config.comparatorBandSize);
|
||||||
@Nullable
|
// filter configs
|
||||||
VadMode vadMode = getVADMode(config.vadMode);
|
rustpotterBuilder.setGainNormalizerEnabled(config.gainNormalizer);
|
||||||
if (vadMode != null) {
|
rustpotterBuilder.setMinGain(config.minGain);
|
||||||
rustpotterBuilder.setVADMode(vadMode);
|
rustpotterBuilder.setMaxGain(config.maxGain);
|
||||||
rustpotterBuilder.setVADSensitivity(config.vadSensitivity);
|
rustpotterBuilder.setGainRef(config.gainRef);
|
||||||
rustpotterBuilder.setVADDelay(config.vadDelay);
|
rustpotterBuilder.setBandPassFilterEnabled(config.bandPass);
|
||||||
}
|
rustpotterBuilder.setBandPassLowCutoff(config.lowCutoff);
|
||||||
@Nullable
|
rustpotterBuilder.setBandPassHighCutoff(config.highCutoff);
|
||||||
NoiseDetectionMode noiseDetectionMode = getNoiseMode(config.noiseDetectionMode);
|
|
||||||
if (noiseDetectionMode != null) {
|
|
||||||
rustpotterBuilder.setNoiseMode(noiseDetectionMode);
|
|
||||||
rustpotterBuilder.setNoiseSensitivity(config.noiseSensitivity);
|
|
||||||
}
|
|
||||||
rustpotterBuilder.setEagerMode(config.eagerMode);
|
|
||||||
// init the detector
|
// init the detector
|
||||||
var rustpotter = rustpotterBuilder.build();
|
var rustpotter = rustpotterBuilder.build();
|
||||||
rustpotterBuilder.delete();
|
rustpotterBuilder.delete();
|
||||||
return rustpotter;
|
return rustpotter;
|
||||||
}
|
}
|
||||||
|
|
||||||
private void processAudioStream(RustpotterJava rustpotter, KSListener ksListener, AudioStream audioStream,
|
private void processAudioStream(Rustpotter rustpotter, KSListener ksListener, AudioStream audioStream,
|
||||||
AtomicBoolean aborted) {
|
AtomicBoolean aborted) {
|
||||||
int numBytesRead;
|
int numBytesRead;
|
||||||
var bufferSize = (int) rustpotter.getBytesPerFrame();
|
var bufferSize = (int) rustpotter.getBytesPerFrame();
|
||||||
@ -200,10 +200,20 @@ public class RustpotterKSService implements KSService {
|
|||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
remaining = bufferSize;
|
remaining = bufferSize;
|
||||||
var result = rustpotter.processBuffer(audioBuffer);
|
var result = rustpotter.processBytes(audioBuffer);
|
||||||
if (result.isPresent()) {
|
if (result.isPresent()) {
|
||||||
var detection = result.get();
|
var detection = result.get();
|
||||||
logger.debug("keyword '{}' detected with score {}!", detection.getName(), detection.getScore());
|
if (logger.isDebugEnabled()) {
|
||||||
|
ArrayList<String> scores = new ArrayList<>();
|
||||||
|
var scoreNames = detection.getScoreNames().split("\\|\\|");
|
||||||
|
var scoreValues = detection.getScores();
|
||||||
|
for (var i = 0; i < Integer.min(scoreNames.length, scoreValues.length); i++) {
|
||||||
|
scores.add("'" + scoreNames[i] + "': " + scoreValues[i]);
|
||||||
|
}
|
||||||
|
logger.debug("Detected '{}' with: Score: {}, AvgScore: {}, Count: {}, Gain: {}, Scores: {}",
|
||||||
|
detection.getName(), detection.getScore(), detection.getAvgScore(),
|
||||||
|
detection.getCounter(), detection.getGain(), String.join(", ", scores));
|
||||||
|
}
|
||||||
detection.delete();
|
detection.delete();
|
||||||
ksListener.ksEventReceived(new KSpottedEvent());
|
ksListener.ksEventReceived(new KSpottedEvent());
|
||||||
}
|
}
|
||||||
@ -216,35 +226,27 @@ public class RustpotterKSService implements KSService {
|
|||||||
logger.debug("rustpotter stopped");
|
logger.debug("rustpotter stopped");
|
||||||
}
|
}
|
||||||
|
|
||||||
private @Nullable VadMode getVADMode(String mode) {
|
private ScoreMode getScoreMode(String mode) {
|
||||||
switch (mode) {
|
switch (mode) {
|
||||||
case "low-bitrate":
|
case "average":
|
||||||
return VadMode.LOW_BITRATE;
|
return ScoreMode.AVG;
|
||||||
case "quality":
|
case "median":
|
||||||
return VadMode.QUALITY;
|
return ScoreMode.MEDIAN;
|
||||||
case "aggressive":
|
case "p25":
|
||||||
return VadMode.AGGRESSIVE;
|
return ScoreMode.P25;
|
||||||
case "very-aggressive":
|
case "p50":
|
||||||
return VadMode.VERY_AGGRESSIVE;
|
return ScoreMode.P50;
|
||||||
|
case "p75":
|
||||||
|
return ScoreMode.P75;
|
||||||
|
case "p80":
|
||||||
|
return ScoreMode.P80;
|
||||||
|
case "p90":
|
||||||
|
return ScoreMode.P90;
|
||||||
|
case "p95":
|
||||||
|
return ScoreMode.P95;
|
||||||
|
case "max":
|
||||||
default:
|
default:
|
||||||
return null;
|
return ScoreMode.MAX;
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
private @Nullable NoiseDetectionMode getNoiseMode(String mode) {
|
|
||||||
switch (mode) {
|
|
||||||
case "easiest":
|
|
||||||
return NoiseDetectionMode.EASIEST;
|
|
||||||
case "easy":
|
|
||||||
return NoiseDetectionMode.EASY;
|
|
||||||
case "normal":
|
|
||||||
return NoiseDetectionMode.NORMAL;
|
|
||||||
case "hard":
|
|
||||||
return NoiseDetectionMode.HARD;
|
|
||||||
case "hardest":
|
|
||||||
return NoiseDetectionMode.HARDEST;
|
|
||||||
default:
|
|
||||||
return null;
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -9,13 +9,9 @@
|
|||||||
<label>Wakeword Detector</label>
|
<label>Wakeword Detector</label>
|
||||||
<description>Wakeword detection options.</description>
|
<description>Wakeword detection options.</description>
|
||||||
</parameter-group>
|
</parameter-group>
|
||||||
<parameter-group name="noiseDetector">
|
<parameter-group name="filters">
|
||||||
<label>Noise Detector</label>
|
<label>Audio Filters</label>
|
||||||
<description>Optional noise detection options.</description>
|
<description>Optional audio filter options.</description>
|
||||||
</parameter-group>
|
|
||||||
<parameter-group name="vadDetector">
|
|
||||||
<label>VAD Detector</label>
|
|
||||||
<description>Optional voice activity detector options.</description>
|
|
||||||
</parameter-group>
|
</parameter-group>
|
||||||
<parameter name="threshold" type="decimal" min="0" max="1" groupName="wakewordDetector">
|
<parameter name="threshold" type="decimal" min="0" max="1" groupName="wakewordDetector">
|
||||||
<label>Threshold</label>
|
<label>Threshold</label>
|
||||||
@ -31,6 +27,27 @@
|
|||||||
cpu. If set to 0 this functionality is disabled.</description>
|
cpu. If set to 0 this functionality is disabled.</description>
|
||||||
<default>0.2</default>
|
<default>0.2</default>
|
||||||
</parameter>
|
</parameter>
|
||||||
|
<parameter name="scoreMode" type="text" groupName="wakewordDetector">
|
||||||
|
<label>Score Mode</label>
|
||||||
|
<description>Indicates how to calculate the final score.</description>
|
||||||
|
<default>max</default>
|
||||||
|
<options>
|
||||||
|
<option value="average">Average</option>
|
||||||
|
<option value="max">Max</option>
|
||||||
|
<option value="median">Median</option>
|
||||||
|
<option value="p25">P25</option>
|
||||||
|
<option value="p50">P50</option>
|
||||||
|
<option value="p75">P75</option>
|
||||||
|
<option value="p80">P80</option>
|
||||||
|
<option value="p90">P90</option>
|
||||||
|
<option value="p95">P95</option>
|
||||||
|
</options>
|
||||||
|
</parameter>
|
||||||
|
<parameter name="minScores" type="integer" groupName="wakewordDetector">
|
||||||
|
<label>Min Scores</label>
|
||||||
|
<description>Minimum number of positive scores to consider a partial detection as a detection.</description>
|
||||||
|
<default>5</default>
|
||||||
|
</parameter>
|
||||||
<parameter name="comparatorRef" type="decimal" min="0" max="1" groupName="wakewordDetector">
|
<parameter name="comparatorRef" type="decimal" min="0" max="1" groupName="wakewordDetector">
|
||||||
<label>Comparator Ref</label>
|
<label>Comparator Ref</label>
|
||||||
<description>Configures the reference for the comparator used to match the samples.</description>
|
<description>Configures the reference for the comparator used to match the samples.</description>
|
||||||
@ -40,58 +57,44 @@
|
|||||||
<parameter name="comparatorBandSize" type="integer" groupName="wakewordDetector">
|
<parameter name="comparatorBandSize" type="integer" groupName="wakewordDetector">
|
||||||
<label>Comparator Band Size</label>
|
<label>Comparator Band Size</label>
|
||||||
<description>Configures the band-size for the comparator used to match the samples.</description>
|
<description>Configures the band-size for the comparator used to match the samples.</description>
|
||||||
<default>6</default>
|
<default>5</default>
|
||||||
<advanced>true</advanced>
|
<advanced>true</advanced>
|
||||||
</parameter>
|
</parameter>
|
||||||
<parameter name="eagerMode" type="boolean" groupName="wakewordDetector">
|
<parameter name="gainNormalizer" type="boolean" groupName="filters">
|
||||||
<label>Eager Mode</label>
|
<label>Gain Normalizer</label>
|
||||||
<description>Enables eager mode. End detection as soon as a result is over the score, instead of waiting to
|
<description> Enables an audio filter that intent to approximate the volume of the stream to a reference level (RMS
|
||||||
see if the
|
of the samples is used as volume measure).</description>
|
||||||
next frame has a higher score.</description>
|
<default>false</default>
|
||||||
<default>true</default>
|
|
||||||
</parameter>
|
</parameter>
|
||||||
<parameter name="noiseDetectionMode" type="text" groupName="noiseDetector">
|
<parameter name="minGain" type="decimal" min="0.1" max="1" step="0.1" groupName="filters">
|
||||||
<label>Noise Detection Mode</label>
|
<label>Min Gain</label>
|
||||||
<description>Use a noise detector to reduce computation in the absence of sound. Configures the difficulty to
|
<description>Min gain applied by the gain normalizer filter.</description>
|
||||||
consider
|
|
||||||
a
|
|
||||||
frame as noise (the required noise level).</description>
|
|
||||||
<default>disabled</default>
|
|
||||||
<options>
|
|
||||||
<option value="disabled">Disabled</option>
|
|
||||||
<option value="easiest">Easiest</option>
|
|
||||||
<option value="easy">Easy</option>
|
|
||||||
<option value="normal">Normal</option>
|
|
||||||
<option value="hard">Hard</option>
|
|
||||||
<option value="hardest">Hardest</option>
|
|
||||||
</options>
|
|
||||||
</parameter>
|
|
||||||
<parameter name="noiseSensitivity" type="decimal" min="0" max="1" groupName="noiseDetector">
|
|
||||||
<label>Noise Sensitivity</label>
|
|
||||||
<description>Noise/silence ratio in the last second to consider voice is detected.</description>
|
|
||||||
<default>0.5</default>
|
<default>0.5</default>
|
||||||
</parameter>
|
</parameter>
|
||||||
<parameter name="vadMode" type="text" groupName="vadDetector">
|
<parameter name="maxGain" type="decimal" min="0.1" max="1" step="0.1" groupName="filters">
|
||||||
<label>VAD Mode</label>
|
<label>Max Gain</label>
|
||||||
<description>Use a vad detector to reduce computation in the absence of vocal sound.</description>
|
<description>Max gain applied by the gain normalizer filter.</description>
|
||||||
<default>disabled</default>
|
<default>1</default>
|
||||||
<options>
|
|
||||||
<option value="disabled">Disabled</option>
|
|
||||||
<option value="low-bitrate">Low Bitrate</option>
|
|
||||||
<option value="quality">Quality</option>
|
|
||||||
<option value="aggressive">Aggressive</option>
|
|
||||||
<option value="very-aggressive">Very Aggressive</option>
|
|
||||||
</options>
|
|
||||||
</parameter>
|
</parameter>
|
||||||
<parameter name="vadSensitivity" type="decimal" min="0" max="1" groupName="vadDetector">
|
<parameter name="gainRef" type="decimal" min="0" max="1" step="0.001" groupName="filters">
|
||||||
<label>VAD Sensitivity</label>
|
<label>Gain Ref</label>
|
||||||
<description>Voice/silence ratio in the last second to consider voice is detected.</description>
|
<description>Set the RMS reference used by the gain-normalizer to calculate the gain applied. If unset an estimation
|
||||||
<default>0.5</default>
|
of the wakeword level is used.</description>
|
||||||
</parameter>
|
</parameter>
|
||||||
<parameter name="vadDelay" type="integer" groupName="vadDetector">
|
<parameter name="bandPass" type="boolean" groupName="filters">
|
||||||
<label>VAD Delay</label>
|
<label>Band Pass</label>
|
||||||
<description>Seconds to disable the vad detector after voice is detected.</description>
|
<description>Enables an audio filter that attenuates frequencies outside the low cutoff and high cutoff range.</description>
|
||||||
<default>3</default>
|
<default>false</default>
|
||||||
|
</parameter>
|
||||||
|
<parameter name="lowCutoff" type="decimal" min="0" groupName="filters">
|
||||||
|
<label>Low Cutoff</label>
|
||||||
|
<description>Low cutoff for the band-pass filter.</description>
|
||||||
|
<default>80</default>
|
||||||
|
</parameter>
|
||||||
|
<parameter name="highCutoff" type="decimal" min="0" groupName="filters">
|
||||||
|
<label>High Cutoff</label>
|
||||||
|
<description>High cutoff for the band-pass filter.</description>
|
||||||
|
<default>400</default>
|
||||||
</parameter>
|
</parameter>
|
||||||
</config-description>
|
</config-description>
|
||||||
</config-description:config-descriptions>
|
</config-description:config-descriptions>
|
||||||
|
|||||||
@ -1,40 +1,42 @@
|
|||||||
voice.config.rustpotterks.averagedThreshold.label = Averaged Threshold
|
voice.config.rustpotterks.averagedThreshold.label = Averaged Threshold
|
||||||
voice.config.rustpotterks.averagedThreshold.description = Configures the detector averaged threshold, is the min score (in range 0. to 1.) that the audio should obtain against a combination of the wake word templates, the detection will be aborted if this is not the case. This way it can prevent to run the comparison of the current frame against each of the wake word templates which saves cpu. If set to 0 this functionality is disabled.
|
voice.config.rustpotterks.averagedThreshold.description = Configures the detector averaged threshold, is the min score (in range 0. to 1.) that the audio should obtain against a combination of the wake word templates, the detection will be aborted if this is not the case. This way it can prevent to run the comparison of the current frame against each of the wake word templates which saves cpu. If set to 0 this functionality is disabled.
|
||||||
|
voice.config.rustpotterks.bandPass.label = Band Pass
|
||||||
|
voice.config.rustpotterks.bandPass.description = Enables an audio filter that attenuates frequencies outside the low cutoff and high cutoff range.
|
||||||
voice.config.rustpotterks.comparatorBandSize.label = Comparator Band Size
|
voice.config.rustpotterks.comparatorBandSize.label = Comparator Band Size
|
||||||
voice.config.rustpotterks.comparatorBandSize.description = Configures the band-size for the comparator used to match the samples.
|
voice.config.rustpotterks.comparatorBandSize.description = Configures the band-size for the comparator used to match the samples.
|
||||||
voice.config.rustpotterks.comparatorRef.label = Comparator Ref
|
voice.config.rustpotterks.comparatorRef.label = Comparator Ref
|
||||||
voice.config.rustpotterks.comparatorRef.description = Configures the reference for the comparator used to match the samples.
|
voice.config.rustpotterks.comparatorRef.description = Configures the reference for the comparator used to match the samples.
|
||||||
voice.config.rustpotterks.eagerMode.label = Eager Mode
|
voice.config.rustpotterks.gainNormalizer.label = Gain Normalizer
|
||||||
voice.config.rustpotterks.eagerMode.description = Enables eager mode. End detection as soon as a result is over the score, instead of waiting to see if the next frame has a higher score.
|
voice.config.rustpotterks.gainNormalizer.description = Enables an audio filter that intent to approximate the volume of the stream to a reference level (RMS of the samples is used as volume measure).
|
||||||
voice.config.rustpotterks.group.noiseDetector.label = Noise Detector
|
voice.config.rustpotterks.gainRef.label = Gain Ref
|
||||||
voice.config.rustpotterks.group.noiseDetector.description = Optional noise detection options.
|
voice.config.rustpotterks.gainRef.description = Set the RMS reference used by the gain-normalizer to calculate the gain applied. If unset an estimation of the wakeword level is used.
|
||||||
voice.config.rustpotterks.group.vadDetector.label = VAD Detector
|
voice.config.rustpotterks.group.filters.label = Audio Filters
|
||||||
voice.config.rustpotterks.group.vadDetector.description = Optional voice activity detector options.
|
voice.config.rustpotterks.group.filters.description = Optional audio filter options.
|
||||||
voice.config.rustpotterks.group.wakewordDetector.label = Wakeword Detector
|
voice.config.rustpotterks.group.wakewordDetector.label = Wakeword Detector
|
||||||
voice.config.rustpotterks.group.wakewordDetector.description = Wakeword detection options.
|
voice.config.rustpotterks.group.wakewordDetector.description = Wakeword detection options.
|
||||||
voice.config.rustpotterks.noiseDetectionMode.label = Noise Detection Mode
|
voice.config.rustpotterks.highCutoff.label = High Cutoff
|
||||||
voice.config.rustpotterks.noiseDetectionMode.description = Use a noise detector to reduce computation in the absence of sound. Configures the difficulty to consider a frame as noise (the required noise level).
|
voice.config.rustpotterks.highCutoff.description = High cutoff for the band-pass filter.
|
||||||
voice.config.rustpotterks.noiseDetectionMode.option.disabled = Disabled
|
voice.config.rustpotterks.lowCutoff.label = Low Cutoff
|
||||||
voice.config.rustpotterks.noiseDetectionMode.option.easiest = Easiest
|
voice.config.rustpotterks.lowCutoff.description = Low cutoff for the band-pass filter.
|
||||||
voice.config.rustpotterks.noiseDetectionMode.option.easy = Easy
|
voice.config.rustpotterks.maxGain.label = Max Gain
|
||||||
voice.config.rustpotterks.noiseDetectionMode.option.normal = Normal
|
voice.config.rustpotterks.maxGain.description = Max gain applied by the gain normalizer filter.
|
||||||
voice.config.rustpotterks.noiseDetectionMode.option.hard = Hard
|
voice.config.rustpotterks.minGain.label = Min Gain
|
||||||
voice.config.rustpotterks.noiseDetectionMode.option.hardest = Hardest
|
voice.config.rustpotterks.minGain.description = Min gain applied by the gain normalizer filter.
|
||||||
voice.config.rustpotterks.noiseSensitivity.label = Noise Sensitivity
|
voice.config.rustpotterks.minScores.label = Min Scores
|
||||||
voice.config.rustpotterks.noiseSensitivity.description = Noise/silence ratio in the last second to consider voice is detected.
|
voice.config.rustpotterks.minScores.description = Minimum number of positive scores to consider a partial detection as a detection.
|
||||||
|
voice.config.rustpotterks.scoreMode.label = Score Mode
|
||||||
|
voice.config.rustpotterks.scoreMode.description = Indicates how to calculate the final score.
|
||||||
|
voice.config.rustpotterks.scoreMode.option.average = Average
|
||||||
|
voice.config.rustpotterks.scoreMode.option.max = Max
|
||||||
|
voice.config.rustpotterks.scoreMode.option.median = Median
|
||||||
|
voice.config.rustpotterks.scoreMode.option.p25 = P25
|
||||||
|
voice.config.rustpotterks.scoreMode.option.p50 = P50
|
||||||
|
voice.config.rustpotterks.scoreMode.option.p75 = P75
|
||||||
|
voice.config.rustpotterks.scoreMode.option.p80 = P80
|
||||||
|
voice.config.rustpotterks.scoreMode.option.p90 = P90
|
||||||
|
voice.config.rustpotterks.scoreMode.option.p95 = P95
|
||||||
voice.config.rustpotterks.threshold.label = Threshold
|
voice.config.rustpotterks.threshold.label = Threshold
|
||||||
voice.config.rustpotterks.threshold.description = Configures the detector threshold, is the min score (in range 0. to 1.) that some of the wakeword templates should obtain to trigger a detection. Model defined value takes prevalence if present.
|
voice.config.rustpotterks.threshold.description = Configures the detector threshold, is the min score (in range 0. to 1.) that some of the wakeword templates should obtain to trigger a detection. Model defined value takes prevalence if present.
|
||||||
voice.config.rustpotterks.vadDelay.label = VAD Delay
|
|
||||||
voice.config.rustpotterks.vadDelay.description = Seconds to disable the vad detector after voice is detected.
|
|
||||||
voice.config.rustpotterks.vadMode.label = VAD Mode
|
|
||||||
voice.config.rustpotterks.vadMode.description = Use a vad detector to reduce computation in the absence of vocal sound.
|
|
||||||
voice.config.rustpotterks.vadMode.option.disabled = Disabled
|
|
||||||
voice.config.rustpotterks.vadMode.option.low-bitrate = Low Bitrate
|
|
||||||
voice.config.rustpotterks.vadMode.option.quality = Quality
|
|
||||||
voice.config.rustpotterks.vadMode.option.aggressive = Aggressive
|
|
||||||
voice.config.rustpotterks.vadMode.option.very-aggressive = Very Aggressive
|
|
||||||
voice.config.rustpotterks.vadSensitivity.label = VAD Sensitivity
|
|
||||||
voice.config.rustpotterks.vadSensitivity.description = Voice/silence ratio in the last second to consider voice is detected.
|
|
||||||
|
|
||||||
# service
|
# service
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user