add read me

This commit is contained in:
2026-01-09 10:28:44 +11:00
commit edaf914b73
13417 changed files with 2952119 additions and 0 deletions

View File

@@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -0,0 +1,223 @@
Metadata-Version: 2.1
Name: openwakeword
Version: 0.4.0
Summary: An open-source audio wake word (or phrase) detection framework with a focus on performance and simplicity
Home-page: https://pypi.org/project/openwakeword
Author: David Scripka
Author-email: David Scripka <david.scripka@gmail.com>
Project-URL: Homepage, https://github.com/dscripka/openWakeWord
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: onnxruntime (<2,>=1.10.0)
Requires-Dist: tqdm (<5.0,>=4.0)
Requires-Dist: scipy (<2,>=1.3)
Requires-Dist: scikit-learn (<2,>=1)
Provides-Extra: full
Requires-Dist: mutagen (<2,>=1.46.0) ; extra == 'full'
Requires-Dist: speechbrain (<1,>=0.5.13) ; extra == 'full'
Requires-Dist: pytest (<8,>=7.2.0) ; extra == 'full'
Requires-Dist: pytest-cov (<3,>=2.10.1) ; extra == 'full'
Requires-Dist: pytest-flake8 (<2,>=1.1.1) ; extra == 'full'
Requires-Dist: pytest-mypy (<1,>=0.10.0) ; extra == 'full'
Requires-Dist: plotext (<6,>=5.2.7) ; extra == 'full'
Requires-Dist: sounddevice (<1,>=0.4.1) ; extra == 'full'
Provides-Extra: test
Requires-Dist: pytest (<8,>=7.2.0) ; extra == 'test'
Requires-Dist: pytest-cov (<3,>=2.10.1) ; extra == 'test'
Requires-Dist: pytest-flake8 (<2,>=1.1.1) ; extra == 'test'
Requires-Dist: flake8 (<4.1,>=4.0) ; extra == 'test'
Requires-Dist: pytest-mypy (<1,>=0.10.0) ; extra == 'test'
![Github CI](https://github.com/dscripka/openWakeWord/actions/workflows/tests.yml/badge.svg)
# openWakeWord
openWakeWord is an open-source wakeword library that can be used to create voice-enabled applications and interfaces. It includes pre-trained models for common words & phrases that work well in real-world environments.
# Demo
You can try an online demo of the included pre-trained models via HuggingFace Spaces [right here!](https://huggingface.co/spaces/davidscripka/openWakeWord).
Note that real-time detection of a microphone stream can occasionally behave strangely in Spaces. For the most reliable testing, perform a local installation as described below.
# Installation & Usage
Installing openWakeWord is simple and has minimal dependencies:
```
pip install openwakeword
```
To (optionally) use [Speex](https://www.speex.org/) noise suppression on Linux systems to improve performance in noisy environments, install the Speex dependencies and then the pre-built Python package (see the assets [here](https://github.com/dscripka/openWakeWord/releases/tag/v0.1.1) for all .whl versions), adjusting for your python version and system architecture as needed.
```
sudo apt-get install libspeexdsp-dev
pip install https://github.com/dscripka/openWakeWord/releases/download/v0.1.1/speexdsp_ns-0.1.2-cp38-cp38-linux_x86_64.whl
```
Many thanks to [TeaPoly](https://github.com/TeaPoly/speexdsp-ns-python) for their Python wrapper of the Speex noise suppression libraries.
For quick local testing, clone this repository and use the included [example script](examples/detect_from_microphone.py) to try streaming detection from a local microphone. **Important note!** The model files are stored in this repo using [git-lfs](https://git-lfs.com/); make sure it is installed on your system and if needed use `git-lfs fetch --all` to make sure the the models download correctly.
Adding openWakeWord to your own Python code requires just a few lines:
```python
from openwakeword.model import Model
# Instantiate the model
model = Model(
wakeword_model_paths=["path/to/model.onnx"], # can also leave this argument empty to load all of the included pre-trained models
)
# Get audio data containing 16-bit 16khz PCM audio data from a file, microphone, network stream, etc.
# For the best efficiency and latency, audio frames should be multiples of 80 ms, with longer frames
# increasing overall efficiency at the cost of detection latency
frame = my_function_to_get_audio_frame()
# Get predictions for the frame
prediction = model.predict(frame)
```
# Recommendations for Usage
## Noise Suppression and Voice Activity Detection (VAD)
While the default settings for openWakeWord will work well in many cases, there are adjustable parameters in openWakeWord that can improve performance in some deployment scenarios.
On supported platforms (currently only X86 and Arm64 linux), Speex noise suppression can be enabled by setting the `enable_speex_noise_suppression=True` when instantiating an openWakeWord model. This can improve performance when relatively constant background noise is present.
Second, a voice activity detection (VAD) model from [Silero](https://github.com/snakers4/silero-vad) is included with openWakeWord, and can be enabled by setting the `vad_threshold` argument to a value between 0 and 1 when instantiating an openWakeWord model. This will only allow a positive prediction from openWakeWord when the VAD model simultaneously has a score above the specified threshold, which can significantly reduce false-positive activations in the present of non-speech noise.
## Threshold Scores for Activation
All of the included openWakeWord models were trained to work well with a default threshold of `0.5` for a positive prediction, but you are encouraged to determine the best threshold for your environment and use-case through testing. For certain deployments, using a lower or higher threshold in practice may result in significantly better performance.
## User-specific models
If the baseline performance of openWakeWord models is not sufficient for a given application (specifically, if the false activation rate is unacceptably high), it is possible to train [custom verifier models](docs/custom_verifier_models.md) for specific voices that act as a second-stage filter on predictions (i.e., only allow activations through that were likely spoken by a known set of voices). This can greatly improve performance, at the cost of making the openWakeWord system less likely to respond to new voices.
# Project Goals
openWakeWord has four high-level goals, which combine to (hopefully!) produce a framework that is simple to use *and* extend.
1) Be fast *enough* for real-world usage, while maintaining ease of use and development. For example, a single core of a Raspberry Pi 3 can run 15-20 openWakeWord models simultaneously in real-time. However, the models are likely still too large for less powerful systems or micro-controllers. Commercial options like [Picovoice Porcupine](https://picovoice.ai/platform/porcupine/) or [Fluent Wakeword](https://fluent.ai/products/wakeword/) are likely better suited for highly constrained hardware environments.
2) Be accurate *enough* for real-world usage. The included models are typically have false-accept and false-reject rates below the annoyance threshold for the average user. This is obviously subjective, by a false-accept rate of <0.5 per hour and a false-reject rate of <5% is often reasonable in practice. See the [Performance & Evaluation](#performance-and-evaluation) section for details about how well the included models can be expected to perform in practice.
2) Have a simple model architecture and inference process. Models process a stream of audio data in 80 ms frames, and return a score between 0 and 1 for each frame indicating the confidence that a wake word/phrase has been detected. All models also have a shared feature extraction backbone, so that each additional model only has a small impact to overall system complexity and resource requirements.
4) Require **little to no manual data collection** to train new models. The included models (see the [Pre-trained Models](#pre-trained-models) section for more details) were all trained with *100% synthetic* speech generated from text-to-speech models. Training new models is a simple as generating new clips for the target wake word/phrase and training a small model on top of of the frozen shared feature extractor. See the [Training New Models](#training-new-models) section for more details.
Future releases of openWakeWord will aim to stay aligned with these goals, even when adding new functionality.
# Pre-Trained Models
openWakeWord comes with pre-trained models for common words & phrases. Currently, only English models are supported, but they should be reasonably robust across different types speaker accents and pronunciation.
The table below lists each model, examples of the word/phrases it is trained to recognize, and the associated documentation page for additional detail. Many of these models are trained on multiple variations of the same word/phrase; see the individual documentation pages for each model to see all supported word & phrase variations.
| Model | Detected Speech | Documentation Page |
| ------------- | ------------- | ------------- |
| alexa | "alexa"| [docs](docs/models/alexa.md) |
| hey mycroft | "hey mycroft" | [docs](docs/models/hey_mycroft.md) |
| hey jarvis | "hey jarvis" | [docs](docs/models/hey_jarvis.md) |
| current weather | "what's the weather" | [docs](docs/models/weather.md) |
| timers | "set a 10 minute timer" | [docs](docs/models/timers.md) |
Based on the methods discussed in [performance testing](#performance-and-evaluation), each included model aims to meet the target performance criteria of <5% false-reject rates and <0.5/hour false-accept rates with appropriate threshold tuning. These levels are subjective, but hopefully are below the annoyance threshold where the average user becomes frustrated with a system that often misses intended activations and/or causes disruption by activating too frequently at undesired times. For example, at these performance levels a user could expect to have the model process continuous mixed content audio of several hours with at most a few false activations, and have a failed intended activation in only 1/20 attempts (and a failed retry in only 1/400 attempts).
If you have a new wake word or phrase that you would like to see included in the next release, please open an issue, and we'll do a best to train a model! The focus of these requests and future release will be on words and phrases that have broad general usage versus highly specific application.
# Model Architecture
openWakeword models are composed of three separate components:
1) A pre-processing function that computes [melspectrogram](https://pytorch.org/audio/main/generated/torchaudio.transforms.MelSpectrogram.html) of the input audio data. For openWakeword, an ONNX implementation of Torch's melspectrogram function with fixed parameters is used to enable efficient performance across devices.
2) A shared feature extraction backbone model that converts melspectrogram inputs into general-purpose speech audio embeddings. This [model](https://arxiv.org/abs/2002.01322) is provided by [Google](https://tfhub.dev/google/speech_embedding/1) as a TFHub module under an [Apache-2.0](https://opensource.org/licenses/Apache-2.0) license. For openWakeWord, this model was manually re-implemented to separate out different functionality and allow for more control of architecture modifications compared to a TFHub module. The model itself is series of relatively simple convolutional blocks, and gains its strong performance from extensive pre-training on large amounts of data. This model is the core component of openWakeWord, and enables the strong performance that is seen even when training on fully-synthetic data.
3) A classification model that follows the shared (and frozen) feature extraction model. The structure of this classification model is arbitrary, but in practice a simple fully-connected network or 2 layer RNN works well.
# Performance and Evaluation
Evaluating wake word/phrase detection models is challenging, and it is often very difficult to assess how different models presented in papers or other projects will perform *when deployed* with respect to two critical metrics: false-reject rates and false-accept rates. For clarity in definitions:
A *false-reject* is when the model fails to detect an intended activation from a user.
A *false-accept* is when the model inadvertently activates when the user did not intend for it to do so.
For openWakeWord, evaluation follows two principles:
- The *false-reject* rate should be determined from wakeword/phrases that represent realistic recording environments, including those with background noise and reverberation. This can be accomplished by directly collected data from these environments, or simulating them with data augmentation methods.
- The *false-accept* rate should be determined from audio that represents the types of environments that would be expected for the deployed model, not just on the training/evaluation data. In practice, this means that the model should only rarely activate in error, even in the presence of hours of continuous speech and background noise.
While other wakeword evaluation standards [do exist](https://github.com/Picovoice/wake-word-benchmark), for openWakeWord it was decided that a custom evaluation would better indicate what performance users can expect for real-world deployments. Specifically:
1) *false-reject* rates are calculated from either clean recordings of the wakeword that are mixed with background noise at realistic signal-to-noise ratios (e.g., 5-10 dB) *and* reverberated with room Impulse Response Functions (RIRs) to better simulate far-field audio, *or* manually collected data from realistic deployment environments (e.g., far-field capture with normal environment noise).
2) *false-accept* rates are determined by using the [Dinner Party Corpus](https://www.amazon.science/publications/dipco-dinner-party-corpus) dataset, which represents ~5.5 hours of far-field speech, background music, and miscellaneous noise. This dataset sets a realistic (if challenging) goal for how many false activations might occur in a similar situation.
To illustrate how openWakeWord can produce capable models, the false-accept/false-reject curves for the included `"alexa"` model is shown below along with the performance of a strong commercial competitor, [Picovoice Porcupine](https://picovoice.ai/platform/porcupine/). Other existing open-source wakeword engines (e.g., [Snowboy](https://github.com/Kitt-AI/snowboy), [PocketSphinx](https://github.com/cmusphinx/pocketsphinx), etc.) are not included as they are either no longer maintained or demonstrate performance significantly below that of Porcupine. The positive test examples used were those included in [Picovoice's](https://github.com/Picovoice/wake-word-benchmark) repository, a fantastic resource that they have freely provided to the community. Note, however, that the test data was prepared differently compared to Picovoice's implementation (see the [Alexa model documentation](docs/models/alexa.md) for more details).
![FPR/FRR curve for "alexa" pre-trained model](docs/models/images/alexa_performance_plot.png)
For at least this test data and preparation, openWakeWord produces a model that is more accurate than Porcupine.
As a second illustration, the false-accept/false-reject rate of the included `"hey mycroft"` model is shown below along with the performance of a [custom](https://picovoice.ai/docs/quick-start/porcupine-python/#custom-keywords) Picovoice Porcupine model and [Mycroft Precise](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/precise). In this case, the positive test examples were manually collected from a male speaker with a relatively neutral American english accent in realistic home recording scenarios (see the [Hey Mycroft model documentation](docs/models/hey_mycroft.md) for more details).
![FPR/FRR curve for "hey mycroft" pre-trained model](docs/models/images/hey_mycroft_performance.png)
Again, for at least this test data and preparation, openWakeWord produces a model at least as good as existing solutions.
However, in should noted that for both of these tests sample sizes are small and there are issues ([1](https://github.com/Picovoice/wake-word-benchmark/issues/13), [2](https://github.com/MycroftAI/mycroft-precise/issues/237)) with the evaluation of the other libraries that suggest these results should be interpreted cautiously. As such, the only claim being made is that openWakeWord models are broadly competitive with comparable offerings. You are strongly encouraged to [test openWakeWord](#installation--usage) to determine if it will meet the requirements of your use-case.
Finally, to give evidence that the core methods behind openWakeWord (i.e., pre-trained speech embeddings and high-quality synthetic speech) are effective across a wider range of wake word/phrase structure and length, the table below shows the performance on the [Fluent Speech Commands](https://paperswithcode.com/sota/spoken-language-understanding-on-fluent) test set using an openWakeWord model and the baseline method shown in a [related paper by the dataset authors](https://arxiv.org/abs/1910.09463). While both models were trained on fully-synthetic data, due to fundamentally different data synthesis & preparation, training, and evaluation approaches, the numbers below are likely not directly comparable. Rather, the important conclusion is that openWakeWord is a viable approach for the task of spoken language understanding (SLU).
| Model | Test Set Accuracy | Link |
| ------------- | ------------- | ------------- |
| openWakeWord | ~97.5% | NA |
| encoder-decoder | ~94.9% | [paper](https://arxiv.org/abs/1910.09463) |
If you are aware of other open-source wakeword/phrase libraries that should be added to these comparisons, or have suggestions on how to improve the evaluation more generally, please open an issue! We are eager to continue improving openWakeWord by learning how others are approaching this problem.
## Other Performance Details
### Model Robustness
Due to a combination of variability in the generated speech and the extensive pre-training from Google, openWakeWord models also demonstrate some additional performance benefits that are useful for real-world applications. In testing, three in particular have been observed.
1) The trained models seem to respond reasonably well to wakewords and phrases that are [whispered](https://en.wikipedia.org/wiki/Whispering). This is somewhat surprising behavior, as the text-to-speech models used for producing training data generally do not create synthetic speech that has acoustic qualities similar to whispering.
2) The models also respond relatively well to wakewords and phrases spoken at different speeds (within reason).
3) The models are able to handle some variability in the phrasing of a given command. This behavior was not entirely a surprise, given that [others](https://arxiv.org/abs/1904.03670) have reported similar benefits when training end-to-end spoken language understanding systems. For example, the included [pre-trained weather model](docs/models/weather.md) will typically still respond correctly to a phrase like "how is the weather today" despite not training directly on that phrase (though false rejections rates will likely be higher, on average, compared to phrases closer to the training data).
### Background Noise
While the models are trained with background noise to increase robustness, in some cases additional noise suppression can improve performance. Setting the `enable_speex_noise_suppression=True` argument during openWakeWord model initialization will use the efficient Speex noise suppression algorithm to pre-process the audio data prior to prediction. This can reduce both false-reject rates and false-accept rates, though testing in a realistic deployment environment is strongly recommended.
# Training New Models
Training new models is conceptually simple, and the entire process is demonstrated in a [tutorial notebook](notebooks/training_models.ipynb).
Fundamentally, a new model requires two data generation and collection steps:
1) Generate new training data for the desired wakeword/phrase using open-source speech-to-text systems (see [Synthetic Data Generation](docs/synthetic_data_generation.md) for more details). These models and the generation code are hosted in a separate [repository](https://github.com/dscripka/synthetic_speech_dataset_generation). The number of generated examples required can vary, a minimum of several thousand is recommended and performance seems to increase smoothly with increasing dataset size.
2) Collect negative data (e.g., audio where the wakeword/phrase is not present) to help the model have a low false-accept rate. This also benefits from scale, and the [included models](#pre-trained-models) were all trained with ~30,000 hours of negative data representing speech, noise, and music. See the individual model documentation pages for more details on training data curation and preparation.
# Language Support
Currently, openWakeWord only supports English, primarily because the pre-trained text-to-speech models used to generate training data are all based on english datasets. It's likely that speech-to-text models trained on other languages would also work well, but non-english models & datasets are less commonly available.
Future release road maps may have non-english support. In particular, [Mycroft.AIs Mimic 3](https://github.com/MycroftAI/mimic3-voices) TTS engine may work well to help extend some support to other languages.
# License
All of the code in openWakeWord is licensed under the **Apache 2.0** license. All of the included pre-trained models are licensed under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-nc-sa/4.0/) license due to the inclusion of datasets with unknown or restrictive licensing as part of the training data. If you are interested in pre-trained models with more permissive licensing, please raise an issue and we will try to add them to a future release.

View File

@@ -0,0 +1,30 @@
openwakeword-0.4.0.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
openwakeword-0.4.0.dist-info/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
openwakeword-0.4.0.dist-info/METADATA,sha256=LRFXEspYc8DTU75G_7G2ChU2Cq1gQaM473-cugd34IM,21992
openwakeword-0.4.0.dist-info/RECORD,,
openwakeword-0.4.0.dist-info/REQUESTED,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
openwakeword-0.4.0.dist-info/WHEEL,sha256=pkctZYzUS4AYVn6dJ-7367OJZivF2e8RA9b_ZBjif18,92
openwakeword-0.4.0.dist-info/top_level.txt,sha256=F-oZfkRRKwZ2slLP6lybvE-TajgtNTFjny-6jY3-gBI,13
openwakeword/__init__.py,sha256=xVeZYr_0eAoI648O4Niz_4oTgfkHKbdBxHmK_SKFDxU,1296
openwakeword/__pycache__/__init__.cpython-312.pyc,,
openwakeword/__pycache__/custom_verifier_model.cpython-312.pyc,,
openwakeword/__pycache__/data.cpython-312.pyc,,
openwakeword/__pycache__/metrics.cpython-312.pyc,,
openwakeword/__pycache__/model.cpython-312.pyc,,
openwakeword/__pycache__/utils.cpython-312.pyc,,
openwakeword/__pycache__/vad.cpython-312.pyc,,
openwakeword/custom_verifier_model.py,sha256=Ydo0QXezhH_XPufyC7qO1anigskNZJ8Vbjn0l3YnQkI,7131
openwakeword/data.py,sha256=KpcGxPBBHThQUuU7CCWPdfvDlYty5b99BT-1Wkd_Oc8,32108
openwakeword/metrics.py,sha256=4Ay_JiISOAuup5488-znRHYdim9_DecsubJ_J2WQV0Q,3756
openwakeword/model.py,sha256=h14WAQPOh-8gRjVM-yWcoNh76JLLIhmJVc8N2zU-pDY,19970
openwakeword/resources/models/alexa_v0.1.onnx,sha256=b_VmoB0SZw6NnjxZ2jJlHbFXXRcnKmAbf4o5KD37rj4,854246
openwakeword/resources/models/embedding_model.onnx,sha256=unVNs812ilJMZV6pBlXuXmBVpDuN_Sk2ahHpNxaunlE,1328103
openwakeword/resources/models/hey_jarvis_v0.1.onnx,sha256=lKE8_mAHWxMvakcufkYugSPucIYbw_tYQ0pzcS7g0ss,1271370
openwakeword/resources/models/hey_marvin_v0.1.onnx,sha256=ttS3lN3y4dbynp9FhI4khY4u3Q2BCxTgwccN2pofy_A,857691
openwakeword/resources/models/hey_mycroft_v0.1.onnx,sha256=eFvfVlWGOuR1U7I3k6oQjHsBUtSCP3hptB8tLXZZEvw,503850
openwakeword/resources/models/melspectrogram.onnx,sha256=uisOD4t7h1NposicsTNg_1O6xDbyiVzO2fR5-mXrF28,1087958
openwakeword/resources/models/silero_vad.onnx,sha256=o16_Uv085fFGmyo2FY26dhvEe5c-ozgrMYbKFbH1ryg,1807522
openwakeword/resources/models/timer_v0.1.onnx,sha256=Nx5EU1RwopJIs7jxu7uvJSXIZBf9j3XGf88CrguWJt8,1742475
openwakeword/resources/models/weather_v0.1.onnx,sha256=hEHajnRomejZaVKNW61WUc3VYwecBZYniPd3UwQfYOc,1149158
openwakeword/utils.py,sha256=ZViPt3hFHjnowSdFBbmVrKes5MbK257e9PUNOS0jVQE,18548
openwakeword/vad.py,sha256=boIWrcH7tOIc_QQuXJwVT0fDz7UZY6fPE0Y7pCFfufE,5227

View File

@@ -0,0 +1,5 @@
Wheel-Version: 1.0
Generator: bdist_wheel (0.40.0)
Root-Is-Purelib: true
Tag: py3-none-any

View File

@@ -0,0 +1 @@
openwakeword