PocketSphinx 5.0.3 is released!

PocketSphinx 5.0.3 is now released. This is a patch release which adds support for Python 3.12 and fixes a bug in the NGramModel wrapper class.

Download source from GitHub or PyPI Yes, these are not exactly the same file.

Install binaries for Python:

pip3 install pocketsphinx

Read the API documentation for C and for Python3

Pull requests and bug reports and such are welcome via https://github.com/cmusphinx/pocketsphinx.

PocketSphinx 5.0.1 is released!

PocketSphinx 5.0.1 is now released. This is a patch release which fixes a number of bugs and documentation errors in PocketSphinx 5.0.0. See the link above for more detail.

Download source from GitHub or PyPI. Yes, these are not exactly the same file.

Install binaries for Python:

pip3 install pocketsphinx

Read the API documentation for C and for Python3

Pull requests and bug reports and such are welcome via https://github.com/cmusphinx/pocketsphinx.

Live audio examples for Windows

Well, it turns out that people were using pocketsphinx_continuous, at least sort of. As I expected, they weren’t really using the actual pocketsphinx_continuous binary for anything useful other than recognizing from files. But, well, the code did claim to be example code, and so obviously people were using it … as example code.

…which is a perfectly sensible thing to do, and unfortunately in removing the audio support from PocketSphinx, it became considerably less useful as an example of how to do recognition from a microphone, particularly if the solution of running SoX in a subprocess isn’t an appealing one (as on Windows, for instance).

The sensible solution to this is to bring back something like pocketsphinx_continuous but explicitly in the form of example code. Adding cross-platform audio support to the library is absolutely something I will not do, but there are some other options, PortAudio foremost among them. So, here is an example of using PortAudio

That said, wrangling external dependencies on Windows is very annoying. To use the above example may require a certain amount of path and environment wrangling to get CMake/VSCode/Visual Studio to find PortAudio. For this reason there is also now an example of using the Win32 Waveform Audio API directly

Note that in both cases you may have quite bad results when running a “Debug” build, because Windows is very slow, and Visual C++ outputs extremely slow code when debugging is enabled.

These examples are included in the upcoming 5.0.1 release in the examples directory.

SphinxTrain 5.0.0 is released!

There is also an updated release of SphinxTrain, and the acoustic modeling tutorial has been updated to reflect the new and simplified usage. Still working on the other tutorials, sorry.

To quote the release notes, this release fixes a few long-standing bugs in SphinxTrain and makes the package (hopefully) easier to use. Among other things:

  • The dependency on SphinxBase is gone, because SphinxBase is gone
  • The dependency on Sphinx3 for VTLN and force-alignment is gone (sphinx3_align is included)
  • Multi-CPU training actually works, tested on up to 64 CPUs with LibriSpeech, much easier than setting up PBS on the Clown
  • The dependency on Visual Studio for buliding on Windows is gone (but please just use WSL, please)
  • The dependency on Autotools is gone (CMake ain’t great but it’s much less bad)
  • There is a Dockerfile now
  • There is “continuous integration” now (sort of)
  • The -remove_silence option has been disabled by default (unlike in PocketSphinx you can still turn it on if you really want to, it might save you a bit of time in training)
  • It is not necessary to install SphinxTrain system-wide to run training
  • G2P support has been updated for the most commonly installed version of OpenFST (do not try to use any other version, because C++, that’s why)
  • Pick Decoding Model Based on Context Dependence by @Mazyod
  • Output an error message when we cannot execute a tool by @cshung
  • Make an option in config for not folding case in phonemes by @lenzo-ka
  • Use consistent shebang for python by @acgrobman
  • Add -sox flag to sphinx_fe to convert files with SoX by @dhdaines
  • Update and enable G2P code by @dhdaines
  • Librispeech training template by @dhdaines

You can download it from the release page

Or clone it (shallowly) with git:

git clone --depth 1 --branch v5.0.0 https://github.com/cmusphinx/sphinxtrain

Pull requests and bug reports and such are welcome via https://github.com/cmusphinx/sphinxbase.