Thursday, August 2, 2018
Sphinx4 1 0 beta4 Is Released Whats next
Sphinx4 1 0 beta4 Is Released Whats next
So, almost according to schedule, sphinx4 was released yesterday. Check the notes at
http://cmusphinx.sourceforge.net/2010/03/sphinx4-1-0-beta-4-released/
Most notable improvements were already discussed here, so let me try to plan what the next release will be. Trying to be realistic in plans, I dont want to promise everything at once. Here is some attempt to forecast the next release notes
The biggest issue with sphinx4 is actually documentation. Current poll on CMUSphinx website clearly shows that. Personally I sometimes think that perfect documentation will not help if system doesnt work, but at least it will make product attractive and easy to use. My idea is that we need to have more developer-level documentation - tutorial, examples, task-oriented howtos. Its unlikely well be able to write something that is good enough as textbook on speech technologies. But we need to prove the point that its possible to build ASR system without understanding who is Welch.
On the code side, we face a biggest challenge since sphinx4 was designed. We need to move to the multipass system. Its not just about rescoring, its about plugging diarization framework from LIUM, its also about making sphinx4 suitable for both batch and live applications. Thats the serious issue.
The reason is that currently sphinx4 architecture is flow-oriented. Its built like a single pipe of components each passing audio to other. This is good for live applications, but not so good for batch ones. You get troubles when you need to split pipe or merge it later. In batch application one could have a huge benefit from looking on recording as a whole and returning to recording multiple times. For example, you could estimate noise level properly and just cleanup audio on the second pass. Such multipass decoding doesnt well fit into pipe paradigm. On the other side, changing it to purely batch will create issues for live applications.
So we are in trouble. We have to invent some combined scheme probably and create a hybrid of pipe and batch approaches. I was thinking about knowledge base scheme when information about stream is stored in some database as processing goes. Database cleanup policies could emulate both pipe (when database is immediately cleaned) and batch approaches (when database is kept even over sessions). Festival utterances remind me such data processing scheme between. Anyway, this idea is not finalized yet.
We also expect to see a lot of movement from CMUSphinx Workshop in Dallas and in Google Summer of code participation. I hope issues described above and some more interesting issuses will be resolved till next release in August. Lets discuss the rest then!