11 month 15 day , The fifth session Kaldi Technical exchange meeting held in Beijing .Kaldi The father of Daniel Povey
Doctor's first visit , With the major Internet companies from Beijing , Well known university developers communicate with the next generation Kaldi Future development of community .
Join Xiaomi for a year ,Daniel Povey Designed and developed a new generation Kaldi. new generation Kaldi
It's divided into three parts , Including the core algorithm part , Training data preparation , Sample script collection section .
among ,Lhotse（ Training data preparation ） Will replace the previous Kaldi All data preparation related work in , Manipulate metadata for various audio and text .Lhotse except Kaldi
itself , It is also suitable for other applications . and Lhotse pure Python code , Easy to use .
Icefall（ Sample script collection section ） Will replace Kaldi
Sample script collection in , And become a separate sub project . The reason is to separate the sample script set from the core algorithm , Yes, considering that sample scripts can be very large , And it's constantly changing .
It is reported that , new generation Kaldi The core part of “k2”.k2 It makes it easy for developers to PyTorch/TensorFlow Various speech recognition algorithms are implemented in , such as
CTC,LF—MMI,RNN—T,2nd—pass Language model, etc , Eliminating the mismatch between training and decoding in previous speech recognition algorithms .
meanwhile , adopt k2 It can be very easy to implement （ Confidence gradually increased ） Multi round decoding process , It was hard to do in the past . Compared with some other speech recognition library advantages ,k2
Faster , Strong versatility （ It can be used to model a variety of speech recognition algorithms ）.
Daniel Povey Doctor revealed ,k2 The core code is complete . about 41000 Line code （ Mainly C++）, Just released this week 0.1 edition .
Data display ,Daniel Povey He is currently the chief voice scientist of Xiaomi group , Developed and maintained by him Kaldi A variety of speech recognition models are integrated , Recognized as the cornerstone of the industry's speech recognition framework .