11 month 15 day , The fifth session Kaldi Technical exchange meeting held in Beijing .Kaldi The father of Daniel Povey
Doctor's first visit , With the major Internet companies from Beijing , Well known university developers communicate with the next generation Kaldi Future development of community .

   Join Xiaomi for a year ,Daniel Povey Designed and developed a new generation Kaldi. new generation Kaldi
It's divided into three parts , Including the core algorithm part , Training data preparation , Sample script collection section .

   among ,Lhotse( Training data preparation ) Will replace the previous Kaldi All data preparation related work in , Manipulate metadata for various audio and text .Lhotse except Kaldi
itself , It is also suitable for other applications . and Lhotse pure Python code , Easy to use .

  Icefall( Sample script collection section ) Will replace Kaldi
Sample script collection in , And become a separate sub project . The reason is to separate the sample script set from the core algorithm , Yes, considering that sample scripts can be very large , And it's constantly changing .

   It is reported that , new generation Kaldi The core part of “k2”.k2 It makes it easy for developers to PyTorch/TensorFlow Various speech recognition algorithms are implemented in , such as
CTC,LF—MMI,RNN—T,2nd—pass Language model, etc , Eliminating the mismatch between training and decoding in previous speech recognition algorithms .

   meanwhile , adopt k2 It can be very easy to implement ( Confidence gradually increased ) Multi round decoding process , It was hard to do in the past . Compared with some other speech recognition library advantages ,k2
Faster , Strong versatility ( It can be used to model a variety of speech recognition algorithms ).

  Daniel Povey Doctor revealed ,k2 The core code is complete . about 41000 Line code ( Mainly C++), Just released this week 0.1 edition .

   Data display ,Daniel Povey He is currently the chief voice scientist of Xiaomi group , Developed and maintained by him Kaldi A variety of speech recognition models are integrated , Recognized as the cornerstone of the industry's speech recognition framework .