[ASR] Kaldi๋ž€?

2025. 3. 17. 13:26ยท๐Ÿค– AI
728x90
728x90

Kaldi๋Š” ์ž๋™ ์Œ์„ฑ ์ธ์‹(ASR, Automatic Speech Recognition) ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜, ๋ชจ๋ธ, ํˆด์„ ์ œ๊ณตํ•˜๋Š” ์˜คํ”ˆ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ํˆดํ‚ท์ด๋‹ค.

 

Kaldi ASR

Kaldi's code lives at https://github.com/kaldi-asr/kaldi. To checkout (i.e. clone in the git terminology) the most recent changes, you can use this command git clone https://github.com/kaldi-asr/kaldi or follow the github link and click "Download in zip" o

kaldi-asr.org

 

 

Kaldi๋Š” ๊ณ ์„ฑ๋Šฅ, ์œ ์—ฐ์„ฑ ๋†’์€ ์Œ์„ฑ ์ธ์‹ ๊ฐœ๋ฐœ ๋„๊ตฌ๋กœ, ๋ณต์žกํ•œ ASR ์‹œ์Šคํ…œ๋ถ€ํ„ฐ ์ตœ์‹  ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๊นŒ์ง€ ์—ฐ๊ตฌ ๋ฐ ์‚ฐ์—… ์ ์šฉ์ด ๊ฐ€๋Šฅํ•œ ๊ฐ•๋ ฅํ•œ ์˜คํ”ˆ์†Œ์Šค ํ”Œ๋žซํผ์ด๋‹ค.

 

 

Kaldi์˜ ํŠน์ง•

  • ์ตœ์‹  ์Œ์„ฑ ์ธ์‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ง€์› : ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜, HMM-DNN ํ•˜์ด๋ธŒ๋ฆฌ๋“œ, TDNN, LSTM, Transformer ๋“ฑ
  • Feature Extraction : MFCC, fbank, pitch ๋“ฑ ๋‹ค์–‘ํ•œ ์Œํ–ฅ ํŠน์ง• ์ถ”์ถœ ์ง€์›
  • Training & Decoding : GMM, DNN, TDNN, LSTM ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์˜ ํ•™์Šต ๋ฐ ๋””์ฝ”๋”ฉ
  • ์Œํ–ฅ ๋ชจ๋ธ (Acoustic Model) : ๋‹ค์–‘ํ•œ ๋ชจ๋ธ๋กœ ์Œ์„ฑ์„ ๋ฒกํ„ฐํ™”ํ•˜์—ฌ ์ธ์‹ ๊ฐ€๋Šฅ
  • ์–ธ์–ด ๋ชจ๋ธ (Language Model) : n-gram, RNNLM ์—ฐ๋™ ๊ฐ€๋Šฅ
  • ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ๋””์ฝ”๋”ฉ (WFST) : Weighted Finite-State Transducer๋ฅผ ์ด์šฉํ•œ ๊ฐ•๋ ฅํ•œ ๊ฒ€์ƒ‰๊ณผ ๋””์ฝ”๋”ฉ
  • ๋ฉ€ํ‹ฐ์Šคํ”ผ์ปค/๋‹ค๊ตญ์–ด ์ง€์› : ๋‹ค์–‘ํ•œ ์–ธ์–ด ๋ฐ ๋ฐœํ™”์ž์— ๋Œ€์‘ ๊ฐ€๋Šฅํ•œ ์‹œ์Šคํ…œ ๊ตฌ์ถ•
  • End-to-End ์‹œ์Šคํ…œ ์—ฐ๋™ : ์ตœ๊ทผ Transformer, Attention ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๋„ ์—ฐ๊ตฌ ์ ์šฉ ๊ฐ€๋Šฅ

 

Kaldi์˜ ๊ตฌ์„ฑ (Architecture)

๊ตฌ์„ฑ ์š”์†Œ ์—ญํ• 
Feature Extraction ์Œ์„ฑ ์‹ ํ˜ธ →  ํŠน์„ฑ(MFCC, filterbank ๋“ฑ) ๋ณ€ํ™˜
Acoustic Model  ์Œ์„ฑ ํŠน์ง• → ์Œ์†Œ(phoneme), ๋‹จ์–ด ํ™•๋ฅ  ์˜ˆ์ธก
Lexicon ๋ฐœ์Œ ์‚ฌ์ „, ๋‹จ์–ด → ๋ฐœ์Œ์œผ๋กœ ๋งคํ•‘
Language Model  ๋ฌธ์žฅ ๊ตฌ์„ฑ์˜ ํ™•๋ฅ  ๋ชจ๋ธ (n-gram, RNN ๋“ฑ)
Decoder  ๊ทธ๋ž˜ํ”„(WFST) ๊ธฐ๋ฐ˜์œผ๋กœ ์Œ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฌธ์žฅ์œผ๋กœ ๋ณ€ํ™˜

 

 

Kaldi์˜ ํ•™์Šต ๋ฐ ์ธ์‹ ํŒŒ์ดํ”„๋ผ์ธ

[์Œ์„ฑ ๋ฐ์ดํ„ฐ] → [Feature Extraction] → [Acoustic Model Training] 
                                        ↓
                                [Decoding Graph (WFST)] ← [Language Model + Lexicon]
                                        ↓
                                [Decoding (์Œ์„ฑ → ํ…์ŠคํŠธ)]

 

์˜ˆ์‹œ ์›Œํฌํ”Œ๋กœ์šฐ

  1. ๋ฐ์ดํ„ฐ ์ค€๋น„ (Data preparation)
  2. ํŠน์ง• ์ถ”์ถœ (Feature extraction)
  3. ์Œํ–ฅ ๋ชจ๋ธ ํ•™์Šต (Acoustic model training)
  4. ๋””์ฝ”๋”ฉ ๊ทธ๋ž˜ํ”„ ์ค€๋น„ (Graph preparation)
  5. ๋””์ฝ”๋”ฉ (Decoding)
  6. ๊ฒฐ๊ณผ ํ‰๊ฐ€ (Evaluation)

 

Kaldi์˜ ์žฅ์ ๊ณผ ๋‹จ์ 

์žฅ์  ๋‹จ์ 
๋งค์šฐ ๊ฐ•๋ ฅํ•˜๊ณ  ์œ ์—ฐํ•œ ์Œ์„ฑ ์ธ์‹ ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ ๊ฐ€๋Šฅ ๋ณต์žกํ•œ ์„ค์ • ๋ฐ ๋†’์€ ํ•™์Šต ๋‚œ์ด๋„
๋‹ค์–‘ํ•œ ์ตœ์‹  ์Œํ–ฅ ๋ชจ๋ธ๊ณผ ๋””์ฝ”๋” ์ง€์› ๋”ฅ๋Ÿฌ๋‹/End-to-End ๋ชจ๋ธ์€ ์ง์ ‘ ๊ตฌ์ถ• ํ•„์š”
์—ฐ๊ตฌ ๋ฐ ์‚ฐ์—… ๋ชจ๋‘ ์‚ฌ์šฉ ๊ฐ€๋Šฅ ์„ค์น˜ ๋ฐ ๋นŒ๋“œ ๋ณต์žก (ํŠนํžˆ C++ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์˜์กด)
๋‹ค์–‘ํ•œ ์–ธ์–ด, ๋ฉ€ํ‹ฐ ์Šคํ”ผ์ปค ์ง€์› GUI ๋ถ€์กฑ, ๋ช…๋ น์ค„ ์ค‘์‹ฌ

 

 

Kaldi ์„ค์น˜

Kaldi Github์—์„œ ํ”„๋กœ์ ํŠธ๋ฅผ Cloneํ•œ ํ›„ ํ™˜๊ฒฝ์„ ์„ค์ •ํ•ด ์ค€๋‹ค.

  • ์‚ฌ์ „ ๊ตฌ์ถ•๋œ ๋ฐ์ดํ„ฐ์…‹ ์˜ˆ์ œ (LibriSpeech, TED-LIUM ๋“ฑ) ์ œ๊ณต
  • Python Wrapper (PyKaldi) ์‚ฌ์šฉ ์‹œ Python์—์„œ๋„ ํ™œ์šฉ ๊ฐ€๋Šฅ

 

 

GitHub - kaldi-asr/kaldi: kaldi-asr/kaldi is the official location of the Kaldi project.

kaldi-asr/kaldi is the official location of the Kaldi project. - kaldi-asr/kaldi

github.com

git clone https://github.com/kaldi-asr/kaldi.git
cd kaldi/tools
make
cd ../src
./configure
make -j $(nproc)

 

 

Kaldi ํ™œ์šฉ ์˜ˆ์ œ

# LibriSpeech ๋ฐ์ดํ„ฐ๋กœ ์Œ์„ฑ ์ธ์‹ ๋ชจ๋ธ ํ•™์Šต ์˜ˆ์ œ
cd kaldi/egs/librispeech/s5
./run.sh  # ์ „์ฒด ์›Œํฌํ”Œ๋กœ์šฐ ์‹คํ–‰ (ํŠน์ง• ์ถ”์ถœ, ํ•™์Šต, ๋””์ฝ”๋”ฉ, ํ‰๊ฐ€)
  1. Kaldi ๊ธฐ๋ฐ˜ STT๋ฅผ TCP๋กœ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š”ํด๋ผ์ด์–ธํŠธ๊ฐ€ TCP๋กœ ์Œ์„ฑ ์ŠคํŠธ๋ฆผ ์ „์†ก
  2. ์„œ๋ฒ„์—์„œ Kaldi Online2 ๋ฐ์ฝ”๋”๋ฅผ ํ†ตํ•ด ์‹ค์‹œ๊ฐ„ ์Œ์„ฑ ์ธ์‹ (STT) ์ˆ˜ํ–‰
  3. ์ธ์‹๋œ ํ…์ŠคํŠธ๋ฅผ ํด๋ผ์ด์–ธํŠธ๋กœ TCP ์‘๋‹ต
728x90
320x100
์ €์ž‘์žํ‘œ์‹œ ๋น„์˜๋ฆฌ ๋ณ€๊ฒฝ๊ธˆ์ง€ (์ƒˆ์ฐฝ์—ด๋ฆผ)

'๐Ÿค– AI' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[STT/Kaldi] ๋ฐœ์Œ์‚ฌ์ „(Lexicon), ์–ธ์–ด๋ชจ๋ธ(LM)์ด๋ž€?  (0) 2025.06.11
A Survey of Embodied AI: From Simulators to Research Tasks ๋…ผ๋ฌธ ์ •๋ฆฌ - (2)  (0) 2022.07.13
A Survey of Embodied AI: From Simulators to Research Tasks ๋…ผ๋ฌธ ์ •๋ฆฌ - (1)  (0) 2022.07.12
'๐Ÿค– AI' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [STT/Kaldi] ๋ฐœ์Œ์‚ฌ์ „(Lexicon), ์–ธ์–ด๋ชจ๋ธ(LM)์ด๋ž€?
  • A Survey of Embodied AI: From Simulators to Research Tasks ๋…ผ๋ฌธ ์ •๋ฆฌ - (2)
  • A Survey of Embodied AI: From Simulators to Research Tasks ๋…ผ๋ฌธ ์ •๋ฆฌ - (1)
mxnxeonx
mxnxeonx
"์•„, ์ด๊ฑฐ ๋ญ์˜€๋”๋ผ"๋ฅผ ํ•˜์ง€ ์•Š๊ธฐ์œ„ํ•œ ์ผ๊ธฐ์žฅ.
  • mxnxeonx
    MJ's Development Diary
    mxnxeonx
  • ์ „์ฒด
    ์˜ค๋Š˜
    ์–ด์ œ
    • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (154)
      • ๐Ÿ’ป Language (43)
        • Java : ์ž๋ฐ” (18)
        • Python : ํŒŒ์ด์ฌ (9)
        • ROS : ๋กœ๋ด‡์‹œ์Šคํ…œ (9)
        • Android : ์•ˆ๋“œ๋กœ์ด๋“œ (4)
        • JavaScript : ์ž๋ฐ”์Šคํฌ๋ฆฝํŠธ (2)
      • ๐ŸŒ Environment (19)
        • IDE : ํ†ตํ•ฉ๊ฐœ๋ฐœํ™˜๊ฒฝ (9)
        • Virtual : ๊ฐ€์ƒํ™˜๊ฒฝ (10)
      • โš™ Framework (12)
        • Vue-๋ทฐ (3)
        • Spring-์Šคํ”„๋ง (7)
      • ๐Ÿ’พ DataBase (18)
      • ๐ŸŒŒ OS (36)
        • Linux-๋ฆฌ๋ˆ…์Šค (36)
      • ๐Ÿ’ฌ CI · CD (7)
        • Git : ๊นƒ (7)
      • ๐Ÿ“ƒ ETC (3)
      • ๐Ÿค– AI (4)
  • ๋งํฌ

    • GitHub
  • ์ธ๊ธฐ ๊ธ€

  • ์ตœ๊ทผ ๋Œ“๊ธ€

  • ์ตœ๊ทผ ๊ธ€

  • hELLOยท Designed By์ •์ƒ์šฐ.v4.10.3
mxnxeonx
[ASR] Kaldi๋ž€?
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”