Slow inference on rtx 3090
Hello, S2 - pro is slow on my 3090, even when using the --compile flag,
I get between 4 and 5 it/s, is this the expected speed?
2026-03-17 02:31:40.105 | INFO | fish_speech.models.text2semantic.inference:generate_long:653 - Encoded prompt shape: torch.Size([11, 669])
1%|█ | 474/32098 [01:28<1:38:56, 5.33it/s]
2026-03-17 02:33:10.021 | INFO | fish_speech.models.text2semantic.inference:generate_long:682 - Compilation time: 89.96 seconds
2026-03-17 02:33:10.022 | INFO | fish_speech.models.text2semantic.inference:generate_long:690 - Batch 0: Generated 476 tokens in 89.96 seconds, 5.29 tokens/sec
2026-03-17 02:33:10.022 | INFO | fish_speech.models.text2semantic.inference:generate_long:694 - Bandwidth achieved: 24.14 GB/s
2026-03-17 02:33:10.023 | INFO | fish_speech.models.text2semantic.inference:generate_long:720 - GPU Memory used: 22.15 GB
=== Generation Complete! ===
Maybe I should try using Sage Attention or Flash Attention 2?
i am on Windows btw.
Thanks in advance.
it does have fa built in
Hello, S2 - pro is slow on my 3090, even when using the --compile flag,
I get between 4 and 5 it/s, is this the expected speed?2026-03-17 02:31:40.105 | INFO | fish_speech.models.text2semantic.inference:generate_long:653 - Encoded prompt shape: torch.Size([11, 669]) 1%|█ | 474/32098 [01:28<1:38:56, 5.33it/s] 2026-03-17 02:33:10.021 | INFO | fish_speech.models.text2semantic.inference:generate_long:682 - Compilation time: 89.96 seconds 2026-03-17 02:33:10.022 | INFO | fish_speech.models.text2semantic.inference:generate_long:690 - Batch 0: Generated 476 tokens in 89.96 seconds, 5.29 tokens/sec 2026-03-17 02:33:10.022 | INFO | fish_speech.models.text2semantic.inference:generate_long:694 - Bandwidth achieved: 24.14 GB/s 2026-03-17 02:33:10.023 | INFO | fish_speech.models.text2semantic.inference:generate_long:720 - GPU Memory used: 22.15 GB === Generation Complete! ===Maybe I should try using Sage Attention or Flash Attention 2?
i am on Windows btw.Thanks in advance.
Yeah, I believe the minimum requirements are a 5090 with at least 24GB VRAM
I had similar performance on my 3090; something on the order of 5x slower than realtime, if I recall.
Re: GSherman's comment - the 3090 does have 24 GB of VRAM. (And the 5090 has 32 GB, so - I'm not sure what "a 5090 with at least 24 GB of VRAM" refers to other than a 5090.)
i not sure bra
I get the same speed on my 3090