Not working

#6
by evanjilina - opened

Did anyone get this to work? It isnt for me even for the examples mentioned

videosdk.live org

@evanjilina What issue are you facing?

Are you running the quick start example mentioned?

Hi

I tried a simple example of how a real life conversation would be where turn taking would be useful with the following example. NOTE: this is the only part changed from the code given in the website.

sentences = [
"hey how are you doing",
"hey how are you doing today",
"hey how are you doing today i",
"hey how are you doing today i was",
"hey how are you doing today i was thinking",
"hey how are you doing today i was thinking we",
"hey how are you doing today i was thinking we could",
"hey how are you doing today i was thinking we could grab",
"hey how are you doing today i was thinking we could grab lunch"
]

The model just predicts EOT everywhere

$ python vanilla_namo.py
Disabling PyTorch because PyTorch >= 2.1 is required but found 1.13.0+cu116
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Loading model from repo: videosdk-live/Namo-Turn-Detector-v1-Multilingual
✅ Model and tokenizer loaded successfully.
'hey how are you doing' -> End of Turn (confidence: 0.703)

'hey how are you doing today' -> End of Turn (confidence: 0.799)

'hey how are you doing today i' -> End of Turn (confidence: 0.745)

'hey how are you doing today i was' -> End of Turn (confidence: 0.650)

'hey how are you doing today i was thinking' -> End of Turn (confidence: 0.609)

'hey how are you doing today i was thinking we' -> End of Turn (confidence: 0.700)

'hey how are you doing today i was thinking we could' -> End of Turn (confidence: 0.724)

'hey how are you doing today i was thinking we could grab' -> End of Turn (confidence: 0.725)

'hey how are you doing today i was thinking we could grab lunch' -> End of Turn (confidence: 0.699)

In fact I tried just adding one "obvious" word to the examples provided and it didnt work. Something like the sentences or turns very rarely end with because is also not being captured

sentences = [
"They're often made with oil or sugar because", # Expected: End of Turn
"I think the next logical step is to discuss", # Expected: Not End of Turn
"What are you doing tonight for", # Expected: End of Turn
"The Revenue Act of 1862 adopted rates that increased with time", # Expected: Not End of Turn
]

✅ Model and tokenizer loaded successfully.
'They're often made with oil or sugar because' -> End of Turn (confidence: 0.572)

'I think the next logical step is to discuss' -> Not End of Turn (confidence: 0.693)

'What are you doing tonight for' -> End of Turn (confidence: 0.679)

'The Revenue Act of 1862 adopted rates that increased with time' -> Not End of Turn (confidence: 0.763)

videosdk.live org

@evanjilina I would suggest you to use English specific model, if english is your only use case. Also to get the best experience of the model try setting up thresholds so that false positives get handled well.

To try with best case scenarios, i would recommend use it with agent SDK. Here is a quick start example for the same: https://github.com/videosdk-live/agents-quickstart/blob/main/Namo%20Turn%20Detector/main.py

If you want you can look at the implementation with threshold here: https://github.com/videosdk-live/agents/blob/main/videosdk-plugins/videosdk-plugins-turn-detector/videosdk/plugins/turn_detector/turn_detector_v3.py#L174

Sign up or log in to comment