KnowBrainer Speech Recognition
Decrease font size
Increase font size
Topic Title: Determine How Fast and Accurate Dragon Really Is?
Topic Summary: You can use DragonBench if you really want to know.
Created On: 06/08/2020 03:57 PM
Status: Post and Reply
Linear : Threading : Single : Branch
 Determine How Fast and Accurate Dragon Really Is?   - R. Wilke - 06/08/2020 03:57 PM  
 Determine How Fast and Accurate Dragon Really Is?   - Ag - 06/08/2020 04:16 PM  
 Determine How Fast and Accurate Dragon Really Is?   - R. Wilke - 06/08/2020 04:31 PM  
Keyword
 06/08/2020 03:57 PM
User is offline View Users Profile Print this message

Author Icon
R. Wilke
Top-Tier Member

Posts: 7099
Joined: 03/04/2007

For those looking for a Dragon benchmarking utility, here is the new release of 


DragonBench


Feel free to ask if you have any questions.

 

 



-------------------------



No need to buy if all you want to do is try ...

DragonCapture (30 Day Trial)
DragonCapture Manual



 06/08/2020 04:16 PM
User is offline View Users Profile Print this message

Author Icon
Ag
Top-Tier Member

Posts: 402
Joined: 07/08/2019

Q: what does "Average Real-Time Factor" Mean?


In a previous post you said that DragonBench can measure latency. By which I mean the time from me saying something and/or being received by the soundcard, to a command being started or text being inserted. I don't see it on the screen above. Perhaps it in a different screen? In any case, how do/would you measure such latency?\

 

---

 

I suppose that DragonBench knows (a) the time it starts the  sound file to be transcribed "playing", (b) the duration of the clip, (c)  the time the 1st  character is  received by DragonBench, and (d)  the time the last character is received.

 

(c) - (a) is an  actual latency,  although probably involving startup overhead.

 

(d)-((a)+(b))  is an actual latency,  I think with less startup overhead  mixed in.

 

I don't think you're going to have latencies within the actual clip,  i.e. see a distribution of latencies depending on  what was said. (Hmm,  I suppose you could try starting and stopping the clip different places).

 

... ah, I see that the "detailed results" section has what I am guessing is the interarrival time of different words.   No, that can't be true - if the scores like 963/to, 9389/dtermine were milliseconds,  that would be way too slow. They must be quality or confidence scores.

 

Anyway, if you did have arrival times for inter-arrival times we could look for latency variation within the speech clip that is being transcribed.  of course, you have to know the true segmentation, which might be onerous. Although it might be possible to compare the fast versus slow but accurate  timings, to  get  an automated measurement.

 

---

 

AFAICT  this is only  dictation latency, not command processing latency, right?  So it's not testing the interface between Dragon and KnowBrainer,  although it is testing the interface between Dragon and DragonBench.  which may or may not be comparable. Probably not.

 

===

 

Anyway, latency is only one metric. Accuracy is probably more important.   I will probably purchase DragonBench   sometime soon,  trying to synchronize that purchase with the arrival of a new machine that I actually want to  evaluate before I decide to keep  or return.

 

 



-------------------------

DPG15.6 (also DPI 15.3) + KB, Sennheiser MB Pro 1 UC ML, BTD 800 dongle, Windows 10 Pro, MS Surface Book 3, Intel Core i7-1065G7 CPU @ 1.3/1.5GHz (4 cores, 8 logical, GPU=NVIDIA Quadro RTX 3000 with Max-Q Design.



 06/08/2020 04:31 PM
User is offline View Users Profile Print this message

Author Icon
R. Wilke
Top-Tier Member

Posts: 7099
Joined: 03/04/2007

In ASR research, "real time factor" is used as a common term/method to measure the speed of the recogniser. If you look it up in Google, you will get numerous hits. Just to quote one of them which I just found searching for it:

How is the speed of a speech recognition system measured?

Real Time Factor is a very natural measure of a speech decoding speed that expresses how much the recogniser decodes slower than the user speaks. The latency measures the time between the end of the user speech and the time when a decoder returns the hypothesis, which is the most important speed measure for ASR.

Real-time Factor (RTF): the ratio of the speech recognition response time to the utterance duration. Usually both mean RTF (average over all utterances), and 90th percentile RTF is examined in efficiency analysis.


https://devopedia.org/speech-recognition

DragonBench has been designed to automatically measure the RTF with regards to dictation in real-time, not commands however, and it requires either dictating into the text box on the UI, or providing an audio file for transcribing into the UI, thus making it reproducible.

In the past, I had also designed a similar application to measure command recognition latency, which, however, again required providing recorded input.

Saying all this because I know you would expect it to measure all of your various inputs on-the-fly, whether they are dictation or commands, and from anywhere.



-------------------------



No need to buy if all you want to do is try ...

DragonCapture (30 Day Trial)
DragonCapture Manual



Statistics
31670 users are registered to the KnowBrainer Speech Recognition forum.
There are currently 1 users logged in.
The most users ever online was 4473 on 07/17/2020 at 10:00 AM.
There are currently 372 guests browsing this forum, which makes a total of 373 users using this forum.

FuseTalk Standard Edition v4.0 - © 1999-2020 FuseTalk™ Inc. All rights reserved.