Microsoft Speech SDK (en)
 

Finding position of word in miliseconds


 
Finding position of word in miliseconds Post Reply
21.04.2008 05:35 Zodiac

Hi,
I'm writing an app which generates a .wav file from some text using
SAPI.
I need to sync some pictures with this text and present it to the user
as a slide show.
I need the pictures to appear when the relevant words are spoken.
Is there any way (a SAPI api) which will tell me where (duration in
seconds) a particular word would occur in the generated audio file.

I've looked at the 'word' and 'bookmark' events but I think they fire
correctly (timing) only when the audio is sent to the speakers,
setting the outputstream gives me inaccurate timings.

Any ideas?
Re: Finding position of word in miliseconds Post Reply
22.04.2008 05:15 jges

On Apr 21, 2:35 pm, Zodiac wrote:
> Hi,
> I'm writing an app which generates a .wav file from some text using
> SAPI.
> I need to sync some pictures with this text and present it to the user
> as a slide show.
> I need the pictures to appear when the relevant words are spoken.
> Is there any way (a SAPI api) which will tell me where (duration in
> seconds) a particular word would occur in the generated audio file.
>
> I've looked at the 'word' and 'bookmark' events but I think they fire
> correctly (timing) only when the audio is sent to the speakers,
> setting the outputstream gives me inaccurate timings.
>
> Any ideas?

Hi
I am not an expert in audio file formats, but I think that
what you need is related to that subject. You would have
to insert the events from SAPI into the wav file.
If you try the TTSApp distributed with the SAPI 5.1 SDK,
you can create a wav file that keeps these events in sync
with the audio. If you replay the fle with the TTSApp,
the micmouth works exactly the same as when playing
"life".

Gregorio
 
 
 Write Us|  Add to favorites
 
 
 ©2007 TERASENS GmbH. All rights reserved. Copyright Notice