[Table of Contents]


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AMIA-L] Speech To Text Technology



Captioning is done manually with a real person transcribing.

The radio transcript project I mentioned wants to be automated, and the
challenge is not just different voices, but also different audio levels
and sound qualities (remote broadcasts, ambiant background noises,
telephone hooik-ups, etc.) 

So the challenge is even greater to 'train' a system.

Nan Rubin

-----Original Message-----
From: Association of Moving Image Archivists [mailto:AMIA-L@xxxxxxxxxxx]
On Behalf Of Albert Steg
Sent: Friday, November 30, 2007 12:23 PM
To: AMIA-L@xxxxxxxxxxx
Subject: Re: [AMIA-L] Speech To Text Technology


Rick, which software prodcut was that?  Might it have been Naturally  
Speaking?

http://www.nuance.com/naturallyspeaking/business/

It's not surprising that voice recognition will be a much tougher nut  
to crack than text recognition -- while it's tough enough training a  
computer to see that 'a' and 'A' are the same letter, the mind  
boggles at what it would take for an automated system to transcribe a  
conversation between 2 or 3 people with differing accents and voice  
timbres -- without ever having been able to benchmark them against a  
known text.

Do captioned broadcasts actually pull this off?  I always figured  
they had a live person doing a sort of stenography job.

Albert

On Nov 30, 2007, at 11:47 AM, Rick Prelinger wrote:

> I tried it.  The off-the-shelf software is said to work well when  
> it gets input from a single speaker who has spent the time going  
> through the routine to train the program to recognize his or her  
> voice.  This involves reading a specific script into a mike for  
> 15-20 minutes.  My trials showed that I could train it to recognize  
> my own voice and thereby automate notetaking or at best the  
> production of a first-draft text, but even a sloppy typist like me  
> can produce cleaner copy.  When I presented it with different  
> voices, it produced gibberish.  I had hoped to use it to convert  
> voices heard on the radio to text, but that was beyond the scope of  
> what you can source in the non-classified world.
>
> Rick
>
>
> -- 
>
> Rick Prelinger
> Prelinger Archives    http://www.prelinger.com
> P.O. Box 590622, San Francisco, Calif. 94159-0622 USA
> footage@xxxxxxxxx



[Subject index] [Index for current month] [Table of Contents]






 [CoOL] [Search all CoOL documents]
This page last changed: November 16, 2008