[EE]: Speech Recognition with Neural Network

Discussion:

Dimitris Kapousouz

2004-04-22 00:18:57 UTC

Hello Dear list

I am working on a (hobby) project, trying to implement some kind of system
that will be able to accept orders and act accordingly depending on the
order. To be more specific i want a -robotic- arm to move
up-down-left-right, when i say these (up down left right) words. I decided
to build this system with use of neural networks (i have read that it gives
till now the most promisng results).

I thought that the network should have be feedforward network trained with
back propagation (very common for pattern recognition), but i am stuck in
some thing. The theory of NN says that every neuron of the network should
have a transfer function (sigmoid, pulse, logsig, linear etc.). I don't know
which one is the best for my application. Will i understand this by
experimenting or is there some standard guiding rule?

Another question that arises is where to train the network? I know about
Matlab and fuzzytech, but is there any other software that handles NN that i
should for sure check out? Is there any that can also generate code for PICs
or AVRs ? (My final wish is to get all this system in some MCU but i am not
sure yet if there is any PIC with enough RAM and speed for such an
application maybe dsPIC?)

And a last question, which sapmling rate is reccomended for speech
applications?

Thank you in advance, i will appreciate any comments

Dimitris Kapousouz

_________________________________________________________________
STOP MORE SPAM with the new MSN 8 and get 2 months FREE*
http://join.msn.com/?page=features/junkmail

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

Robert B.

2004-04-22 03:37:08 UTC

Permalink

I'm gonna have to question your choice of processor for this project. In
the robotics lab here they have a neural network, but it's powered by a bank
of pentium IV's. It's obviously a little more complicated, but as I
understand it the neural network tends to eat up CPU time like no other. My
cell phone has speech recognition software that works quite well. I'm not
sure how it works, but first you record your voice, then when you repeat the
command it recognizes it and does the specified task. It also has a number
of pre-programmed functions, though any sort of non-american accent tricks
them. This leads me to believe that it is probably some sort of
hashing/matching algorithm instead of a complicated neural network. If all
you care about is getting a robotic hand to move around on command, you
might consider a simpler solution.

PICS run fast enough to sample audio to at a decent rate, so that's where
I'd start, with developing a pic-specific algorithm of storing sounds.

It's just my opinion though! :-D

----- Original Message -----
From: "Dimitris Kapousouz" <dkap__-deqWdTs+***@public.gmane.org>
To: <PICLIST-HvpawkHQRZp23lw5o+***@public.gmane.org>
Sent: Wednesday, April 21, 2004 7:18 PM
Subject: [EE]: Speech Recognition with Neural Network

Post by Dimitris Kapousouz
Hello Dear list
I am working on a (hobby) project, trying to implement some kind of system
that will be able to accept orders and act accordingly depending on the
order. To be more specific i want a -robotic- arm to move
up-down-left-right, when i say these (up down left right) words. I decided
to build this system with use of neural networks (i have read that it gives
till now the most promisng results).
I thought that the network should have be feedforward network trained with
back propagation (very common for pattern recognition), but i am stuck in
some thing. The theory of NN says that every neuron of the network should
have a transfer function (sigmoid, pulse, logsig, linear etc.). I don't know
which one is the best for my application. Will i understand this by
experimenting or is there some standard guiding rule?
Another question that arises is where to train the network? I know about
Matlab and fuzzytech, but is there any other software that handles NN that i
should for sure check out? Is there any that can also generate code for PICs
or AVRs ? (My final wish is to get all this system in some MCU but i am not
sure yet if there is any PIC with enough RAM and speed for such an
application maybe dsPIC?)
And a last question, which sapmling rate is reccomended for speech
applications?
Thank you in advance, i will appreciate any comments
Dimitris Kapousouz
_________________________________________________________________
STOP MORE SPAM with the new MSN 8 and get 2 months FREE*
http://join.msn.com/?page=features/junkmail
--
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

Russell McMahon

2004-04-22 06:38:26 UTC

Permalink

Post by Dimitris Kapousouz
I am working on a (hobby) project, trying to implement some kind of system
that will be able to accept orders and act accordingly depending on the
order. To be more specific i want a -robotic- arm to move
up-down-left-right, when i say these (up down left right) words. I decided
to build this system with use of neural networks (i have read that it

gives

Post by Dimitris Kapousouz
till now the most promisng results).

If it's end results that you want, then I suggest that the neural network
that is liable to give the best results is the one already in your head.
Limited vocabulary, single word, speaker dependent speech recognition is
well within the capability of current relatively bottom end processors with
a little hardware - especially if you are prepared to tweak the code and
parameters to suit your application. I saw an MC6802 based application at
least up to your requirement in a magazine some 15 years ago. AFAIR they
amplified the speech signal to a square wave and sampled it. Memory fades
but they MAY have had 3 opamp bandpass amplifiers on the input - but that is
probably a memory from another commercial system available in about 1980
that yielded OK results.

If you want easiest results then you can buy single IC based systems which
allow about 20 words of recognition. These have about ?4 key words which can
then be used to trigger selected secondary words. eg OPERATE Brakes, OPERATE
motor, ... . A few tens of $US AFAIR.

A number of commercial ICs exist

eg http://www.futurlec.com/News/Philips/SpeechChip.html

RM

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

Dimitris Kapousouz

2004-04-22 09:12:43 UTC

Permalink

As i said in my first email, I am not sure yet if this application i was
thinking about, will be finally get into an MCU. Therefore i haven't thought
yet for any possible processor. (I was hoping some PIC of 18F series can
handle this). Of course the most time and CPU consuming part is the training
which will be held in a PC. Only the weights of the NN are going to be put
in some memory, for the calculations. (If i ever manage to build this NN)

The network and application do not have to recognize all the human
vocabulary, and occupy banks of Pentium IV... A few words (four-five words)
are enough. I was having in my mind the way mobile phones work but as some
members of the list said it may be not a NN but some other kind of
algorithm.

Anyway, thanks

Subject: Re: Speech Recognition with Neural Network
Date: Wed, 21 Apr 2004 22:37:08 -0500
I'm gonna have to question your choice of processor for this project. In
the robotics lab here they have a neural network, but it's powered by a
bank
of pentium IV's. It's obviously a little more complicated, but as I
understand it the neural network tends to eat up CPU time like no other.
My
cell phone has speech recognition software that works quite well. I'm not
sure how it works, but first you record your voice, then when you repeat
the
command it recognizes it and does the specified task. It also has a number
of pre-programmed functions, though any sort of non-american accent tricks
them. This leads me to believe that it is probably some sort of
hashing/matching algorithm instead of a complicated neural network. If all
you care about is getting a robotic hand to move around on command, you
might consider a simpler solution.
PICS run fast enough to sample audio to at a decent rate, so that's where
I'd start, with developing a pic-specific algorithm of storing sounds.
It's just my opinion though! :-D
----- Original Message -----
Sent: Wednesday, April 21, 2004 7:18 PM
Subject: [EE]: Speech Recognition with Neural Network

Post by Dimitris Kapousouz
Hello Dear list
I am working on a (hobby) project, trying to implement some kind of

system

Post by Dimitris Kapousouz
that will be able to accept orders and act accordingly depending on the
order. To be more specific i want a -robotic- arm to move
up-down-left-right, when i say these (up down left right) words. I

decided

Post by Dimitris Kapousouz
to build this system with use of neural networks (i have read that it

gives

Post by Dimitris Kapousouz
till now the most promisng results).
I thought that the network should have be feedforward network trained

with

Post by Dimitris Kapousouz
back propagation (very common for pattern recognition), but i am stuck

Post by Dimitris Kapousouz
some thing. The theory of NN says that every neuron of the network

should

Post by Dimitris Kapousouz
have a transfer function (sigmoid, pulse, logsig, linear etc.). I don't

know

Post by Dimitris Kapousouz
which one is the best for my application. Will i understand this by
experimenting or is there some standard guiding rule?
Another question that arises is where to train the network? I know about
Matlab and fuzzytech, but is there any other software that handles NN

that
i

Post by Dimitris Kapousouz
should for sure check out? Is there any that can also generate code for

PICs

Post by Dimitris Kapousouz
or AVRs ? (My final wish is to get all this system in some MCU but i am

not

Post by Dimitris Kapousouz
sure yet if there is any PIC with enough RAM and speed for such an
application maybe dsPIC?)
And a last question, which sapmling rate is reccomended for speech
applications?
Thank you in advance, i will appreciate any comments
Dimitris Kapousouz
_________________________________________________________________
STOP MORE SPAM with the new MSN 8 and get 2 months FREE*
http://join.msn.com/?page=features/junkmail
--
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

_________________________________________________________________
Protect your PC - get McAfee.com VirusScan Online
http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

Alan B. Pearce

2004-04-22 09:33:10 UTC

Permalink

Post by Dimitris Kapousouz
(I was hoping some PIC of 18F series can
handle this). Of course the most time and CPU consuming part is
the training which will be held in a PC. Only the weights of the
NN are going to be put in some memory, for the calculations.
(If i ever manage to build this NN)

I suspect you would get a fair way with an 18F8x20 series device which has
128k internal ROM and up to 2MB external ROM addressing capability.

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

Bob Ammerman

2004-04-22 14:07:36 UTC

Permalink

Check out:

http://www.sensoryinc.com

where they have many speech recognition options. I have used the RSC-364,
and it does work quite well.

Bob Ammerman
RAm Systems

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

Andrew Kilpatrick

2004-04-22 14:01:23 UTC

Permalink

Back in the old days (well, when I was in university in the late 90s :)
I was working in an audio research project, and was thinking about a lot
of audio processing ideas. Obviously one of the ones that came to mind
was that of speech recognition, and I came up with an interesting idea,
albeit maybe not an original one.

After having played with (some years before that) an audio spectrum
analyser, I started to think about reducing the amount of information
in an audio signal, which, of course, is what a spectrum analyser does.
It gives you, lets say, 10 bands of information spaced every octave.
And each of these 10 bands might be connected to a level meter with 10
steps from no signal to full signal. Obviously there is not enough
information presented by this kind of display to recreate the audio,
but it does give a kind of view of the signal that is not possible in
any other way.

And that got me thinking...

If you could so easily reduce the amount of audio information with a
bunch of analog filters, it would then be more easy to process the
result in a CPU, provided that there was enough information to do
whatever task was required. And so I envisioned a system using a bunch
of filters adapted to different parts of the speech band. Some would
detect the various sounds of vowels by particular weighting of the
signal in certain bands, and some would detect consanants, etc. The
signals would just be DC signals that are slowed down enough to be
very fast level meters, but not the original waveforms themselves.
Then a CPU would just need to compare the sequence of energies measured
in the various bands against reference sequences previously recorded.
It could even deal with words it didn't know by chopping up the result
and applying it to multiple prerecorded sequences. Perhaps the analog
circuit could be replaced by a DSP, although for a prototype, I think
that it would be easier to use analog circuits.

So, I guess the long and the short of it is that I have no idea if
this is feasible, or if this might be how some software/hardware does
the job already, but it's worth a shot. For a hobby type project it
would be pretty easy to construct, and probably fairly easy to program.
Even just with a bunch of analog filters and level meter circuits feeding
their voltages into the analog inputs to a PIC, a nice little RS-232
output could be made to plug into a PC. Then you could write simple
software on the PC to watch the voice spectrum with a minimal amount
of coding and CPU effort.

Andrew

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics