[EE] Spread Spectrum Audio Watermarking

Discussion:

Harold Hallikainen

2018-02-02 17:39:25 UTC

I'm looking at sending some extremely low speed data under audio to
identify the audio. The receiver would be an Android device with internal
microphone. I'm thinking of using direct sequence spread spectrum with the
level being substantially below the audio so it would appear to be very
low level noise. But, my experience with spread spectrum is extremely
limited. I'd appreciate any pointers to information on it, especially at
audio frequencies, and any open source projects on it.

THANKS!

Harold

--
FCC Rules Updated Daily at http://www.hallikainen.com
Not sent from an iPhone.
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist

Charles Craft

2018-02-02 18:20:43 UTC

Permalink

https://www.bloomberg.com/news/articles/2018-02-02/here-s-why-alexa-won-t-light-up-during-amazon-s-super-bowl-ad
"The second tactic describes how a commercial itself could transmit an inaudible acoustic signal to tell Alexa to ignore its wake word."

-----Original Message-----

Sent: Feb 2, 2018 12:39 PM
Subject: [EE] Spread Spectrum Audio Watermarking
I'm looking at sending some extremely low speed data under audio to
identify the audio. The receiver would be an Android device with internal
microphone. I'm thinking of using direct sequence spread spectrum with the
level being substantially below the audio so it would appear to be very
low level noise. But, my experience with spread spectrum is extremely
limited. I'd appreciate any pointers to information on it, especially at
audio frequencies, and any open source projects on it.
THANKS!
Harold
--
FCC Rules Updated Daily at http://www.hallikainen.com
Not sent from an iPhone.
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist

--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist

Harold Hallikainen

2018-02-02 23:30:58 UTC

Permalink

Thanks! I doubt they just use high frequency attenuation. I'm surprised
Amazon does not just identify the voices of people in the household.
Commands could be customized to the person asking. If an unrecognized
voice shows up (like from a commercial), the Echo could ask the command to
be repeated.

Harold

Post by Charles Craft
https://www.bloomberg.com/news/articles/2018-02-02/here-s-why-alexa-won-t-light-up-during-amazon-s-super-bowl-ad
"The second tactic describes how a commercial itself could transmit an
inaudible acoustic signal to tell Alexa to ignore its wake word."
-----Original Message-----

--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist

Dave Tweed

2018-02-03 00:04:42 UTC

Permalink

I used to work for Aris Technologies, a predecessor to Verance Corporation,
one of the leaders in this field. (https://www.verance.com/)

Making an audio watermark both *unobtrusive* and *robust* is a serious
challenge. Noise-based systems score poorly on both aspects.

Wideband noise is not really all that unobtrusive, unless the audio "cover"
signal itself is also fairly wideband, like a rock band. Solo instruments or
singers will not hide the noise well at all.

Wideband noise is also not robust. First of all, the microphone in a typical
Android device is not going to give you a high SNR to begin with, which means
you'll need to have a significant amount of power in your watermark, which
gets back to the obtrusiveness point above. Second, any kind of encoding
algorithm used to compress the audio (e.g., MP3) will be expressly designed
to ignore any wideband noise in the original signal, pretty much wiping out
your watermark.

Some of Verance's algorithms are patended; you might try searching for patents
assigned to them in order to get a feel for what the state of the art in this
area is. I know what they were doing back in 2000, but I'm not free to discuss
the details. Besides, I'm sure things have evolved significantly since then.

-- Dave Tweed

--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist

Jean-Paul Louis

2018-02-03 04:59:22 UTC

Permalink

Harold,

Did you look at using CTSS if data is low speed.

Post by Harold Hallikainen
I'm looking at sending some extremely low speed data under audio to
identify the audio. The receiver would be an Android device with internal
microphone. I'm thinking of using direct sequence spread spectrum with the
level being substantially below the audio so it would appear to be very
low level noise. But, my experience with spread spectrum is extremely
limited. I'd appreciate any pointers to information on it, especially at
audio frequencies, and any open source projects on it.
THANKS!
Harold
--
FCC Rules Updated Daily at http://www.hallikainen.com
Not sent from an iPhone.
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist

Just my $0.02,

Jean-Paul
N1JPL

--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist

RussellMc

2018-02-03 05:03:16 UTC

Permalink

Post by Harold Hallikainen
I'm looking at sending some extremely low speed data under audio to
identify the audio. The receiver would be an Android device with internal
microphone. I'm thinking of using direct sequence spread spectrum with the
level being substantially below the audio so it would appear to be very
low level noise. But, my experience with spread spectrum is extremely
limited. I'd appreciate any pointers to information on it, especially at
audio frequencies, and any open source projects on it.
This is not directly what you want but given what Dave said about

difficulty this may help in some way.

Just under 40 years ago (!!!) for my Master's thesis project I needed

to decode signalling tones on telephone customer calls fed from points
points in the midst of the switching network in a telephone exchange.
Decoding of "real" signals in isolation was easy but I found that speech
often produced sequences around the relevant 400 Hz frequency that imitated
signalling well enough to be falsely decoded as signals. I used a "speech
detector" which looked for energy in specific other bands which indicated
that speech and not signals was present. I could dig out a thesis copy for
more detail but my application's specifics are almost certainly irrelevant
to you.
However, the presence or absence of sound energy in certain bands may allow
signalling in manners which signals alone may not suffice.

Russell

--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/

Harold Hallikainen

2018-02-03 18:01:03 UTC

Permalink

Thanks for all the comments!

I am familiar with Verance and have worked with Civolution on audio and
video watermarking. They are indeed sophisticated (and expensive) systems.

I've also seen a bit about the Nielson Portable People Meter (
https://en.wikipedia.org/wiki/Portable_People_Meter ) which adds
identifying audio to broadcast stations for ratings. The spectral plots
I've seen of PPM show fairly high level tones added to the audio but are
psychoacoustically masked by adjacent frequency audio.

I've also considered CTSS (continuous tone coded squelch) methods where a
continuous subaudible tone identifies the signal. A modification of that
could be subaudible FSK where I'd be able to identify multiple signals
using the subaudible FSK and a UART. That technique is not suitable for
the normal radio use of CTSS where you need to identify the signal (to
unsquelch the radio) very quickly. Here, I could identify the signal in a
minute or so.

On "talk off," early ham radio DTMF (dual tone multi frequency or Touch
Tone) detectors suffered from this since they only looked for the presence
of the tones. Better ones compared the in-band level to the out of band
level for the tones. Voice would have near equal in-band to out-of-band
levels while DTMF tones had high in-band signals compared to the
out-of-band. In the application I'm looking at, though, there will be a
lot of out-of-band audio.

Methods of detecting signals below the noise or interfering signal level
are interesting. Spread spectrum is one method. There's also the JT
methods described at https://physics.princeton.edu/pulsar/k1jt/wsjt.html .
These use frequency shift keying at a slow rate (0.372 seconds per baud)
with multiple frequencies to get more bits per baud. This allows data
recovery with a signal to noise ratio of 28 to 24 dB in 2500 Hz
bandwidth.

Thanks for the ideas!

Harold

RussellMc

2018-02-04 11:31:40 UTC

Permalink

Can you do eg

- Slow speed data driven phase reversal or phase modification on
frequencies within a selected narrow bandwidth.

- Signal absence/presence in zero crossing points in signal to allow low
amplitude signal with well identified locations where it appears. Will
depend on modulation type.

R

--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://