The Video Microphone: Is your candy wrapper tattling on you?
This is pretty cool guys.
Researchers at MIT can film a potted plant that you’re talking to and then recreate your voice based solely on that silent video.
It doesn’t have to be a plant. It could be a glass of water or even an empty bag of potato chips.
This seemingly amazing feet is accomplished by a special algorithm that can interpret lots of super-tiny but distinct vibrations and determine the sound that must have created them. When I say tiny, I mean tiny. We’re talking one tenth of a micrometer or less than one-thousandth of a pixel…a pixel! If superman looked at these videos, he’d have trouble seeing any of these vibrations.
It does this by looking at the gestalt or whole image whether it’s a plant or candy wrapper. It then averages all those tiny vibrations and filters out the noise.
Researchers used really high-speed cameras to take the video, something in the range of 2,000 to 6,000 frames per second (fps). For comparison, movie film is generally 24 fps. This kind of speed is necessary because the frequency of the film (fps) has to be higher than the frequency of the audio signal.
Just to be sure, they also did an experiment from 15 feet away and through soundproof glass. The result is definitely not perfect but the words are clearly understandable. Check out this video link to hear examples and descriptions.
Don’t worry though. You don’t need a high-end camera to pull this off. Common smartphones with 60 fps capability can also do the same thing. How is that possible? Because of a peculiarity of a technology called the rolling shutter sensor.
When you take a selfie, the resulting image (that no one wants to see btw) is built up over a brief period of time as a series of lines, one on top another. Each line is taken at a slightly different slice of time. This, in essence, is encoding information at a much higher frequency than 60 fps. Fast enough to pick up the tiny vibrations needed to reconstruct the audio like I’ve been discussing. The quality isn’t as good as the high speed cameras but it can do the job.
As a surveillance tool this can obviously come in handy. Sure it could be easy to circumvent. All you have to do is whisper in an already noisy environment. But this is the barest beginning of this technology. What will version 10.0 be like? (Full Disclosure, I heard of the government using technology like this years ago to eavesdrop on conversations that were vibrating windows. Perhaps they already have version 10.0. I’ll never talk to my candy wrappers again)
That doesn’t even matter though. It can still be used to pull sounds from video that we never dreamed of capturing before. Sounds that we didn’t even know were there.
The researchers that developed this technology have a completely different take on its potential though. They see it as a fundamentally new type of object imaging. Often we need to determine how an object deals with various forces like pressure. Instead of physically poking something, we can just blast it with sound and see what the algorithm says.
Image Credit: Video Screenshot