Throwing processing power at raw images lets smartphones and cameras do some amazing things—and the best is yet to come.
Since the birth of photography almost 180 years ago, the relationship between a photographer and a camera has remained mostly unchanged. You open a shutter and capture an image. Though you might manipulate lenses, exposures, and chemicals—or, in recent years, bits—there was a nearly one-to-one relationship between what the lens saw and what you captured. But you've likely taken thousands, if not tens of thousands, of pictures in recent years that break that relationship without knowing it.
Computational photography takes a swarm of data from images or image sensors and combines it algorithmically to produce a photo that would be impossible to capture with film photography or digital photography in its more conventional form. Image data can be assembled across time and space, producing super-real high-dynamic range (HDR) photos—or just ones that capture both light and dark areas well. Multiple cameras' inputs can be fused into a single image, as on some Android phones and the iPhone 7 Plus, allowing for crisper or richer images in a single shot and a synthetic zoom that looks nearly as good as one produced via optical means.
But as much as computational photography has insinuated itself into all major smartphone models and some standalone digital cameras, we're still at the beginning. Google, Facebook, and others are pushing the concept further, and researchers in the field say there are plenty of new ideas circulating that will make their way into hardware—mostly as part of smartphones, the biggest platform for taking pictures and leveraging innovative imaging techniques.
The coming developments will allow 3D object capture, video capture and analysis for virtual reality and augmented reality, better real-time AR interaction, and even selfies that resemble you more closely.
In recent years, we've watched the just-good-enough cameras in smartphones become better-than-good-enough, eating the heart out of what was once a fast-growing market for point-and-shoot digital cameras. While smartphones can't beat the combination of lens, high-count sensors, and other factors that make digital single-lens reflex (DSLR) cameras the pinnacle of the market, they continue to creep up the curve, with computational photography providing some of the tricks.
When HDR first appeared in the iPhone's iOS 4.1 release in 2010, it followed a typical practice by professional and serious photographers of bracketing shots: taking multiple images manually or with automatic settings at different exposures or other settings. Before image-editing software, photographers would pick among their photos and sometimes use darkroom techniques to combine them. Photoshop and other apps could mix multiple exposures of the same space to great effect, and some iOS apps were already offering this as a feature when iOS 4.1 shipped.
Having HDR built directly into a smartphone OS transformed it from a trick into a mainstream technique, even though the early versions weren't great. (Android followed the iPhone's lead and added it as a core feature.) Apple gradually shifted from capturing three bracketed images to what photo app developers tell me is a much more elaborate set of captures and adjustments that are analyzed and fused in software to produce the HDR result.
And that's where things mostly stalled for years, despite a proliferation of academic investigation. Gordon Wetzstein, a professor who leads the Stanford Computational Imaging Group, an interdisciplinary research group at Stanford University, says that of hundreds of papers in the field on computational photography, it "boils back down to one, two, maybe three different incarnations that end up being simple enough that they're actually useful." This is partly because of power constraints, phones' and cameras' form factors, and other elements that limit practical use.
Adding multiple rear-facing cameras was an idea that kicked around for quite a while. While the first dual-camera phone shipped was the HTC One (M8) in early 2014, its abilities were ahead of the software and image-processing hardware. The potential started to be realized with the Huawei P9 (April 2016), which combined color and grayscale cameras, and Apple's iPhone 7 Plus (October 2016), which has a wide-angle and nominally telephoto pair. In both cases, the multiple cameras' images capture different aspects simultaneously, which software combines for an arguably better result.
With two cameras combined with software that performs object recognition in scene, a system can extract depth. The iPhone 7 Plus uses this with Apple's still-in-beta Portrait mode, which fillets a subject in the foreground from all the background layers, allowing it to pleasingly blur the background and thereby create the effect known as bokeh. This look simulates the one that a photographer would previously get by using a DSLR paired with a lens with a very short depth of field.
Wetzstein notes the potential for the depth recognition to have an impact behind photographic effects. By analyzing objects in a scene by depth, a two-camera system could automatically produce better pictures, building on the face, smile, and blink recognition features that are standard in cameras and smartphones today.
But if two lens/camera combos are good, surely more are better? Researchers have tested cobbled-together multi-input systems, sometimes quite elaborate, as with the Stanford Multi-Camera Array, which sported 128 separate cameras. These were fixed installations and not practical for commercial (or amateur) use.
The low cost of smartphone-size lenses may change that. Instead of using a single, large expensive lens, as on a DSLR, performing computation on photos collected from many smaller lenses and integrating the results computationally could achieve high-quality results. This is the thinking behind the L16, cited by Wetzstein as an example. It's a camera made by a company simply called Light, with 16 camera elements across three focal lengths. (The $1,700 device isn't shipping yet and its preorder allotment sold out.)
Depending on lighting and zoom factor, the L16 fires off a different combination of 10 of those lenses across three focal lengths to fuse a 52-megapixel image using a package not much bigger than a smartphone or typical digital snapshot camera. It may be a gimmick or it may be a way to pack a wallop in one's pocket; we'll know when it hits photographers' hands.
A different hardware approach brought Lytro to the market, a single-lens camera that could refocus a photograph after it was taken and produce 3D images. Lytro's technology relies on a large image sensor, the elements of which were grouped into super-pixels that allowed its software to capture a light field, effectively knowing the incoming direction of light as it hit the sensor. This light field could be reconstructed by its software later. The system never caught on in either its original prosumer or later professional model, and the company adapted its approach to VR capture hardware.
Here's a refocusable photo taken with Lytro's ultimately unsuccessful consumer camera:
Rather than capturing light fields or combining image data, some experimental efforts in the hands of developers rely on a synchronized infrared (IR) sensor that captures depth information. Google's Tango is a practical testbed for this approach, allowing the capture of structured light and time of flight.
Structured light relies on projecting a pattern onto a scene that a sensor then reads and uses to estimate distance and surface displacement. Time of flight, by contrast, measures the time between projecting a signal and its reflection, omitting a grid and providing more direct measurement. IR is invisible to the naked eye, and is most commonly used.
Microsoft's Kinect sensor add-on for the Xbox started with structured light and shifted to time of flight, and in both versions were the first mainstream uses, but in a fixed location and for a single purpose: capturing motion for gaming and other inputs. Tango, while still a work in progress relevant to developers rather than the masses, brings the technology to mobile devices in a practical form. It's already available in Lenovo's Phab 2 Pro smartphone.
At first glance, these types of depth-finding may not seem to meet the definition of computational photography. In effect, an IR sensor (paired with an emitter) is a camera, paired with a standard photographic camera to build a depth and object map.
Any method of obtaining depth plays right into advancing augmented and virtual reality systems and practicality by allowing a mobile device to better identify what's in its visual field. The more immediate benefit is for AR: Overlaying an existing scene with information requires vastly less computational power than generating VR's full-blown 3D graphics and letting people interact with that world.
Wetzstein says that structured light is a power-hungry technique because it requires the constant projection of a grid. Time of flight should have greater impact, but he says it will require years more development to make it fully capable.
3D VR photographic capture could come at some point from a combination of multiple lenses and depth perception, but probably not any time soon. Wetzstein says that although phones can capture panoramas easily enough, creating both video and stereo panoramas that can be stitched together and remain synchronized currently requires gear in the $15,000 to $30,000 range, such as that used with Facebook 360 and Google Jump, relying on more than a dozen cameras and huge apparatuses.
Besides its role in AR and VR, computational photography could help solve much more routine problems by marrying itself with computer vision (the study of machine-based perception) and machine learning (teaching machines to recognize what they perceive).
By better analyzing the contents of a scene, photo software could automatically identify the best pictures.
Irfan Essa, a professor at the Georgia Institute of Technology, heads the school's Interdisciplinary Research Center for Machine Learning. He says that an ever-stronger connection among those areas "has grown into more object-centric thinking." Computational photography moves beyond just capturing pixels, he says, into capturing light, which allows it to extract the geometry of a scene. "If you know where the object is and what surface it's on, you can do more with it," he says.
This helps with depth, as noted above, but also with one of the most common problems facing average smartphone owners: It's easier to take photos than manage them. "We're just capturing too many pictures," Essa says. "I take pictures at the dinner table with my family and I end up having 40 to 50 pictures." By better analyzing the contents of a scene, photo software could automatically identify the best pictures.
Some third-party apps already do this, and Apple's burst mode in its Camera app tries to detect the "best" pictures of a set taken in fast succession. But these early stabs at the idea rely on a handful of cues instead of full-blown recognition. As the photographic tech in smartphones gets better, researchers will be able to take the idea further, Essa says.
Essa also expects to see improvements in color matching, tone adjustment, and selfie correction. He notes that despite the decades of work that Adobe and Kodak have put into technologies to allow the same color to appear in the same way everywhere, it's only recently that these ideas have hit the mass market. Apple's 9.7-inch iPad Pro, for instance, introduced what Apple calls "True Tone," a sensor that measures ambient light color and conditions and adjusts the display to provide a consistent set of colors to the viewer, no matter the temperature of the light in which they're using the tablet.
Better color management relies on better cameras as well as better displays, and Essa says it will ultimately produce a pipeline that computational photography will aid by integrating similar sampling technologies into the image-creation chain. He notes that skin tone is an area where the most improvement could come. "Most selfies look like crap, but they're getting better," he says.
One of the pioneering academics of computational photography, Marc Levoy, taught at Stanford, inspired and advised the founders of Lytro, and released an early iPhone app that created faux bokeh. He's now at Google, and deferred my questions to the firm's press relations department, which didn't respond to a request for an interview. This isn't unusual: Many researchers in this field have founded or joined startups or become part of teams at computer companies and dotcoms. That's a reminder that there's likely a fair amount more happening behind the scenes at smartphone makers, some of which will find its way into our hands.