How Facebook teaches photos to talk

Inside Facebook's plan to be more human
Inside Facebook's plan to be more human

Facebook's News Feed is a feast for the eyes, filled with photos, videos and status updates.

That's not great for visually impaired individuals, so Facebook has turned to artificial intelligence to improve their experience. A blind person can now hear an audio message describing a friend's photo that shows people dancing or riding bikes.

To do so, Facebook's algorithm had to be taught what it was seeing.

Artificial intelligence is the secret sauce behind making a project like this possible. It can do everything from translate languages, understand human speech and identify diseases. But AI advances aren't without flaws.

Even as artificial intelligence excels, the human element -- which includes biases and oversights by those who train the system -- surfaces in alarming ways. For example, a Microsoft bot named Tay once sparked outrage when it tweeted attacks against Jews and feminists.

Dario Garcia, an artificial intelligence engineer at Facebook, is leading the project to identify what is happening in photos and read them out loud for the blind.

"If you get it wrong, the consequences are pretty bad," said Dario Garcia, an artificial intelligence engineer at Facebook. "[Our project is] not a self-driving car, where someone will die if you get it wrong. But you can give a very misleading experience to people that most likely don't have a clear way of knowing the algorithm is wrong."

Garcia's team gathered a sample of 130,000 public images that featured people. Staffers, called annotators, wrote a single line description of each photo. The images became examples that showed the AI what a photo of a person riding a bike or a horse looked like.

The team faced tricky questions. If only part of a person's body appeared in an image, Garcia and the annotators would need to discuss how that influenced the description.

"You become almost obsessed with what the current definition of a person is," Garcia said.

The conclusions of the group impacted how billions of photos are understood.

Over time, the algorithm learned what was happening in photos and developed its own captions. After caption writing was tested, some images were relabeled to correct mistakes. The AI also learned from those corrections and strengthened its predictions in what Garcia calls a virtuous cycle.

When the system launched in April 2016, it only identified objects and humans, but it has since been updated to identify 12 distinct actions in its captions.

To use the feature, a blind person needs to access Facebook with a screen-reader -- software that helps a visually impaired reader by using a speech synthesizer or braille display -- and focus on the image.

Related: Facebook exec: We need more women in power

There's still room to improve. The National Federation of the Blind recommends Facebook users who want the blind to have access to their photos include a detailed caption due to the limitations of the service.

Matt King, a blind engineer at Facebook who contributed to the project, compares today's AI systems to machines from the 1980s that read books to the unsighted. Those machines were the size of washing machines, couldn't read fancy typefaces, and the page of the book had to be clean.

"Artificial intelligence is creating a path to a world where everyone can communicate in ways they feel are most natural and can do so without leaving anyone feeling excluded," King said.

He says he's optimistic about Facebook's progress so far.

Facebook's advancements have also been helped along by Yann LeCun, the company's director of AI Research. LeCun, who joined Facebook in 2013 and is also a professor at New York University, is one the biggest names in the AI field. He's credited with developing the convolutional neural network, a popular AI technique that has been used for years in banks and ATMs to read the numbers on checks.

Despite its advancements, LeCun knows there are still limitations with AI. LeCun's wife, who is French, cannot use voice recognition apps because they struggle to understand her accent.

"There's not a lot of people speaking English with a French accent," LeCun explained to CNN Tech. "It's not that [engineers] don't like French accented people. It's just that there's not much data."

CNNMoney Sponsors