Facebook Ego4D AI Research Wants To Learn Everything You Do, What Could Go Wrong?

Facebook
Facebook's artificial intelligence division has announced a new long-term research project called Ego4D, which is an ambitious effort to expand the boundaries and capabilities of AI to understand and learn our habits from a first-person perspective, via AR glasses and headsets. The hardware aspect, while critical, is secondary to the scope of what Facebook is trying to accomplish.

At face value, this is about teaching AI to perceive and the world through your own eyes, to help you with a variety of everyday and/or occasional tasks. That's where some kind of head gear comes into play—it serves as the first-person conduit from which the AI is fed external data from your surroundings, experiences, and interactions.

"AI typically learns from photos and videos captured in third-person, but next-generation AI will need to learn from videos that show the world from the center of action. AI that understands the world from this point of view could unlock a new era of immersive experiences," Facebook explains.


It's not just Facebook that is involved with this, but a consortium of 13 universities from nine countries. They've collectively amassed north of 2,200 hours of first-person video from 700 t̶e̶s̶t̶ ̶s̶u̶b̶j̶e̶c̶t̶s̶ participants moseying about their daily lives doing whatever it is they do. The universities are the ones collecting the data (at least for now), which is described as the "world's largest first-person video dataset captured in the wild with unscripted activity."

As part of this research, the overseers are focusing on five benchmark challenges...
  • Episodic memory: What happened when? (e.g., “Where did I leave my keys?”)
  • Forecasting: What am I likely to do next? (e.g., “Wait, you’ve already added salt to this recipe”)
  • Hand and object manipulation: What am I doing? (e.g., “Teach me how to play the drums”)
  • Audio-visual diarization: Who said what when? (e.g., “What was the main topic during class?”)
  • Social interaction: Who is interacting with whom? (e.g., “Help me better hear the person talking to me at this noisy restaurant”)
Facebook says these benchmarks and the subsequent research that goes into them will ultimately lead to the development of smarter AI assistants. How that manifests down the road is the big question. In this initial phase, cameras are capturing day-to-day activities, like shopping at the grocery store, playing games and chatting at the same time, interacting in group activities, and so forth.

"Ego4D makes it possible for AI to gain knowledge rooted in the physical and social world, gleaned through the first-person perspective of the people who live in it," says Kristin Grauman, lead research scientist at Facebook. "Not only will AI start to understand the world around it better, it could one day be personalized at an individual level—it could know your favorite coffee mug or guide your itinerary for your next family trip. And we're actively working on assistant-inspired research prototypes that could do just that."

Facebook's involvement raises some obvious questions—how will it benefit, and at what cost to its users? After all, this is the same social network that its former product manager has accused of putting profit over safety, in part by optimizing content for engagement even when Facebook allegedly knows it might be harmful to the viewer. It's a claim Mark Zuckerberg vehemently denies, and the whistleblower's own patent filings might raise eyebrows.

Be that as it may, this is either exciting or scary stuff, or maybe both. Answers might come sooner than later—Facebook says this research could impact our lives in the "not-so-distant future."