The difference between this approach and its predecessors is that DeepMind hopes to use a “long-term security dialogue,” says Geoffrey Irving, a security researcher at DeepMind.
“That means we don’t expect the problems we face in these models — whether misinformation or stereotyping or whatever — to be obvious at first glance, and we want to talk about them in detail. And that also means between machines and people,” he says.
DeepMind’s idea of using human preferences to optimize the way an AI model learns is not new, says Sara Hooker, who runs Cohere for AI, a nonprofit AI research lab.
“But the improvements are compelling and show clear benefits for human-driven optimization of dialog agents in a large language model environment,” says Hooker.
Douwe Kiela, a researcher at AI startup Hugging Face, says Sparrow is “a nice next step that follows the general trend in AI, where we’re more seriously trying to improve the security aspects of implementing models in large languages.”
But there’s a lot of work to be done before these conversational AI models can be deployed in the wild.
The sparrow still makes mistakes. The model sometimes goes off topic or makes up random answers. Determined participants were also able to make rules for breaking the model 8% of the time. (This is still an improvement over older models: previous DeepMind models broke the rules three times more often than Sparrow.)
“For areas where the human toll could be high if an agent responds, such as providing medical and financial advice, this may still seem to many to be an unacceptably high failure rate,” says Hooker. The work is also built around an English-language model, “as we live in a world where technology must safely and responsibly serve many different languages,” she adds.
And Kiela points out another problem: “Relying on Google for information leads to unknown biases that are hard to detect, given that everything is closed source.”