IMAGINE A beach where for decades people have enjoyed sunbathing in the buff. Suddenly one of the world’s biggest corporations takes it over and invites anyone in, declaring that thongs and mankinis are the new nudity. The naturists object, but sun-worshippers flock in anyway. That, by and large, is the situation in the world’s open-source community, where bare-it-all purists are confronting Meta, the social-media giant controlled by a mankini-clad Mark Zuckerberg.
On August 22nd the Open Source Initiative (OSI), an industry body, issued a draft set of standards defining what counts as open-source artificial intelligence (AI). It said that, to qualify, developers of AI models must make available enough information about the data they are trained on, as well as the source code and the “weights” of the internal connections within them, to make them copyable by others. Meta, which releases the weights but not the data behind its popular Llama models (and imposes various licensing restrictions), does not meet the definition. Meta, meanwhile, continues to insist its models are open-source, setting the scene for a clash with the community’s purists.
Meta objects to what it sees as the OSI’s binary approach, and appears to believe that the cost and complexity of developing large language models (LLMs) means a spectrum of openness is more appropriate. It argues that only a few models comply with the OSI’s definition, none of which is state of the art.
Mr Zuckerberg’s eagerness to shape what is meant by open-source AI is understandable. Llama sets itself apart from proprietary LLMs produced by the likes of OpenAI and Google on the openness of its architecture, rather as Apple, the iPhone-maker, uses privacy as a selling-point. Since early 2023 Meta’s Llama models have been downloaded more than 300m times. As customers begin to scrutinise the cost of AI more closely, interest in open-source models is likely to grow.
Purists are pushing back against Meta’s efforts to set its own standard on the definition of open-source AI. Stefano Maffulli, head of the OSI, says Mr Zuckerberg “is really bullying the industry to follow his lead”. OLMo, a model created by the Allen Institute for AI, a non-profit based in Seattle, divulges far more than Llama. Its boss, Ali Farhadi, says of Llama models: “We love them, we celebrate them, we cherish them. They are stepping in the right direction. But they are just not open source.”
The definition of open-source AI is doubly important at a time when regulation is in flux. Mr Maffulli alleges that Meta may be “abusing” the term to take advantage of AI regulations that put a lighter burden on open-source models. Take the EU’s AI Act, which became law this month with the aim of imposing safeguards on the most powerful LLMs. It offers “exceptions” for open-source models (the bloc has many open-source developers), albeit with conflicting definitions of what that means, notes Kai Zenner, a policy adviser at the European Parliament who worked on the legislation. Or consider California’s SB 1047, a bill that aims for responsible AI development in Silicon Valley’s home state. In a letter this month, Mozilla, an open-source software group, Hugging Face, a library for LLMs, and EleutherAI, a non-profit AI research outfit, urged Senator Scott Wiener, the bill’s sponsor, to work with OSI on a precise definition of open-source AI.
Imprecision raises the risk of “open-washing”, says Mark Surman, head of the Mozilla Foundation. In contrast, a watertight definition would give developers confidence that they can use, copy and modify open-source models like Llama without being “at the whim” of Mr Zuckerberg’s goodwill. Which raises the tantalising question: will Zuck ever have the pluck to bare it all? ■