Sure, GitHub’s AI-assisted co-pilot writes code for you, but is it legal or ethical?

GitHub co-pilot, Microsoft’s AI pair programming service, has been out for less than a month now, but it’s already hugely popular. In projects where it is enabled, GitHub says nearly 40% of code is now written by Copilot. That’s over a million users and millions of lines of code.

This extension and a back-end service suggests code to developers directly in their editors. It supports integrated development environments (IDEs) such as Microsoft’s Visual Studio Code, Neovim, and JetBrains. In these, the AI ​​suggests the next line of code as the developers type.

The program can suggest complete methods and complex algorithms as well as boilerplate code and unit testing support. For all intents and purposes, the rear motor AI acts as a pair-programming assistant. Developers are free to accept, reject or modify Copilot’s suggestions. If you are a new programmer, Copilot can interpret simple natural language commands and translate them into one of twelve programming languages. These include Python, JavaScript, TypeScript, Ruby, and Go.

Microsoft, GitHub and Open AI collaborated in the construction of the program. It’s based on OpenAI Codex. The Codex was trained on billions of lines of publicly available source code – including code in public repositories on GitHub – and on natural language, which means it can understand both programming and languages humans.

Sounds like a dream come true, doesn’t it? There’s a pretty big fly in the soup, though. Legal questions arise as to whether Codex had the right to use open source code to provide the basis for a proprietary service. And, even if it’s legal, can Microsoft, OpenAI and GitHub, and thus Copilot users, ethically use the code it “writes?”

According to Nat Friedman, CEO of GitHub when Copilot was released in beta, GitHub is legally in the clear because “training ML systems on public data is fair use.But, he also noted, “IP [intellectual property] and AI will be an interesting political discussion around the world for years to come.” You can say it again.

Others disagree with Venom. The Software Freedom Conservancy (SFC), a nonprofit that provides legal services for open source software projects, argues that OpenAI was formed exclusively with projects hosted on GitHub. And many of them were licensed under copyleft licenses. Therefore, like Bradley M. Kuhn., SFC Policy Fellow and Hacker-in-Residence, declared“Most of these projects are not in the ‘public domain’, they are licensed Free and Open Source Software Licenses (FOSS). These licenses have requirements such as correct author attribution and, in the case of copyleft licenses, they sometimes require that works based on and/or incorporating the software be licensed under the same copyleft license as the prior work. Microsoft and GitHub have been ignoring these licensing requirements for over a year.”

Therefore, the SFC bites the bullet and urges developers not just to avoid using Copilot, but to stop using GitHub altogether. They know it won’t be easy. Thanks to “effective marketing” from Microsoft and GitHub, GitHub has convinced free and open source software (FOSS) developers that GitHub is the best (and even the only) place for FOSS development. However, as a proprietary tool and trade secret, GitHub itself is the complete opposite of FOSS,” Kuhn added.

Other people land between these extremes.

For example, Stefano Maffulli, executive director of the Open Source Initiative (OSI)the organization that oversees open source licensing, understands “why so many open source developers are upset: they made their source code available for the advancement of computing and mankind. Now that code is being used to train machines to create more code — something the original developers never imagined or intended. I can see how infuriating that is for some.

That said, Maffulli thinks, “Legally, it looks like GitHub is within its rights.” However, it’s not worth getting into the legal weeds arguing whether there’s an open source license issue here or a copyright issue. not just open source developers.”

Maffuli supports:

Copilot exposed developers to one of the dilemmas of modern AI: the balance of rights between individuals participating in public internet and social media activities and companies using “user-generated content”. to train a new all-powerful AI. For many years, we knew that uploading our images, blog posts, and code to public internet sites meant that we would lose some control over our creations. We have created standards and licenses (open source and Creative Commons, for example) to balance control and publicity between creators and society as a whole. How many billions of Facebook users realized their photos and tags were used to form a machine that would recognize them on the streets protesting or shopping? How many of those billions would choose to participate in this public activity if they understood that they were driving a powerful machine with an unknown reach into our private lives?

We can’t expect organizations to use AI in the future with “goodwill” and “good faith”, so it’s time to have a broader conversation about the impact of AI on society and on open source.

That’s a great point. Copilot is just the tip of the iceberg of a much bigger problem. The OSI will not ignore it. The organization has been working for several months on building a virtual event called In-depth analysis: AI. The OSI hopes this will start a conversation about the legal and ethical implications of AI and what is acceptable for AI systems to be “open source”. It includes a podcast series, to be launched soon, and a virtual conference, to be held in October 2022.

Focusing more on the legal stuff, a well-known open source lawyer and OSS Capital General Partner Heather Meeker believes Copilot is legally in the clear.

People get confused when a body of text like software source code – which is a copyrighted work of authorship – is used as data by other software tools. They might think that the results produced by an AI tool are somehow “derived” from the body text used to create it. In fact, the license terms of the original source code are probably irrelevant. AI tools that do predictive writing, by definition, suggest commonly used phrases or statements when the context makes them appropriate. That would probably fall under fair use or scene to do defenses against copyright infringement – if it was infringement in the first place. It is more likely that these commonly used artifacts are small snippets of code that are fully functional in nature and therefore, when used in isolation, do not enjoy copyright protection at all.

Meeker noted that even the Freedom Software Foundation (FSF) does not claim that what Copilot does is copyright infringement. As John A. Rothchild, professor of law at Wayne State University, and Daniel H. Rothchild, Ph.D. candidate at the University of California, Berkeley, stated in his FSF article, “Use of Copilot’s output by its developer-clients is probable and not infringing.This, however, “does not absolve GitHub of wrongdoing, but rather argues that Copilot and its developer-clients are likely not infringing developer copyrights.” FSF argues that Copilot is unethical because it is software as a service (SaaS).

Eben Moglen, an open source legal expert and Columbia law professor, thinks Copilot doesn’t face serious legal issues, but GitHub and OpenAI need to address some concerns.

Indeed, Moglen said, “Like photocopiers or scissors and paste, code recommendation programs may result in copyright infringement. Accordingly, parties offering such recommendation services should proceed with due regard to licensing so that users incorporating recommended code into their projects are granularly informed of any licensing restrictions on recommended code. Ideally, users should have the ability to automatically filter recommendations to prevent unintended incorporation of code with conflicting or undesirable license terms. At this time, Copilot does not do this.

Therefore, since many “free software programmers are not comfortable with code, they have contributed to the incorporation of free software projects into a GitHub code database through which it is distributed as snippets by the Copilot recommendation engine for a fee,” Moglen said. GitHub should provide “an easy and persistent way to separate their code from Copilot”. If GitHub doesn’t, they’ve given programmers a reason to move their projects elsewhere, as the SFC suggests. Therefore, Moglen expects GitHub to provide a way to protect affected developers from having their code sucked into the OpenAI Codex.

What shall we do now? Ultimately, the courts will decide. Besides open source and copyright issues, there are still bigger legal issues regarding the use of “public” data by private AI services.

As Maffulli said, “We need to better understand the needs of all stakeholders involved in AI in order to establish a new framework that will embed the value of open source in AI, providing the safeguards for that collaboration and fair competition occur at all levels of society.”

Finally, it’s worth noting that GitHub isn’t the only company using AI to help programmers. Google’s DeepMind has its own AlphaCode AI Development SystemSalesforce has CodeT5and there is also open-source PolyCoder. In short, Copilot is not the only AI coder. The question of how AI fits into programming, open source and copyright is far more important than the simplistic “Microsoft is bad for open source!”

Related stories:

Comments are closed.