I've been thinking about removing all open source code I have ever written from the Internet, and re-uploading it exclusively under licenses that prohibit mixing it with code generated by a statistical model.
I've been thinking about removing all open source code I have ever written from the Internet, and re-uploading it exclusively under licenses that prohibit mixing it with code generated by a statistical model.
@mcc thank you for sharing your thoughts on this. 🙂
This thread and replies have given me some things to think about.
I hope things get better.
@mcc@mastodon.social
wich license would you recomend?
@mcc this is a super interesting idea. In terms of a legally coherent license as a goal, another path is to consider e.g. how the zlib "you may not use this for evil" clause caused corporate lawyers from IBM to tie themselves in knots. What I mean is there may be some value in a legally /scary/ license which might be easy enough to write.
@PsySal Preventing corporate usage through WTFPL-like licenses is pretty easy ( consider this: https://mastodon.social/@mcc/112798596112926161 ), but preventing corporate use is not actually my goal. It would probably be the effect of this license however.
@mcc wondering whether existing "joke" licenses actually encompass that
it'd be difficult to argue that vibe coding is not doing "what the fuck you want to", so wtfpl is out
but i do think it constitutes "being a proper dick" so maybe dbad technically works. iunno
@mcc would that stop them, do you think?
@fishidwardrobe It depends on which "them".
@mcc people who like to mix your code via statistical model, i suppose.
it seems to me that these people don't pay much attention to "allowed to".
@mcc Are you willing to chase down every statistical model that ignores your stipulation? They don't seem to think Copyright laws apply to them, if they can be said to 'think'.
Given the current state of affairs, it seems very reasonable to me.
LLM advocates and their enablers will hide the fact that they are posting stolen code— they brag about this, their most widely-effective argument for LLM code being allowed in OSS is "if you tell us we can't, we'll do it and not tell you"— and attempts to track which open source projects use stolen code will be harassed off the Internet. I do not want to be part of this open source community, so the only options I seem to have are to wall myself off from it, or stop doing open source altogether.
@mcc Thank you for putting it like that. I have long had problems with "if we forbid it they're just gonna hide it" argument, but I couldn't quite place my finger on why. I know now - because if people's idea of "open" "community" is to fucking lie and feign ignorance for some personal convenience, they aren't welcome. Never were for other reasons, won't be for sloperating reasons, either.
@KFears I mean it doesn't make sense because the same is true of code which is stolen. You wouldn't introduce a rule of "note: if you have stolen this code unauthorized from your workplace you must disclose it", and if such a rule were introduced you wouldn't justify it based on "well if we don't require labeling people would just do it anyway". So what they're really saying is some kinds of code theft are acceptable and some aren't
@mcc Yep, exactly. I'm loving your eloquence. You really put into words that which I've been struggling to formulate a retort against.
@mcc free software is based on consensus and open-minded communication. Once that communication dies, FOSS dies with it. And the communication is being actively poisoned by LLM use... I have been meaning to write a blog post about that clusterfuck (well, bad-faith communication as a whole in FOSS), i guess i should really get it done some time soon huh?
@mcc This is a feeling I've been having for a few years, and I struggle to see how the normalisation of unaccountable LLM platforms and users won't spell the end of the Free Software movement/type of F/OSS culture I care about, and that's heartbreaking on so many levels.
@mcc It makes me reluctant when I think about sharing my work. It makes me reluctant when I think about accepting contributions. Even for closed source projects, it makes me wary of collaboration because I feel like the level of micromanagement I'd need to have confidence that my work wasn't being tainted just isn't the sort of working relationship I'd like to have with others.
I don't eat fish because I'm not interested in carefully chewing for bones all the time. This feels like that.
@mcc IANAL but I think the problem that for example EFF highlights is that even though your code is protected by copyright law the "fair use" the vultures claim can not legally be restricted in a license. A license can only give people more rights than allowed by copyright law, not less.If what they do is "fair use" I think open source needs to be done in some other way than relying on copyright law.
@higgins I believe you have taken my post as saying the opposite of what it said. Please read the remainder of the thread.
@mcc pretty sure it is possible to make watermarks that survive going through an LLM, especially if the thing is using RAG
@mcc abusing unicode to generate text that tokenizes weirdly for watermarking is clever and seems to work well but it also seems to be easy to defeat by simply changing the tokenizer
@bob I think you are trying to solve the opposite problem from the one I am trying to solve https://mastodon.social/@mcc/115873001725077622
Note: Before replying to this thread, consider whether being blocked by me is something you want.
@mcc I was about to reply with a bon mot, and then I read back.
Nah, you're totally right of course.
RE: https://mastodon.social/@mcc/115872922320160715
To explain my reasoning here—
- You cannot get LLM model creators to comply with a license or law, and they operate in secret, so you don't know they've stolen your code.
- However, people who *use* LLM code are not so lucky. They are non-anonymous. They must comply with the TOSes of their code hosts and the legal departments of their employers.
So attack the problem at the other end. People scrape and steal your code— you can't stop it. But once they do, they get *nothing else from you*.
@mcc Related: a JIRA ticket close reason of "Used AI".
@mcc how did you manage to quote a post in a reply to another post?
(I'll leave aside for now the fact the the quoted post was a parent post of the one being replied to!)
@mcc, if you can do this in a way which doesn't fall foul of the likes of the Debian Free Software Guidelines – and hopefully you can, and easily – then by all means go for it. 👍
@mcc yeah i think expedition 33 losing its game of the year award is the most visible example i've seen of the only way to stop people using AI-- once it becomes a liability to have people find out your code was AI-generated, you've eliminated almost everyone who is actually using it right now, and certainly the people who are providing the business model for the AI companies
In other words, if my use of "mixing" was not clear, I am not talking about a license which prohibits scraping for LLM models; I already release my code under such a license (the MIT license). I am talking about banning linking my code against code by people who *used* an LLM model.
To emphasize: I am not actually interested in having a discussion on this subject and will probably block without blinking if you try to start one.
@mcc I don't understand what you wrote: why does the MIT license forbid against scrappers for LLMs? If I understand the license correctly, it just needs to "be included in all copies or substantial portions of the Software". When we use an API, the software that runs it doesn't get shown or me. I don't understand your statement.
@mcc I'm of much the same mind.
What does the license you'd release under look like? Something brand new? Existing license with some new clauses? GPL family seem like a bad fit for this.
@ieure (1 of 2) I think it would have to be a new license. This is problematic because I am not qualified to make a license.
@ieure (2 of 2) I believe you could make this license GPL-compatible if you modified it for, rather than banning *all* LLMs, allowing a cutout for models which are under GPL-compatible license and which track the required credit/copyright notices. One could then argue that any LLM which is not following this rule is *already* GPL-incompatible, because it includes copyrighted material under violation of its license, and so violates the GPL's "no additional restrictions" clause.