@newt@tk i think there was a project to try and watermark images in ways that could make common diffuser models confused if you tried to train art on it. i kind of just laughed at the concept because the horny nerds will find a way around it.
@icedquinn@tk@newt It's a little more advanced than watermarking but yeah
It's fairly effective from what I've seen
They're just straight up poisoning the well from which training data comes by tweaking pixels in a way that's barely perceptible to humans (still working on that) but deadly to training
@Moon@tk@icedquinn@leyonhjelm@newt > hand-curated
Lol, lmao
That's not possible except for the truly dedicated
You need millions of samples, each which would need to be checked individually
@TURBORETARD9000 the amount of data you need to train from random init depends on how large the network is. you do need a "lot" of images when you make some huge chungus neural network.
fine tuning you can get away with less. people have done fine tunes of VITS speech models with only minutes of samples.
the trick is all the other data that was close-ish is still in the neural network.
that being said existing models are severely overblown. nobody us doing stuff like the old GMDH where you grow a network that is just the right size, or Numenta's contextualized learning, they're just shitting tons of neurons out and boiling oceans.
@Moon@TURBORETARD9000@tk@newt all the adversarial attacks i've seen in whitepapers require you have the model on hand to deceive it. its entirely possible that just fine tuning the model after the fact breaks your cheats.
Add comment