BEFORE:
AFTER:
For my money, the difference between the first image and the second is (just about) as remarkable as any of the verbal wizardry I’ve seen from Claude or ChatGPT. The visual difference between the two images is easy to see. It’s the sort of thing, where, if you didn’t actually have to figure out how to transform the first into the second, you’d think it could be done in the blink of an eye, with the flick of the wrist.
And now you can, almost. With generative AI it really is almost that easy. Almost, but not quite. We’re not yet to the point where one can say, “computer, clean that up for me,” and it’ll know what you want and be able to do it. While I’m tempted to say we’re not far from it now, I don’t really know. The phrase “clean that up” is doing a lot of work in that command. I’m not sure we’re to the point where a so-called agent powered by a ginormous so-called Foundation Model can do that. That agent might have to be personally trained by the use to know what the scope of that phrase is.
The thing is, I’m not sure we need such an agent. What I actually had to do was not difficult and did not take much time, say two, three, five minutes, and this was my first time through. I had to:
- pick a tool,
- set a parameter on the tool, which took an adjustment to get it right,
- manually trace over the area I wanted altered, which took a little skill, but nothing beyond the reach of anyone capable of using Photoshop at all, and finally,
- direct Photoshop to execute the operation.
That last took a half-minute to a minute on my machine, which is maybe three years old. It would go much quicker with a new machine. Now, if I had fifty of those to do, in that case, yes, it would be nice to be able to get the job done with a simple command,
Now, if I were a painter with the requisite draftsmanship, I could paint a picture like either of those photos. Neither of them would present a technical problem to such a craftsman. But that craftsman is unlikely to want to execute an image like the first. Why junk up the scene with that intrusive table and what looks like a piece of a bike rack in the background.
Things are a bit different for the photographer. Sure, taking a shot like the second would as easy as the first shot, if those blasted things weren’t in the way. But they were, and that’s a big problem. I would think the problem would have been almost impossible to handle with traditional analog photography. You would have to manually paint over the intrusions. That’s not at all practical. With digital photography things are different. You can easily go in change any pixel you want to. You could, in theory, manually edit the first image so that it comes out like the second. But it would be hellish and time-consuming. There are probably people who can and have done that sort of thing. I hope they get paid and arm and a leg for doing it. But the need for that kind of skill is now all-but-over.
Now, notice that green smudge at the lower right. If I were shooting this for a magazine, I’d probably have to get rid of it. That would be difficult to do and I’m not sure how well the AI would be able to paint the girl’s feet. Perhaps I’ll give it a try some time.
But I’m in no hurry. “Why not?” you ask. “Because I like it there.” “Why, pray tell,” you ask. Because it gives a sense of distance, of space. Whatever it is, probably some kind of plant, it’s between the photographer and the kids. The photographer, that’s me, likes such things. He’ll even deliberately introduce such “defects” into his photos. They’re part of his aesthetic.
In this case, however, I doubt that there was any deliberation. I saw the kids move out of the corner of my eye. So I turned and took the shot. The green blob intruded, which is fine. But so did that table, not so good.
Now, back to the underlying AI tech. As I said up top, the difference between those two images is as remarkable to me as the verbal skills of ChatGPT or Claude. But, and this is very important, you need to understand that, to a first approximation, LLM tech treats language the same way this visual tech treats visual information. LLMs treat language as though it consisted of strings of colored beads. You and I know that those colored beads are in fact letters that spell out words and the spaces and punctuation between words. The AI tech doesn’t “know” that. You and I know that those strings are words, symbols; the AI tech doesn’t know that. As far as it is concerned, you might as well feed it (images of) strings of colored beads.
Remember step 3 above, where I trace over the part of the image I want removed? That’s a prompt. Or rather, that plus the source image constitute a prompt to the AI. It then produces a new image with the changes specified in the prompt. Considered at the appropriate level of abstraction, it’s the same as prompting ChatGPT or Claude to take a sad story as input and return a happy one.
[As Sean Connery said in that movie, “Here endeth the lesson.”]
"Now, notice that green smudge at the lower right. If I were shooting this for a magazine, I’d probably have to get rid of it. That would be difficult to do and I’m not sure how well the AI would be able to paint the girl’s feet. Perhaps I’ll give it a try some time."
ReplyDelete20yrs since I've used photoshop in anger. I started with v3. V6 is just about as good as now. 1st Adobe tower built by Mac sales. 2nd Adobe tower w skybridge built by windows sales. Now fully enshitified.
Green smudge.
Magic wand... blur 7, edge 3... always use odd numbers as even numbers don't dither properly. Assists with the jagged edges.
Mask green smudge.
Select adjacent playground and legs. Copy paste into masked area w 3 (or more) pixel blur as required.
Sharpen / Unsharpen! Seems counterintuitive. Works.
After working as a system dynamics rep with a global consulting co... maths not my strong suit... i left high tech and went tree seed picking for 6 months in the NT. Luxury. A mate was a minking reveg provider. Then, published a newspaper for a year and had to do full page used car ads w 20+ cars .pics taken with any background and and intrusions.
The bean counters ... newscorpse .... said production of car ads was taking too long. We did a test. Under 50 sec per car to outline, cut background and sharpen / unshapen, balance. They shut up.
Adobe has the best and fairest ai training set I believe. But as you say Bill, it won't be long before extinction comes knocking. Or enshitified service.
Dipity.
Adobe had a bit of a backlash last year against changes in its terms of service. I blogged about it: https://new-savanna.blogspot.com/2024/06/adobe-backlash-what-hath-ai-wrought.html
Delete