Thanks for starting the discussion in a post!
Full disclosure - I recently spent a few weeks porting open sourced linux
doom over to System 7 specifically targeting the SE/30 . Community has been generally positive. A few negative responses over in a thread on 68kmla related to AI’s use in making it. It’s my first time using AI (in this case claude code) to help make a vintage mac app, and have made vintage mac apps before.
Opinions follow:
## Trusting AI-generated code
Inherently, anything AI-generated is less trustable than something human-created. Not because AI necessarily produces worse code than average humans (it may or may not) but because the kinds of mistakes that AI makes (be it in pictures, algorithms, logic, code, bugfixes) are simply different than the mistakes humans make.
So
if this matters in the application - strange failure modes might be unacceptable. Take two examples:
- A System 7 application that is a game (let’s say Doom)
If there are weird failure modes made in the AI’s code that are unexpected and unlikely, compared to human code, it’s probably alright. I know of no one using Doom to make safety-critical decisions or workflows such that it is paramount to comprehend and control fail cases such that AI-induced failures would be of major concern
- A self-flying plane codebase
Clearly a safety critical application. The stakes for failure are tremendous. The safety argument for the overall system is underpinned with safety analyses that include things like fault trees and hazard + risk analysis, etc. Basically — understanding and mitigating failure modes (be it from the software or otherwise) is a must for being able to maintain and justify the safety of the system.
My point is that you certainly *require* trust in the code in #2 and you *do not require* trust in the code in #1. Therefore the bar to acceptability of AI code is much lower in 1 than in 2. But it’s about software and the safety-criticality of the appplication that makes the difference. And I’m not saying you can’t have item #2 with AI code in it - but you’d want to do a lot more than vibe code that’s for sure.
Rounding back to our Hobby of retro tech - The whole thing is pretty much full of non-safety-critical applications (think like #1 above).
Ok so that’s trust.. what else matters in our hobby?
## Code accessibility
Another thing that AI induces is a
“Comprehension Debt” by its developers. So if not handled well, the developer(s) of a project will quickly lose touch with understanding how it is all the code they are pumping out works — innate system insights are lost or never developed — and being able to make changes becomes harder and harder as this Comprehension debt expands. Take Doom-SE/30 — Of the code that was changed beyond the initial linux-doom it was based on, I wrote maybe 1% of it by hand. The rest was done by Claude’s models. If it matters, and I think it does, the 99% written by claude was “managed” by me. It wasn’t just “go” and come back in 14 hours to find the game baked to where it is now. There were issues, bugs, plans, fails, iterations, arguments with the model, etc. I do not claim that this makes up for the Comprehension Debt but it is different than “go” and hitting publish.
But back to code accessibility — in so much as a retro tech project hopes to-, expects to-, or ultimately benefits from- community handoff and contributions now or in the future — it would certainly behove the codebase to be understandable because the first step to extending it later (especially by others) depends on the codebase being accessible, or rather “comprehensible.” So if AI-built code incurs a comprehension debt, and when there is hope or desire for the code to be carried forward by the community in the future (or heck even by the original author after they’ve forgotten key details) then there is benefit in the produced code to be comprehensible.
A point here is that there are TONS of software products that do not have any source code available but whose value is still high. I don’t have the source code to SimpleText, but I’m glad the program is there and I get a ton of value from it. Same could be said about the vast majority of programs we use for our retro hobby - if for no other reason than because most of what we use is old closed-source products.
Put another way - What’s the difference between:
- High-comprehension-debt AI-based apps for our retro machines that work
- Low-comprehension-debt based apps for our retro machines that work
Yes, one is ultimately better than the other in absolute terms. But if they both work —is the typical hobbyist going to sit around saying “I’m so glad the one I’m using exhibited low comprehension debt by its authors” ??
Don’t get me wrong —
there are _certainly_ examples of open-source projects in retro tech communities where comprehension debt and accessibility to the code DO MATTER. For example -
the BlueSCSI project, itself a fork of the ZuluSCSI project, is a community-supported endeavor. Here I’d say it’s extremely important that Comprehension Debt be managed (AI-induced or otherwise). And using AI tools to “vibe code” (for lack of better term meaning using AI coding in a way that produces high-comprehension-debt in the code produced) this sort of project comes at great risk because that debt will work against the broader effort and advancement of the project medium- and long-term.
Key question others might know: How are big famous open-source projects managing coding contributions when it comes to AI-assisted changes? How’s Linus Trovalds handling it with the open sourced parts of linux? He’s still the big MR approver on major changes…. Maybe someone here knows? Or @eric how would you look at submissions to BlueSCSI repo that are AI-assisted? Have you set guidelines? More curious - not saying that you should or shouldn’t.
## Other relevant considerations (I assume no right/wrong stance on these, but they might feed thought and conversation)
- Alternate workflow and mindset frameworks for considering AI-assisted retro community code
- Given the documentation I produced and included in Doom-SE/30 (see PERFORMANCE_IDEAS.md and OPTIMIZATION_HISTORY.md and the Source Code and Port manual file) one could argue that the real value of the project long term is in having these records of HOW the port was done, WHAT optimizations were tried and their results, and FUTURE WORK. In many ways it strikes me like a research article in academia. It didn’t productize the work, but it did present the results of enough prototyping to get it working. Someone could look at it, and the codebase, and the docs I cited, and advance it. Heck with the codebase and those documents someone could go and re-implement the optimizations by hand in (hopefully) a more efficient way (better sw performance) as well as in a way resolving the AI trust/comprehension issues. Or in 30 years they could just feed that info as part of the information prep for re-doing the port from scratch (with or without AI).
Does providing that extra documentation help make things better? I provided it for both that reason and in order to just keep a record of what I tried. AI is not intelligent and having records helps keep it in check when it forgets what was done in the past.
- The “Who cares, if it works?” mindset
- If you always wanted Mario on your Atari, and no one ever made it, and you can make it and do so with the help of AI --
How much more really matters? (possible answers follow - none do I claim to be my position)
- We don’t want AI-slop programs in the retro tech community, so even if it gets us programs and functions we’ve wanted and didn’t have, I’d rather not have them if it risks or has AI issues
- I’m glad to have it at the cost/risk of AI-induced issues, but I want it treated “different” somehow in the community. Maybe it’s a naming convention, maybe its that its collected differently in sites like `macintoshgarden.org` so we know it’s AI “tainted” somehow.
- Nothing else matters — we had no Mario on Atari, and now it’s here - yay.
- Note that the subbullet 2 above is interesting but brings loads of concerns. Who governs this / decides when something has “enough” AI to count? Who enforces this? What about folks who lie or just don’t tell people that they used AI?
Anyway - this post is long enough but I was excited at the prospect of this sort of convo in our community. It’s a conversation that should happen throughout not only coding industries/communities but throughout society. So if you’re in this convo and have something to say - welcome - and note that you’re thinking about questions that the entire human race has to ponder and act on. Pretty cool to be part of something that large and important.