WarpSE: 25 MHz 68HC000-based accelerator for Mac SE

Zane Kaminski

Administrator
Staff member
Founder
Sep 5, 2021
372
610
93
Columbus, Ohio, USA
Forked it :)

Guess I better go grab the plus from the loft now!
You can also change the pinout of the Xilinx CPLD if you need. Just let me know what your final pinout is so I can make sure it'll fit in the CPLD with the pinout you're using.

Edit: Also no need to connect to the Mac's 16 MHz clock since it's not on the 68000. The 16 MHz clock is only used by the interface from the FSB (frontside bus i.e. fast bus) to the PDS bus. I can redesign the FSB-PDS bus bridge to only use the 8 MHz Mac clock and the 25 MHz fast CPU clock.
 
Last edited:

Ubik

Tinkerer
Nov 2, 2021
41
55
18
Orange County, CA
Much better than the Levco SuperMac SpeedCard I am using right now, which has the sound problem and doesn't have a fix, and which has a more bothersome method to disable the accelerator.

Absolutely fantastic, Zane!
@Zane Kaminski I agree JDW. As a fellow SpeedCard user I can't wait to see the 68000 come pretty close to SE/30 speeds, which was the Nirvana back in the day for the SE.
 
  • Like
Reactions: retr01

JDW

Administrator
Staff member
Founder
Sep 2, 2021
1,577
1,373
113
53
Japan
youtube.com
@Zane Kaminski I agree JDW. As a fellow SpeedCard user I can't wait to see the 68000 come pretty close to SE/30 speeds, which was the Nirvana back in the day for the SE.
The really neat thing about the SpeedCard is the same for Zane's forthcoming accelerator — either can be disabled at cold boot, allowing you to use the SE with its stock 8MHz 68000 processor.

The SE/30 is in many ways a better Mac than the SE, so most people with one of those may wonder why you should even have an SE. But the SE/30's stock condition is a 16MHz 030. It can't be made into an 8MHz machine. The SE is stock 8MHz 68000 machine, and if you have the older SE ROMs (compatible with only the 800K drives, not the 1.44MB drives), you basically have the equivalent of a Macintosh Plus, but without the sub-par SCSI issues of the Plus. I can even boot my SE (with older ROMs) from System 1.0! That makes it a very enjoyable machine to use when you want to run some of the oldest software. Indeed, unless you just want the hum of the old 400K disks drives, you wouldn't even need a 128K or 512K or 512Ke Mac if you have a SE with the older ROMs. And by having an accelerator in the SE too, you get SE/30 or better performance when you need it.

Of course, we won't know about software compatibility until Zane's accelerator is out, but the SpeedCard maintains compatibility with very old Mac System Software versions, so I don't see why Zane's board would not.

Anyway, for SE owners reading this, if you have older ROMs in your SE, try boot from System 1.0, which you can download here...


Note that I cannot boot my SE from System 0.85 for some reason (perhaps a ROM incompatibility), but it will boot from 1.0 and later. Naturally, my Mac512 boots from System 0.85 just fine, but there's not much practicality in using System 0.85 (the OS used by the Tour floppy), so I don't consider that much of a loss when using a Mac SE versus a Mac128 or Mac512 or MacPlus.
 
  • Like
  • Love
Reactions: JeffC and retr01

retr01

Senior Tinkerer
Jun 6, 2022
2,473
1
796
113
Utah, USA
retr01.com
System 1.0 can be played pretty much from any platform in the web browser, thanks to Infinite Mac! :)

Of course, more fun on the SE! Let's see how fast WarpSE will make System 1.0 run...

1681337753699.png
 
Last edited:

Ubik

Tinkerer
Nov 2, 2021
41
55
18
Orange County, CA
@JDW and @retr01 - Great points about the reason an SE has value. I completely agree. I would add to that the SE is far less complicated and therefore more robust and resistant to the effects of aging. I love my SE/30, but it's more complex with a denser board and many surface mount caps that needed replacing, along with some very specialized chips that have a finite life. My barn-find SE worked right out of the dust with a new battery. I just don't worry about using the SE as a daily vintage gamer machine.
Moreover, SEs are definitely less intimidating, less expensive, with more options for the new vintage mac restorer - esp. with the reloaded logic board on the market now and soon this new accelerator!
 
  • Like
Reactions: JDW and retr01

Zane Kaminski

Administrator
Staff member
Founder
Sep 5, 2021
372
610
93
Columbus, Ohio, USA
James @JDW brings up an important point. I planned on only putting the FDHD ROM in the WarpSE but the later FDHD ROM won’t boot really early system software versions. I didn’t plan on socketing the ROM either. If your motherboard has the original ROM, you can of course disable the WarpSE to fix this but you can’t run System 1 with acceleration unless you do a time-consuming fast ROM reflash. Is this acceptable?

I could double the flash size and put both the FDHD and original ROM images in. You’d reprogram the CPLD to switch between them. But that’s almost as annoying as reprogramming the flash (or more since you have to open the case and connect WarpSE to a PC). It seems more straightforward to just use the BMoW ROMinator flash utility or something similar from us. Right? Not 100% sure here.
 
Last edited:
  • Love
Reactions: JDW

JDW

Administrator
Staff member
Founder
Sep 2, 2021
1,577
1,373
113
53
Japan
youtube.com
@Zane Kaminski

With my stock SE, I love the old ROMs because I love running the oldest System Software at times. I would never upgrade the ROMs if it meant I would lose that capability. And although I could swap out the two ROM chips and change the IWM to SWIM (a change of 3 chips is required to get 1.44MB functionality), it's too bothersome to do that because I would want to do the switch often.

By the way, the SWIM chip is supposedly compatible with the older ROMs. I've just not tested it. So if the story is true, then you would just leave a SWIM installed and then swap out only your 2 ROM chips. Swap in the old ROMs to give you 400K/800K drive compatibility and the ability to boot System 1.0. Then swap in newer ROMs to get 400K/800K/1.44MB drive compatibility, but boot only from System 6 and higher. It's hell to do that often, but functionality speaking, that would work.

Now if I purchase one of those mind-blowingly great WarpSE accelerator boards, and if that board has the SWIM chip or accesses the stock motherboard SWIM and also gives me the ability to perform a rather easy ACTION to swap between the ROMs, that would be outstanding. I would die to have that luxury! (Okay, well, maybe not die. But you know what I mean.) I define that ACTION as being either: (1) a software change (akin to a Control Panel change or similar), or (2) a press of programmer's switches at cold boot). In other words, the ACTION to do the ROM swap should not require me to crack open the Mac's case at all.

For example, let's say I've got an SE with old ROMs and an IWM. I then buy a WarpSE. What would be required for me to be able to swap between Old and New SE ROMs, getting acceleration even when I am booted into System 1.0?

I assume I would need to buy a SWIM chip to swap out the older IWM chip, unless WarpSE has an integrated SWIM. As I said, someone with a SWIM chip installed on the stock SE motherboard should be able to use older ROMs or newer ROMs, and then your ability to boot System 1.0 would merely be dependent on your using the older ROMs.

In short, if WarpSE allows me to CONVENIENTLY choose between old and new ROMs, that would be ground-breaking. For example, could I just hold down both the programmer's and reset switches while powering the SE on in order to "toggle" the ROM version?

Not sure what is possible, but something EASY would be desired. Wouldn't matter if a reboot was required to swap ROMs, as it obviously would be required. So long as you don't have to open the case or use a special programmer attached to WarpSE or the motherboard, the ability to swap ROMs would be great. But if an external programmer was required, then you would need to go through the time and effort to open the case, pull the motherboard, do whatever is required to reprogram, put it all back together, run it for a while that way, then repeat all that trouble when you want to switch ROMs, which is too much work for anyone to realistically do on a regular basis. I am one of those guys who would want to switch ROM versions often, if and only if it was convenient and easy to do.

Thoughts?
 
  • Love
Reactions: retr01

Zane Kaminski

Administrator
Staff member
Founder
Sep 5, 2021
372
610
93
Columbus, Ohio, USA
@JDW I have a question... have you ever used Big Mess O' Wires's ROMinator flash utility? It looks like this:
1681533062462.png


What if you have to use a similar utility to reflash the WarpSE's fast ROM? It'll take a minute or two to flash so it's slower than a control panel settings but you shouldn't need to open the case unless your Mac powers off or crashes during the ROM flash. If the Mac crashes you have to open up the Mac and use the WarpSE's USB update system to fix the partial flash. Is that any good?
 

JDW

Administrator
Staff member
Founder
Sep 2, 2021
1,577
1,373
113
53
Japan
youtube.com
@Zane Kaminski
That kind of flashing is perfect! Anyone can easily do it, and as you said, you don't need to break apart the Mac at all.

WarpSE is turning out to be one THE monumental upgrade board to have! My goodness, this will be amazing. Zane, you're absolutely brilliant!
 

Zane Kaminski

Administrator
Staff member
Founder
Sep 5, 2021
372
610
93
Columbus, Ohio, USA
Just finished what I hoped would be the last bit of performance tuning on the WarpSE relating to RAM refresh but there is one remaining issue.

I currently have the speculative refresh feature implemented like this:
Screenshot 2023-04-15 at 5.46.49 AM.png


This RS0toRef variable gives the conditions under which a refresh is executed. The interesting part is the first term which I've commented "refresh during first clock of non-RAM access." This piece implements the refresh latency hiding by waiting for the first clock of a non-RAM access to service a non-urgent refresh request. Adding this nets 1% improvement Speedometer 3 scores in the CPU and math benchmarks. Unfortunately it does not improve graphics performance in Speedometer 3. Actually I've redesigned this RAM controller a few times and graphics performance has actually decreased by 0.25% since an earlier attempt at the RAM controller. This 0.25% performance decrease since before is small enough to ignore but is statistically significant.

The reason that graphics performance hasn't improved is that only three of the four clocks of a refresh can definitely be hidden by the "refresh during first clock of non-RAM access" condition. A RAM access immediately following a non-RAM access during which a refresh was executed is delayed by one clock to complete the refresh. In this case the WarpSE incurs 25% of the refresh overhead despite hiding the other three clocks out of four. Evidently when running QuickDraw, ROM accesses are more often followed by RAM accesses than in Speedometer's CPU and math benchmarks.

So it would be better if we could come up with a condition that predicted with reasonable accuracy when, say, two back-to-back ROM accesses were going to occur. If we started the RAM refresh during the first ROM access, the subsequent ROM access would hide the remaining one clock of refresh latency and the graphics performance would improve a little.

Any ideas? It can be dynamic to an extent, like the RAM controller could switch between two or three different refresh hiding strategies depending on whether the current one is working well.
 

Kai Robinson

TinkerDifferent Board President 2023
Staff member
Founder
Sep 2, 2021
1,165
1
1,173
113
42
Worthing, UK
The SWIM isn't a surprise - as internally the SWIM includes the new chip AND the old chip - there's just a crossbar switch to select the relevant one when needed. IWM exists inside the SWIM at all times.
 
  • Like
Reactions: JDW and retr01

mdeverhart

New Tinkerer
Apr 17, 2022
7
3
3
So it would be better if we could come up with a condition that predicted with reasonable accuracy when, say, two back-to-back ROM accesses were going to occur.
It’s probably possible that you could do a statistical analysis of the ROM to determine which percentage of instructions will touch RAM on the following access - loads, stores, operands in RAM (based on the addressing mode), etc. You could then tune a refresh strategy based on the likelihood of a ROM instruction causing a RAM access.

Or - you could cut out the analysis, implement a scheme where some percentage of ROM accesses cause an opportunistic RAM refresh, and then experimentally tune the percentage (for example, pick a denominator like 16 or 32, and then try different numerator values).
 

retr01

Senior Tinkerer
Jun 6, 2022
2,473
1
796
113
Utah, USA
retr01.com
The SWIM isn't a surprise - as internally the SWIM includes the new chip AND the old chip - there's just a crossbar switch to select the relevant one when needed. IWM exists inside the SWIM at all times.

In a haystack among some hidden treasures at archive.org, I found the 1987 "SWIM Chip User's Reference," revised by Apple on January 11, 1988. :)

The SWIM combines two [virtually] independent disk controller chips into a single package: the IWM (Integrated Woz Machine), currently used in all Macintosh and some Apple I systems, and the ISM (Integrated Sander Machine), an independently-developed controller. The IWM hardware is capable of reading and writing disks using GCR (Group Coded Recording) encoding only. The ISM hardware is somewhat more flexible and can not only read and write GCR disks, but also the MFM (Modified Frequency Modulation) disks used in MS-DOS systems.

The "SWIM Chip Specification" document by E.A.B. for Apple on September 29, 1987, confirms the switch among some assembly coding. :)

The SWIM chip has two modes of operation, the IWM mode and the ISM mode. Only one of the modes can be active at a time. There is switching logic which selects which mode is to be used.

1681574777724.png
1681574879728.png
1681574946442.png
 
Last edited:
  • Like
Reactions: JDW

retr01

Senior Tinkerer
Jun 6, 2022
2,473
1
796
113
Utah, USA
retr01.com
Since SWIM has the IWM and ISM, the following disk formats are possible according to the Macintosh Technical Notes #272 and the other documents mentioned above. :)

SWIM Mode​
Format​
Blocks​
Is TSS valid?​
SD or DD?​
Sides​
Sectors​
Tracks​
IWM​
400K GCR
800​
Yes​
SD​
1​
10​
80​
IWM​
800K GCR
1600​
Yes​
SD​
2​
10​
80​
ISM​
720K MFM
1440​
Yes​
SD​
2​
9​
80​
ISM​
1440K MFM
2880​
Yes​
DD​
2​
18​
80​
 
Last edited:

mdeverhart

New Tinkerer
Apr 17, 2022
7
3
3
Or - you could cut out the analysis, implement a scheme where some percentage of ROM accesses cause an opportunistic RAM refresh, and then experimentally tune the percentage (for example, pick a denominator like 16 or 32, and then try different numerator values).
Thinking about this some more, you could also do dynamic adjustments to the refresh probability - every time you don’t opportunistically refresh but could have, you increase the refresh probability; every time you do opportunistically refresh and shouldn’t have, you decrease the probability. You could also move more aggressively in one direction than the other - for example, increase the probability by 1 if you could have refreshed but didn’t, decrease the probability by 2 if you do refresh and shouldn’t have.
 

JDW

Administrator
Staff member
Founder
Sep 2, 2021
1,577
1,373
113
53
Japan
youtube.com
Any ideas? It can be dynamic to an extent, like the RAM controller could switch between two or three different refresh hiding strategies depending on whether the current one is working well.
I'm afraid I am of no help here. I'm still not fully straight on the technical aspects of how vintage RAM refresh works. I only know the historical tidbits such as the SE being up to 15% faster than the older Mac Plus because the SE video circuits don't use as much processor time for RAM refresh. And on that note, the "Merlin and Mac" section of the following article about the Unitron512 is a rather interesting read...


Take special note of the last paragraph in that section which says...

The next model, the Mac SE, replaced the PALs with a custom chip and represented the first revision to the machine's timing in four years. They used the memory's "page mode" to read the video buffer two words at a time, which reduced the average number of wait states that the 68000 suffered while accessing the RAM. Other than this complex change, the memory timing was unchanged from the original design even though the memory chips now used were the 120 nS 41256 (with a cycle time of only 230 nS).
 

Zane Kaminski

Administrator
Staff member
Founder
Sep 5, 2021
372
610
93
Columbus, Ohio, USA
every time you don’t opportunistically refresh but could have, you increase the refresh probability; every time you do opportunistically refresh and shouldn’t have, you decrease the probability. You could also move more aggressively in one direction than the other - for example, increase the probability by 1 if you could have refreshed but didn’t, decrease the probability by 2 if you do refresh and shouldn’t have.
Yes yes this is basically it. I guess this is called “global scoreboarding” in the context of CPU branch prediction. It almost works but it needs another idea before it’ll improve performance. The issue is that we need more than one speculative refresh strategy. Then we can choose which to apply based on success rate. Otherwise with just one strategy, if the RAM controller observes that speculative refresh isn’t working well, it’s just deferred until it becomes urgent and the performance impact is worse. So we need another strategy or two to switch to, and then we can track the success of each and switch dynamically.

So the current condition I have is:
The first clock of any non-RAM access

That’s okay. We could weaken it to:
Any clock of any non-RAM access

I think that’ll usually hurt more than it’ll help since sometimes a refresh will come in late in a non-RAM cycle and less of the refresh will be hidden. So in that case we should wait for the first clock of the next non-RAM access. We have like 250+ clocks until the refresh becomes urgent so I think that gives a good opportunity to refresh during the first clock of another non-RAM access.

Maybe we can split it up:
First clock of ROM access
Fourth clock of video RAM write (only occurs when posted write FIFO is full)
First clock of I/O access (VIA, SCC, IWM, etc.)

And switch based on the occurrence rate of each. But that still probably won’t help graphics because graphics is strictly ROM reads, RAM reads, and video writes. The I/O devices are hardly accessed when doing graphics and the posted write buffer should be absorbing the latency of video writes to the motherboard.

So if only we could come up with two strategies which are both useful but in different situations…

I guess we can always refresh after the first clock of an I/O read since they always take a long time to complete. But again that doesn't help graphics.
 
Last edited:

Zane Kaminski

Administrator
Staff member
Founder
Sep 5, 2021
372
610
93
Columbus, Ohio, USA
I guess we should always refresh after the first clock of an I/O read since they always take a long time to complete. But again that doesn't help graphics since QuickDraw never accesses the I/O devices. Maybe there's enough room in the CPLD to "save up" some refreshes with a counter. Then when the VBL interrupt occurs, the handler must do some reads from the VIA, right? Conceivably it could even access I/O a few dozen times during the VBL IRQ. So we'd refresh repeatedly during I/O reads, increase the saved refresh counter each time, then "spend" the saved refreshes later by decreasing the counter instead of doing a refresh. This also conveniently eliminates the need for the urgent/non-urgent distinction. We might also not need scoreboarding. If we have a lot of saved up refreshes, only do more refreshes during I/O reads. Otherwise if we're low on saved refreshes, start doing refreshes during ROM reads.
 
  • Love
  • Like
Reactions: Ubik and JDW

Zane Kaminski

Administrator
Staff member
Founder
Sep 5, 2021
372
610
93
Columbus, Ohio, USA
Well I just shortened the refresh from taking four cycles to taking three cycles. In this arrangement, the RAM is slightly "overclocked." So I can't ship it but it gives an idea of the impact of hiding that one additional refresh cycle. With this, the latency of a refresh coinciding with ROM access is fully hidden. The performance gain wasn't great. In Speedometer 3, I got 4.178 vs 4.168 in the CPU test and 3.451 vs 3.448 in the graphics test. Disk and math were the same. So there's no point tweaking this any more for 0.2% gain in CPU and 0.08% in graphics. It'd be better to focus on something more fruitful like improving the sound slowdown tuning.

So on the subject of sound tuning, I have added a clock gate to the final WarpSE hardware. With this, the 68k's clock can be disabled without affecting the controller CPLD's clock. This way it's possible to slow the 68k down to 7.8336 MHz exactly rather than slowing down the RAM. With this hardware it will be possible for the WarpSE to slow down to exactly Mac SE (peak) speed during sound generation, rather than this approximate slowdown I have now. The approximate slowdown should work fine but if need be, this will let me do an update and add cycle-accurate slowdown. Test units are going out with the current arrangement which is perfectly fine but this change will support the update if necessary in the future.

I've also changed the PDS bus interface slightly to allow the CPLD to mask the top two address bits A23 and A22 sent to the PDS. The significance here is that this will let us map 3.5 MB (minus screen and sound buffers) of the Mac SE's RAM into the address space at addresses $500000-$57FFFF (512 kB), $600000-$7FFFFF (2 MB minus screen and sound buffers), and $800000-$8FFFFF (1 MB). Here's a diagram showing this to make it clearer:
Screenshot 2023-04-11 at 9.55.38 PM.png

Basically we are just ANDing the A23 and A22 bits as they go to the PDS with the select signal for these areas of RAM. This redirects the accesses in the $500000-$8FFFFF range into the $000000-$3FFFFF range on the PDS so the SE's motherboard RAM gets accessed. Obviously this RAM won't be very fast compared to the WarpSE's fast RAM, so we will disable this on production boards. What I'd like to do is use this setup to patch the memory manager to allocate RAM in these areas. Nominally an OS expects a contiguous region of RAM to work in, but maybe I can extend the end of RAM to $8FFFFF and add some fake application heaps to the memory manager to reserve space for ROM, SCSI, and the screen/sound buffers. That would be great to have on System 7. The fragmentation kinda sucks but most apps use 512 kB - 1 MB so it should work okay to let you run a few more apps at once. Hopefully the memory manager can be patched in this way. Maybe someone who knows more about the internals of the Mac OS can figure out some more about the memory manager's data structures.
 
Last edited:
  • Love
Reactions: JDW