I've got a fairly complete description of the four-voice sound issue for anyone who's interested in the technical details. I keep talking about it but I never show exactly what I mean with diagrams and stuff.
As we all know, video generation in the classic Mac is based on the 15.6672 MHz system clock. This serves as the video pixel clock. Each line of video is composed of a total of 704 pixel time periods, 512 active and 192 where the display is blanked. Similarly, each frame of video is composed of 370 scanned lines, of which 342 are active and 28 where the display is blanked. We can visualize the timespan of each video frame with a diagram like this:
Code:
time --->
-----------------------------------------------
| active video | blank |
-----------------------------------------------
^ ^ ^ ^ ^ ^
| | | | | |
| first line of frame (line 0) | | | | | last line of frame (line 369)
| | | |
| | | | VBL (vsync, vblank) ends
| | |
| | | VBL starts
| |
| | first inactive line (line 342)
|
| last active line (line 341)
So to be clear, the diagram above represents one frame of time or about 16.6 ms and some of the video-related things that happen in the course of a frame.
Sound generation is synchronized to the video cycle and a new sound sample from the sound buffer is read from memory and output after each line. The sound buffer has 370 entries (numbered 0-369), exactly the same as the number of video lines. The sound buffer is interleaved every other byte with the disk speed buffer so although it has 370 entries, the whole sound buffer spans 740 bytes since every other byte in the buffer is for the (unused on 800k/1.44M drives) disk speed buffer. So the sound buffer must be updated once per frame with 370 new samples. This occurs as a “VBL task” after the vertical blanking interrupt. Let’s look at the code in the Mac ROM for generating 370 samples of four-voice sound:
The significant thing here is that the update starts at byte index 370 in the sound buffer, halfway into it. 185 samples are generated (half of the buffer), and then the driver goes back and generates another 185 samples starting at the beginning of the buffer. So the buffer is first filled up from the middle to the end, and then from the beginning to the middle.
Now let’s look at the beginning of the code for generating single-voice sound:
I’ve omitted the loop to compute a sample but the significant thing to notice here is that in this case, the sound driver is not adding 370 to the sound buffer pointer in order to skip 185 samples into the sound buffer. In the simpler single-voice mode, the sound driver adds 64 to the pointer in order to skip 32 samples into the buffer.
So the takeaway here is that in four-voice mode, sound generation starts at position 185 in the sound buffer, halfway into it, whereas in single-voice mode, sound generation starts at position 32 in the sound buffer, less than one tenth of the way into the buffer.
Now we know from Andy Hertzfeld’s famous story “
Sound by Monday,” that the Mac spends almost 50% of its CPU time generating four-voice sound. This matches what we are seeing in the four-voice loop, which takes 160 clock cycles or 20+ microseconds. A sound sample is output every 45 microseconds so yeah, that’s about half the CPU time. Now let’s make some more of those time diagrams but this time indicating the progress of the CPU outputting the sound buffer and the Mac sending out the samples.
Here’s a diagram showing what’s happening in the video cycle when the Mac begins generating sound samples in the problematic four-voice mode:
Code:
-----------------------------------------------
| active video | blank |
-----------------------------------------------
#### ^
^ ^ ^
| | |
| | | sample gen start
| |
| | max output position in buffer
|
| min output position in buffer
I’m showing here the range of possible positions of the sound currently being output when sound samples start to be generated. We don’t know exactly how long it will take to do other VBL stuff before sound generation begins, so we don’t know exactly where in the video cycle the Mac will be when sound generation starts. Therefore the diagram indicates a fairly broad range of where in the frame (and therefore where in the sound buffer) the Mac is currently outputting from.
Now I’m gonna sort of move time forward and do the diagram again. We know that samples are generated in four-voice mode at about 50% of the speed that they are output. So let’s do the diagram again for the future point in time where sample generation is half complete. This time I’m just gonna show the range of sound buffer output positions with hash marks and the sample generation pointer with the ^ symbol:
Code:
-----------------------------------------------
| active video | blank |
-----------------------------------------------
#### ^
And now as the final sample is generated:
Code:
-----------------------------------------------
| active video | blank |
-----------------------------------------------
^ ####
All these diagrams were scaled for regular 7.8336 MHz CPU speed. Were the processor just a little bit faster, the diagram for when the final sample is generated would look like this:
Code:
-----------------------------------------------
| active video | blank |
-----------------------------------------------
#### ^
This is the cause of the sound problem! If the CPU generates the samples too quickly, generated samples overwrite samples which have yet to be played. This causes the sound to sort of skip slightly 60 times per second. Since it's happening so fast, it doesn't just sound like a CD or record skipping, you get this 60 Hz content and it sounds like a deep groan superimposed on the music. Even a slight increase in speed during sound generation will cause the sound generation pointer to cross over the sound playback pointer, corrupting the audio data as it is played.
So there is a simple way to fix it. It’s a bit coarse in terms of the speed impact but it works. Whenever the sound buffer is written to, slow the CPU way down for the next 30 microseconds or so. During sound generation the slowdown will be continuously triggered, so sound generation will go at 8 MHz equivalent speed and everything else will go at 25 MHz. If no sounds ever play, the CPU never slows down and you get the full 25 MHz. If there are just one-voice sounds playing, that takes maybe 12% CPU time or so and the impact on speed is not so bad. 12% at 8 MHz and 88% at 25 MHz so that averages out to 23 MHz. But with four-voice sounds, you take a big speed hit. 50% of the CPU time is spent at 8 MHz equivalent speed and the other half at 25 MHz, averaging out to 16 MHz or so.
The fix works, basically confirming that we have the correct cause of the issue. It was also interesting to find the perfect slowdown speed. The closer I approached the right (slow) speed, the better the sound quality as the amount of overlap in the sound buffer decreased. We will have to test all the various programs using four-voice sound to make sure that they all sound good on the WarpSE.
The "sound slowdown" solution is okay and I think I’m going to use in the final WarpSE but I would like a better way to fix the problem. Maybe I will write about my better idea another day... Sorry for any typos or wordiness, I was just trying to get this out quick in case anyone is interested in exactly the reason for the sound problem.