68k ScrnBase and its alternate screen address

Relating to a 68k application

Mu0n

Active Tinkerer
Oct 29, 2021
609
560
93
Quebec
www.youtube.com
Target platform: real Mac Plus
Secondary target platform for quick bug squashing: mini-vMac

My project has been real challenging. I've stumbled upon every possible bug, like always, despite avoiding a ton more. But I feel like mastering this brings me significantly closer to my fluid game engine goal.

I've been juggling 3 types of events:

1) Do some moderate DrawString + FillRect to erase it + load strings from resources, on a small portion of the screen, on both screen buffers for good measure as to not get a sudden discontinuity when the screen flips
2) Screen flipping (happens every 4 or 5 seconds at worst, usually 10 seconds.
3) getting PICT from resources and drawing them on only 1 screen

The VBL task decrements one counter for each and nothing else - I kept it very simple so it can get nope out of there quickly and let the main loop retake control.
A hard coded array (for each) tells when the next event occurs in tick count, which gets pumped into the globals that are tracked within the VBL task.

Anticipating problem #1: maybe the getting and drawing of a PICT (512x300) would have take too long sometimes, so I made sure to balance it right in the middle of screen flips.
Anticipating problem #2: the time it takes to draw and do offscreen stuff used to be not taken into account. I'd finish the task, then set the timer for the next one (wasn't noticeable under the emulator in all-out speed mode, but very noticeable under real metal conditions). Now, I get the signal to perform the task, I add the next one as a pending event right away, then perform the task(s) at hand. I've even let my loop deal with only one pending task of each type per loop pass, accepting that there might be more than 1 per type waiting. It patiently waits until the next pass of the main loop to get to it. This lets a catching up mechanism occur.
Anticipating problem #3: Fearing the PICT events the most, I've loaded ALL of it in advance in memory at the start of the program, at worst it uses around 220kb of RAM, which is not that bad. Only the draw to main/alt as offscreen manipulation occurs in the main loop now.

Still, I'm having issues getting a snappy, on-time unfolding of events. I'm tempted to just throw the towel once more and have you guys take a look if you're interested.
 
Last edited:

Crutch

Tinkerer
Jul 10, 2022
293
228
43
Chicago
Small comments:

Right, DrawPicture will be much slower than CopyBits which is much slower than direct bit blitting. If you want to be fast, by far the best thing to do is to draw your picture in an offscreen bitmap somewhere, then just direct copy those bytes into screen (or altscreen) memory when desired, preferably using an asm { } block and an unrolled loop (if targeting a 68000). Your speed up doing this vs. DrawPicture will be multiples.

Also, VBL tasks are really useful if you want to interrupt some other code to do something … if you are just setting a flag for your main loop to detect, you can skip the VBL task because the vertical retrace manager increments the Ticks lomem global on each retrace. Instead of having your main loop poll the global set by your VBL task, save a step and just have your main loop poll Ticks.
 

Mu0n

Active Tinkerer
Oct 29, 2021
609
560
93
Quebec
www.youtube.com
Small comments:

Right, DrawPicture will be much slower than CopyBits which is much slower than direct bit blitting. If you want to be fast, by far the best thing to do is to draw your picture in an offscreen bitmap somewhere, then just direct copy those bytes into screen (or altscreen) memory when desired, preferably using an asm { } block and an unrolled loop (if targeting a 68000). Your speed up doing this vs. DrawPicture will be multiples.
Indeed, I knew that but for some bone-headed reason, the emulator spoiled me into concentrating on other problems first. I think I have to go there for good measure, so thanks for once again putting me on the right track.

Also, VBL tasks are really useful if you want to interrupt some other code to do something … if you are just setting a flag for your main loop to detect, you can skip the VBL task because the vertical retrace manager increments the Ticks lomem global on each retrace. Instead of having your main loop poll the global set by your VBL task, save a step and just have your main loop poll Ticks.
What you propose, main loop only?
pass 1: TickCount polled, no events to do
pass 2: TickCount polled, spotted one long event from type 3 to do, and do it immediately
pass 3: TickCount polled (probably multiple ticks in the future), spotted 2 tasks to deal with from task type 1 and 1 task to deal with from task type 2

taskt-to-do spotting would work by comparing the tickcount with a hardcoded array that lists the cumulative ticks things should happen. If your tickcount is higher than the currently pointed element, deal with it, then find out if there are yet more and catch up to them.

What I do, VBL task:
type 1,2,3 counters are decremented

main loop checks for Button() to exit out,
if a counter is <= 0, set it to the next hardcoded delay right away and raise an in-loop flag to deal with it - except if you go out of bounds in the hardcoded array
do this for every 3 tasks, this only takes a handful of cycles
continuing on the main loop, if a pending task is raised, deal with it now, probably burning a few ticks in the worst case scenario, but VBL will dutifully decrement those counters anyway

I mean, if my method works and is not broken, I'll keep using it for this application.
 

YMK

Active Tinkerer
Nov 8, 2021
358
285
63
Anticipating problem #1: maybe the getting and drawing of a PICT (512x300) would have take too long sometimes, so I made sure to balance it right in the middle of screen flips.

To get an idea of maximum pixel throughput, see how fast you can copy screens from RAM to VRAM. That's your best case.

Any drawing operation which masks existing VRAM data is going to involve a read-modify-write cycle, which is slower than a blind write.

Worse yet, drawing a bitmap to a pixel location not aligned to a word boundary will require bit shifting.

What type of game is your engine for? That is going to determine the optimizations you can get away with.
 
  • Like
Reactions: Crutch

Crutch

Tinkerer
Jul 10, 2022
293
228
43
Chicago
What you propose, main loop only?
pass 1: TickCount polled, no events to do
pass 2: TickCount polled, spotted one long event from type 3 to do, and do it immediately
pass 3: TickCount polled (probably multiple ticks in the future), spotted 2 tasks to deal with from task type 1 and 1 task to deal with from task type 2

Right all I’m saying is that writing a VBL that decrements a counter, then having your main loop check if the counter hits zero, is exactly equivalent to just having your main loop poll Ticks and do something every N times it increments. The Vertical Retrace manager updates Ticks at the same time that your VBL task would run. It’s certainly fine to use a VBL if you prefer (though it could complicate debugging if it stays around after a crash … probably causing another crash).

You really only need a VBL if you need to do something other than on a when-the-main-loop-gets-around-to-it basis, like animate a spinning mouse cursor or update a status bar while you’re stuck off in a tight-loop calculation somewhere.
 

Mu0n

Active Tinkerer
Oct 29, 2021
609
560
93
Quebec
www.youtube.com
The Vertical Retrace manager updates Ticks at the same time that your VBL task would run. It’s certainly fine to use a VBL if you prefer (though it could complicate debugging if it stays around after a crash … probably causing another crash).

You really only need a VBL if you need to do something other than on a when-the-main-loop-gets-around-to-it basis, like animate a spinning mouse cursor or update a status bar while you’re stuck off in a tight-loop calculation somewhere.
Cascading crashes is the name of my indie prog rock band
 
  • Like
Reactions: Crutch

YMK

Active Tinkerer
Nov 8, 2021
358
285
63
You still haven't told us what kind of game your engine is for, so I can only speculate.

If it isn't necessary to wipe and render the entire screen each frame, doing so puts you at a disadvantage.

Same with rendering to RAM (double buffering).

I'd bet that Prince of Persia isn't wiping the VRAM with each frame.



That said, if you must move entire screens into VRAM quickly, look into the MOVEM.L instruction.

MacPaint uses just two of them to copy 416 pixels from RAM to VRAM.
 
Last edited:

Crutch

Tinkerer
Jul 10, 2022
293
228
43
Chicago
(Btw I think his comment that “only 13 registers were truly available” [since you can’t use the stack pointer] isn’t strictly accurate, Atkinson could have saved the stack pointer in some fixed memory location like ApplScratch, and stuffed a sentinel value into the BitMap to tell him when to stop blitting rows. But a 448-pixel-wide window wouldn’t have left room for visually appropriate borders and the tool palette, maybe.)
 

Mu0n

Active Tinkerer
Oct 29, 2021
609
560
93
Quebec
www.youtube.com
You still haven't told us what kind of game your engine is for, so I can only speculate.

It's not a game engine yet, but it could be.

No, I was working (for now) on a sing-along type app for The Last Unicorn's theme song.
It follows the song at the syllable level. The graphics are changed with screen flip so as to never show any weakness in the graphic speed department.

I'll made a thread about it with link to video: https://tinkerdifferent.com/threads...ram-in-c-for-my-old-mac-plus.2313/#post-19837


1674321832394.png



With custom beat per minutes, calculated in multiples of 16.6667 ms and rounded down in integers of ticks. The whole lyrics adapt to this speed calculated at the start, including screen changes, blockmoves, etc.

1674321836984.png
 
Last edited:
  • Like
Reactions: YMK

YMK

Active Tinkerer
Nov 8, 2021
358
285
63
(Btw I think his comment that “only 13 registers were truly available” [since you can’t use the stack pointer] isn’t strictly accurate, Atkinson could have saved the stack pointer in some fixed memory location like ApplScratch, and stuffed a sentinel value into the BitMap to tell him when to stop blitting rows. But a 448-pixel-wide window wouldn’t have left room for visually appropriate borders and the tool palette, maybe.)

I think you're right. All but two of the address registers could be used for data. With an unrolled loop, there's not even a need for a sentinel value.