retr01

Senior Tinkerer
Jun 6, 2022
2,473
1
793
113
Utah, USA
retr01.com
@alxlab, thank you. :) (y) I would like to see a good repair book from back in the day for the SE/30. When I downloaded Larry Pina's book you uploaded, I could not find any mention of the SE/30. :confused:

I am aware of the Apple Service documentation and books by Apple. :geek:

It looks like Larry Pina did another book that is a newer edition covering the SE/30, Classic, and Classic II.

1655787024157.png
1655787076231.png

It is available to borrow for ONE LOUSY HOUR at the Internet Archive. o_O That book is what I need, but hard to find. :cautious: Does anyone have any suggestions?
 

Zane Kaminski

Administrator
Staff member
Founder
Sep 5, 2021
371
608
93
Columbus, Ohio, USA
The PIO units do their own IO on their own clock, so there's no need to use the SPI engine. It can delay X pixel clocks after detecting the end of HSYNC and then start reading on the next cycle. By default GPIO reads will be queued for the main CPU in 32-bit words in a 4 or 8 word deep FIFO and the DMA engine can be triggered after each word to fetch those pixels to RAM without involving the code running on the main CPU cores.
Hmm, may need to double the clock so there's time for a conditional JMP at the end of scanlines
Okay I have looked into the PIO and yeah, it's quite robust. But there is still an issue...

We can set the clock divider to approximately the same as the Mac's pixel clock. And then we would want to write a PIO program that does this:
1655815004948.png


But if we are running the PIO at ~15.6672 MHz, then we still can't center the PIO clock in the middle of each pixel. If only the PIO unit itself could rapidly change the clock division factor. Then we could run at undivided 133 MHz speed, wait until we see HSYNC low, wait an additional fixed number of 133 MHz cycles, and then switch to ~15.6672 MHz, having waited just the right number of 133 MHz clocks to center the divided PIO clock edges right in the middle of each pixel. Then read in the 512 pixels go back to 133 MHz in preparation to do it again.

Unfortunately the PIO can't change its own clock divider frequency. Only the main CPU can do that. The PIO could raise an IRQ to the main CPU to change its clock frequency but that would mess up the precise alignment issue I was trying to solve by doing the waiting at 133 MHz. The variability in the 133-to-15.6672 MHz changeover time would screw up the accuracy gained by waiting at 133 MHz.
 
Last edited:
  • Like
Reactions: retr01

alxlab

Active Tinkerer
Sep 23, 2021
287
312
63
www.alxlab.com
If we wanted to use the PicoVGA library it sound like it would a specific frequency for the different resolutions.

1655820787997.png


Maybe this will cause an issue if we need a different frequency for input and another for output?
 

Zane Kaminski

Administrator
Staff member
Founder
Sep 5, 2021
371
608
93
Columbus, Ohio, USA
Maybe this will cause an issue if we need a different frequency for input and another for output?
Nah, we just have to adjust the PIO divider ratio separately for each of the PIO state machines involved. There are two separate PIO units each with four PIO state machines. I think DVI output uses three or four state machines in one PIO, leaving the others free. Right?

And by the way, the way the divider works, there is no "divided system clock." Instead the clock divider gates the clock according to the programmed division ratio. So for a division ratio of 2, the PIO's clock would only be enabled every other cycle. Then there's a fractional clock divisor component where the remainder gets turned into an extra cycle of clock disablement whenever the fraction accumulates to greater than 1. Like so for a ratio of 2.5:
Screen Shot 2022-06-21 at 10.54.45 AM.png


So there is a bit of jitter with this approach since sometimes you wait N system clock cycles and sometimes you wait N+1 but no big deal I guess.
 
Nov 4, 2021
126
98
28
Tucson, AZ
I've got the basic scanline timings working on a PIO.
1655971390425.png

H-Blank:
1655971402706.png


Right now the program is just NOP-ing instead of reading a pin because I don't have the reading part built out and Mu hates it when I print tons of numbers. I'm driving a side pin high while in the reading pixels loop and low during what should be the H-Blank interval just to see the timings.

Here's PIO program that is wrapped in a CircuitPython script
Code:
.program vidcap
.side_set 1
; cut the bottom 7 bits off of the scanline length
; then pad it back with 0s to get a value > 31
set y, {(screen_width)>>7}
in y, 5
in null, 7
mov y, isr
; use jmp to decrement y (512 pixels: start loop at 511 down to 0)
jmp y--, skip
skip:
push noblock

; wait for hsync
hsync:
;wait 0 pin 2
; hblank = 192 pixels = 2x12x16 cycle delays
set x, 22 [13]
hblank:
    jmp x-- hblank [15]


mov x, y side 0
scanline:
    ;in pins, 1 side 1
    nop side 1 ; time holder for IN
    jmp x-- scanline side 1
jmp hsync side 0
 
  • Like
Reactions: Zane Kaminski

Trash80toG4

Active Tinkerer
Apr 1, 2022
910
260
63
Bermuda Triangle, NC USA
But if we are running the PIO at ~15.6672 MHz, then we still can't center the PIO clock in the middle of each pixel.
Why trip at the center of the pixel? Might keying to the leading and or trailing edges of the signal be easier for converting analog to digital input in this single bit, Black/White system?

Synchronicity would be necessary for sampling a GS signal I think, but why couldn't this particular converter's sampling be asynchronous based on a very high freq?

Totally out of my depth here, but had to ask as the only stupid question . . .
 
Last edited:
  • Like
Reactions: Zane Kaminski
Nov 4, 2021
126
98
28
Tucson, AZ
Hmm, my PIO program runs at twice the pixel clock in order to have time to do the necessary looping logic so it might actually be sampling in the middle of the pixel already. If not, it would be trivial to add a 1 cycle (1/2 pixel) delay to before starting the capture loop.
 

Zane Kaminski

Administrator
Staff member
Founder
Sep 5, 2021
371
608
93
Columbus, Ohio, USA
Hmm, my PIO program runs at twice the pixel clock in order to have time to do the necessary looping logic so it might actually be sampling in the middle of the pixel already. If not, it would be trivial to add a 1 cycle (1/2 pixel) delay to before starting the capture loop.
How can you tell during operation whether to add the delay?
 
Nov 4, 2021
126
98
28
Tucson, AZ
It shouldn't change in operation so once the timing is figured out it should be fine.
Someone with a faster scope could try changing the "side 1" on the next to last line to "side 0" it'll change the output to do a fast square wave at 2x pixel clock with it going high when sampling and see how it lines up with a real signal.
 
  • Like
Reactions: -SE40-

Zane Kaminski

Administrator
Staff member
Founder
Sep 5, 2021
371
608
93
Columbus, Ohio, USA
It shouldn't change in operation so once the timing is figured out it should be fine.
Someone with a faster scope could try changing the "side 1" on the next to last line to "side 0" it'll change the output to do a fast square wave at 2x pixel clock with it going high when sampling and see how it lines up with a real signal.
That’s the thing, it will be hard to constrain the alignment properly, and you will certainly have to re-decide on whether to do the extra clock for alignment on a line-by line basis. At 2x pixel clock it’s maybe doable but there are many gotchas.

The way the system clock division works is very jittery for starters. The integer part of the division is accomplished via skipping that number of system clock cycles, only enabling the PIO clock one in every N system clock cycles. The fractional part of the division works by accumulating a fractional component 1/M and then skipping an additional system clock when the accumulated 1/M count exceeds 1. So for a 133 MHz system clock dividing into 31.3344 MHz, the division factor is approximately 4.255. So that means PIO will usually skip 4, 4, 4, then 5 system clock cycles. But every once in a while, it will skip 5 system clocks in a row twice since the division factor is slightly over 4.25.

With the jittery division issue in mind, consider what happens when HSYNC goes low and we enter a line. First of all, the HSYNC transition may be concurrent with the sampling window around the clock of the Pi Pico. In this case, random data will be seen by the Pico. That is to say that although HSYNC has gone low, it was slightly too late for the Pico to see it and therefore the PIO may randomly take an additional PIO clock to see HSYNC low. Since the sample window around the Pi’s clock is of some finite nonzero amount, That translates to ever so slightly more than one PIO clock cycle worth of time of inaccuracy in sensing the HSYNC transition.

Moreover, you cannot control the fractional clock division accumulation at the moment when the HSYNC transition is noticed by the PIO. If the HSYNC low transition could force the fractional clock divider accumulator to reset to zero, then there would be no issue, but unfortunately at the moment of HSYNC, we don’t know if the PIO will skip 4 or 5 (or 5 twice) system clock cycles next. Although small compared to the previous jitter effect of one whole PIO cycle, the fractional clock jitter directly adds to the HSYNC detection inaccuracy.

So running at 2x pixel clock and 133 MHz system clock, we have 7 system clocks (5 from the first thing, 2 from the second thing) of inaccuracy sensing HSYNC, or 52.5 nanoseconds, compared to a 63.8 ns pixel clock period.

Then we have to add in the additional skew caused by the difference in our clock and the Mac’s. Doubling the clock speed you mentioned earlier, that’s 31.32648 MHz which makes for 44,946 nanoseconds per line. The Mac takes 44,934.64 ns per line so there’s a 12 nanosecond difference. Therefore the Pico’s sample clock will start out at a particular alignment with the pixels from the Mac and then shift 12 nanoseconds by the time the line is done.

And the alignment also isn’t fixed, it will change each line because both oscillators are free-running.

Adding it all up, the inaccuracy is actually greater than a pixel clock period. So it may be hard to select the 512 active pixels in the middle of the line and it may also be hard to avoid capturing repeated pixels, etc.


Please do try the experiment outputting the pixel clock though! I am curious to see such a direct representation of the issues I’m referring to.

edit: oh I forgot, you said you don’t have a fast enough scope. I can try it eventually. Gotta order a Pi Pico though.

Edit2:
So the solution is to run at a faster PIO clock frequency. The way you are doing it works for this, just do more nops or whatever between taking samples. Too bad you can only input one bit at a time this way. I wish you could instruct the PIO input shifter to take in a single bit without sending the whole word to the FIFO yet. Unfortunately we can only do 1 bit at a time so the overhead is abysmal. We can DMA from the FIFO into main RAM but the storage overhead is 8x or 32x (not sure if we can do bytes) so we wanna have the ARM process it quickly into the proper packed format. So therefore the loop to do this has to run at 15.6672 M iterations/sec or 8.5 ARM clocks per word processed from the DMA destination. Probably doable but tight.

Hmmmmm oh I guess we’re overclocking basically 2x though to do DVI. Hahah then the loop on the ARM will be much less constrained.
 
Last edited:
Nov 4, 2021
126
98
28
Tucson, AZ
The word doesn't get pushed into the FIFO until it's "full", where "full" can be set from 1 to 32 bits. I was figuring on leaving it at 32-bits so a scanline would be 64 bytes, aka 16 words, pushed to the ARM core.
A slight PLL tweak should provide an overclock to 125.33333 (off the shelf Pi Picos run at 125Mhz but are listed as "up to 133Mhz") that would make the PIO clock division an even 4 to get 31.3333MHz. I'll see if I can tweak the PLLs from circuitpython and do some timing tomorrow.
 
  • Like
Reactions: Zane Kaminski

Zane Kaminski

Administrator
Staff member
Founder
Sep 5, 2021
371
608
93
Columbus, Ohio, USA
The word doesn't get pushed into the FIFO until it's "full", where "full" can be set from 1 to 32 bits. I was figuring on leaving it at 32-bits so a scanline would be 64 bytes, aka 16 words, pushed to the ARM core.
A slight PLL tweak should provide an overclock to 125.33333 (off the shelf Pi Picos run at 125Mhz but are listed as "up to 133Mhz") that would make the PIO clock division an even 4 to get 31.3333MHz. I'll see if I can tweak the PLLs from circuitpython and do some timing tomorrow.
Ah great, I missed that element of the autopush functionality in the datasheet. I thought that autopush meant to implicitly push at the end of every IN shift operation, as opposed to just when the shift count has been reached after multiple INs.

But do keep in mind that the system clock must run at 250 MHz (actually 251.75 MHz I guess) to do DVI and the Pico can only manage a 640x480 resolution so the video will be “letterboxed.” DVI doesn’t support anything like 512x384 and 1024x768 would require an unachievable 650 MHz system clock. 250 MHz is nearly 2x overclocked for the Pico but apparently it works well enough.

Also there’s the latency issue I mentioned yesterday. When you add up all of the latency for the PIO to detect the HSYNC falling transition, it’s massive, on the order of a whole pixel period. It’s also heavily variable, so you cannot adjust your software delays to completely compensate for it. The saving grace is that the latency is all in terms of system and PIO clock cycles, so if you increase the PIO clock to something like 8x pixel clock then the latency will get much smaller in terms of nanoseconds. And of course the main clock should be within 1% of 251.75 MHz so there is not that much flexibility to adjust the main clock to better match up the PIO cycle time with the Mac’s pixel clock.

Edit: should it have VGA instead of DVI? That way we can do 1024x768 (halved to 512x384 to fit the 512x342 Mac screen) and not have to overclock the Pico. DVI is just too fast for us to do anything more than 640x480.
 
Last edited:

Zane Kaminski

Administrator
Staff member
Founder
Sep 5, 2021
371
608
93
Columbus, Ohio, USA
Okay I have it worked out pretty good for 1024x768 VGA.

For 1024x768, that’s a 65 MHz pixel clock. So let’s clock the system clock at approximately 130 MHz. PLL can’t always synthesize the exact frequency required and also we want to tweak it to line up better with the Mac’s pixel or line timing. So I would impose a limitation that the Pico system clock should be within +/- 0.25% of 130 MHz. The allowable deviation could could be more but a quarter of a percent should suffice.

My choice for the system clock would be 130.2336 MHz, or 2x + 0.17% faster than the nominal 1024x768 65 MHz pixel clock. The specific choice of frequency is to have nominally 5852 Pico clocks per line of video from the Mac. It will not be exact but the alignment will hold for a line as you originally mentioned. Alignment will have to be re-established with each HSYNC cycle.

Then the PIO would run at 8x pixel clock to minimize synchronization latency and latency variance. The division factor would be 1 + 10/256 or ~1.039. So the PIO would run most clocks but skip one clock per 25.6 to keep aligned with the middle of the pixels. That’s 7.67851 ns jitter but a pixel is like 63.8 ns wide so no big deal. And of course the PIO rate is 8x so the loop should have 8 instructions.


I’m excited!! I haven’t redone the latency calculation but this will be as good as it gets without overclocking. I think it’ll suffice for making sure we are capturing the right pixels and right in the middle to avoid random data from capturing during the transitions.
 

Zane Kaminski

Administrator
Staff member
Founder
Sep 5, 2021
371
608
93
Columbus, Ohio, USA
Hmm okay there is one gotcha. The issue is that the RP2040/Pico's PLL can't actually do 130.2336 MHz. 130 MHz is as close as we can get. Unfortunately at 130 MHz, it's difficult to get the divided PIO clock frequency accurate enough to not drift over one line. At 130.2336 MHz, the division ratio to get 8x pixel clock would be 1 + 10/256. But at 130 MHz, we would want to divide by 1 + 9.5/256. Unfortunately we can only go in 1/256 steps so this is impossible. If we instead choose 9/256 or 10/256 for the fractional component, there is an entire pixel clock worth of drift by the end of the video line.

No big deal though. We can choose 9/256 as the fractional divide ratio. Uncompensated, that would result in skipping one pixel by the end of the line. Fortunately the PIO state machine input loop will be 8 instructions long and most other than the IN instruction will be NOPs. We can probably replace some of those NOPs with code that waits for an extra clock per 64 pixels gathered. 64 pixels times 8x PIO rate is 512 clocks so this effectively accomplishes adding an extra 1/512 or 0.5/256 to the PIO division ratio.
 
  • Wow
Reactions: retr01

retr01

Senior Tinkerer
Jun 6, 2022
2,473
1
793
113
Utah, USA
retr01.com
Algorithms! We need to get it right to accomplish desired results. :sneaky:(y)

So, 130 MHz plus .2336 MHz to make the screen very similar to the original B/W 1-bit output on the original analog sweep algorithm to the stock CRT. I can see why it is tricky.

@Zane Kaminski, is truncating .2336 MHz disastrous in terms of quality and achieving a high degree of similarity?
 

Zane Kaminski

Administrator
Staff member
Founder
Sep 5, 2021
371
608
93
Columbus, Ohio, USA
Algorithms! We need to get it right to accomplish desired results. :sneaky:(y)

So, 130 MHz plus .2336 MHz to make the screen very similar to the original B/W 1-bit output on the original analog sweep algorithm to the stock CRT. I can see why it is tricky.

@Zane Kaminski, is truncating .2336 MHz disastrous in terms of quality and achieving a high degree of similarity?
Uncompensated, the 0.2336 MHz truncation will make the sample point will drift later and later, eventually skipping a whole pixel somewhere on the screen and instead capturing the 513th pixel which is always black. Yeah, that would be disastrous although as we have discussed before, there are other gizmos that have an issue like this and nobody seems to really care.

The solution is to set the PIO to be slightly too fast but then to skip a cycle from time to time. This is how the hardware frequency divider works but we need to skip an extra 1 cycle for every 512. The PIO hardware divider only lets you skip in increments of 1 cycle per 256. If we kept the old PIO clock divide settings, we would be slow according to a 130/130.2336 ratio and we'd end up missing one pixel out of 512 and getting the black edge instead. So instead we set the hardware divider for the next faster increment. This will be too fast so uncompensated, we'd end up skipping the final real pixel and having one repeated pixel somewhere. To fix this, we must algorithmically skip an extra cycle from time to time in the pixel acquisition loop. This allows us to essentially emulate in software the ability to skip clock cycles in 1/512 increments, finer than the hardware alone can do. This adds more jitter but it will be fine since the Mac has a relatively slow resolution and so the pixel period is long. A pixel period is 63.8 nanoseconds on the Mac so if we have additional jitter of one 130 MHz clock period (about 7.7 nanoseconds) that's insignificant.

The misalignment issue is like waiting in traffic watching another car's turn signal. Your car is flashing the turn signal and so is the car in front of you. They get aligned for a little bit and they're blinking together, then the inevitable frequency difference causes phase differences to accumulate and they start blinking at opposite times before it spins back around and they're going together again. This is the behavior we don't want. The other car's turn signal is like the pixels coming out of the Mac, blinking at the rate of 15.6672 MHz. Your car's turn signal is like the pixel acquisition rate of the video converter. So we set the PIO to be slightly too fast. That's like making your turn signal blink slightly faster than the other car. Then as you see the two turn signals get misaligned, you press a button that makes the next blink take a little bit longer than usual. This brings the alignment back in sync for a while. In the video converter gizmo, we know that we need to wait for 1/8 longer per 64 pixels captured.
 
  • Like
Reactions: retr01