Narcoleptic MDD Diagnosis (Solved)

helmetguy

New Tinkerer
Apr 1, 2024
22
24
3
This post is more to document my attempts at stopping my MDD from randomly going into sleep mode.

This MDD was bought from a seller who mentioned in passing that it sometimes took "a few attempts" to boot up. OK, so it has PSU issues (what else is new with MDDs). Those issues have now been assumedly resolved by ATX-swapping the PSU.

Now that it has reliable power, it goes into sleep mode completely at random. The random-ness ranges from 45 minutes after boot, to immediately after loading OSX. But averages at around 8 minutes from cold boot. Waking the system after the initial sleepy-time usually causes it to immediately go back into sleep mode.

A look in system.log reveals this delightful little message:

Code:
localhost kernel: Power Management received emergency overtemp signal. Going to sleep.

Now, these sudden sleep events aren't accompanied by the CPU fan ramping up beforehand, or even spinning any faster than idle. This made me assume, rather than something actually overheating, that an errant software overtemp signal was the cause. Especially since temp monitoring software never showed the CPU board going hotter than 50c before the system went to sleep.

I tried:

- Resetting the PMU. Nope, it still randomly goes into sleep mode.

- Checking the PRAM battery. It read 3.65v on a multimeter. That can't be the cause, then.

- Booting into open firmware and entering "set-defaults". This was one of the instances where sleep occurred immediately upon OSX booting afterwards.

- Changing the HDD. There are two temp sensors that software can evidently monitor on an MDD; CPU board and HDD. So I assumed that the HDD temp sensor was borking out. Nope, that wasn't it.

- Changing PSUs. Perhaps the (admittedly somewhat old) PSU was on the way out. Nope, that wasn't it either.

- Changing the RAM. This was an unlikely fix but it bothered me that it had a (officially unsupported) 1GB DIMM installed, so that was changed to 2x 256MB DIMMs. Inevitably this didn't fix the issue, and made OSX run like treacle.

- Booting from OS9 instead, in case OSX was sending wonky overtemp signals. Nope, the random sleep still happened. Interestingly enough, 30 minutes of AHT or the OSX installer doesn't cause random sleep or a hardware shutdown, but it does cause the CPU fan to mimic a Lamborghini.

- Re-pasting the CPUs, despite the fact that software wasn't showing them anywhere near overheating, and despite the fact that the heatsink was getting quite warm (suggesting good contact on the CPUs). The thermal paste job that was on it from whichever previous owner was already pretty good. I cleaned it off and applied a thin, spread layer of AS5 like it was 2005 and I was trying to keep a Prescott P4 cool. That didn't stop the sleepy-time. Neither did letting the AS5 cure for a couple of days.

- Running the MDD with the door open and the CPU fan propped up against the heatsink. There have been moments where opening the case during operation has been like opening a hot oven. But the random sleep persisted.

This all lead me in a complete circle of assumptions. My current one is that the CPUs are actually overheating. More specifically; their temps are quickly spiking then settling down (faster than monitoring software can catch) and the spikes are causing the overtemp signals.

So maybe the CPUs aren't making good contact with the heatsink after all..... And then I noticed this:

IMG_2513.jpeg


The CPU board is bowed right by CPU 1. This could mean that part of the CPU dies aren't making good contact with the heatsink. Well then, that could very well be the cause of the overtemp signals.

So my next step is to experiment with thin thermal pads, to figure out where the CPUs aren't making good contact with the heatsink. The less-squished areas of the thermal pads should indicate this. And then I guess I'll go from there.....

TL;DR - if your MDD has narcolepsy, check the CPU board for warps cry.
 
Last edited:

helmetguy

New Tinkerer
Apr 1, 2024
22
24
3
Good thought. The local recycler has one that I could hopefully grab in the weekend.

Meanwhile, further testing has caused my rapid overheating theory to go out the window. Granted the bow in the CPU board seems enough to cause a contact difference on the heatsink:


IMG_2517.jpeg


But the "overtemp" sleep occurs even with CPU power nap (from CHUD tools) enabled and the CPU board sensor reading 33c - way lower than other sleep occurrences. Now I've arrived back at the "errant overtemp signal" theory, and it would suck if it's caused by the sensor on the CPU board, as it's a dual 1.42GHz board.

I'm THIS 🤏 CLOSE to just commenting "(void) sleepSystem ();" out of the kernel and calling it fixed.....
 

helmetguy

New Tinkerer
Apr 1, 2024
22
24
3
Last weekend I got a chance to go down to the e-waste recycler with my MDD. No, not to drop it off (though that was tempting), but to experiment with another MDD that they had.

This other MDD had a dual 867MHz CPU board, which I chucked into my MDD to see if it would reproduce the sleep issue. After about 10 minutes it went into sleep mode. OK, phew, it wasn't the CPU board causing the issue.

The recycler then graciously let me swap my entire MDD, sans logic board, into their one to test. And it hummed along for 40 minutes with no issues.

That left the logic board as the suspected sleep-causer. As much as I hate ditching things without fixing them first, along with the prospect of a MDD logic board going to scrap, I took the easy way out and traded logic boards.

Then it was time to change the bus speed on the "new" logic board from 133MHz to 167MHz. Huge thanks to The House of Moth for their extremely helpful PPC overclocking page, which showed where the bus speed resistors are on the logic board. They are freaking tiny. Here they are with the side of my pinky for scale:

IMG_2569.jpeg


And here is the removed R676 resistor on the end of my tweezers (it's that black speck):

IMG_2572.jpeg


With the logic board cleaned up and reinstalled, and ignoring a slight false alarm, I hopefully now have a 100% working MDD. Finally.......it has only taken around 6 months........

TL;DR - the sleep issue was solved by replacing the logic board.
 
Last edited: