Asynchronous HFS+ File Reads Under Classic Mode

siddhartha

New Tinkerer
Jul 24, 2024
8
2
3
This is a crosspost from the md5classic Macintosh Garden page.


I have uncovered an absolutely insane issue with Classic Mode in (at least) 10.4 Tiger. As far as I can tell, it doesn't fully support asynchronous file reads.

According to Apple, while an asynchronous read with PBReadForkAsync(FSForkIOParamPtr fsForkPtr) is executing, we can continuously poll fsForkPtr->ioResult If it's a positive value, the task is still running, anything else and the task has completed and data is ready to manipulate. (Documentation for PBReadForkAsync here).

The problem is that fsForkPtr->ioResult can stop updating non-deterministically. If that happens, the standard loop:

while (fsForkPtr->ioResult > 0) {};

will run infinitely.

HOWEVER, if you first read the file into memory with something like Resorcerer, then a subsequent asynchronous read of the file will be perfect.

I created an application (HFSPlusAsyncTest) to easily test for the bug. This archive contains both a Classic and Carbon version. The Carbon version will run fine in OS X, but will show the bug in Classic Mode. Source code for the Classic version is here.

Here's some sample output opening a file for the first time:

Code:
File: ***
Data Fork Size: 1536133 bytes
[Thread 1]      Waiting for data (1)
[Thread 1][0]   32768 bytes read. Mark now @00008000. Data: 5349542100010017
[Thread 2]      Waiting for data (1)
... (total of 10 loops)
[Thread 0]      Waited too long!
[Thread 2][1]   0 bytes read. Mark now @00000000. Data: 0
[Thread 1][2]   32768 bytes read. Mark now @00010000. Data: f14d29a5552b0f5c
[Thread 2][3]   32768 bytes read. Mark now @00018000. Data: a815036e15ff0b48
[Thread 1][4]   32768 bytes read. Mark now @00020000. Data: c7efc0fa3938c10a
[Thread 2][5]   32768 bytes read. Mark now @00028000. Data: 1c4644ef6d27de0
[Thread 1]      Waiting for data (1)
[Thread 1]      Waiting for data (1)
[Thread 1]      Waiting for data (1)
[Thread 1][6]   32768 bytes read. Mark now @00030000. Data: 9efb957338ca6667
[Thread 2]      Waiting for data (1)
... (total of 10 loops)
[Thread 0]      Waited too long!
[Thread 2][7]   32768 bytes read. Mark now @00028000. Data: 1c4644ef6d27de0
[Thread 1][8]   32768 bytes read. Mark now @00038000. Data: 79545654b2d0573
...
[Thread 1][48]  28805 bytes read. Mark now @00177085. Data: de8e3ae1191475da
[Thread 1]      ioResult was -39 (EOF)

------
And then here is the output opening the same file right after:
------
Code:
File: ***
Data Fork Size: 1536133 bytes
[Thread 1][0]   32768 bytes read. Mark now @00008000. Data: 5349542100010017
[Thread 2][1]   32768 bytes read. Mark now @00010000. Data: f14d29a5552b0f5c
[Thread 1][2]   32768 bytes read. Mark now @00018000. Data: a815036e15ff0b48
[Thread 2][3]   32768 bytes read. Mark now @00020000. Data: c7efc0fa3938c10a
[Thread 1][4]   32768 bytes read. Mark now @00028000. Data: 1c4644ef6d27de0
[Thread 2][5]   32768 bytes read. Mark now @00030000. Data: 9efb957338ca6667
[Thread 1][6]   32768 bytes read. Mark now @00038000. Data: 79545654b2d0573
[Thread 2][7]   32768 bytes read. Mark now @00040000. Data: 8099ab9ef4ea2ce4
[Thread 1][8]   32768 bytes read. Mark now @00048000. Data: 3f3c8dffdcd87bfb
...
[Thread 1][46]  28805 bytes read. Mark now @00177085. Data: de8e3ae1191475da
[Thread 1]      ioResult was -39 (EOF)

You can see in the first instance that the value 1c4644ef6d27de0 is repeated. This is because it's reusing the previous data buffer for that thread (#2). This is not good for hashing.

Anyway, all this is to say that Classic Mode is killing me. If I add in any checks to the code then it's going to slow it down considerably. I scoured the internet for anyone who has come across this bug and I couldn't find anything. Do I somehow test for Classic Mode and just use synchronous reads? Has anyone successfully used asynchronous reads in Classic Mode?
 
  • Wow
  • Like
Reactions: Slimes and eric

David Cook

New Tinkerer
Jul 20, 2023
24
22
3
Inside Macintosh claims the SCSI manager is not capable of asynchronous operation. Furthermore, any call that requires a memory allocation also does not work asynchronously.

When you pre-read the file using Resourcerer, it is likely fitting into the cache. This would avoid calls to the SCSI manager and thus could operate asynchronously.

- David
 

siddhartha

New Tinkerer
Jul 24, 2024
8
2
3
Inside Macintosh claims the SCSI manager is not capable of asynchronous operation. Furthermore, any call that requires a memory allocation also does not work asynchronously.

When you pre-read the file using Resourcerer, it is likely fitting into the cache. This would avoid calls to the SCSI manager and thus could operate asynchronously.

- David
This is a bit later in my understanding. It's not technically the SCSI Manager anymore. Everything works fine in native OS 9, it's just Classic Mode that is causing problems. I wonder what the discrepancy is between the emulation layer.
 

siddhartha

New Tinkerer
Jul 24, 2024
8
2
3
I just rewrote the test program to use PBReadAsync() and the results are the same.

EDIT:

What a wild ride. I was going through Files.h and Devices.h to see if there were any other possible helper functions and there it was:
PBWaitIOComplete is a friendly way for applications to monitor
a pending asynchronous I/O operation in power-managed and
preemptive multitasking systems.

I was only able to find anything useful written about this function in the Core Services Framework Reference.

PBWaitIOComplete
Keeps the system idle until either an interrupt occurs or the specified timeout value is reached.

OSErr PBWaitIOComplete (
ParmBlkPtr paramBlock,​
Duration timeout​
);

Parameters
paramBlock
A pointer to a basic File Manager parameter block.​

timeout
The maximum length of time you want the system to be kept idle.​

Return value
A result code. See “Device Manager Result Codes” (page 423). If the timeout value is reached, returns kMPTimeoutErr.

Availability
Available in CarbonLib 1.0 and later when running Mac OS 9 or later.
Available in Mac OS X 10.0 and later.

The Catch:
This function will only work with a ParamBlockRec, not an FSForkIOParam needed for >2GB files...
 
Last edited:

siddhartha

New Tinkerer
Jul 24, 2024
8
2
3
To give an update, I casted a FSForkIOParamPtr as a ParmBlkPtr to pass to PBWaitIOComplete(). No luck. I even manually changed the ioParam.ioTrap field. The behaviour exhibited is still the same as checking io.Param.ioResult, when there is a hang, it's always 1.

Stepping through PBWaitIOComplete(), I can see that there is a delay called based on the specified argument, but even writing my own manual delay and even a SystemTask() call does not help.

I would have expected PBWaitIOComplete() with the casted ParmBlkPtr to actually work since it does beautifully with a legitimate ParmBlockRec and a call to PBReadAsync(), but I suppose more investigation is needed. It's tough to debug because any breakpoints within PBWaitIOComplete() are overwritten and because you need to single step through the code, the asynchronous read does not seem to hang.
 

joevt

Tinkerer
Mar 5, 2023
73
32
18
Why not set a completion routine in the parameter block, then use that to set a flag or to fixup ioResult?

Your wait loop is not waiting for a specific time period - it depends on how fast it can printf 10 times.

I don't see where you kill the I/O when it's taking too long. Then you reuse that param block even though the system might not be done with it? Seems wrong.

Setup a timer if you want it to kill the I/O when it's taking too long.
 

siddhartha

New Tinkerer
Jul 24, 2024
8
2
3
Yeah, it's just a quick and dirty test program. The actual application is calculating the MD5 hash and there are better sanity checks. Just to reiterate, this is only an issue running in Classic Mode as native OS 9 does not have this problem.

Your wait loop is not waiting for a specific time period - it depends on how fast it can printf 10 times.

I don't see where you kill the I/O when it's taking too long. Then you reuse that param block even though the system might not be done with it? Seems wrong.

Setup a timer if you want it to kill the I/O when it's taking too long.

Since I want the hashing to be as fast as possible, any timers or delays are going to be at a significant performance cost. The wait loop is crude, but I find if anything takes longer than about 4 iterations, it will ultimately be the bug and get stuck in an infinite loop. No delay will fix the problem -- you'll be waiting indefinitely. I just wanted the test program to demonstrate that "hey, this would have been an infinite loop here".

Why not set a completion routine in the parameter block, then use that to set a flag or to fixup ioResult?

This was something I did in fact try. The problem with setting a global flag (e.g. volatile Boolean gIsIOComplete) is that it can lead to an odd race condition where one thread is stuck, but the other thread can kick the flag true, freeing the stuck thread. When that happens the stuck thread returns zero for actualCount, which is fine as the other thread will pickup the missing data. Although this is a potential fix for whatever bug is happening, the problem is when both threads are stuck, the flag will never be set to true.

Fixing ioResult is something that I hadn't thought about, and it does alleviate the race condition above, but, as you may have guess from reading above, it doesn't work. A stuck thread is indeed stuck, such that it won't update ioResult on it's own, nor will it invoke the completion routine.
 

joevt

Tinkerer
Mar 5, 2023
73
32
18
Each paramblock should have it's own flag - you can make it part of a metaparamblock which can hold other info about the queued I/O such as timer info or whatever.

Does PBReadForkAsync return an error as the function result? I mean, does it return noErr all the time? I wonder if the second PBReadForkAsync actually succeeded in queuing the read in the first place.

You queue two reads both with fsAtMark. Maybe use absolute positions so there's no confusion about which part each paramblock is supposed to read.
Maybe bzero the paramblock before each read (similar to what happens with the first two reads) just to see if that has an affect.

I don't think async in classic macOS means that different threads are used. It just means that the I/O can happen while other code is running. I understand that you are queuing multiple reads so that I/O will always be happening while doing the calculations and in this case the thread # just represents which paramblock you are waiting for.
 

siddhartha

New Tinkerer
Jul 24, 2024
8
2
3
I appreciate the reply. Yeah, even using meta flags in each parameter block wouldn't help since the Completion function doesn't trigger on a stall anyway.

The ioResult is the result code of the function. The actual function itself returns void since it's an asynchronous function.

Yes, sorry about the thread terminology, it's definitely not doing simultaneous file reads.

I tried using fromMark positioning and the result was the same. It's extra logic to test for an actualCount and then increment the offset too. It's nice that it's all automatic with atMark, because even a stalled read won't increment the mark.

I will try reinitializing the parameter blocks (save for the offset) on each read though, that might be something...

At this point, I'm really interested in what PBWaitIOComplete is doing because it seems to fix the issue. If there was only a way to rewrite it for HFS+ Fork functions...
 

siddhartha

New Tinkerer
Jul 24, 2024
8
2
3
I might not have time coming up to dig deeper, but I wanted to give an update.

Progress has been made.

I originally had a simple loop incrementing a counter while waiting for the asynchronous read to complete. Not only would ioResult always be ioInProgress, but it wouldn't call the Completion routine.

I threw in a call to Delay() to mimic PBWaitIOComplete(), but ioResult still would not update. However, upon further testing, this would cause the Completion routine to be called! Even calling Delay(1, NULL); is enough. All the completion routine does is set a global flag. If it does stall, it will only take one or two iterations and then it's back to reading without any delays.

There are a couple interesting things to note:

1. I think there is another race condition happening since sometimes the flag in the Completion routine won't be set, but ioResult will be 0 (i.e. completed). To avoid an unnecessary iteration, I only call Delay() if the global flag is not set and ioResult == ioInProgress.

2. The Completion routine still needs to set a global flag. I tried setting ioResult to 0 instead, but it would sometimes flip to a 1 (ioInProgress) right after and then hang the loop.

So, there are times when ioResult is [ioDone], but it doesn't trigger the routine. There are other times with the routine is triggered, but ioResult is ioInProgress. In the first case the data is there ready to go, in the second case you need to call Delay() to rescue the data.

Wut.
 

siddhartha

New Tinkerer
Jul 24, 2024
8
2
3
Some more notes. Replacing Delay(1, NULL); with SystemTask(); does not help. I though perhaps it would, but for large files, there are still many timeouts.

To further illustrate the bizarreness, compare these two psudeocode blocks, noting that the first block (the one that checks gIOComplete AND ioResult) is the one that works, whereas in the second block (which only checks ioResult, even though it's set in IOCompletionFunc) does NOT work.

C:
enum {
    kBufferSize = 0xffff
};

volatile Boolean gIOComplete = false;

pascal void IOCompletionFunc(FSForkIOParamPtr fsForkPtr) {
    /* Set our flag on a successful read */
    gIOComplete = true;
}

void Process(FSForkIOParamPtr fsOpenForkPtr) {
    FSForkIOParamPtr    currFSForkPtr, fsForkPtr[2];
    short                currFSForkIndex = 0, delayCount = 0;
   
    /* Open the file */
    PBOpenForkSync(fsOpenForkPtr);
   
    /* Setup first read */
    fsForkPtr[0]                    = (FSForkIOParamPtr)NewPtrClear(sizeof(FSForkIOParam));
    fsForkPtr[0]->ioCompletion        = NewIOCompletionUPP(IOCompletionFunc);
    fsForkPtr[0]->ioResult            = ioInProgress;
    fsForkPtr[0]->forkRefNum        = fsOpenForkPtr->forkRefNum;
    fsForkPtr[0]->positionMode        = fsAtMark;
    fsForkPtr[0]->positionOffset    = 0LL;
    fsForkPtr[0]->requestCount        = kBufferSize;
    fsForkPtr[0]->buffer            = NewPtrClear(kBufferSize);
   
    /* Do the intial read */
    PBReadForkAsync(fsForkPtr[0]);
   
    /* Setup the second read */
    fsForkPtr[1]                    = (FSForkIOParamPtr)NewPtrClear(sizeof(FSForkIOParam));
    fsForkPtr[1]->ioCompletion        = NewIOCompletionUPP(IOCompletionFunc);
    fsForkPtr[1]->ioResult            = ioInProgress;
    fsForkPtr[1]->forkRefNum        = fsOpenForkPtr->forkRefNum;
    fsForkPtr[1]->positionMode        = fsAtMark;
    fsForkPtr[1]->positionOffset    = 0LL;
    fsForkPtr[1]->requestCount        = kBufferSize;
    fsForkPtr[1]->buffer            = NewPtrClear(kBufferSize);
   
    /* Wait until the intial read finishes before adding
       the other read to the queue */
    while (!gIOComplete) {
        Delay(1, NULL);
    }
   
    /* Clear the flag */
    gIOComplete = false;
   
    /* Add the second read to the queue, which should now
       execute immediately */
    PBReadForkAsync(fsForkPtr[1]);
   
    /* Return to the inital read for processing */
    currFSForkPtr = fsForkPtr[0];
   
    while (true) {
        /* Check the flag and only Delay() if the read
           is still in progress (ioResult is a positive value) */
        while (!gIOComplete && currFSForkPtr->ioResult > 0) {
            Delay(1, NULL);
           
            /* Timeout after 10 loops */
            if (++delayCount == 10) {
                printf("Waited too long!");
                break;
            }
        }
       
        /* Reset the flags */
        gIOComplete = false;
        currFSForkPtr->ioResult = ioInProgress;
       
        /* Queue up the next read */
        PBReadForkAsync(currFSForkPtr);
       
        /* Switch back to the previous read in the queue for processing */
        currFSForkIndex = 1 - currFSForkIndex;
        currFSForkPtr = fsForkPtr[currFSForkIndex];
    }
}

C:
enum {
    kBufferSize = 0xffff
};

volatile Boolean gIOComplete = false;

pascal void IOCompletionFunc(FSForkIOParamPtr fsForkPtr) {
    /* Set our flags on a successful read */
    gIOComplete = true;

    /*  ************************************
        CHANGE HERE
        ************************************ */
   if (fsForkPtr->ioResult >= 0) {
       fsForkPtr->ioResult = 0;
   }
}

void Process(FSForkIOParamPtr fsOpenForkPtr) {
    FSForkIOParamPtr    currFSForkPtr, fsForkPtr[2];
    short                currFSForkIndex = 0, delayCount = 0;
   
    /* Open the file */
    PBOpenForkSync(fsOpenForkPtr);
   
    /* Setup first read */
    fsForkPtr[0]                    = (FSForkIOParamPtr)NewPtrClear(sizeof(FSForkIOParam));
    fsForkPtr[0]->ioCompletion        = NewIOCompletionUPP(IOCompletionFunc);
    fsForkPtr[0]->ioResult            = ioInProgress;
    fsForkPtr[0]->forkRefNum        = fsOpenForkPtr->forkRefNum;
    fsForkPtr[0]->positionMode        = fsAtMark;
    fsForkPtr[0]->positionOffset    = 0LL;
    fsForkPtr[0]->requestCount        = kBufferSize;
    fsForkPtr[0]->buffer            = NewPtrClear(kBufferSize);
   
    /* Do the intial read */
    PBReadForkAsync(fsForkPtr[0]);
   
    /* Setup the second read */
    fsForkPtr[1]                    = (FSForkIOParamPtr)NewPtrClear(sizeof(FSForkIOParam));
    fsForkPtr[1]->ioCompletion        = NewIOCompletionUPP(IOCompletionFunc);
    fsForkPtr[1]->ioResult            = ioInProgress;
    fsForkPtr[1]->forkRefNum        = fsOpenForkPtr->forkRefNum;
    fsForkPtr[1]->positionMode        = fsAtMark;
    fsForkPtr[1]->positionOffset    = 0LL;
    fsForkPtr[1]->requestCount        = kBufferSize;
    fsForkPtr[1]->buffer            = NewPtrClear(kBufferSize);
   
    /* Wait until the intial read finishes before adding
       the other read to the queue */
    while (!gIOComplete) {
        Delay(1, NULL);
    }
   
    /* Add the second read to the queue, which should now
       execute immediately */
    PBReadForkAsync(fsForkPtr[1]);
   
    /* Return to the inital read for processing */
    currFSForkPtr = fsForkPtr[0];
   
    while (true) {
       /*   ************************************
            CHANGE HERE
            ************************************ */
        /* Only Delay() if the read is still in progress (ioResult is a positive value) */
        while (currFSForkPtr->ioResult > 0) {
            Delay(1, NULL);
           
            /* Timeout after 10 loops */
            if (++delayCount == 10) {
                printf("Waited too long!");
                break;
            }
        }
       
        /* Reset the ioResult */
        currFSForkPtr->ioResult = ioInProgress;
       
        /* Queue up the next read */
        PBReadForkAsync(currFSForkPtr);
       
        /* Switch back to the previous read in the queue for processing */
        currFSForkIndex = 1 - currFSForkIndex;
        currFSForkPtr = fsForkPtr[currFSForkIndex];
    }
}
 
Last edited: