BlueSCSI V2 watchdog/timeout issues with Fastlane Z3

samaron

New Tinkerer
Sep 3, 2024
10
0
1
Hi!

Have the past weeks struggled with getting my BlueSCSI to work with my Amiga 4000 with a Fastlane Z3 SCSI card. Perhaps I have missed something simple, or deeper diagnostics are in order.

TL;DR symptoms are that the computer seemingly lock up or reboots at random. Have found that attempting to launch games through WHDLoad/iGame it fails every time.

I'll list the specs the best I can first.
Amiga 4000 Dcr (A4000TX) with 3.2 ROM and WB. Super Buster rev. 11.
TF4060 CPU card (FW 4252a06, no overclock)
ZZ9000 RTG (FW 1.13 no RAM)
X-Surf 100 NIC
Prelude (replica) sound card with MPEGit
Fastlane Z3 SCSI controller (v2.2 card with v8.5 ROM, no RAM)

BlueSCSI V2 (HW 2023.10a, FW 2024.05.21) (Updated to v2024.09.15)
Samsung Pro Endurance 128GB SD card

The Fastlane Z3 uses an Emulex FAS216 SCSI chip.

Getting this computer up and running have been quite the ordeal. I am confident that the final piece, my BlueSCSI drive, will be sorted out with some assistance from the community. A lot works, but not quite.

Initially I struggled with random crashes that resulted in a corrupt file system. With the assistance of the debug option, I traced this down to my brand new SanDisk SD card was the culprit. Could see CRC errors and similar messages occur throughout the logs. Replaced this with a high endurance Samsung SD card and the corruption got sorted. Unfortunately, strange stability problems persisted.
I looked at the basics again. Power is supplied by a Corsair RMX750, modified with a 15 ohm 50 watt resistor on the 12 volt rail in order to provide some load. Measuring on the BlueSCSI, I read stable 12,06 and 5,05 volt. Both the BlueSCSI and the controller is terminated. The latter have traditional resistor packs, but currently these are removed in favor of an active terminator plugged into the external port. Cable is free of damage and is 40-50 cm long. The file system is formatted with PFS3aio and have triple checked that the DosType, Mask and such are correct according to the documentation.
Since everything checks out, I started playing with the various settings that can be applied. It appear to be slightly more stable with parity off, though this could very well be a placebo effect. Tried some additional settings like quirks, system type, no prefetch, but so far nothing appear to improve the situation.
Using a different brand SD card (WD Purple) results in the same strange behavior.

Attatched some log files. All are with parity disabled, but the two that ends with the number 2 have additional settings applied. I do see some watchdog and timeout messages at the ends when the computer locked up. The activity LED will stay continuously lit for 10 seconds or so before it goes out.
The FSST log is from when I ran a program called FileSystemStressTest that generates a large amount of files in order to check for file system problems. It has passed once, but otherwise always locks up in the validation phase, not when it generates files. It does appear that WHDLoad does some burst reading and writing when launching a game.

The computer is stable and problematic software like WHDLoad works as it should when booting from IDE. I could of course use IDE and be done with it, but would like to get to the bottom on the SCSI issue. This is also my preferred interface, as the BlueSCSI is a very elegant solution. It is also the faster option.
Using the image on the SD card in WinUAE also works as it should.
Running a file system check and repair program (PFS Doctor) no errors are found. Both in emulator and on actual hardware.

Any help is greatly appreciated.
 

Attachments

  • log_WHDLoad-iGame2.txt
    1.3 MB · Views: 23
  • log_WHDLoad-iGame.txt
    1.4 MB · Views: 18
  • log_WHDLoad-direkte2.txt
    661 KB · Views: 16
  • log_WHDLoad-direkte.txt
    676.6 KB · Views: 17
  • log_FSST.txt
    22.5 MB · Views: 22
Last edited:

samaron

New Tinkerer
Sep 3, 2024
10
0
1
Did some more testing today with turning reselection off. After getting the same results, I also turned off synchronous mode. Results did not change.

In all combinations I got either an "start_dataInTransfer() timeout waiting for previous to finish" at the end of the log, or a longer watchdog message:
[320808ms] DBG -- BUS_FREE
[320810ms] DBG ---- SELECTION: 0
[320810ms] DBG ---- MESSAGE_OUT
[320810ms] DBG ------ OUT: 0xC0
[320810ms] DBG ---- COMMAND: Read10
[320810ms] DBG ------ OUT: 0x28 0x00 0x00 0x62 0x8E 0x37
[320810ms] DBG ------ OUT: 0x00 0x00 0x6A 0x00
[320810ms] DBG ------ Read 106x512 starting at 6458935
[320810ms] DBG ---- DATA_IN, syncOffset 8 syncPeriod 31
[335440ms] --------------
[335440ms] WATCHDOG TIMEOUT, attempting bus reset
[335440ms] Platform: BlueSCSI
[335440ms] FW Version: 2024.05.21-rel May 21 2024 21:21:44
[335440ms] GPIO states: out 0x143FFEFF oe 0x0A5C8FFF in 0x353FFF01
[335440ms] scsiDev.cdb: 0x28 0x00 0x00 0x62 0x8E 0x37 0x00 0x00 0x6A 0x00 0x00 0x00
[335440ms] scsiDev.phase: 4
[335440ms] SCSI DMA state: WRITE
[335441ms] Current buffer: 0x00000600/0x0000D400, next 0x00000000 bytes
[335441ms] SyncOffset: 8 SyncPeriod 31
[335441ms] PIO Parity SM: tx_fifo 4, rx_fifo 4, pc 31, instr 0x00004037
[335441ms] PIO Data SM: tx_fifo 4, rx_fifo 4, pc 19, instr 0x00005061
[335441ms] PIO Sync SM: tx_fifo 4, rx_fifo 0, pc 21, instr 0x0000201A
[335441ms] DMA CH A: ctrl: 0x0100B011 count: 0x000001E1
[335441ms] DMA CH B: ctrl: 0x0002B809 count: 0x00000000
[335441ms] DMA CH C: ctrl: 0x01013805 count: 0x00000001
[335441ms] DMA CH D: ctrl: 0x0101C809 count: 0xFFFFFDFB
[335441ms] GPIO states: 0x353FFB01
[335441ms] STACK 0x20041E80: 0x10031C6B 0x353FFF01 0x00000009 0x13FE6B77
[335441ms] STACK 0x20041E90: 0x10031C6B 0x00000000 0x20041F00 0x00000000
[335441ms] STACK 0x20041EA0: 0x200193B0 0x0000000C 0x00000000 0x00000001
[335441ms] STACK 0x20041EB0: 0x00000000 0x00000000 0x00000000 0x1001A95D
[335441ms] STACK 0x20041EC0: 0x00000000 0x00000010 0xD0000160 0x20041F1C
[335441ms] STACK 0x20041ED0: 0x20000928 0x20005410 0x00000040 0xFFFFFFF9
[335441ms] STACK 0x20041EE0: 0x00000000 0x00000000 0xD0000160 0x00000000
[335441ms] STACK 0x20041EF0: 0x2000025F 0x10022CF7 0x100210FC 0x61000000
[335466ms] DBG ---- Total IN: 54272 OUT: 0 CHECKSUM: 23310
[335466ms] DBG -- BUS_FREE

At this point the only thing I have not tried is a real hard drive. Unfortunately, the only drives I posess are later SCSI-3 drives to which I lack the appropriate connections. I also intended to try a ZuluSCSI, but it is still pending to be shipped from the vendor several months later.

Another thing I find strange is that the speed is only 4-5 MB/s. I expected it to be higher, perhaps closer to 8 MB/s. Haven't given it much thought, but using the SCSI drive feels more sluggish than the IDE drive. With IDE I get around 4 MB/s, which is about the same, yet it feels more snappy. Thought I'd get the SCSI stable before looking into optimizing the speed, but maybe it is related.
I have an Amiga 1200 with a Blizzard PPC where I get around 7-8 MB/s with a slightly older BlueSCSI V2. Though, this SCSI controller uses a different chip than the Fastlane. As far as I can tell I have no problems on that computer.
 

samaron

New Tinkerer
Sep 3, 2024
10
0
1
Thank you for your reply, ClassicHasClass.
I don't expect a ZuluSCSI to work since the projects are very similar. However, there might be a subtle difference which does make it work.

As far as I can tell from the logs, this does appear to be the BlueSCSI V2 causing problems. None of the settings I have tried so far have helped in any significant way. Either I have missed some compatibility settings, or unfortunate enough for it to not work correct with my computer.

Looks like the 4000T uses the same 53C710 SCSI chip as the Blizzard PPC, which I have success with. Maybe that is compatible, but the FAS216 chip this Fastlane card uses isn't. Not exactly sure if that is possible.

A friend has a HD68 male to IDC50 male adapter I can borrow. Will try a real hard drive tomorrow.
 
Last edited:

samaron

New Tinkerer
Sep 3, 2024
10
0
1
Did some more testing today.

Hooked up the real hard drive, a Quantum Atlas 10K II drive with 36,7GB capacity. The computer started, but would hang on the SCSI. I triple checked the jumper configuration on the drive, and it is correct. Set to ID0, Narrow SCSI enabled and termination power enabled. The 68 to 50 pin adapter also contains termination. It does spin up and doesn't sound like a dying dolphin. It sounds healthy.

Reconnected the BlueSCSI and I reconfigured the jumpers to enable the installed 64MB RAM again. Someone claimed this was necessary for the DMA to function, although this isn't mentioned anywhere in the manual. Unsurprisingly the behavior did not change.
I then set the jumpers to disabled/no RAM again and removed the memory sticks. The sockets feel very fragile with the plastic clips, so was a bit hesitant to do this. Unfortunately the behavior did not change.

Removed the TF4060 accelerator card and installed a rev. 3.2 A3640. Installed 8MB RAM on the motherboard. Changed back to the real hard drive and turned on the computer again. This time it did boot. Set up the drive in HDToolbox and hit the "save changes to drive" button, but the format command failed afterwards. Said the drive isn't mounted. Opened HDToolbox again, and apparently it has developed Alzheimers. The drive parameters and configuration were gone. Set them again, and same result.
Changed back to the BlueSCSI. It booted into Workbench, and I tried a WHDLoad game again. It worked, surprisingly. I rebooted the computer and were greeted with error requesters from the file system. Suddenly it was damaged. Checked the BlueSCSI log file and saw nothing out of the ordinary. No CRC errors, no watchdog, no timeout message, just normal SCSI commands.

At first I thought it was the TF4060 since a game started with the A3640, but the instability remains. For some reason it was worse since the file system got borked. For good measure, I tried reformatting the SD card with the SD Association formatting tool. It failed. After a couple of attempts, it passed a full format. Booting up the computer again it was even more sluggish than usual. Opening the Workbench drawer takes a few seconds for all the icons to load.

Very strange case... I am not sure why the behavior is so strange. It should just work. Perhaps I should pull the trigger and buy an A4091 controller card that has a 53C710 chip. All the success stories comes from people that has a controller with that chip.
 

samaron

New Tinkerer
Sep 3, 2024
10
0
1
Have done some thinking, and questioned why most old accelerator cards came with only 128MB RAM as maximum. The card I have in the computer now has 256MB. Maybe it causes problems with the DMA?

Tried to get a better understanding of how the mask works on the drive partitions. Couldn't find in-depth documentation explaining this in detail, but the general idea is that the digits represent the address area of the memory, and the most right digit dictates the buffer alignment. The SCSI controller documentation states that a mask of 0xFFFFFFFC (32-bit address space aligned to longword) is to be used. My thought was to manipulate this to 0x0FFFFFFC, which in theory should allow DMA to only function within the first 128MB. I did not take into account the 2MB chip RAM, but I assume the accelerator RAM has the highest priority and is used.

After I had edited the mask I booted into Workbench and started a WHDLoad game, and it actually started! As mentioned earlier, this would always crash. However, something was off. It felt more sluggish as the icons took slightly longer to load, and running a speed test I found that the performance had gone from about 5,5MB/s to 2,1MB/s. It also eventually crashed again, and errors were found with the partition after running a repair tool to check it.

It does appear I'm onto something here. Will try to find some better documentation in order to experiment further with the DMA region.
 

eric

Administrator
Staff member
Sep 2, 2021
941
1,541
93
MN
scsi.blue
Could see CRC errors and similar messages occur throughout the logs
Please checkout our recent patch to fix a lot of edge cases for SD issues on some machines - hopefully that addresses all your issues. Please let us know.

 

samaron

New Tinkerer
Sep 3, 2024
10
0
1
Thank you for your reply, eric! Will give it a try tomorrow and report back.

Perhaps I have been a bit too stubborn with trying to get SCSI to work on my Amiga 4000. If the new firmware update doesn't work, I probably should start from the beginning again with the troubleshooting. I feel it has gotten a bit chaotic and somewhat contradictory results.
 

samaron

New Tinkerer
Sep 3, 2024
10
0
1
Flashed the new firmware today and gave it another try. Unfortunately it did not improve my situation.

The CRC errors were only an issue with the SanDisk card I had. Hasn't been a problem with the Samsung and WD cards I got.
Found a page on the wiki that states to avoid SanDisk, but had already bought the Samsung at that point in order to try a different brand. Haven't found a clear answer as to why SanDisk should be avoided.

Regardless, I did some more testing. Copied over my backup image for each attempt.
Tried all three buffer alignments with the 0x0FFFFFFX partition mask, which should limit DMA to only 128MB of the RAM. Started iGame and launched a WHDLoad game. Started up without crashing. Ran some disk speed benchmarks. Eventually it crashed again either by locking up or rebooting. Afterwards the file system had errors.
The final attempt was with the DMA mask that the SCSI card specifies in its documentation, 0xFFFFFFFC. As expected, the computer locked up when launching a WHDLoad game.

Limiting the address area the DMA can access does provide some positive results, but the system remains unstable. As mentioned, it is rock solid with IDE. Another thing I notice with SCSI, and have been mentioned, is that it is noticeably more sluggish. Despite it having slightly better numbers than the IDE, it is in practice slower. A drawer with many icons can use 3 seconds to display all, while on IDE it takes less than a second. iGame takes almost 30 seconds to load on SCSI, while on IDE it takes 2-3 seconds.

I find this slow performance a bit peculiar, since SCSI should be the faster option. Is it somehow possible that the controller is faulty in such a way that it appears to work as it should, but performs slow and corrupts data? The Samsung SD card I have is a high endurance card that is V30. I doubt it is the bottleneck.

I hooked up the real hard drive again today after I were done with testing with the SD card. Today it regained its memories and had the partition information present. Not sure why. I probably forgot to reboot the computer after saving the information last time. Will copy over the system files tomorrow and see how it performs compared to the SD card.
 

Attachments

  • log_0xFFFFFFFC.txt
    3.7 MB · Views: 14
  • log_0x0FFFFFFF.txt
    5.5 MB · Views: 13
  • log_0x0FFFFFFE.txt
    1.7 MB · Views: 13
  • log_0x0FFFFFFC.txt
    2.6 MB · Views: 15

samaron

New Tinkerer
Sep 3, 2024
10
0
1
Have been doing some more testing.

Fortunately I were able to borrow a different SCSI controller card locally, an A4091. Using this card, I experience the same crash with the original partition mask value. Using my modified value provides the same positive result as earlier. At least this is repeatable.
I also tried to replicate a file system corruption, but after a whole day of file system stress testing, it remains intact without errors.
However, the performance is still very poor. I connected an IDE boot drive to the motherboard IDE controller, which is fairly slow, and it performs better. Both in benchmarks and real world usage.

Since it was working better than my controller card, I hooked up the old rotational SCSI hard drive and copied over the system. Using the real hard drive as the boot drive, the result was immediate. Faster boot time, snappy icons and application launches, and benchmarks show a considerable performance increase. On the A4091, the BlueSCSI V2 got 1,9 MB/s, the IDE (PIO0) 2,1 MB/s and the old hard drive 6,5 MB/s.

I am starting to think my controller card is in fact faulty. I find it hard to believe it works 99%, but eats the file system in certain situations. But at this point I have triple checked everything is configured correct. It should just work. Unfortunately I do not have access to an identical card in order to compare. These are also very expensive, which makes it a bit cost prohibitive to buy one for testing.
The poor performance I'm not sure why is. Incompatibility, I suppose? I understand the BlueSCSI is made with old Macintosh computers in mind. Perhaps these Amiga controller cards have some quirks it doesn't like. These are Zorro bus cards and not part of a CPU expansion board. They interface differently.
 

eric

Administrator
Staff member
Sep 2, 2021
941
1,541
93
MN
scsi.blue
All the logs you've shared have debug on, debug will greatly decrease speed of any operation. Can you share the ini file you have and also ensure debug is off when running tests? While there are very minor defaults to allow BlueSCSI to work out of the box on a Mac, that should have no affect on other systems https://bluescsi.com/docs/Quirks#apple-quirks-1-default
 

samaron

New Tinkerer
Sep 3, 2024
10
0
1
Thank you for your reply, eric.

This has been a long process, with some separate issues mixed together. Easy to get a bit lost. Somehow I didn't think of disabling the debug logs. I'll certainly try that. Maybe tomorrow, or next week.

As for the ini-file, it has most of the time only had Debug=1. I tried a few other compatibility settings as mentioned earlier in the thread. These have been turned off again after I found no meaningful improvement or change with any.
 

samaron

New Tinkerer
Sep 3, 2024
10
0
1
I thank you for your suggestions, eric. I haven't achieved the end result I wanted, but at least some issues have been corrected.

Turning off the debug logs certainly provided positive results on the performance. Get about 5,4 MB/s now, and it is in general much snappier. The filesystem still becomes damaged, but I believe this must be the Fastlane controller card acting up. It works as it should with the A4091 card, after all.
I should have thought of trying without the debug logs, but... I simply lost my way with all the different tests. I left this option on as the crashes and data damage did not show in the normal log file. My thought was that it would catch something abnormal, but so far it hasn't.

The summary this far:
Random crashes caused by DMA access to memory address ranges the logic can't handle. Quoting from this eab thread: "U714 GAL does not assert _ADDRZ3 on Buster for addresses over $0FFFFFFF." Solution: Apply mask value 0x0FFFFFFF when partitioning the drive.

The poor performance was simply caused by the debug option being enabled in the ini-file.

Filesystem still becomes damaged with my Fastlane Z3 card. Why this happens I'm not sure. Could be faulty in a very strange way, or it is very picky and quirky. The latter isn't uncommon with Amiga hardware from what I have been told. Might as well get an A4091 as it appears more compatible. Would save me a lot of headache.