The following is a compilation of German material translated by Google with little to no effort in trying to manually clean it up. The thread - The PAK68/3 Accelerator - [German] contains the original text and discussion.
The creator Holger Zimmerman, aka pakman, is still very active to this day and always willing to help out in getting the PAK68/3 working for users.
This article contains the following:
Introduction to the PAK68/3 (translated from http://www.wrsonline.de/pak3.html)
Doppel-PAK - c't Magazine Article - November 1993
Doppel-PAK, Part 2 - c't Magazine Article - December 1993
Everything Under Control - c't Magazine Article - March 1994
The extensive possibilities offered by a PAK68/3 with its extensions, as well as the reliability in operation, give the Atari systems, which have gone out of fashion, a breath of fresh air. Even if you don't take off to the 'megahertz heights' of today's technology with the PAK, this accelerator offers Atari enthusiasts from the very beginning some joy in the system.
To get an overview of the effort involved in setting up and installing a PAK, here are a few more points of reference:
The third edition of our long-running favorite PAK-68 is all about the fifth power of two: 32 MHz, 32 bits and 32 KByte cache are the important reasons for presenting the optimal 68000 accelerator for self-construction.
The PAK-68 has made many friends since its birth (not without complications) in July 1987. Initially [1] still clocked at a leisurely 8 MHz, later at 16 MHz [2] and now with clock frequencies almost at will, the processor replacement card remains the most logical way to breathe contemporary performance into an existing computer.
And it's not a piece of cake: In addition to the striking acceleration, Atari ST fans can enjoy the modern TOS versions 2.06 and (with small patches) 3.06 in 32-bit access; with the 68030 version there is also a PMMU for multi-TOS. Compact Macs particularly benefit from the three to sixfold increase in working speed under the resource-guzzling System 7, depending on the configuration. Our aim was to remain as 68000-compatible as possible, so that critical computers (Atari with IMP chipsets, Amiga) are not left out - although there will certainly still be applications and extensions that are not 68020/030- are compliant. So, before purchasing a PAK, find out whether your favorite editor also `plays´ on larger machines (Atari TT, Falcon, Mac II).
So here it is, the PAK-68/3. With almost the same dimensions as the PAK/2, a lot has happened 'under the hood': either a 68020 or 68030 CPU with a clock frequency of 16 to 33 MHz and a system clock of 8 MHz, a 68881 or 68882 FPU, 256 or 512 KB ROM and a 32-bit, 32-KB second-level cache.
When replicating the PAK/3, a difficult decision has to be made right from the start: Should it be a 68020 or a 68030? Since there is a separate circuit board version for both processors, it is not possible to change them later. As a sort of compensation, however, all other features can also be added or omitted later by exchanging the GALs. In view of the used processors that are sometimes offered very cheaply (see classified ads), if you don't already have an old PAK-68, we recommend the 030 version, which has a speed advantage of around 15 percent compared to the 020 with the same clock frequency. Otherwise you can continue to use most of the components of the old PAK, but only the GALs if you have a suitable programming device. Of course, the processor must be able to cope with the desired clock frequency.
The state machine concept of the PAK/2 has proven itself in practice, so that the core of it could also be retained for the PAK/3. The CPU, FPU and ROMs are largely wired like the predecessor. Synchronous clock generation using a 74F86 as a doubler for the 8 MHz system clock is also identical. The quartz oscillator was also found on the PAK/2, but was only responsible for the FPU there. With the appropriate GAL equipment, the PAK can now also be operated asynchronously to the system clock (tested up to 33 MHz); more about that in the next c't. GALs U4, U5, and U6 have largely retained their function (and names). You will still recognize a few things in the equations for U5, while hardly one stone was left unturned for U4. Completely new are GAL U1, which takes over control of the local bus cycles (i.e. for ROM reads and cache hits) and GAL U2, which coordinates the cache.
The 68020 already had a cache for instructions integrated into the CPU, and the 68030 added a data cache. Unfortunately, the internal caches are not particularly large at 256 bytes. Benchmarks often use short loops to test the performance, the CPU cache is still doing well here; in practice, however, there is not much left of it. The PAK/3 has therefore been given a 32 KB second-level cache, and anyone who turns it off for a test will see how right this decision was. Typical applications run at the maximum configuration (32 MHz plus L2 cache) at at least six times the 68000 speed, some benchmarks even show double-digit performance gains. Compared to the old PAK, which brought about 250 percent of the performance of a 68000, a significant improvement.
The four SRAMs (each 8 KBit × 8, i.e. 8 Kbytes) together form a 32 Kbyte cache memory which the CPU can access very quickly and in full. GAL U2 ensures that the data fetched from main memory is stored in the cache on the side, so to speak. Data from main memory address 1, for example, also ends up in the cache at address 1. However, since the cache only contains 32,768 bytes, data from the main memory address 32,768+1 also ends up in the cache at address 1, with the previous contents of the cache being overwritten, of course. How can this go well?
If the CPU initiates a read access, the tag RAM is now not read out but operated in the so-called compare mode, in which it compares the externally applied address bits A15 to A21 with the stored address bits. The comparator required for this is integrated in the tag RAM and reports the result to the outside via the MATCH connection; if there is a match, a high level is obtained here. This is the sign for the CPU and its read logic that the data it is looking for is already in the cache. If the tag RAM does not report a hit (MATCH is low), then the data word in the cache belongs to another page, which means that the CPU now has to look in the main memory. In the hope that the data word that has just been fetched will be needed again in the near future, the control logic then immediately stores it in the cache.
The cache SRAMs are below the EPROM sockets. Before using SRAM sockets, try out whether the 'turrets' fit into your computer. Especially in the compact Mac it is quite tight.
Because of the 16-bit wide system bus, the cache must be managed word by word, so two tag RAMs are required for the 32-bit wide cache of the PAK. The seven address bits that the tag RAM evaluates cover 128 pages, ie the address space from 0 to 4 Mbytes. In the area above $40,000, the second-level cache is ineffective, which does not imply any restrictions on the Macs and Ataris in question; the CPU has direct access to the ROM on the PAK anyway.
The same plan for two versions: Except for the CPU, both new PAKs are identical. The 64-pin socket for connecting to the main board belongs on the solder side in the DIL rows that point to the middle of the board (see also the detailed photo). R1 and R2 are only fitted to the 030 version.
The eighth bit of the tag RAMs forms the valid bit and is therefore at V__CC. In order to declare all data in the cache invalid (e.g. in the case of DMA accesses or a reset), the tag RAMs are cleared via the /TCLR connection and all bits are thus set to low. In this way, the comparator always recognizes a cache miss (no hit) on the next read access and then reloads the cache.
A PAK/3 with a clock frequency of 16 MHz is used to illustrate the processes. A bus cycle begins when the CPU applies the address and, after the search in the CPU's internal cache has been unsuccessful, activates /AS_20 with the falling edge of CLK16. If this is a ROM access (U6's /ROM active) and there are ROMs on the PAK (J7 on 1-2), U1 turns on the ROMs and immediately acknowledges with /DSACK0 and /DSACK1. This is recognized by the CPU with the next falling edge of CLK16, whereupon it terminates the bus cycle exactly one CLK16 cycle later by deactivating /AS_20. U1 keeps /CYC_00 high all the time, so U4 doesn't initiate access to the mainboard.
Assuming there is a read access to the range $0 to $40 0000 (U6's /DRAM active), the L2 cache is activated (/TCLR high), but at least one of the two tag RAMs does not report a hit (MAT0 or MAT2 to low): In this case, /CYC_00 goes to low after a short waiting time and thus starts access to the mainboard in U4, which in principle runs like with the PAK/2. U2 now ensures that the data word that has been read is also stored in the SRAMs and at the same time that the associated address is retained in the corresponding tag RAM.
Except for the processor and its pinout, the 020 and 030 PAK are the same. You will therefore only find the circuit once here.
The clock doubler for synchronous 16 MHz operation was carried over from the old PAK.
The fact that the data bus on the mainboard is only 16 bits wide, but the cache is 32 bits wide, requires a closer look. If the address of a data word is not on a long word boundary (i.e. A1 = high), the following problem arises: The mainboard supplies the data word on the CPU data lines D16 to D31, but the CPU later expects it in the cache on D0 to D15. In this case, the two bus drivers on the PAK/3 become active in the truest sense of the word and solve the problem by creating a temporary cross-connection between the upper and lower data word.
The cache is powerless for write access, however, with the write-through strategy implemented here, access to the mainboard is required in any case. If the tag RAMs report a hit, i.e. the data word belonging to this address is also in the cache, U2 takes care of updating the cache contents as well. This means that the next read access can be handled from the cache again. So-called write-back caches, which independently write their content back into the main memory and thus also buffer write access, are much more complex to implement and in this case - because of the 16-bit bottleneck - not even particularly effective.
Thanks to multilayer technology, the circuit board almost achieves the packing density of a free beer bar; In order not to exceed the dimensions of the old PAK, the tag RAMs had to be banished under the EPROM sockets, and some support capacitors can only be found on the solder side when using closed PGA sockets for the processors. Incidentally, the CPU and FPU can only be removed from PGA types with cup spring contacts (`precision sockets´), if at all, only with brute force. Normal designs, such as those produced by AMP, are better here. To be on the safe side, you should also donate the Tag-RAMs socket; the bars of the EPROM sockets above must be carefully removed.
As with the old PAK, a 64-pin DIL socket on the solder side with a plugged-in adapter socket or SIL strip that pricks on both sides establishes the connection to the mainboard. For reasons of stability, this time the operative removal of the old 68000 CPU - if not already done - is unavoidable, a 64-pin DIL socket takes its place. In contrast to the PAK-68/2, a GAL set that allows operation with the CPU remaining on the mainboard is not provided. The amputation of all 68000 legs with small side cutters has proven effective. The pins can then be easily pulled out of the circuit board individually. Anyone who tears up during this procedure because of the suffering CPU can later solder a prosthesis in the form of a DIL socket under the legless 68000 and continue to use it for test purposes.
Incidentally, the operation of the 68000 switchable next to the 68020/030 is optionally possible if you thread a few lines to the previously empty spare version and use a correspondingly programmed GAL (tips follow). Unfortunately, with the best will in the world, the connections could no longer be accommodated on the circuit board, and since not everyone needs a 'before/after' option, we only provided a freely wireable empty socket for this. Alternatively, the 68000 socket on the component side of the board can also be used to accommodate MSDOS emulators and similar extensions that use the processor signals directly.
The use of medium-fast CPUs (20 and 25 MHz) with asynchronous clocking, i.e. when using the 32 MHz GAL set and a correspondingly designed quartz oscillator, usually leads to problems; an asynchronously running 20 MHz CPU is also hardly faster than the synchronously clocked 16 MHz version due to unavoidable waiting cycles.
In the next issue of c't you will find out what needs to be considered when adapting TOS 3.06, how the asynchronous accesses of the 32 MHz version work and how to persuade the Mac to use the FPU and the data cache. (cm)
[2] Manfred Völkel, Carsten Meyer, Tausch mit Turbo, PAK-68/2 in Mac, Atari und Amiga, c't 11/91, S. 314
[3] Manfred Völkel, Tausch mit Turbo, Tips zur PAK-68/2 im ST, c't 12/91, S. 222
[4] Manfred Völkel, Turbo für den Austauschmotor, Schneller Atari mit modifizierter PAK-68, c't 3/91, S. 282
[5] Carsten Meyer, Schnelle Freundin, Amiga mit PAK und 14 MHz, c't 9/90, S. 338
[6] Roland Beck, Hilfskraft: 68000-Prozessor trotz PAK-68/2, c't 7/92, S. 196
[7] Leo Drisis, Real-Renner, 68881-Einbindung für die PAK im Mac, c't 5/90, S. 190
[8] Carsten Meyer, Fast-PAK, 68000 und PAK-68 mit 16 MHz im Macintosh, c't 1/90, S. 194
[9] Johannes Assenbaum, Mehr CPU, PAK-68 - die Prozessor-Austausch-Karte für 68000-Rechner, c't 8/87, S. 68
Now it's getting really exciting: after the 800-digit soldering spree and the first tests, the PAK is waiting for clock signals in the VHF range. So that the CPU doesn't get tangled up in the jungle of asynchronous accesses and wait cycles, a cunning GAL logic is required in addition to the 32 MHz crystal.
In the 16 MHz operation discussed so far, CLK16 supplies the clock for the CPU and GALs U1, U4 and U5, obtained by clock doubling from the 8 MHz system clock. The CPU runs synchronously to the system clock. The PAK/3 also gets along excellently with a 12 MHz system clock on the Atari mainboard [6], whereby the CPU then, you guessed it, runs at 24 MHz. The only change needed is to the clock doubler circuit. In this case, 470 ohms should be provided for R41 and R43, and 15 pF for C41 and C42 (the capacitors C41 and C42 are mistakenly missing in the parts list; however, they are correctly 'weighted' in the circuit diagram).
If you wanted to operate the CPU in this way with a system clock of 8 MHz and 32 MHz (i.e. with a quadrupled clock), then a more complex clock generation, for example with the help of a PLL, would be necessary. With the PAK-68/3, a different approach is taken in 32 MHz operation: A quartz oscillator supplies the CPU, FPU and GAL U1 with its own 32 MHz, so everything runs asynchronously to the system clock. The GALs U4 and U5, responsible for the interface to the mainboard, continue to run with CLK16 and thus synchronously with the mainboard.
Even with a cache hit, there is unfortunately no way around a wait state, since the tag RAM must first be queried as to whether the desired data is actually in the cache. Unfortunately, faster tag RAMs with access times of less than 20 ns are prohibitively expensive due to their limited availability. In PCs, normal SRAMs are almost exclusively used for this task; the comparator logic is integrated in the LSI chip set. On the other hand, the SRAMs used for the cache on the PAK/3 are the same as in the PC world. Thus, at least with SRAM, access time, availability and price are not a problem.
So far there are no differences between 16 MHz and 32 MHz operation. However, since the processor signals and CYC_00 as input signals for U4 are completely asynchronous at 32 MHz, it is to be expected that the setup and hold times for U4 will occasionally be violated. It then depends on chance whether the input signals are recognized as low or high, whereby the individual registers of the GAL can decide quite differently.
To illustrate what can happen, imagine two register outputs with the exact same equations. First of all, one expects that both outputs always agree. However, if an input signal changes at the 'right' time, one register could still recognize the old value while the other already sees the new value and then changes its state. So for one clock period both outputs would deliver different results.
The completion of an access to the mainboard also looks a bit different at 32 MHz. First, U4 waits one CLK16 clock longer than at 16MHz before DSACK1 asserts. With the next falling edge of the 32 MHz clock, the CPU recognizes that the waiting is finally over. Since both clocks run asynchronously, the data is not accepted when reading exactly with the rising edge of AS_00, but with a jitter of 30 ns. The PAK/2 has proven that this is not a problem, because there the data was always accepted by the CPU 30 ns before AS_00.
In principle, CPUCLK is not limited to 32 MHz. At lower frequencies, however, the jitter would be too great (always one CPUCLK period, at 25 MHz that is already 40 ns), which can be good, but doesn't have to be. There would be no problems with the jitter at higher clock frequencies, but there would be problems with the data bus drivers on the mainboard, which would not be switched off quickly enough after the work was done and would thus disrupt the following bus cycle. FPU accesses are most likely to be at risk because they do not require wait states.
Pins 11 (/BG), 13 (/BR) and 20 (E) are not connected to the 64-pin socket U7 on the PAK-68/3. If you want to plug in extensions that require these signals (possibly graphics extensions or emulators in the Atari), you have to submit them later using a wire jumper. If a 68000 is to be used as an emergency drive, these six lines (three each from the PAK and from the 68000) must be routed to a switching GAL (Listing PUK). In addition, /BR_20 (from J6) and /PAK_EN (from J5) are required on the GAL. The jumper J6 is not necessary, the switching between the processors is done with J5.
There are also some programs, such as drivers for virtual RAM, that use the MMU but want to find it in exactly the same state as it was initialized by TOS 3.06 in the TT. With TOS 2.06, this MMU initialization can be made up for by a small program in the auto folder, with which such programs then also run.
Unfortunately, this leads to another (small) problem: A cold start triggered via the keyboard causes the computer to hang up because TOS 2.06 deletes the entire memory and with it the MMU table. The CPU encounters a RESET command in the TOS startup sequence, which resets the entire hardware of the computer (including the L2 cache). However, the internal status of the CPU remains unaffected, the MMU and the two CPU caches are still active, which then promptly goes haywire.
An OR-linked Enable signal from both drives disables the cache if the Mac has problems with floppy disks.
With and without: The results (relative to the Mac Classic) in the right column were determined using the SANE accelerator GEMStart; the increase in FPU performance in the benchmark mix is clear. The Speedometer (version 3.23), on the other hand, always beats SANE in the FPU benchmarks. With a 68030-PAK, the results are on average 10 percent higher.
The more elegant alternative is to use TOS 3.06, which is already designed for a 68030. Since, in contrast to TOS 2.06, 68020 code is used in some places, there is a noticeable increase in speed.
Unfortunately, the original TOS 3.06 of the TT cannot be used on the PAK-68/3 due to different hardware requirements. This also applies to the patch presented in c't 9/92, p. 212, to which the entire MMU initialization fell victim at that time. There, a subprogram in which the CPU occasionally allows some of its colleagues to take a short breather (implemented in the TT with the TT-MFP) was simply skipped. What may have worked with the PAK/2 because of the lower speed is guaranteed to go wrong with the PAK-68/3. Therefore there will be a new patch for TOS 3.06, which we only want to release after an extensive test phase.
The only unsolved problem at the moment is MiNT's memory protection, which unfortunately hasn't been able to work on the PAK/3 yet. However, since other 68030 boards also have the same problem, the error is more likely to be found in MiNT.
With the Mac it is also possible (as with the PAK-68/2) to distribute the computer ROM over four EPROMs and plug them into the PAK. The Mac collection disk 7 contains the split utility from [3] along with all GAL sources and JEDECs. Since the Mac Classic is equipped with 512 instead of 256 Kbytes of ROM (only Apple knows why), the split program must be adjusted accordingly (variable ROMLen); 27C010 EPROMs are also required.
The GAL P3_PUK is housed in the spare socket of the PAK board with some 'outdoor wiring'.
The following problem occurs with some mask versions of the IWM or SWIM (floppy disk controller of the Mac, `Integrated Woz Machine´): The Mac does not recognize the inserted floppy disk, formats it incorrectly or indicates that write protection is not available. If your copy reacts like this, you will have to disable the processor and possibly also the L2 cache during all disk accesses. This is very easy to do using two diodes that are connected as shown in the figure above. Since the Mac is a bit finicky about formatting floppy disks anyway, you can also format floppy disks with `Diskcopy 4.2´: Simply create an image of a blank floppy disk and fry new disks with it. This works wonders, especially if a floppy disk has already been MSDOS-formatted.
The use of a SANE accelerator (an INIT or control field that redirects the SANE calls to the coprocessor) is recommended, so that all `calculating´ applications have something of the 68881/882 (if available) - otherwise it would be left out if the current program does not speak to him personally. [7] describes the Radius-MATH-INIT patch, which is actually intended for the American company's accelerator cards. If you have access to CompuServe (Mac Systems Forum) or AppleLink (Software Updates), you should download the GEMStart control panel (version 2.2 or 3.0) from Total Systems; it harmonises wonderfully with the PAK even without a patch - old and new.
The Dhrystone test and the calculation of the Julia set show a different picture, which also corresponds to the experiences from daily practice, for example with applications like TeX. The bottom line is that the PAK/3 with 32 MHz achieves a factor of 3.5 to 4 compared to a normal ST with 8 MHz and is about twice as fast as conventional 68000 accelerators with 16 MHz.
We don't have any experience with operation in the Amiga; However, we expect positive feedback from some 'beta testers' in the near future. (cm)
[2] Jankowski/Reschke/Rabich, Atari Profibuch ST-STE-TT, 10. Auflage 1991, Sybex-Verlag
[3] Manfred Völkel, Carsten Meyer, Tausch mit Turbo, PAK-68/2 in Mac, Atari und Amiga, c't 11/91, S. 314
[4] Jürgen Knufinke, Manfred Völkel, Stufe sechs, TOS x.06 für die PAK-68/2, c't 9/92, S. 212
[5] MausNet, Gruppen ATARI.HARD, ATARI.SOFT, ATARI-EXP 29. 11. 91 - 15. 10. 93
[6] Und es geht doch, Atari mit mehr als 8 MHz, ST Computer, c't 9/92, S. 118
[7] Leo Drisis, Real-Renner, 68881-Einbindung für die PAK im Mac, c't 5/90, S. 190
Mac users have it easy when configuring their system: simply click on a control field and set the desired parameters. So far, users of our projects MacStart and PAK-68/3 have been neglected, which is why we want to present a control panel together with detailed 'construction instructions' here.
For today's Mac user, it is certainly hard to imagine that in the beginning there was only one control panel, and that is not even true in today's sense: It was only a desk accessory (DA for short). The control panel soon became too small, and in system version 4.1 it was expanded to include a scrollbar-controlled selection function. It remained in this form for Mac users up to System 6.0x.
But everything changed with System 7: The control panel DA in its original form died out completely, from now on every cdev could call itself a control panel and have its own window of any size. But the conscientious programmer is advised to stick to the standard size so that it still fits under System 6.0.x.
The numItems parameter has only historical significance under System 7: In the old control panel DA, the cdev's dialog item list was simply appended to that of the control panel DA. Thus, all item numbers have shifted by the number of these dialog items. To compensate for this, the cdev was supplied with the number of items in the numItems parameter. Under System 7, nothing is appended there, but the numItems parameter is still dragged along for reasons of compatibility.
If a message of the event type comes in, the commander makes it easy for himself and simply passes through the event record known from the programming of applications. There the control panel can then help itself to its heart's content and fetch what it needs to evaluate the message.
If something goes wrong during control panel operation, it can also return an error message instead of the address of the memory area. In this way, the finder knows that the user must be informed of this error by means of an alert box and that the time has come to end the control field as quickly as possible.
Since System 6.0.4, the Gestalt manager has taken over the information service about the system and hardware configuration. Using a selector, you tell the gestalt function what you would like to know. The function diligently provides information, only the existence of the PAK-FPU remains undetected. The system programmers at Apple probably suspected this, because they allow the information function to be replaced for each selector. So you rewrite the routine and use it to replace the native one.
As a second task, the control panel should take over the installation of the shutdown routine for the MacStart project. This is hooked into the shutdown sequence using the shutdown manager. What this routine does with the shutdown can be read in [3]. As a little treat and therefore the third task, the processor's instruction or data cache should be able to be switched on and off, for example for speed tests or eliminating incompatibilities.
The DITL resource contains information about the number, structure and position of the dialog items. Under System 7, it is managed independently by the Finder and only passed to the control panel function for evaluating mouse clicks and keystrokes or for handling user items.
The nrct resource contains information about which rectangles should be drawn in the background of the control window. If you prefer to have your rectangles drawn using the DITL resource, you can enter the entire area of the control panel window as the rectangle here; history is also taken into account here. The coordinates of the rectangles as well as the DITL resource do not refer to the point (1,1) but to (1,89). This is because in the old control panel DA, space had to be left next to the actual control panel area for the icon list of the other available cdevs.
This is how the complete c't control panel presents itself to the user.
In order to be able to display an icon for each control panel both in the Finder and in the cdev list of the control panel DA, you need the bundle resources including accessories. The combination of resources of one type each, BNDL, FREF, and ICN#, uniquely associates each control panel file with an icon for the Finder. Under System 7, you can live without these resources if necessary.
All these resources must always have the ID -4064; the golden question mark goes to whoever came up with that pretty number. With this specification, however, the finder always knows which resources to use when building a control field. So that you don't have to do so much work with the creation of the resources, we would also like to point out Apple's ResEdit, which provides very convenient input methods for all the necessary resources.
The settings are stored System-7-like in the Preferences folder. Thus, booting via network would also be possible.
The entire listing is beyond the scope of this article. That is why we have limited ourselves to the most important parts here. Building instructions for INITs can also be found in [1]. The entire source code along with the executable program is also available this time on a collection disk and in the c't mailbox.(cm)
[2] Michael Parker, Carsten Meyer: FPU-Starthilfe, INIT für PAK-ausgerüstete Macs, c't 1/92, S. 188
[3] Carsten Meyer: Startautomatik, Macintosh automatisch ein- und ausschalten, c't 10/92 S. 196
[4] Inside Macintosh I, The Ressource Manager, The Dialog Manager, The File Manager, The Control Manager, Addison-Wesley 17737
[5] Inside Macintosh V: The Control Panel, The Shut Down Manager, Addison-Wesley 17719
[6] Inside Macintosh VI: Compatibility Guidelines, The Control Panel, The File Manager, The Finder Interface, Addison-Wesley 57755
[7] Holger Zimmermann: Doppel-PAK, PAK68/3 mit Cache und 68020/030 für 68000-Rechner, c't 11/93 S. 222, c't 12/93, S. 276
The creator Holger Zimmerman, aka pakman, is still very active to this day and always willing to help out in getting the PAK68/3 working for users.
This article contains the following:
Introduction to the PAK68/3 (translated from http://www.wrsonline.de/pak3.html)
Doppel-PAK - c't Magazine Article - November 1993
Doppel-PAK, Part 2 - c't Magazine Article - December 1993
Everything Under Control - c't Magazine Article - March 1994
Introduction to the PAK68/3
What is a PAK?
The PAK68/3 is a processor exchange card for a 68000 processor. The addition /3 indicates that the PAK68, or simply called PAK for short, is already in its 3rd generation. The design and layout of the card is intended to replace the processor in an Atari ST or MegaST. An adapter is required for the 'E' models, i.e. STE and MegaSTE, since a different design of the CPU (Central Processing Unit, i.e. processor) was used here. The PAK68 is not suitable for replacing the processor of another Atari computer system, ie TT or Falcon, since these computers do not work on a 68000 basis.What is that good for?
The PAK68/3 is probably the fastest and most widely used accelerator for 68000 based systems. In addition, the PAK is a public project (first published in c't-Magazine), which is still looked after and maintained by the developer Holger Zimmermann.The extensive possibilities offered by a PAK68/3 with its extensions, as well as the reliability in operation, give the Atari systems, which have gone out of fashion, a breath of fresh air. Even if you don't take off to the 'megahertz heights' of today's technology with the PAK, this accelerator offers Atari enthusiasts from the very beginning some joy in the system.
The PAK at a glance
Here are the capabilities of the PAK68/3 in brief:- 68030 processor, 32-bit, instruction and data cache on chip, PMMU
- 68881/882 FPU (optional)
- 32KB second-level cache (optional)
- 32-50MHz clock frequency
- EPROMs with an adapted TOS 3.06 on the PAK (optional)
- Option to switch to the 68000 CPU (optional)
- High compatibility with Atari mainboards and other hardware thanks to a clean bus interface (the PAK even runs in 68000-based MacIntosh systems!)
Assembly and Installation
The PAK68/3-030 is available as a self-assembly kit, which contains the blank circuit board and some components, or as a finished device. The assembly and installation instructions are available as an HTML document. The JEDEC files for the GALs used are public, but only released for private use. If it is not possible to program GALs yourself, they can also be obtained completely programmed. If you want to get down to business here, you can take a look at our GABI project.To get an overview of the effort involved in setting up and installing a PAK, here are a few more points of reference:
- The construction and installation of the PAK requires around 1,000 soldering points
- It may not be easy to get a 68030-50MHz processor, the same goes for the 68882 FPU.
- The processor must be removed from the mainboard and replaced with a socket. Unfortunately, this is not as easy as it sounds, especially when the CPU is in PLCC form factor (like in the 1040STE and MegaSTE).
- Under certain circumstances, the PAK no longer fits into the original housing (1040ST), at the latest with the FRAK/2 on top it becomes too small in every Atari cardboard box.
- The power supply may not be strong enough to supply the PAK with sufficient power, but this is definitely the case when used with the FRAK/2 and a graphics card.
Double PAK
PAK-68/3 with cache and 68020/030 for 68000 machines
Holger Zimmermann
The third edition of our long-running favorite PAK-68 is all about the fifth power of two: 32 MHz, 32 bits and 32 KByte cache are the important reasons for presenting the optimal 68000 accelerator for self-construction.
The PAK-68 has made many friends since its birth (not without complications) in July 1987. Initially [1] still clocked at a leisurely 8 MHz, later at 16 MHz [2] and now with clock frequencies almost at will, the processor replacement card remains the most logical way to breathe contemporary performance into an existing computer.
And it's not a piece of cake: In addition to the striking acceleration, Atari ST fans can enjoy the modern TOS versions 2.06 and (with small patches) 3.06 in 32-bit access; with the 68030 version there is also a PMMU for multi-TOS. Compact Macs particularly benefit from the three to sixfold increase in working speed under the resource-guzzling System 7, depending on the configuration. Our aim was to remain as 68000-compatible as possible, so that critical computers (Atari with IMP chipsets, Amiga) are not left out - although there will certainly still be applications and extensions that are not 68020/030- are compliant. So, before purchasing a PAK, find out whether your favorite editor also `plays´ on larger machines (Atari TT, Falcon, Mac II).
Eco-friendly
However, a 68020 with 8 or 16 MHz is no longer quite 'state of the art'. A 68030 was therefore at the top of the wish list, followed by higher clock speeds and a second-level cache. Luckily, there are now inexpensive components that allow such desires to be implemented without significant hardware contortions and without draining your bank account. So your ST or shoebox Mac is far from being old-fashioned.So here it is, the PAK-68/3. With almost the same dimensions as the PAK/2, a lot has happened 'under the hood': either a 68020 or 68030 CPU with a clock frequency of 16 to 33 MHz and a system clock of 8 MHz, a 68881 or 68882 FPU, 256 or 512 KB ROM and a 32-bit, 32-KB second-level cache.
When replicating the PAK/3, a difficult decision has to be made right from the start: Should it be a 68020 or a 68030? Since there is a separate circuit board version for both processors, it is not possible to change them later. As a sort of compensation, however, all other features can also be added or omitted later by exchanging the GALs. In view of the used processors that are sometimes offered very cheaply (see classified ads), if you don't already have an old PAK-68, we recommend the 030 version, which has a speed advantage of around 15 percent compared to the 020 with the same clock frequency. Otherwise you can continue to use most of the components of the old PAK, but only the GALs if you have a suitable programming device. Of course, the processor must be able to cope with the desired clock frequency.
Big plans
The circuit diagram of the 030 version of the PAK/3 basically covers the 020 as well. Since the /STERM, /CIIN, /CIOUT, /MMUDIS, /CBREQ and /CBACK connections are not available on the 68020, the corresponding lines, including the associated pull-ups and jumpers, are not required in this case. Unfortunately, the pin assignment of both CPUs is significantly different, which is only noticeable in the small print in the circuit diagram, but ultimately led to the two different board versions in the circuit board layout. Except for the GALs, R1, R2 and the CPU, both board versions are equipped exactly the same.The state machine concept of the PAK/2 has proven itself in practice, so that the core of it could also be retained for the PAK/3. The CPU, FPU and ROMs are largely wired like the predecessor. Synchronous clock generation using a 74F86 as a doubler for the 8 MHz system clock is also identical. The quartz oscillator was also found on the PAK/2, but was only responsible for the FPU there. With the appropriate GAL equipment, the PAK can now also be operated asynchronously to the system clock (tested up to 33 MHz); more about that in the next c't. GALs U4, U5, and U6 have largely retained their function (and names). You will still recognize a few things in the equations for U5, while hardly one stone was left unturned for U4. Completely new are GAL U1, which takes over control of the local bus cycles (i.e. for ROM reads and cache hits) and GAL U2, which coordinates the cache.
The 68020 already had a cache for instructions integrated into the CPU, and the 68030 added a data cache. Unfortunately, the internal caches are not particularly large at 256 bytes. Benchmarks often use short loops to test the performance, the CPU cache is still doing well here; in practice, however, there is not much left of it. The PAK/3 has therefore been given a 32 KB second-level cache, and anyone who turns it off for a test will see how right this decision was. Typical applications run at the maximum configuration (32 MHz plus L2 cache) at at least six times the 68000 speed, some benchmarks even show double-digit performance gains. Compared to the old PAK, which brought about 250 percent of the performance of a 68000, a significant improvement.
Piggyback
Two tag RAMs, four fast SRAMs and two data bus drivers have been added for the L2 cache. The SRAMs (narrow design, 300 mil) require largely the same signals as the ROMs (width 600 mil), so that a two-storey design was the obvious choice here. This was the only way to keep the dimensions of the PAK/2. We were also able to eliminate a small blemish: the PAK no longer covers the PDS slot in the Mac SE.The four SRAMs (each 8 KBit × 8, i.e. 8 Kbytes) together form a 32 Kbyte cache memory which the CPU can access very quickly and in full. GAL U2 ensures that the data fetched from main memory is stored in the cache on the side, so to speak. Data from main memory address 1, for example, also ends up in the cache at address 1. However, since the cache only contains 32,768 bytes, data from the main memory address 32,768+1 also ends up in the cache at address 1, with the previous contents of the cache being overwritten, of course. How can this go well?
Tag Storage
This is where the tag RAMs come into play, which are initially also static RAMs with 8K × 8, but whose data lines are connected to the CPU address lines A15 to A21. Each time data is cached, the associated upper address bits are placed in the tag RAM. For each individual data word in the cache, there is information as to which 32-KB page of the main memory the contents of the cache belong to.If the CPU initiates a read access, the tag RAM is now not read out but operated in the so-called compare mode, in which it compares the externally applied address bits A15 to A21 with the stored address bits. The comparator required for this is integrated in the tag RAM and reports the result to the outside via the MATCH connection; if there is a match, a high level is obtained here. This is the sign for the CPU and its read logic that the data it is looking for is already in the cache. If the tag RAM does not report a hit (MATCH is low), then the data word in the cache belongs to another page, which means that the CPU now has to look in the main memory. In the hope that the data word that has just been fetched will be needed again in the near future, the control logic then immediately stores it in the cache.
The cache SRAMs are below the EPROM sockets. Before using SRAM sockets, try out whether the 'turrets' fit into your computer. Especially in the compact Mac it is quite tight.
Because of the 16-bit wide system bus, the cache must be managed word by word, so two tag RAMs are required for the 32-bit wide cache of the PAK. The seven address bits that the tag RAM evaluates cover 128 pages, ie the address space from 0 to 4 Mbytes. In the area above $40,000, the second-level cache is ineffective, which does not imply any restrictions on the Macs and Ataris in question; the CPU has direct access to the ROM on the PAK anyway.
The same plan for two versions: Except for the CPU, both new PAKs are identical. The 64-pin socket for connecting to the main board belongs on the solder side in the DIL rows that point to the middle of the board (see also the detailed photo). R1 and R2 are only fitted to the 030 version.
The eighth bit of the tag RAMs forms the valid bit and is therefore at V__CC. In order to declare all data in the cache invalid (e.g. in the case of DMA accesses or a reset), the tag RAMs are cleared via the /TCLR connection and all bits are thus set to low. In this way, the comparator always recognizes a cache miss (no hit) on the next read access and then reloads the cache.
UnBursted
There are no differences between the 68020 and 68030 in the course of a bus cycle on the PAK/3. The synchronous bus mode of the 68030, which once accelerated the Mac IIci so much (burst mode with /STERM), unfortunately cannot be used here; on the one hand the ROMs and tag RAMs are too slow at 32 MHz, on the other hand there would be a problem if a read access to the main memory was followed by a local read access to the PAK/3 because the data bus drivers of the mainboard (in the Mac as well as in the Atari) would still be switched on from the previous access and would thus intervene.A PAK/3 with a clock frequency of 16 MHz is used to illustrate the processes. A bus cycle begins when the CPU applies the address and, after the search in the CPU's internal cache has been unsuccessful, activates /AS_20 with the falling edge of CLK16. If this is a ROM access (U6's /ROM active) and there are ROMs on the PAK (J7 on 1-2), U1 turns on the ROMs and immediately acknowledges with /DSACK0 and /DSACK1. This is recognized by the CPU with the next falling edge of CLK16, whereupon it terminates the bus cycle exactly one CLK16 cycle later by deactivating /AS_20. U1 keeps /CYC_00 high all the time, so U4 doesn't initiate access to the mainboard.
Assuming there is a read access to the range $0 to $40 0000 (U6's /DRAM active), the L2 cache is activated (/TCLR high), but at least one of the two tag RAMs does not report a hit (MAT0 or MAT2 to low): In this case, /CYC_00 goes to low after a short waiting time and thus starts access to the mainboard in U4, which in principle runs like with the PAK/2. U2 now ensures that the data word that has been read is also stored in the SRAMs and at the same time that the associated address is retained in the corresponding tag RAM.
Except for the processor and its pinout, the 020 and 030 PAK are the same. You will therefore only find the circuit once here.
The clock doubler for synchronous 16 MHz operation was carried over from the old PAK.
The fact that the data bus on the mainboard is only 16 bits wide, but the cache is 32 bits wide, requires a closer look. If the address of a data word is not on a long word boundary (i.e. A1 = high), the following problem arises: The mainboard supplies the data word on the CPU data lines D16 to D31, but the CPU later expects it in the cache on D0 to D15. In this case, the two bus drivers on the PAK/3 become active in the truest sense of the word and solve the problem by creating a temporary cross-connection between the upper and lower data word.
Ricochet
Up to this point, the PAK/3 is not a bit faster than its predecessor. But the next time this address is read, the comparator in the tag RAM recognizes a hit if the address created and the address stored are identical. If the second tag RAM, which is responsible for the other half of the 32-bit wide cache, also signals a hit (MAT0 and MAT2 both high), then a quick look into the cache is enough for the CPU to get the desired data . In this case, /CYC_00 stays high, U4 politely holds back, and the mainboard doesn't even notice this access. Instead, U1 switches on the SRAMs, analogous to ROM access, and confirms with /DSACK0 and /DSACK1. The rest is a matter of form. Such a read access only takes three CLK16 cycles, compared to eight CLK16 cycles, which access to the mainboard takes in the best case. The CPU is presented with 32 bits from the cache at once, but without a cache, another eight cycles are required for the second word.The cache is powerless for write access, however, with the write-through strategy implemented here, access to the mainboard is required in any case. If the tag RAMs report a hit, i.e. the data word belonging to this address is also in the cache, U2 takes care of updating the cache contents as well. This means that the next read access can be handled from the cache again. So-called write-back caches, which independently write their content back into the main memory and thus also buffer write access, are much more complex to implement and in this case - because of the 16-bit bottleneck - not even particularly effective.
Past the CPU
DMA access is an unpleasant topic that Macs (at least the types in question here) are spared. Since the content of the main memory is manipulated bypassing the CPU, the entire cache must be declared invalid after such an action to be on the safe side. With the PAK/3, this is achieved by deleting the content of the tag RAMs, i.e. setting them to low. This ensures that the cache and main memory are always coherent. But if DMA accesses take place everywhere, the performance naturally goes down. For this reason, the blitter in the Atari ST with PAK-68/3 is completely useless and should therefore be switched off or removed altogether; the new CPU is faster anyway.Construction and Maintenance
Four configuration permutations are possible for Mac and ST: Completely naked, i.e. without cache and without ROMs (recommended for initial tests), with ROMs and without cache, with cache but without ROMs and finally the full configuration with all the trimmings On it, each with the GALs that match the computer and the desired clock frequency. The options are specified in the bill of materials. In the table next to the parts list you will find all possible GAL designations. In contrast to the old PAK, all GALs are populated here even if the ROMs or the cache are absent, since these options are indicated to the PAK logic by jumpers; In the second part of this article we will go into the specifics of the computer in detail.Thanks to multilayer technology, the circuit board almost achieves the packing density of a free beer bar; In order not to exceed the dimensions of the old PAK, the tag RAMs had to be banished under the EPROM sockets, and some support capacitors can only be found on the solder side when using closed PGA sockets for the processors. Incidentally, the CPU and FPU can only be removed from PGA types with cup spring contacts (`precision sockets´), if at all, only with brute force. Normal designs, such as those produced by AMP, are better here. To be on the safe side, you should also donate the Tag-RAMs socket; the bars of the EPROM sockets above must be carefully removed.
As with the old PAK, a 64-pin DIL socket on the solder side with a plugged-in adapter socket or SIL strip that pricks on both sides establishes the connection to the mainboard. For reasons of stability, this time the operative removal of the old 68000 CPU - if not already done - is unavoidable, a 64-pin DIL socket takes its place. In contrast to the PAK-68/2, a GAL set that allows operation with the CPU remaining on the mainboard is not provided. The amputation of all 68000 legs with small side cutters has proven effective. The pins can then be easily pulled out of the circuit board individually. Anyone who tears up during this procedure because of the suffering CPU can later solder a prosthesis in the form of a DIL socket under the legless 68000 and continue to use it for test purposes.
Incidentally, the operation of the 68000 switchable next to the 68020/030 is optionally possible if you thread a few lines to the previously empty spare version and use a correspondingly programmed GAL (tips follow). Unfortunately, with the best will in the world, the connections could no longer be accommodated on the circuit board, and since not everyone needs a 'before/after' option, we only provided a freely wireable empty socket for this. Alternatively, the 68000 socket on the component side of the board can also be used to accommodate MSDOS emulators and similar extensions that use the processor signals directly.
The use of medium-fast CPUs (20 and 25 MHz) with asynchronous clocking, i.e. when using the 32 MHz GAL set and a correspondingly designed quartz oscillator, usually leads to problems; an asynchronously running 20 MHz CPU is also hardly faster than the synchronously clocked 16 MHz version due to unavoidable waiting cycles.
In the next issue of c't you will find out what needs to be considered when adapting TOS 3.06, how the asynchronous accesses of the 32 MHz version work and how to persuade the Mac to use the FPU and the data cache. (cm)
Literature
[1] Manfred Völkel, Carsten Meyer, Tausch mit Turbo, PAK-68/2 - mehr Dampf für 68000-Rechner, c't 10/91, S. 178[2] Manfred Völkel, Carsten Meyer, Tausch mit Turbo, PAK-68/2 in Mac, Atari und Amiga, c't 11/91, S. 314
[3] Manfred Völkel, Tausch mit Turbo, Tips zur PAK-68/2 im ST, c't 12/91, S. 222
[4] Manfred Völkel, Turbo für den Austauschmotor, Schneller Atari mit modifizierter PAK-68, c't 3/91, S. 282
[5] Carsten Meyer, Schnelle Freundin, Amiga mit PAK und 14 MHz, c't 9/90, S. 338
[6] Roland Beck, Hilfskraft: 68000-Prozessor trotz PAK-68/2, c't 7/92, S. 196
[7] Leo Drisis, Real-Renner, 68881-Einbindung für die PAK im Mac, c't 5/90, S. 190
[8] Carsten Meyer, Fast-PAK, 68000 und PAK-68 mit 16 MHz im Macintosh, c't 1/90, S. 194
[9] Johannes Assenbaum, Mehr CPU, PAK-68 - die Prozessor-Austausch-Karte für 68000-Rechner, c't 8/87, S. 68
Double PAK
PAK-68/3 with cache and 68020/030 for 68000 computers, part 2
Holger Zimmerman
Now it's getting really exciting: after the 800-digit soldering spree and the first tests, the PAK is waiting for clock signals in the VHF range. So that the CPU doesn't get tangled up in the jungle of asynchronous accesses and wait cycles, a cunning GAL logic is required in addition to the 32 MHz crystal.
In the 16 MHz operation discussed so far, CLK16 supplies the clock for the CPU and GALs U1, U4 and U5, obtained by clock doubling from the 8 MHz system clock. The CPU runs synchronously to the system clock. The PAK/3 also gets along excellently with a 12 MHz system clock on the Atari mainboard [6], whereby the CPU then, you guessed it, runs at 24 MHz. The only change needed is to the clock doubler circuit. In this case, 470 ohms should be provided for R41 and R43, and 15 pF for C41 and C42 (the capacitors C41 and C42 are mistakenly missing in the parts list; however, they are correctly 'weighted' in the circuit diagram).
If you wanted to operate the CPU in this way with a system clock of 8 MHz and 32 MHz (i.e. with a quadrupled clock), then a more complex clock generation, for example with the help of a PLL, would be necessary. With the PAK-68/3, a different approach is taken in 32 MHz operation: A quartz oscillator supplies the CPU, FPU and GAL U1 with its own 32 MHz, so everything runs asynchronously to the system clock. The GALs U4 and U5, responsible for the interface to the mainboard, continue to run with CLK16 and thus synchronously with the mainboard.
In step...
All local accesses to the PAK-68/3 (cache hit, ROM and FPU) are always synchronous to CPUCLK. Communication between the CPU and FPU is the same as at 16 MHz, just twice as fast. With ROM access, the speed is limited by the access time of the EPROMs. While 120 EPROMs are sufficient for 0 wait states at 16 MHz, access times of less than 60 ns would be necessary at 32 MHz. Such EPROMs are not only difficult to obtain, but also uncomfortably expensive. On the PAK-68/3, U1 with the signals W1 and W2 ensures that four wait states are inserted when accessing the ROM, which means that EPROMs with up to 150 ns (but then with /CS on GND) can be used.Even with a cache hit, there is unfortunately no way around a wait state, since the tag RAM must first be queried as to whether the desired data is actually in the cache. Unfortunately, faster tag RAMs with access times of less than 20 ns are prohibitively expensive due to their limited availability. In PCs, normal SRAMs are almost exclusively used for this task; the comparator logic is integrated in the LSI chip set. On the other hand, the SRAMs used for the cache on the PAK/3 are the same as in the PC world. Thus, at least with SRAM, access time, availability and price are not a problem.
...March!
As long as the mainboard is not being accessed, the CYC_00 output of U1 remains inactive and the state machine in U4 is standing still, so to speak, with the 68000 bus signals also being inactive. For write accesses, I/O accesses and cache misses, U1 activates CYC_00, whereupon U4 starts a bus cycle. U1 then has nothing to do with the rest of this bus cycle, the outputs DSACK0 and DSACK1 of U1 are switched off, and everything is waiting for DSACK1 from U4, whose output was switched on with CYC_00. Meanwhile, it's raining down on Waitstates.So far there are no differences between 16 MHz and 32 MHz operation. However, since the processor signals and CYC_00 as input signals for U4 are completely asynchronous at 32 MHz, it is to be expected that the setup and hold times for U4 will occasionally be violated. It then depends on chance whether the input signals are recognized as low or high, whereby the individual registers of the GAL can decide quite differently.
To illustrate what can happen, imagine two register outputs with the exact same equations. First of all, one expects that both outputs always agree. However, if an input signal changes at the 'right' time, one register could still recognize the old value while the other already sees the new value and then changes its state. So for one clock period both outputs would deliver different results.
Forbidden Zone
To avoid unpleasant situations arising from this, two state variables that are dependent on asynchronous input signals must never change at the same time. With the PAK-68/3 it is ensured that only one status variable changes at a time, and thus no `bomb atmosphere´ arises.The completion of an access to the mainboard also looks a bit different at 32 MHz. First, U4 waits one CLK16 clock longer than at 16MHz before DSACK1 asserts. With the next falling edge of the 32 MHz clock, the CPU recognizes that the waiting is finally over. Since both clocks run asynchronously, the data is not accepted when reading exactly with the rising edge of AS_00, but with a jitter of 30 ns. The PAK/2 has proven that this is not a problem, because there the data was always accepted by the CPU 30 ns before AS_00.
In principle, CPUCLK is not limited to 32 MHz. At lower frequencies, however, the jitter would be too great (always one CPUCLK period, at 25 MHz that is already 40 ns), which can be good, but doesn't have to be. There would be no problems with the jitter at higher clock frequencies, but there would be problems with the data bus drivers on the mainboard, which would not be switched off quickly enough after the work was done and would thus disrupt the following bus cycle. FPU accesses are most likely to be at risk because they do not require wait states.
Assistant
As already mentioned in the first part of the article, the circuit board has an empty socket and a 68000 socket, with which you can easily switch between the two processors (of course not during operation).Pins 11 (/BG), 13 (/BR) and 20 (E) are not connected to the 64-pin socket U7 on the PAK-68/3. If you want to plug in extensions that require these signals (possibly graphics extensions or emulators in the Atari), you have to submit them later using a wire jumper. If a 68000 is to be used as an emergency drive, these six lines (three each from the PAK and from the 68000) must be routed to a switching GAL (Listing PUK). In addition, /BR_20 (from J6) and /PAK_EN (from J5) are required on the GAL. The jumper J6 is not necessary, the switching between the processors is done with J5.
Atari
The PAK/3-020 runs without any problems under TOS 2.06, which in principle also applies to the 030 PAK. However, since TOS 2.06 does not know about the existence of an MMU and a CPU-internal data cache, problems can arise in certain situations, for example with some disk drivers with an activated data cache. This is not noticeable in the TT because the MMU is programmed accordingly by TOS 3.06. AHDI, which also runs perfectly on a PAK/3-030 under TOS 2.06 without MMU support, shows that there is another way.There are also some programs, such as drivers for virtual RAM, that use the MMU but want to find it in exactly the same state as it was initialized by TOS 3.06 in the TT. With TOS 2.06, this MMU initialization can be made up for by a small program in the auto folder, with which such programs then also run.
Unfortunately, this leads to another (small) problem: A cold start triggered via the keyboard causes the computer to hang up because TOS 2.06 deletes the entire memory and with it the MMU table. The CPU encounters a RESET command in the TOS startup sequence, which resets the entire hardware of the computer (including the L2 cache). However, the internal status of the CPU remains unaffected, the MMU and the two CPU caches are still active, which then promptly goes haywire.
An OR-linked Enable signal from both drives disables the cache if the Mac has problems with floppy disks.
With and without: The results (relative to the Mac Classic) in the right column were determined using the SANE accelerator GEMStart; the increase in FPU performance in the benchmark mix is clear. The Speedometer (version 3.23), on the other hand, always beats SANE in the FPU benchmarks. With a 68030-PAK, the results are on average 10 percent higher.
The more elegant alternative is to use TOS 3.06, which is already designed for a 68030. Since, in contrast to TOS 2.06, 68020 code is used in some places, there is a noticeable increase in speed.
Unfortunately, the original TOS 3.06 of the TT cannot be used on the PAK-68/3 due to different hardware requirements. This also applies to the patch presented in c't 9/92, p. 212, to which the entire MMU initialization fell victim at that time. There, a subprogram in which the CPU occasionally allows some of its colleagues to take a short breather (implemented in the TT with the TT-MFP) was simply skipped. What may have worked with the PAK/2 because of the lower speed is guaranteed to go wrong with the PAK-68/3. Therefore there will be a new patch for TOS 3.06, which we only want to release after an extensive test phase.
The only unsolved problem at the moment is MiNT's memory protection, which unfortunately hasn't been able to work on the PAK/3 yet. However, since other 68030 boards also have the same problem, the error is more likely to be found in MiNT.
Inside Mac
In principle, it can be installed in all Macs with 68000-8, including the Classic 1. The only trouble is that the latter (like the Atari STE) uses a 68000 in the PLCC version - and nothing can be done without an appropriate adapter. Various companies, such as MW Elektronik from Königswinter, offer this.With the Mac it is also possible (as with the PAK-68/2) to distribute the computer ROM over four EPROMs and plug them into the PAK. The Mac collection disk 7 contains the split utility from [3] along with all GAL sources and JEDECs. Since the Mac Classic is equipped with 512 instead of 256 Kbytes of ROM (only Apple knows why), the split program must be adjusted accordingly (variable ROMLen); 27C010 EPROMs are also required.
The GAL P3_PUK is housed in the spare socket of the PAK board with some 'outdoor wiring'.
The following problem occurs with some mask versions of the IWM or SWIM (floppy disk controller of the Mac, `Integrated Woz Machine´): The Mac does not recognize the inserted floppy disk, formats it incorrectly or indicates that write protection is not available. If your copy reacts like this, you will have to disable the processor and possibly also the L2 cache during all disk accesses. This is very easy to do using two diodes that are connected as shown in the figure above. Since the Mac is a bit finicky about formatting floppy disks anyway, you can also format floppy disks with `Diskcopy 4.2´: Simply create an image of a blank floppy disk and fry new disks with it. This works wonders, especially if a floppy disk has already been MSDOS-formatted.
The use of a SANE accelerator (an INIT or control field that redirects the SANE calls to the coprocessor) is recommended, so that all `calculating´ applications have something of the 68881/882 (if available) - otherwise it would be left out if the current program does not speak to him personally. [7] describes the Radius-MATH-INIT patch, which is actually intended for the American company's accelerator cards. If you have access to CompuServe (Mac Systems Forum) or AppleLink (Software Updates), you should download the GEMStart control panel (version 2.2 or 3.0) from Total Systems; it harmonises wonderfully with the PAK even without a patch - old and new.
Output
What is important, if you can believe a well-known German politician, is what comes out at the end - see the benchmark table below. On the one hand, it is noticeable that QuickIndex already certified the PAK/2 as having a fairly high computing power. This is due to the short program loops used there, which largely find space in the CPU's internal cache; so the step to the PAK/3 in the various expansion stages no longer seems so big. In the last column it is noticeable that a 24 MHz PAK in a 12 MHz Atari is almost as fast as the 32 MHz version with an 8 MHz bus clock, because data access to the mainboard can only take place at certain times; in between the video controller is on. A faster CPU at 8 MHz thus wastes a good part of the time gained waiting for the next 'access window'. The many write accesses to the screen memory, which do not run via the L2 cache, also have a somewhat slowing effect in the GEM test.The Dhrystone test and the calculation of the Julia set show a different picture, which also corresponds to the experiences from daily practice, for example with applications like TeX. The bottom line is that the PAK/3 with 32 MHz achieves a factor of 3.5 to 4 compared to a normal ST with 8 MHz and is about twice as fast as conventional 68000 accelerators with 16 MHz.
We don't have any experience with operation in the Amiga; However, we expect positive feedback from some 'beta testers' in the near future. (cm)
Literature
[1] Holger Zimmermann, Doppel-PAK, PAK-68/3 mit Cache und 68020/030 für 68000-Rechner, c't 11/93, S. 222[2] Jankowski/Reschke/Rabich, Atari Profibuch ST-STE-TT, 10. Auflage 1991, Sybex-Verlag
[3] Manfred Völkel, Carsten Meyer, Tausch mit Turbo, PAK-68/2 in Mac, Atari und Amiga, c't 11/91, S. 314
[4] Jürgen Knufinke, Manfred Völkel, Stufe sechs, TOS x.06 für die PAK-68/2, c't 9/92, S. 212
[5] MausNet, Gruppen ATARI.HARD, ATARI.SOFT, ATARI-EXP 29. 11. 91 - 15. 10. 93
[6] Und es geht doch, Atari mit mehr als 8 MHz, ST Computer, c't 9/92, S. 118
[7] Leo Drisis, Real-Renner, 68881-Einbindung für die PAK im Mac, c't 5/90, S. 190
Everything Under Control
Macintosh control panel for PAK-68 and MacStart
Michael Parker
Mac users have it easy when configuring their system: simply click on a control field and set the desired parameters. So far, users of our projects MacStart and PAK-68/3 have been neglected, which is why we want to present a control panel together with detailed 'construction instructions' here.
For today's Mac user, it is certainly hard to imagine that in the beginning there was only one control panel, and that is not even true in today's sense: It was only a desk accessory (DA for short). The control panel soon became too small, and in system version 4.1 it was expanded to include a scrollbar-controlled selection function. It remained in this form for Mac users up to System 6.0x.
But everything changed with System 7: The control panel DA in its original form died out completely, from now on every cdev could call itself a control panel and have its own window of any size. But the conscientious programmer is advised to stick to the standard size so that it still fits under System 6.0.x.
Command Receiver
However, one thing has remained the same since System 4.1: the mechanism. In essence, a control panel is just a command receiver that listens to a limited number of messages. These messages instruct the control panel to check whether it is suitable for the current software or hardware, to initialize itself, to react to various events, such as a mouse click, and finally to clean everything up neatly again. These messages used to come from the said control panel DA and have been sent directly from the Finder since System 7. The event commands are counterparts of well-known warriors such as mouse click, keystroke, cut/copy/paste/clear as well as update, activate and deactivate events. So that nothing burns, a control field regularly receives a null event, which does not want to tell the recipient of the command much more than that he now has a little time to carry out regular work such as displaying the current time.All Routine
The heart of a control is a control function with a strictly defined parameter list. Those parameters reveal a lot about the specifics of control panels. In addition to the already known message, a pointer to a dialog item list is transferred to the control field function. This reveals that a control is just a special kind of dialog. As a special feature, the structure and drawing is not a matter for the control field itself, but for the finder. Therefore, a cdev is a common way to open a window in the Finder, even if you don't want to control anything. In the event of a mouse click or keystroke, the control field function is only informed in the item parameter which dialog item was activated.The numItems parameter has only historical significance under System 7: In the old control panel DA, the cdev's dialog item list was simply appended to that of the control panel DA. Thus, all item numbers have shifted by the number of these dialog items. To compensate for this, the cdev was supplied with the number of items in the numItems parameter. Under System 7, nothing is appended there, but the numItems parameter is still dragged along for reasons of compatibility.
If a message of the event type comes in, the commander makes it easy for himself and simply passes through the event record known from the programming of applications. There the control panel can then help itself to its heart's content and fetch what it needs to evaluate the message.
Give and Take
A final parameter called cdevValue handles the two-way communication between the control panel and the Finder or the old control panel DA. Once a control field has been initialized, it is given the exact value in the cdevValue parameter that it returned the last time it was called. With this ping-pong method, the control field function can, for example, obtain a pointer or a handle to a memory area it has allocated.If something goes wrong during control panel operation, it can also return an error message instead of the address of the memory area. In this way, the finder knows that the user must be informed of this error by means of an alert box and that the time has come to end the control field as quickly as possible.
Wishlist
The c't control panel should fulfill three tasks. First, if you have a Mac with a PAK (Processor Replacement Card) [7], it should make the system understand that it really does have a floating-point processor installed. Every normal Mac knows this from the start, only the PAK Macs play dumb here, because the ROMs and the system do not (yet) know anything about the calculator on the PAK. There's not much left but to help out a bit: A replacement routine is needed. In [2] we presented the hasFPU init for this purpose, but it no longer works properly under System 7.Since System 6.0.4, the Gestalt manager has taken over the information service about the system and hardware configuration. Using a selector, you tell the gestalt function what you would like to know. The function diligently provides information, only the existence of the PAK-FPU remains undetected. The system programmers at Apple probably suspected this, because they allow the information function to be replaced for each selector. So you rewrite the routine and use it to replace the native one.
As a second task, the control panel should take over the installation of the shutdown routine for the MacStart project. This is hooked into the shutdown sequence using the shutdown manager. What this routine does with the shutdown can be read in [3]. As a little treat and therefore the third task, the processor's instruction or data cache should be able to be switched on and off, for example for speed tests or eliminating incompatibilities.
CDEV and More
A control panel file requires a lot of other resources besides the cdev resource that contains the control panel function. First, there is the mach resource, which contains only one integer word for the system configuration and one word for the hardware configuration. In fact, you can encode a lot of configuration into the bits of these two words, but two configurations have emerged in the Gestalt Age. $FFFF 0000 means: The control panel runs on all configurations, while $0000 FFFF means: Ask the control panel function itself with a special message whether the control panel can run under this configuration.The DITL resource contains information about the number, structure and position of the dialog items. Under System 7, it is managed independently by the Finder and only passed to the control panel function for evaluating mouse clicks and keystrokes or for handling user items.
The nrct resource contains information about which rectangles should be drawn in the background of the control window. If you prefer to have your rectangles drawn using the DITL resource, you can enter the entire area of the control panel window as the rectangle here; history is also taken into account here. The coordinates of the rectangles as well as the DITL resource do not refer to the point (1,1) but to (1,89). This is because in the old control panel DA, space had to be left next to the actual control panel area for the icon list of the other available cdevs.
This is how the complete c't control panel presents itself to the user.
In order to be able to display an icon for each control panel both in the Finder and in the cdev list of the control panel DA, you need the bundle resources including accessories. The combination of resources of one type each, BNDL, FREF, and ICN#, uniquely associates each control panel file with an icon for the Finder. Under System 7, you can live without these resources if necessary.
All these resources must always have the ID -4064; the golden question mark goes to whoever came up with that pretty number. With this specification, however, the finder always knows which resources to use when building a control field. So that you don't have to do so much work with the creation of the resources, we would also like to point out Apple's ResEdit, which provides very convenient input methods for all the necessary resources.
An INIT must be
The prettiest control panel is useless if there is no code that installs the patching routines in the system; according to old custom, the whole thing has to happen during system start-up. There's only one thing: an INIT resource is required. A nice feature of control panel files is that if they are in the right place in the system folder, they are scanned for INIT resources during system startup. You may have already seen how an INIT is structured in [1] and [2]. Suffice it to say here that this control panel INIT doesn't do much more than load the settings, check what to install, load appropriate routines into the system heap, and hook them into the appropriate places in the system.The settings are stored System-7-like in the Preferences folder. Thus, booting via network would also be possible.
Building Instructions
The control panel was developed with Think C. It basically consists of four projects, each of which contributes a resource to the control panel: the control panel function, the FPU detection renderer replacement routine, the MacStart circuit shutdown routine, and an INIT that stores the latter two in the system heap at boot time Installed. The project for the cdev resource also handles the merging of all resources into a full-fledged control panel.The entire listing is beyond the scope of this article. That is why we have limited ourselves to the most important parts here. Building instructions for INITs can also be found in [1]. The entire source code along with the executable program is also available this time on a collection disk and in the c't mailbox.(cm)
Literature
[1] Leonidas Drisis: Starthilfe, c't 2/91, S. 182[2] Michael Parker, Carsten Meyer: FPU-Starthilfe, INIT für PAK-ausgerüstete Macs, c't 1/92, S. 188
[3] Carsten Meyer: Startautomatik, Macintosh automatisch ein- und ausschalten, c't 10/92 S. 196
[4] Inside Macintosh I, The Ressource Manager, The Dialog Manager, The File Manager, The Control Manager, Addison-Wesley 17737
[5] Inside Macintosh V: The Control Panel, The Shut Down Manager, Addison-Wesley 17719
[6] Inside Macintosh VI: Compatibility Guidelines, The Control Panel, The File Manager, The Finder Interface, Addison-Wesley 57755
[7] Holger Zimmermann: Doppel-PAK, PAK68/3 mit Cache und 68020/030 für 68000-Rechner, c't 11/93 S. 222, c't 12/93, S. 276