Always the biggest problem with parallel data transfer (and one of the major reasons it disappeared from printers) is that no matter what you do, all 8 signals do not arrive at the same time... and the smallest little change in the electromagnetic field surrounding and within the wires can change the time it takes for each signal to travel.
However, in your case, I think you can make the parallel work. What you'll have to do is make your parallel bus synchronous, not asynchronous. You might try having 9 lines run from one device to the other. 8 for data, one for sync. The 8 data lines can run to/from any old DigitalI/O pins, but run the sync line to a PORTB interrupt pin. All you'll have to do is make sure that you wait a microsecond or so (at least 5 nanoseconds per foot of transmission distance) then send your sync pulse. Or, you can write your byte and send the sync pulse all at the same time, then have your recieving device read they byte when the sync pulse goes low (again watiting at least 5 nanoseconds per foot).
If you don't want to play around with synchronous parallel (i wish you would though, sounds like a fun project!) I like your idea of putting a bus control module on the program port.... very interesting. You might make up some kind of command set to direct where the serial port on the controller connects to... kind of like a digitally controlled serial switch box. You'd just need a micro with a few UARTS to pass bytes around inside... i'm sure most low end DSPs and probably some higher end embedded processors would have the 3+ uarts you'd need (unless you want to bit bang, which is another option entirely).
Very cool ideas... keep us 'posted'!
-q