Many high-performance applications involve large data sets that are impossible to fit entirely within on-chip memories of even the largest FPGAs. As a result, they must be stored in off-chip SDRAMs and loaded onto the FPGAs as computations progress. Because of the high latency and energy consumption associated with off-chip memory accesses, it is important to develop efficient operation schedules that not only minimize latency of computations, but also the amount of data I/Os. We formulate this problem as a modified resource-constrained job scheduling problem. The problem is then solved using a list scheduling algorithm that takes advantage of the fast burst-mode access of SDRAMs. Results have shown that for large problem sizes, the performance of our algorithm is within 1% of a hand-optimized matrix-matrix multiplication implementation, with no memory overhead, and is within 0.03% of the theoretical minimum latency of an 8-by-8 cofactor matrix computation.