From research!ihnp4!shell!heller@UCB-VAX 26 85 Feb CST (Tue) 18:53:11
Received: from su-score.arpa by csnet-relay.arpa id a003277; 27 Feb 85 4:28 EST
Received: from UCB-VAX.ARPA by SU-SCORE.ARPA with TCP; Wed 27 Feb 85 01:17:39-PST
Received: by UCB-VAX.ARPA (4.24/4.42)
	id AA02630; Wed, 27 Feb 85 01:12:18 pst
From: ihnp4!shell!heller@UCB-VAX
Message-Id: <8502270912.AA02630@UCB-VAX.ARPA>
Date: 26 Feb 85 18:53:11 CST (Tue)
Received: by ihnp4.ATT.UUCP; id AA17542; 26 Feb 85 18:53:11 CST (Tue)
To: na.dis@SU-SCORE
Subject: 2D FFT's
Via:  Csnet-Relay; 27 Feb 85 5:02-EDT

In October I asked for some information on large complex 2D FFT's.
Here's a summary of the responses with some comments added.
First, the original query:

  Can anyone supply me with information or references concerning runtime
  of a 4096 x 512 complex 2-dimensional FFT on current supercomputers?
  I'm interested in actual runtime, not speculation (we do that in-house),
  and separate times for work and data manipulation if possible.  Some
  indication of your algorithm would also be appreciated.  Many thanks, 

..................................................

From: Robert Schreiber

sorry, ours isnt up yet.
	(watch for the GuilTech machine)

..................................................

From: Richard Bartels

I don't have anything to offer in answer to your question.
But I would like a few pointers into the 2-D FFT literature
(and beyond to higher dimensions) if you have the time.

        Two references on applications:
        R.H. Tatham, "Multidimensional Filtering of Seismic Data,"
        IEEE Proceedings, Oct. 1984, pp. 1357-1369

        D.W. March and A.D. Bailey,
        "A review of the two-dimensional transform and its use in seismic
        processing,"
        First Break, vol. 1, no. 1, 1983, pp. 9-21
            (If you can't find this in your library, which is likely,
	    I'll send a photocopy.  There are some color photos which
	    do not reproduce well.)

        References on algorithms:
        Elliot and Rao
        Fast Transforms; Algorithms, Analyses, Applications
        Academic Press, 1982
            (relatively little on n-D transforms)

        papers by Clive Temperton in J. Computational Physics,
	Paul Swarztrauber in Parallel Computing (new journal).

..................................................

From: mazama!stew (Stewart Levin)

For the new Amdahl/Fujitsu machine call Dr. Chani Pangali at Amdahl
(408)746-3312.  He's in charge of benchmarking the machine and 2D FFT's
are one of his tests.  I believe they were running at about 400 Megaflops
though don't quote me.  By the way he'll be at the Houston SIAM-SEG-SPE
Mathematical Computations meeting in January to present his results.
I'll be there too talking about wraparound supression in frequency domain
migrations.
	(Pangali did not attend, after all.)

..................................................

From: Roland Sweet

Bonnie Bowman of Mobil Expl. and Prod. cornered Paul Swarztrauber and me last
year wanting to know how to do 2D ffts on non-core-contained arrays.  You might
see if she has any insights now.  Her phone is (214) 573-7737 -- that is, if
you competitors can talk to each other.
	(no comment)

..................................................

From: OSCAR BUNEMAN

   I have a 2-d FFT on the MFE CRAY system which is ideal for your
purpose. If K is the total number of words to be transformed (that would
be 2NM for an N by M complex transform) then the number of cycles is
2K(logN + logM) where logs are to base 2. This is because there are
logN levels in one dimension and logM levels in the other. In each level
each word has to be fetched and stored. All the arithmetic is overlapped
with these fetches and stores. There is a ~15% overhead from 'chaining
delays'. A cycle takes 12 1/2 nanoseconds on the CRAY 1 and less on the
more recent CRAYs, such as XMP's. You can get details of this FFT from
the LIBRIS utility at MFE, Livermore. 
    Since your project involves 4 million word storage, you must be thinking 
of rather an advanced type of CRAY, unless you are considering buffering
data in and out of core (I do that in my 3-D plasma particle code). If you
do want to buffer, you may be content with packed data precision. I use
a 2:1 packer (also available through LIBRIS) to maintain typical IBM
single precision. Incidentally, for maximum speed one of the dimension
ought not to be divisible by 8. I would embed the 4096*512 complex array
in a (512,4097) complex array.

   The person looking after LIBRIS at MFE is Carol Tull. Her telephone
number is (415) 422-1556. She may know of a good way to ship LIBRIS
codes out of MFE. The LIBRIS identifier is MATHS:D:D6.
   By all means, let others know what I told you about my FFT. Can
I also sell you a purely real FAST "HARTLEY" TRANSFORM ?
	(Oscar later broadcast a description of this code.)

...........................................

Don Heller
Physics and Computer Science Dept.
Shell Development Company
P.O. Box 481
Houston, Texas 77001

713-663-2341
{ihnp4, pur-ee, ut-sally}!shell!heller



.