Title of Invention	AN APPARATUS FOR PROVIDING PARALLEL AXIS TO DATA ELEMENTS OF A DATA VECTOR OF AN N-DIMENSIONAL ARRAY
Abstract	A memory architecture is provided to enable parallel access along any dimension of an n-dimensional data array. To enable parallel access of consecutive s data elements along any dimensio~ a mapping is defined to store any consecutive s data elements of n- dimensional data array in s parallel memory banks. This mapping is realized by Permuter (32) and Address Generation Logic (31). The necessary and sufficient conditions for parallel data access by mapping functions are described for all combinations of (n, s). Two instances of the mapping, namely circular rotation along Oth dimension and dyadic permutation along 0’th dimension, have been discussed in detail. For these mappings, the' basic architecture as well as its extensions are discussed, where the basic architecture refers to n-dimensional data array having s data elements along each dimension, and extension refers to data array with m data elements with m as an integer multiple of s.

Title of Invention

AN APPARATUS FOR PROVIDING PARALLEL AXIS TO DATA ELEMENTS OF A DATA VECTOR OF AN N-DIMENSIONAL ARRAY

Abstract

A memory architecture is provided to enable parallel access along any dimension of an n-dimensional data array. To enable parallel access of consecutive s data elements along any dimensio~ a mapping is defined to store any consecutive s data elements of n- dimensional data array in s parallel memory banks. This mapping is realized by Permuter (32) and Address Generation Logic (31). The necessary and sufficient conditions for parallel data access by mapping functions are described for all combinations of (n, s). Two instances of the mapping, namely circular rotation along Oth dimension and dyadic permutation along 0’th dimension, have been discussed in detail. For these mappings, the' basic architecture as well as its extensions are discussed, where the basic architecture refers to n-dimensional data array having s data elements along each dimension, and extension refers to data array with m data elements with m as an integer multiple of s.

Full Text	1. Field of Invention The present invention is related to the field of memory architecture, more specifically, to n-dimensional hyper-matrix (rectangular data array) with s-data elements along each dimension. 2. Background of the Invention The design of memory architecture for n-dimensional rectangular data array is a well-known problem and its scope stretches to myriad of applications. The particular cases of parallel data access in 2- and 3-dimensional rectangular data array is of importance for signal processing applications. Specifically, the memory architecture for 2-dimensional data access is attractive for video, image, and graphics processing whereas data access to 3-dimensional space is attractive for 3-dimensional graphics and video signal processing. Many image and video processing algorithms require either row-wise or column wise access to data in a 2-dimensional data array (an image or a frame of a video sequence). The most relevant applications are lossy compression algorithms for images and video, which use 2-dimensional separable transforms such as Discrete Cosine Transform (DCT) and Inverse Discrete Cosine Transform (IDCT). These transforms are integral part of compression techniques utilized in widely accepted video and image compression standards such as MPEG (Moving Picture Expert Group), H.261, H.263, JPEG (Joint Photographic Expert Group), etc. In accordance with the recommendations made in these standards, each image or a frame in video sequence is divided into macroblocks, which is further divided into blocks of (8 x 8) data array. In the encoder, the 2D-DCT operation is applied over this block of (8 x 8) data array followed by quantization and entropy coding to achieve compression. In the decoder, 2D-IDCT operation is performed after variable length decoding and de-quantization operations. The 2D-DCT (or 2D-IDCT) is a separable transform and can be computed by performing ID-DCT (or ID-IDCT) operation over all the rows of a block followed by ID-DCT (or ID-IDCT) operation over all columns, or vice-versa. As shown in Figure 1, after the first 1D-(I)DCT operation 12 over all the rows (or columns) of 8 x 8 block U, the data is to be fed to second 1D-(I)DCT block 14 in column-(or row-)wise fashion. This requires a memory 13 which allows both row-wise as well as column-wise access because after the first 1D-(I)DCT operation 12 the data is written into memory 13 in row(column)-wise fashion, whereas for second 1D-(I)DCT operation 14 data is read from the memory 13 in column-(row-)wise fashion. For a DSP processor with SIMD architecture having 4 data-elements vector as operands, each (8 x 8) block can be divided into four data arrays of size (4 x 4). For each row (or column) of this (8 X 8) block, two row-wise (or column-wise) accesses are required to be made, each access fetching four consecutive elements. The present invention provides a scheme that meets this requirement. Similarly, the 3D-(I)DCT can also be achieved using 1D-(I)DCT but in this case transpose memory should be such that it allows the parallel access to data along all three dimensions. The present invention describes a memory architecture for n-dimensional data array allowing parallel access to data along any of the n dimensions. The problem of 2-dimensional memory architecture allowing row-wise as well as column-wise access is not new, but there is no record of extension of the same concept to higher dimensions to the best of the authors knowledge. As a solution to carrying out 2-dimensional matrix transpose operation, several conventional transpose memories have been proposed. In U.S. Patent 5,740,340, as is understood, the memory cells are organized as an {s x s) data array. The s rows and s columns are addressed by 2s addresses, and there is a decoder that decodes any one of the said 2s addresses and enables access to said s rows of data and said s columns of data. This solution appears quite restrictive, since it needs a Special kind of 2-D memory in which any row or any column can be enabled at a time for accessing. In addition, all enabled locations are accessed at a time. So the extensional of this architecture will be very complex for large data arrays which is segmented into smaller {s x s) data sub-arrays as only complete row (or column) of data array is enabled, not part of it. The mentioned complexity is not addressed in the disclosed document of the patent. Further, the complexity of this scheme is higher as it involves s2 banks as compared to s banks in the present invention. Moreover, this scheme can not be generalized to n-dimensional data arrays. The U.S. Patent 5,481,487 appears to suggest a different memory architecture, which requires 4 parallel banks to store one (8 x 8) data array. Each bank stores one of the four quadrants of the data array, each quadrant being a (4 x 4) data array. This scheme appears to have the following restrictions: 1. Though address and data buses are provided for all the four banks, not all are accessed in parallel. 2. This memory architecture is restrictive in the sense that it implements only transpose function. If data is written in row (column) order, it can be read only in column (row) order. 3. This scheme is restricted to only one (8 x 8) block, and cannot be generalized to store larger 2-dimensional data arrays. 4. This architecture can store consecutive (8 x 8) blocks (in the same memory locations) but with the following restriction. If first (8 x 8) block is written in row-wise (column-wise) order then second block must be written in column-wise (row-wise) order. 5. This scheme may not be generalized for storing n-dimensional data arrays. In U.S. Patent 4,603,348, a memory architecture has been described for storing multi-dimensional array. According to this scheme, the n-dimensional array is divided into number of divisions, which do not overlap. Each such division is defined as an n-dimensional array with 2 elements in each dimension. The number of banks in the proposed architecture is equal to the number of elements in each of these divisions. Each bank has one data element from a given division, hence enabling the parallel access to all elements of a division. This scheme appears to provide access only to a division of an n-dimensional array. In contrast, the scheme disclosed in the present invention provides access to data along any given dimension. In U.S. Patent 4,740,927, a bit addressable memory has been proposed in which 2-dimensional array of bits is divided into partition sectors equal to the number of parallel memory modules (banks) provided. Each memory module has addresses equal to number of bits in each partition sector. Each partition is divided into several s x s matrices. Where s is the number of parallel banks. The logical placement of the bits of these matrices is such that bits of any row or column lie in different memory modules, providing parallel access along row and column. However the present invention proposes an architecture with less complex address generation logic. A particular case proposed architecture, referred to as memory architecture with dyadic permutation, provides an address generation logic in which main operation is logical EXORing operation as against the addition operation in address generation logic proposed in prior art. Moreover, unlike this scheme, the invention disclosed in this document is much more generic and holds good for dimension greater than 2 as well The object of the present invention is to provide a novel solution to overcome the disadvantages of the prior arts. SUMMARY OF INVENTION Accordingly, the present invention provides an apparatus for providing parallel access to data elements of a data vector of an n-dimensional rectangular array, comprising: an address generation logic module (31), for receiving an index of a first data element of a data vector to be stored into memory banks (34) and for receiving a dimension of access, generating addresses to which to store the data elements of the data vector, the address generation logic module (31) requiring that each data element of a data vector is stored in a different memory bank to allow parallel access along any dimension of an n-dimensional rectangular array; a permuter (32), coupled to the address generation logic module (31), for receiving the address information, and receiving the index information of first data element of the data vector, for carrying out permutation operation on the addresses by generating a bank numbers of a memory bank to which to issue the addresses, the permuter (32) requiring that each data element of a data vector is stored in a different memory bank to allow The memory bank number for each data elements of the data vector is generated by inverse permuter to achieve shuffling of the data elements after retrieving the data from the memory banks. The memory bank number for each data elements of the data vector is generated by inverse permuter to achieve shuffling of the data elements after retrieving the data from the memory banks. The circular rotation is performed by the permuter and inverse permuter. The permuter and inverse permuter performs the circular rotation of the data vector by value equal to. remainder obtained by dividing the summation of all 'n' index coordinates of first data element in the data vector by number of data elements in the data vector, where 'n' is dimension of (s,n)-hyper matrix and s is the number of the data element in the data vector. The circular rotation performed by the permuter and inverse permuter permits parallel access to all s data elements of a data vector along the dimensiony which may span over two adjacent (s,n)-hyper-matrices, and the rotation parameter for permuter is generated by value equal to remainder obtained by dividing the summation of all index coordinates except jth index coordinate aj of first data element in the data vector by number of data elements in the data vector, and the rotation parameter for inverse permuter is given by generated by value equal to remainder obtained by dividing the summation of all 'n' index coordinates of first data element in the data vector by number of data elements in the data vector, where m-st and t is an integer. parallel access along any dimension of an n-dimensional rectangular array, and for storing data elements at the generated address information and bank number corresponding to the each data element of the data vector; and s memory banks (34), coupled to the permuter (32) to allow parallel access by the permuter to the s memory banks, to receive address to store each data element of the data vector, where s is equal to the number of data elements in a data vector. Further the apparatus comprises an inverse permuter (33), for receiving the index information of the first data element of a data vector and for receiving the information of read operation, and coupled to the s memory banks (34), for performing the permutation operation on data elements, which is inverse of the permutation operation performed by the permuter (32) on the addresses, to generate the bank numbers from which to retrieve the data elements of the data vector. The inverse permuter (33) receives information about write operation, and is coupled to store data to the s-memory banks (34) responsive to performing the same permutation operation as the permutation operation performed by the permuter on the addresses, to generate the bank numbers to which to store the data elements of the data vector. The address generation logic module (31) and the permuter (32) are integrated into a single module. The address for storing each data element of a data vector to memory is generated by address generation logic and a memory bank number for the generated address for storing each data element of a data vector to the memory bank generated by permuter (32). The address for retrieving each data element of a data vector from memory banks is generated by address generation logic module (31) and a memory bank number for the generated address for retrieving each data element of a data vector from the memory bank is generated by the permuter (32). The address logic generator generates an address as herein described. The permuter and inverse permuter perform dyadic permutation. The permuter and inverse permuter performs the dyadic permutation of the data vector by a parameter obtained by bit-wise logic EXOR operation of all 'n' index coordinates of first data element in the data vector, where 'n' is dimension of (s,n)-hyper matrix and s is the number of the data element in the data vector. The permutation performed by the permuter is dyadic permutation and that perfonned by the inverse permuter is a combination of dyadic and circular permutation, which permits parallel access to all s data elements of a data vector along the dimension 7 which may span over two adjacent (s,n)-hyper-matrices, and the parameter for dyadic permutation of permuter and inverse is obtained by bit-wise logic EXOR operation of all index coordinates except jth index coordinate aj of first data element in the data vector, and the circular rotation parameter for inverse permuter is given by generated by value equal to remainder obtained by dividing the jth index coordinate aj of first data element in the data vector by number of data elements in the data vector, where m=st and i is an integer. The address logic generator generates an address as herein described. Further present invention provides a method for providing parallel access to data elements of a data vector of a 2-dimensional rectangular array, comprising: - receiving an index of a first data element of a data vector to be stored into a memory bank, a direction of access, and the data vector to be stored, - generating an address to which to store the data element, wherein each data element of a data vector is stored at a different address to allow parallel access; - generating a bank number of a memory bank to which to store the data element, wherein each data element of a data vector is stored in a different memory bank to allow parallel access; and - storing all data elements at the generated address information and bank number corresponding to the data element. The objective of the present invention is to provide a generalized framework of memory architecture for n-dimensional rectangular data arrays such that parallel access to data along any of the n-dimensions is possible. It is claimed that the memory architecture of the present invention is generic and less complex as compared to architectures discussed in prior arts. It also overcomes the disadvantages of the prior arts for 2-dimensionaI transpose memories. The objective of this invention is achieved by applying a simple, yet effective, method for rearranging (permuting) the elements of the data array while reading / writing data from/to the memory. This rearrangement is the distinguishing feature of this invention. The brief description of the invention is as follows. BRIEF DESCRIPTION OF THE DRAWINGS-Figure 1 shows the implementation of 8x8 2D-(I)DCT using 8-point 1D-(I)DCT, Figure 2 shows the possible accesses in the proposed memory architectures which allows parallel access to 3 data elements (i.e., 5=3) in 3-dimensional data array. Figure 3 shows the basic memory architecture for n-dimensional rectangular data array with m data elements along each dimension. Figure 4 shows memory architecture for Case A under Circular Rotation Permutation. Figure 5 shows memory architecture for Case C under Circular Rotation Permutation. Figure 6 shows memory architecture for Case A under Dyadic Permutation. Figure 7 shows memory architecture for Case B under Dyadic Permutation. Figure 8 shows memory architecture for Case C under Dyadic Permutation. Cases A, B and C mentioned here are defined in section on "Detailed Description of the Preferred Embodiments". Figure 9 shows the correspondence between an index in 4x4 matrix and location in memory bank. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention provides a generalized framework for memory architecture for n-dimensional rectangular data array, which enables parallel access to data along any of the n dimensions. The underlying concept of this invention is first described in a generic sense with an aim to define a class of architectures, which include all possible variants of this scheme. Subsequently, a scheme is described with examples for a particular case, which, apparently, has the least complexity in the class of variants of that case. Before proceeding further, the following three definitions have been provided to enhance the readability of the document. 1.1 Definitions 1. n-dimensional hyper-matrix: It is an n-dimensional rectangular data array. Any location of a data element in an n-dimensional hyper-matrix is given by index is the index for dimension J. The data array has a single data value stored at each of its locations, c 2. p-bounded n-dimensional hvper-matrix: It is an n-dimensional hyper-matrix with p index of each (s,n)-hyper-matrix in (m,n)-hyper-matrix is an integer multiple of s along each of the n-dimensions. 3. Data Vector: A data vector in a (m,n)-hyper-matrix is a sequence of s data in the hyper-matrix along any given dimension. It is defined by the index of the starting location in the hyper-matrix and the dimension along which its elements lie. A data 1.2 Theory of Underlying Concept The present invention describes memory architecture with s parallel banks for storing all data elements of a (s',n)-hyper-matrix, so as to enable parallel access to all the s data elements of a data vector. Since s data elements are accessed in parallel, the total number of banks used in this invention are indeed minimum. The data of (s,n)-hyper-matrix is re-arranged before it is stored in the s-parallel banks of memory. The rearrangement of data is such that it ensures that all s elements of any data vector in this (s,n)-hyper-matrix are stored in different banks. Two functions, g and/define the rearrangement described above. These functions take the index of a data element as its argument. The data element at index Functions, / and g must satisfy the following conditions to ensure the parallel access to all elements of any data vector. such rearrangements are possible. In other words, there exists more than one distinct pair of functions (g,J) for given values of n and s. The present invention scheme can easily be generalized for a (m,n)-hyper-matrix, which is assumed to be divided into (s,n)-hyper-matrices (Please refer to Figure 2 for illustration). The different (s,n)-hyper-matrices in the (m,n)-hyper-matrix are linearly mapped in s-parallel banks. It should be noted that for each (s,n)-hyper-matrices, the data is rearranged before it is stored in the memory. In this context, at least two variations are possible. 1. All s data elements of the accessed data vector, lie within a given (s,n)-hyper-matrix, 2. All s data elements of the data vector to be accessed, may span over two adjacent (s,n)-hyper-matrix. 1.3 Description of Basic Memory Architecture The basic architecture for proposed memory for (m,n)-hyper"matrix is shown in Figure 3. Here, m=st and / is an integer. For storing the 5-data elements of any data vector into the memory banks, n-dimensional starting index, the dimension along which the data is to be stored, and the s data elements are provided to this architecture. For reading s data elements in parallel, the n-dimensional starting index and the dimension of access are provided to the memory architecture. Based on these, the addresses for all of s banks 34 are computed by Address Generation Logic 31 and issued to the banks 34 after carrying out a permutation (rearrangement) by Permutation Logic 32, also referred as Permuter, which ensures that only the required locations are accessed in each bank. Address Generation Logic is a hardware module comprising of registers and arithmetic circuit and, Permuter comprises of input and output port registers and an interconnection network circuit to route data in input port registers to output port registers. The Inverse Permutation Logic 33, also referred as Inverse Permuter, is inverse of Permutation Logic 32 for addresses for the data read from the memory and inverse of Permutation Logic 32 for data written into memory. Inverse Permuter comprises of input and output port registers and an intercormection switch circuit to route data in input port registers to output port registers. Although, the Address Generation Logic 31 and Permutation Logic 32 are shown as separate blocks, they can be merged into one block. As discussed earlier, for a given value of n and s (in an (s,n)-hyper-matrix), many such rearrangements are possible. The Address Generation Logic 31 defined by the functions g and/are different for each rearrangement. Therefore, the complexity of the Address Generation 31 and Permutation Logic 32 in the architecture described in Figure 3 will also be different for different rearrangements. Among all possible rearrangements, the preferable ones are those which lead to simpler and regular hardware. All other possible rearrangements for a given value of n and s are claimed to be different architectural realizations though conceptually same as discussed in the present invention. One class of such rearrangements is permutation along any one dimension. In rest of the document, the permutation along 0th dimension is taken up without loss of generality. For this case, the permutation and address generation logic is also provided. 1.1 Permutation Along 0^ Dimension In an (s,n)-hyper-matrix, there are sn-1 data vectors which are along the 0th dimension. The starting index for these data vectors is given by For this case, s-elements of any such data vector are stored at the same memory address but in different banks. The bank number for elements of any such given data vector is obtained by using a function f, which satisfies the conditions mentioned earlier. The advantage of this rearrangement is that the mapping (functiong) of the elements of data vectors to address in the banks becomes independent of the function f The address Condition 1 for function g. It should be noted that α0 is not involved in computation of address for the banks 34, as the address is same for all elements of these date vectors. Since, α0 will be different for all elements of any such data vector, the function f will ensure that these elements are indeed stored in different banks. Hence, Condition 3 is also satisfied. memory architecture. Among all rearrangements corresponding to the permutation along any given dimension, the one which is obtained by dyadic permutation appear to be the least complex for values of s which is an integer power of 2. For any general value of s, a circular permutation can be used. These two permutations are taken up to explain the concept for following three scenarios. Case A This memory architecture is for (s,w)-hyper-matrix which allows parallel access to all elements of any data vector, Case B This memory architecture is for (m,n)-hyper-matrix which is divided into (s,n-hyper-matrix. This memory architecture allows parallel access to all elements of any data vector which lies within a single (s,w)-hyper-matrix, Case C This memory architecture is for (m,n)-hyper"matrix which is divided into (5,n)-hyper-matrix. It allows parallel access to all s data elements of any data vector which may span over two adjacent (s,n)-hyper-matrix. It should be noted that Case B is the special case of Case C. 1.4.1 Memory Architecture for Circular Rotational Permutation The permutation function corresponding to circular rotation is given below. It is obvious from the properties of mod s addition that this function satisfies Conditions 1 and 3. circular rotation in Inverse Permutation Logic 45 is anti-clockwise for storing data into the memory, whereas it is reverse (clockwise) for reading data from memory. The amount of rotational shift in Inverse Permutation Logic 45 for the data read/write operation remains the same as for the addresses. Case C: The basic architecture (shown in Figure 5) for this generalization (an being the indices along the dimension i) remains the same as in Case A, only the address generation logic and width of the address bus changes. For a given value of index and a/=(a.-a/'). For the s consecutive data elements to be accessed along the jth Case B: This is a specific instance of Case C. For this case, the index αj in the n-dimensional index of the first data in the required data vector will be such that αj= 0. The αk J will be equal to (ΑJ + A:) for the s successive data. Hence, the term for address architectures with s as an integer power of 2 are of much importance. The complexity of the hardware to realize all of the architectures described in Subsection 1.4.1 gets reduced significantly if s is a power of 2. The advantage is obvious from the fact that the and the (mod s) operation is equivalent to logical AND operation with (s-I). Further, the dyadic permutation function gives much simpler permutation logic. Another advantage of this permutation function is that the permutation logic turns out to be same as inverse permutation logic. It is easy to see that this permutation function satisfies the Conditions 1 and 3. -, ,, —~. •■,'■ J be treated as 0. The proposed memory architecture is given in Figure 6. Address Generation Logic and the Permutation Logic modules can be combined into single module 63. Case B: Let (m,n)-hyper-matrix with m=2x+y be divided into (s,n)-hyper-matrices. clockwise direction for data reading operation (refer to Figure 8). 1.5 An Example of a Single 2-Dimensional Data Array with s=4 This example illustrates the above-described scheme for a 2-dimensional (4x4) matrix 91. The correspondence between location in matrix 91 and memory banks 92 for data elements is shown in Figure 9 for dyadic permutation case. It is apparent from the shown rearrangement that elements of the matrix 91, which lie in the same row (or column) are stored in different banks 92. Accessing this memory bank 92 involves two steps: 1. Computing the address for each bank; 2. Reordering of the 4-element after (before) reading (writing) from the 4 banks. The following table indicates the bank number in which the ith data element in the a row (or column) lies. This can be used to reorder the data elements for reading from or writing into the memory banks. It is apparent that for column-wise access, the 2-bit address for all data elements in the column to be read (or written into) from the memory is same as index i. On the other hand, row-wise access the address is same as row number a of the data element. 2 Alternative Embodiments of the Invention The following are alternate embodiments of the present invention. 1. Though the scheme has been described for only two rearrangements, all other rearrangements under the scope of discussion in previous section will lead to other alternative implementations of the same scheme. 2. If any application demands access along only some particular dimensions, then minor variation of the proposed scheme will lead to significant reduction in hardware complexity. 3. For the Cases B and C discussed in Section 0, the number of elements along each dimension need not be same for larger hyper-matrix. That is, the value of m may be different for each dimension. For this alternative implementation, only the address computation for accessing any (s,n)-hyper-matrix will change, the logic to compute addresses for elements data vector within (s,n)-hyper-matrix will remain same. 4. Though the scheme has been described for parallel access, it can be also used for accessing the data in serial. It can be accomplished by issuing all the addresses corresponding to 5-data elements sequentially. Given the constraint of sequential access, some minor changes in the architecture will lead to reduction in hardware complexity. We claim: 1. An apparatus for providing parallel access to data elements of a data vector of an n-dimensional rectangular array, comprising: an address generation logic module (31), for receiving an index of a first data element of a data vector to be stored into memory banks (34) and for receiving a dimension of access, generating addresses to which to store the data elements of the data vector, the address generation logic module (31) requiring that each data element of a data vector is stored in a different memory bank to allow parallel access along any dimension of an n-dimensional rectangular array; a permuter (32), coupled to the address generation logic module (31), for receiving the address information, and receiving the index information of first data element of the data vector, for carrying out permutation operation on the addresses by generating a bank numbers of a memory bank to which to issue the addresses, the permuter (32) requiring that each data element of a data vector is stored in a different memory bank to allow parallel access along any dimension of an n-dimensional rectangular array, and for storing data elements at the generated address information and bank number corresponding to the each data element of the data vector; and s memory banks (34), coupled to the permuter (32) to allow parallel access by the permuter to the s memory banks, to receive address to store each data element of the data vector, where s is equal to the number of data elements in a data vector. 2. The apparatus as claimed in claim 1 comprising: an inverse permuter (33), for receiving the index information of the first data element of a data vector and for receiving the information of read operation, and coupled to the s memory banks (34), for performing the permutation operation on data elements, which is inverse of the permutation operation performed by the permuter (32) on the addresses, to generate the bank numbers from which to retrieve the data elements of the data vector. 3. The apparatus as claimed in claim 2 wherein the inverse permuter (33) receives information about write operation, and is coupled to store data to the s-memory banks (34) responsive to performing the same permutation operation as the permutation operation performed by the permuter on the addresses, to generate the bank numbers to which to store the data elements of the data vector. 4. The apparatus as claimed in claim 1 wherein the address generation logic module (31) and the permuter (32) are integrated into a single module. 5. The apparatus as claimed in claim 1 wherein the address for storing each data element of a data vector to memory is generated by address generation logic and a memory bank number for the generated address for storing each data element of a data vector to the memory bank generated by permuter (32). 6. The apparatus as claimed in claim 1 wherein the address for retrieving each data element of a data vector from memory banks is generated by address generation logic module (31) and a memory bank number for the generated address for retrieving each data element of a data vector from the memory bank is generated by the permuter (32). 7. The apparatus as claimed in claim 1 wherein the memory bank number for each data elements of the data vector is generated by inverse permuter to achieve shuffling of the data elements after retrieving the data from the memory banks. 8. The apparatus as claimed in claim 3 wherein the memory bank number for each data elements of the data vector is generated by inverse permuter to achieve shuffling of the data elements after retrieving the data from the memory banks. 9. The apparatus as claimed in claim 3 wherein circular rotation is performed by the permuter and inverse permuter. 10. The apparatus of claim 9 for a (s,n)-hyper matrix, wherein the permuter and inverse permuter performs the circular rotation of the data vector by value equal to remainder obtained by dividing the summation of all 'n' index coordinates of first data element in the data vector by number of data elements in the data vector, where 'n' is dimension of (s,n)-hyper matrix and s is the number of the data element in the data vector. 1. The apparatus as claimed in claim 10 wherein the address generator [2. The apparatus as claimed in claim 9 for a (m,n)-hyper-matrix, wherein the circular rotation performed by the permuter and inverse permuter permits parallel access to all s data elements of a data vector along the dimension j which may span over two adjacent (s,n)-hyper-matrices, and the rotation parameter for permuter is generated by value equal to remainder obtained by dividing the summation of all index coordinates except jth index coordinate αj of first data element in the data vector by number of data elements in the data vector, and the rotation parameter for inverse permuter is given by generated by value equal to remainder obtained by dividing the summation of all 'n' index coordinates of first data element in the data vector by number of data elements in the data vector, where m=st and t is an integer. 13. The apparatus of the claim 12 wherein the address logic generator generates an address as herein described. 14. The apparatus as claimed in claim 3 wherein the permuter and inverse permuter perform dyadic permutation. 15. The apparatus of claim 14 for a (s,n)-hyper matrix, wherein the permuter and inverse permuter performs the dyadic permutation of the data vector by a parameter obtained by bit-wise logic EXOR operation of all 'n' index coordinates of first data element in the data vector, where 'n' is dimension of (s,n)-hyper matrix and s is the number of the data element in the data vector. 16. The apparatus of claim 15 wherein the address generator logic generates the address for data at index [α1„.,,αn-2 α0] of (s,n)-hyper 17. The apparatus of claim 14 for a (m,n)-hyper-matrix, wherein permutation performed by the permuter is dyadic permutation and that performed by the inverse permuter is a combination of dyadic and circular permutation, which permits parallel access to all s data elements of a data vector along the dimension/ which may span over two adjacent (s,n)-hyper-matrices, and the parameter for dyadic permutation of permuter and inverse is obtained by bit-wise logic EXOR operation of all index coordinates except jth index coordinate a J of first data element in the data vector, and the circular rotation parameter for inverse permuter is given by generated by value equal to remainder obtained by dividing the jth index coordinate αj of first data element in the data vector by number of data elements in the data vector, where m=st and t is an integer. 18. The apparatus of the claim 17 wherein the address logic generator generates an address as herein described. 19. A method for providing parallel access to data elements of a data vector of a 2-dimensional rectangular array, comprising: receiving an index of a first data element of a data vector to be stored into a memory bank, a direction of access, and the data vector to be stored, generating an address to which to store the data element, wherein each data element of a data vector is stored at a different address to allow parallel access; generating a bank number of a memory bank to which to store the data element, wherein each data element of a data vector is stored in a different memory bank to allow parallel access; and storing all data elements at the generated address information and bank number corresponding to the data element. 20. A method for providing parallel access to data elements of a data vector of a 2-dimensional rectangular array, comprising: receiving an index of a first data element of a data vector to be retrieved from a memory bank, and the direction of access, generating an address form which to retrieve the data element, wherein each data element of a data vector is stored at a different address to allow parallel access; . generating a bank number of a memory bank from which to retrieve - the data element, wherein each data element of a data vector is stored in a different memory bank to allow parallel access; and retrieving all data elements at the generated address information and bank number to obtain the desired data vector. 21. An apparatus for providing parallel access to data elements of a data vector of an n-dimensional rectangular array substantially as herein described with reference to an as illustrated by the accompanying drawings. 22, A method for providing parallel access to data elements of a data vector of a 2-dimensional rectangular array substantially as herein described with reference to an as illustrated by the accompanying drawings.

Full Text

1. Field of Invention
The present invention is related to the field of memory architecture, more specifically, to n-dimensional hyper-matrix (rectangular data array) with s-data elements along each dimension.
2. Background of the Invention
The design of memory architecture for n-dimensional rectangular data array is a well-known problem and its scope stretches to myriad of applications. The particular cases of parallel data access in 2- and 3-dimensional rectangular data array is of importance for signal processing applications. Specifically, the memory architecture for 2-dimensional data access is attractive for video, image, and graphics processing whereas data access to 3-dimensional space is attractive for 3-dimensional graphics and video signal processing.
Many image and video processing algorithms require either row-wise or column wise access to data in a 2-dimensional data array (an image or a frame of a video sequence). The most relevant applications are lossy compression algorithms for images and video, which use 2-dimensional separable transforms such as Discrete Cosine Transform (DCT) and Inverse Discrete Cosine Transform (IDCT). These transforms are integral part of compression techniques utilized in widely accepted video and image compression standards such as MPEG (Moving Picture Expert Group), H.261, H.263, JPEG (Joint Photographic Expert Group), etc. In accordance with the recommendations

made in these standards, each image or a frame in video sequence is divided into macroblocks, which is further divided into blocks of (8 x 8) data array. In the encoder, the 2D-DCT operation is applied over this block of (8 x 8) data array followed by quantization and entropy coding to achieve compression. In the decoder, 2D-IDCT operation is performed after variable length decoding and de-quantization operations. The 2D-DCT (or 2D-IDCT) is a separable transform and can be computed by performing ID-DCT (or ID-IDCT) operation over all the rows of a block followed by ID-DCT (or ID-IDCT) operation over all columns, or vice-versa.
As shown in Figure 1, after the first 1D-(I)DCT operation 12 over all the rows (or columns) of 8 x 8 block U, the data is to be fed to second 1D-(I)DCT block 14 in column-(or row-)wise fashion. This requires a memory 13 which allows both row-wise as well as column-wise access because after the first 1D-(I)DCT operation 12 the data is written into memory 13 in row(column)-wise fashion, whereas for second 1D-(I)DCT operation 14 data is read from the memory 13 in column-(row-)wise fashion. For a DSP processor with SIMD architecture having 4 data-elements vector as operands, each (8 x 8) block can be divided into four data arrays of size (4 x 4). For each row (or column) of this (8 X 8) block, two row-wise (or column-wise) accesses are required to be made, each access fetching four consecutive elements. The present invention provides a scheme that meets this requirement.
Similarly, the 3D-(I)DCT can also be achieved using 1D-(I)DCT but in this case transpose memory should be such that it allows the parallel access to data along all three dimensions. The present invention describes a memory architecture for n-dimensional data array allowing parallel access to data along any of the n dimensions.

The problem of 2-dimensional memory architecture allowing row-wise as well as column-wise access is not new, but there is no record of extension of the same concept to higher dimensions to the best of the authors knowledge. As a solution to carrying out 2-dimensional matrix transpose operation, several conventional transpose memories have been proposed.
In U.S. Patent 5,740,340, as is understood, the memory cells are organized as an {s x s) data array. The s rows and s columns are addressed by 2s addresses, and there is a decoder that decodes any one of the said 2s addresses and enables access to said s rows of data and said s columns of data. This solution appears quite restrictive, since it needs a Special kind of 2-D memory in which any row or any column can be enabled at a time for accessing. In addition, all enabled locations are accessed at a time. So the extensional of this architecture will be very complex for large data arrays which is segmented into smaller {s x s) data sub-arrays as only complete row (or column) of data array is enabled, not part of it. The mentioned complexity is not addressed in the disclosed document of the patent. Further, the complexity of this scheme is higher as it involves s2 banks as compared to s banks in the present invention. Moreover, this scheme can not be generalized to n-dimensional data arrays.
The U.S. Patent 5,481,487 appears to suggest a different memory architecture, which requires 4 parallel banks to store one (8 x 8) data array. Each bank stores one of the four quadrants of the data array, each quadrant being a (4 x 4) data array. This scheme appears to have the following restrictions:
1. Though address and data buses are provided for all the four banks, not all are accessed in parallel.

2. This memory architecture is restrictive in the sense that it implements only transpose function. If data is written in row (column) order, it can be read only in column (row) order.
3. This scheme is restricted to only one (8 x 8) block, and cannot be generalized to store larger 2-dimensional data arrays.
4. This architecture can store consecutive (8 x 8) blocks (in the same memory locations) but with the following restriction. If first (8 x 8) block is written in row-wise (column-wise) order then second block must be written in column-wise (row-wise) order.
5. This scheme may not be generalized for storing n-dimensional data arrays.
In U.S. Patent 4,603,348, a memory architecture has been described for storing multi-dimensional array. According to this scheme, the n-dimensional array is divided into number of divisions, which do not overlap. Each such division is defined as an n-dimensional array with 2 elements in each dimension. The number of banks in the proposed architecture is equal to the number of elements in each of these divisions. Each bank has one data element from a given division, hence enabling the parallel access to all elements of a division. This scheme appears to provide access only to a division of an n-dimensional array. In contrast, the scheme disclosed in the present invention provides access to data along any given dimension.
In U.S. Patent 4,740,927, a bit addressable memory has been proposed in which 2-dimensional array of bits is divided into partition sectors equal to the number of parallel memory modules (banks) provided. Each memory module has addresses equal to

number of bits in each partition sector. Each partition is divided into several s x s matrices. Where s is the number of parallel banks. The logical placement of the bits of these matrices is such that bits of any row or column lie in different memory modules, providing parallel access along row and column. However the present invention proposes an architecture with less complex address generation logic. A particular case proposed architecture, referred to as memory architecture with dyadic permutation, provides an address generation logic in which main operation is logical EXORing operation as against the addition operation in address generation logic proposed in prior art. Moreover, unlike this scheme, the invention disclosed in this document is much more generic and holds good for dimension greater than 2 as well
The object of the present invention is to provide a novel solution to overcome the disadvantages of the prior arts.
SUMMARY OF INVENTION
Accordingly, the present invention provides an apparatus for providing parallel access to data elements of a data vector of an n-dimensional rectangular array, comprising:
an address generation logic module (31), for receiving an index of a first data element of a data vector to be stored into memory banks (34) and for receiving a dimension of access, generating addresses to which to store the data elements of the data vector, the address generation logic module (31) requiring that each data element of a data vector is stored in a different memory bank to allow parallel access along any dimension of an n-dimensional rectangular array;
a permuter (32), coupled to the address generation logic module (31), for receiving the address information, and receiving the index information of first data element of the data vector, for carrying out permutation operation on the addresses by generating a bank numbers of a memory bank to which to issue the addresses, the permuter (32) requiring that each data element of a data vector is stored in a different memory bank to allow

The memory bank number for each data elements of the data vector is generated by inverse permuter to achieve shuffling of the data elements after retrieving the data from the memory banks.
The memory bank number for each data elements of the data vector is generated by inverse permuter to achieve shuffling of the data elements after retrieving the data from the memory banks.
The circular rotation is performed by the permuter and inverse permuter.
The permuter and inverse permuter performs the circular rotation of the data vector by value equal to. remainder obtained by dividing the summation of all 'n' index coordinates of first data element in the data vector by number of data elements in the data vector, where 'n' is dimension of (s,n)-hyper matrix and s is the number of the data element in the data vector.

The circular rotation performed by the permuter and inverse permuter permits parallel access to all s data elements of a data vector along the dimensiony which may span over two adjacent (s,n)-hyper-matrices, and the rotation parameter for permuter is generated by value equal to remainder obtained by dividing the summation of all index coordinates except jth index coordinate aj of first data element in the data vector by number of data
elements in the data vector, and the rotation parameter for inverse permuter is given by generated by value equal to remainder obtained by dividing the summation of all 'n' index coordinates of first data element in the data vector by number of data elements in the data vector, where m-st and t is an integer.

parallel access along any dimension of an n-dimensional rectangular array, and for storing data elements at the generated address information and bank number corresponding to the each data element of the data vector; and
s memory banks (34), coupled to the permuter (32) to allow parallel access by the permuter to the s memory banks, to receive address to store each data element of the data vector, where s is equal to the number of data elements in a data vector.
Further the apparatus comprises an inverse permuter (33), for receiving the index information of the first data element of a data vector and for receiving the information of read operation, and coupled to the s memory banks (34), for performing the permutation operation on data elements, which is inverse of the permutation operation performed by the permuter (32) on the addresses, to generate the bank numbers from which to retrieve the data elements of the data vector.
The inverse permuter (33) receives information about write operation, and is coupled to store data to the s-memory banks (34) responsive to performing the same permutation operation as the permutation operation performed by the permuter on the addresses, to generate the bank numbers to which to store the data elements of the data vector.
The address generation logic module (31) and the permuter (32) are integrated into a single module.
The address for storing each data element of a data vector to memory is generated by address generation logic and a memory bank number for the generated address for storing each data element of a data vector to the memory bank generated by permuter (32).
The address for retrieving each data element of a data vector from memory banks is generated by address generation logic module (31) and a memory bank number for the generated address for retrieving each data element of a data vector from the memory bank is generated by the permuter (32).

The address logic generator generates an address as herein described.
The permuter and inverse permuter perform dyadic permutation.
The permuter and inverse permuter performs the dyadic permutation of the data vector by a parameter obtained by bit-wise logic EXOR operation of all 'n' index coordinates of first data element in the data vector, where 'n' is dimension of (s,n)-hyper matrix and s is the number of the data element in the data vector.

The permutation performed by the permuter is dyadic permutation and that perfonned by the inverse permuter is a combination of dyadic and circular permutation, which permits parallel access to all s data elements of a data vector along the dimension 7 which may span over two adjacent (s,n)-hyper-matrices, and the parameter for dyadic permutation of permuter and inverse is obtained by bit-wise logic EXOR operation of all index
coordinates except jth index coordinate aj of first data element in the data vector, and the
circular rotation parameter for inverse permuter is given by generated by value equal to
remainder obtained by dividing the jth index coordinate aj of first data element in the
data vector by number of data elements in the data vector, where m=st and i is an integer.
The address logic generator generates an address as herein described.
Further present invention provides a method for providing parallel access to data elements of a data vector of a 2-dimensional rectangular array, comprising:
- receiving an index of a first data element of a data vector to be stored into a memory bank, a direction of access, and the data vector to be stored,

- generating an address to which to store the data element, wherein each data element of a data vector is stored at a different address to allow parallel access;
- generating a bank number of a memory bank to which to store the data element, wherein each data element of a data vector is stored in a different memory bank to allow parallel access; and
- storing all data elements at the generated address information and bank number corresponding to the data element.
The objective of the present invention is to provide a generalized framework of memory architecture for n-dimensional rectangular data arrays such that parallel access to data along any of the n-dimensions is possible. It is claimed that the memory architecture of the present invention is generic and less complex as compared to architectures discussed in prior arts. It also overcomes the disadvantages of the prior arts for 2-dimensionaI transpose memories. The objective of this invention is achieved by applying a simple, yet effective, method for rearranging (permuting) the elements of the data array while reading / writing data from/to the memory. This rearrangement is the distinguishing feature of this invention. The brief description of the invention is as follows.

BRIEF DESCRIPTION OF THE DRAWINGS-Figure 1 shows the implementation of 8x8 2D-(I)DCT using 8-point 1D-(I)DCT, Figure 2 shows the possible accesses in the proposed memory architectures which allows parallel access to 3 data elements (i.e., 5=3) in 3-dimensional data array. Figure 3 shows the basic memory architecture for n-dimensional rectangular data
array with m data elements along each dimension. Figure 4 shows memory architecture for Case A under Circular Rotation
Permutation. Figure 5 shows memory architecture for Case C under Circular Rotation
Permutation. Figure 6 shows memory architecture for Case A under Dyadic Permutation. Figure 7 shows memory architecture for Case B under Dyadic Permutation. Figure 8 shows memory architecture for Case C under Dyadic Permutation.
Cases A, B and C mentioned here are defined in section on "Detailed Description of the Preferred Embodiments".

Figure 9 shows the correspondence between an index in 4x4 matrix and location in memory bank. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The present invention provides a generalized framework for memory architecture for n-dimensional rectangular data array, which enables parallel access to data along any of the n dimensions. The underlying concept of this invention is first described in a generic sense with an aim to define a class of architectures, which include all possible variants of this scheme. Subsequently, a scheme is described with examples for a particular case, which, apparently, has the least complexity in the class of variants of that case. Before proceeding further, the following three definitions have been provided to enhance the readability of the document.
1.1 Definitions
1. n-dimensional hyper-matrix: It is an n-dimensional rectangular data array. Any
location of a data element in an n-dimensional hyper-matrix is given by index
is the index for dimension J. The data array has a
single data value stored at each of its locations,
c
2. p-bounded n-dimensional hvper-matrix: It is an n-dimensional hyper-matrix with p

index of each (s,n)-hyper-matrix in (m,n)-hyper-matrix is an integer multiple of s along each of the n-dimensions.
3. Data Vector: A data vector in a (m,n)-hyper-matrix is a sequence of s data in the hyper-matrix along any given dimension. It is defined by the index of the starting location in the hyper-matrix and the dimension along which its elements lie. A data

1.2 Theory of Underlying Concept
The present invention describes memory architecture with s parallel banks for storing all data elements of a (s',n)-hyper-matrix, so as to enable parallel access to all the s data elements of a data vector. Since s data elements are accessed in parallel, the total number of banks used in this invention are indeed minimum. The data of (s,n)-hyper-matrix is re-arranged before it is stored in the s-parallel banks of memory. The rearrangement of data is such that it ensures that all s elements of any data vector in this (s,n)-hyper-matrix are stored in different banks.
Two functions, g and/define the rearrangement described above. These functions take the index of a data element as its argument. The data element at index

Functions, / and g must satisfy the following conditions to ensure the parallel access to all elements of any data vector.

such rearrangements are possible. In other words, there exists more than one distinct pair of functions (g,J) for given values of n and s.
The present invention scheme can easily be generalized for a (m,n)-hyper-matrix, which is assumed to be divided into (s,n)-hyper-matrices (Please refer to Figure 2 for

illustration). The different (s,n)-hyper-matrices in the (m,n)-hyper-matrix are linearly mapped in s-parallel banks. It should be noted that for each (s,n)-hyper-matrices, the data is rearranged before it is stored in the memory. In this context, at least two variations are
possible.
1. All s data elements of the accessed data vector, lie within a given (s,n)-hyper-matrix,
2. All s data elements of the data vector to be accessed, may span over two adjacent
(s,n)-hyper-matrix.
1.3 Description of Basic Memory Architecture
The basic architecture for proposed memory for (m,n)-hyper"matrix is shown in Figure 3. Here, m=st and / is an integer. For storing the 5-data elements of any data vector into the memory banks, n-dimensional starting index, the dimension along which the data is to be stored, and the s data elements are provided to this architecture. For reading s data elements in parallel, the n-dimensional starting index and the dimension of access are provided to the memory architecture. Based on these, the addresses for all of s banks 34 are computed by Address Generation Logic 31 and issued to the banks 34 after carrying out a permutation (rearrangement) by Permutation Logic 32, also referred as Permuter, which ensures that only the required locations are accessed in each bank. Address Generation Logic is a hardware module comprising of registers and arithmetic circuit and, Permuter comprises of input and output port registers and an interconnection network circuit to route data in input port registers to output port registers. The Inverse Permutation Logic 33, also referred as Inverse Permuter, is inverse of Permutation Logic 32 for addresses for the data read from the memory and inverse of Permutation Logic 32 for data written into memory. Inverse Permuter comprises of input and output port registers and an intercormection switch circuit to route data in input port registers to output port registers. Although, the Address Generation Logic 31 and Permutation Logic 32 are shown as separate blocks, they can be merged into one block.

As discussed earlier, for a given value of n and s (in an (s,n)-hyper-matrix), many such rearrangements are possible. The Address Generation Logic 31 defined by the functions g and/are different for each rearrangement. Therefore, the complexity of the Address Generation 31 and Permutation Logic 32 in the architecture described in Figure 3 will also be different for different rearrangements. Among all possible rearrangements, the preferable ones are those which lead to simpler and regular hardware. All other possible rearrangements for a given value of n and s are claimed to be different architectural realizations though conceptually same as discussed in the present invention. One class of such rearrangements is permutation along any one dimension. In rest of the document, the permutation along 0th dimension is taken up without loss of generality. For this case, the permutation and address generation logic is also provided.
1.1 Permutation Along 0^ Dimension
In an (s,n)-hyper-matrix, there are sn-1 data vectors which are along the 0th
dimension. The starting index for these data vectors is given by For
this case, s-elements of any such data vector are stored at the same memory address but in different banks. The bank number for elements of any such given data vector is obtained by using a function f, which satisfies the conditions mentioned earlier. The
advantage of this rearrangement is that the mapping (functiong) of the elements of data vectors to address in the banks becomes independent of the function f The address

Condition 1 for function g. It should be noted that α0 is not involved in computation of address for the banks 34, as the address is same for all elements of these date vectors.
Since, α0 will be different for all elements of any such data vector, the function f will ensure that these elements are indeed stored in different banks. Hence, Condition 3 is also
satisfied.

memory architecture. Among all rearrangements corresponding to the permutation along any given dimension, the one which is obtained by dyadic permutation appear to be the least complex for values of s which is an integer power of 2. For any general value of s, a circular permutation can be used. These two permutations are taken up to explain the concept for following three scenarios.
Case A This memory architecture is for (s,w)-hyper-matrix which allows parallel access to all elements of any data vector,
Case B This memory architecture is for (m,n)-hyper-matrix which is divided into (s,n-hyper-matrix. This memory architecture allows parallel access to all elements of any data vector which lies within a single (s,w)-hyper-matrix,

Case C This memory architecture is for (m,n)-hyper"matrix which is divided into (5,n)-hyper-matrix. It allows parallel access to all s data elements of any data vector which may span over two adjacent (s,n)-hyper-matrix. It should be noted that Case B is the special case of Case C.
1.4.1 Memory Architecture for Circular Rotational Permutation
The permutation function corresponding to circular rotation is given below. It is obvious from the properties of mod s addition that this function satisfies Conditions 1 and 3.

circular rotation in Inverse Permutation Logic 45 is anti-clockwise for storing data into the memory, whereas it is reverse (clockwise) for reading data from memory. The amount of rotational shift in Inverse Permutation Logic 45 for the data read/write operation remains the same as for the addresses.

Case C: The basic architecture (shown in Figure 5) for this generalization (an
being the indices
along the dimension i) remains the same as in Case A, only the address generation logic and width of the address bus changes. For a given value of index and a/=(a.-a/'). For the s consecutive data elements to be accessed along the jth

Case B: This is a specific instance of Case C. For this case, the index αj in the n-dimensional index of the first data in the required data vector will be such that αj= 0. The αk J will be equal to (ΑJ + A:) for the s successive data. Hence, the term for address

architectures with s as an integer power of 2 are of much importance. The complexity of the hardware to realize all of the architectures described in Subsection 1.4.1 gets reduced significantly if s is a power of 2. The advantage is obvious from the fact that the

and the (mod s) operation is equivalent to logical AND operation with (s-I). Further, the dyadic permutation function gives much simpler permutation logic. Another advantage of this permutation function is that the permutation logic turns out to be same as inverse permutation logic. It is easy to see that this permutation function satisfies the Conditions 1 and 3.

-, ,, —~. •■,'■ J
be treated as 0. The proposed memory architecture is given in Figure 6.

Address Generation Logic and the Permutation Logic modules can be combined into single module 63.
Case B: Let (m,n)-hyper-matrix with m=2x+y be divided into (s,n)-hyper-matrices.

clockwise direction for data reading operation (refer to Figure 8). 1.5 An Example of a Single 2-Dimensional Data Array with s=4
This example illustrates the above-described scheme for a 2-dimensional (4x4) matrix 91. The correspondence between location in matrix 91 and memory banks 92 for data elements is shown in Figure 9 for dyadic permutation case. It is apparent from the shown rearrangement that elements of the matrix 91, which lie in the same row (or column) are stored in different banks 92. Accessing this memory bank 92 involves two steps:
1. Computing the address for each bank;
2. Reordering of the 4-element after (before) reading (writing) from the 4 banks.

The following table indicates the bank number in which the ith data element in the a row (or column) lies. This can be used to reorder the data elements for reading from or writing into the memory banks.

It is apparent that for column-wise access, the 2-bit address for all data elements in the
column to be read (or written into) from the memory is same as index i. On the other
hand, row-wise access the address is same as row number a of the data element.
2 Alternative Embodiments of the Invention
The following are alternate embodiments of the present invention.
1. Though the scheme has been described for only two rearrangements, all other rearrangements under the scope of discussion in previous section will lead to other alternative implementations of the same scheme.
2. If any application demands access along only some particular dimensions, then minor variation of the proposed scheme will lead to significant reduction in hardware complexity.
3. For the Cases B and C discussed in Section 0, the number of elements along each
dimension need not be same for larger hyper-matrix. That is, the value of m may be
different for each dimension. For this alternative implementation, only the address

computation for accessing any (s,n)-hyper-matrix will change, the logic to compute addresses for elements data vector within (s,n)-hyper-matrix will remain same.
4. Though the scheme has been described for parallel access, it can be also used for accessing the data in serial. It can be accomplished by issuing all the addresses corresponding to 5-data elements sequentially. Given the constraint of sequential access, some minor changes in the architecture will lead to reduction in hardware complexity.

We claim:
1. An apparatus for providing parallel access to data elements of a data
vector of an n-dimensional rectangular array, comprising:
an address generation logic module (31), for receiving an index of a first data element of a data vector to be stored into memory banks (34) and for receiving a dimension of access, generating addresses to which to store the data elements of the data vector, the address generation logic module (31) requiring that each data element of a data vector is stored in a different memory bank to allow parallel access along any dimension of an n-dimensional rectangular array;
a permuter (32), coupled to the address generation logic module (31), for receiving the address information, and receiving the index information of first data element of the data vector, for carrying out permutation operation on the addresses by generating a bank numbers of a memory bank to which to issue the addresses, the permuter (32) requiring that each data element of a data vector is stored in a different memory bank to allow parallel access along any dimension of an n-dimensional rectangular array, and for storing data elements at the generated address information and bank number corresponding to the each data element of the data vector; and
s memory banks (34), coupled to the permuter (32) to allow parallel access by the permuter to the s memory banks, to receive address to store each data element of the data vector, where s is equal to the number of data elements in a data vector.
2. The apparatus as claimed in claim 1 comprising:
an inverse permuter (33), for receiving the index information of the first data element of a data vector and for receiving the information of read operation, and coupled to the s memory banks (34), for performing the permutation operation on data elements, which is inverse of the permutation operation performed by the permuter (32) on the addresses, to generate the bank numbers from which to retrieve the data elements of the data vector.

3. The apparatus as claimed in claim 2 wherein the inverse permuter (33) receives information about write operation, and is coupled to store data to the s-memory banks (34) responsive to performing the same permutation operation as the permutation operation performed by the permuter on the addresses, to generate the bank numbers to which to store the data elements of the data vector.
4. The apparatus as claimed in claim 1 wherein the address generation logic module (31) and the permuter (32) are integrated into a single module.
5. The apparatus as claimed in claim 1 wherein the address for storing each data element of a data vector to memory is generated by address generation logic and a memory bank number for the generated address for storing each data element of a data vector to the memory bank generated by permuter (32).
6. The apparatus as claimed in claim 1 wherein the address for retrieving each data element of a data vector from memory banks is generated by address generation logic module (31) and a memory bank number for the generated address for retrieving each data element of a data vector from the memory bank is generated by the permuter (32).
7. The apparatus as claimed in claim 1 wherein the memory bank number for each data elements of the data vector is generated by inverse permuter to achieve shuffling of the data elements after retrieving the data from the memory banks.
8. The apparatus as claimed in claim 3 wherein the memory bank number for each data elements of the data vector is generated by inverse permuter to achieve shuffling of the data elements after retrieving the data from the memory banks.
9. The apparatus as claimed in claim 3 wherein circular rotation is performed by the permuter and inverse permuter.
10. The apparatus of claim 9 for a (s,n)-hyper matrix, wherein the permuter and inverse permuter performs the circular rotation of the

data vector by value equal to remainder obtained by dividing the summation of all 'n' index coordinates of first data element in the data vector by number of data elements in the data vector, where 'n' is dimension of (s,n)-hyper matrix and s is the number of the data element in the data vector.
1. The apparatus as claimed in claim 10 wherein the address generator

[2. The apparatus as claimed in claim 9 for a (m,n)-hyper-matrix, wherein the circular rotation performed by the permuter and inverse permuter permits parallel access to all s data elements of a data vector along the dimension j which may span over two adjacent (s,n)-hyper-matrices, and the rotation parameter for permuter is generated by value equal to remainder obtained by dividing the summation of all index coordinates except jth index coordinate αj of first data element in the
data vector by number of data elements in the data vector, and the rotation parameter for inverse permuter is given by generated by value equal to remainder obtained by dividing the summation of all 'n' index coordinates of first data element in the data vector by number of data elements in the data vector, where m=st and t is an integer.
13. The apparatus of the claim 12 wherein the address logic generator
generates an address as herein described.
14. The apparatus as claimed in claim 3 wherein the permuter and inverse
permuter perform dyadic permutation.
15. The apparatus of claim 14 for a (s,n)-hyper matrix, wherein the
permuter and inverse permuter performs the dyadic permutation of the
data vector by a parameter obtained by bit-wise logic EXOR operation
of all 'n' index coordinates of first data element in the data vector,
where 'n' is dimension of (s,n)-hyper matrix and s is the number of the
data element in the data vector.
16. The apparatus of claim 15 wherein the address generator logic
generates the address for data at index [α1„.,,αn-2 α0] of (s,n)-hyper

17. The apparatus of claim 14 for a (m,n)-hyper-matrix, wherein
permutation performed by the permuter is dyadic permutation and that
performed by the inverse permuter is a combination of dyadic and
circular permutation, which permits parallel access to all s data
elements of a data vector along the dimension/ which may span over
two adjacent (s,n)-hyper-matrices, and the parameter for dyadic
permutation of permuter and inverse is obtained by bit-wise logic
EXOR operation of all index coordinates except jth index coordinate
a J of first data element in the data vector, and the circular rotation
parameter for inverse permuter is given by generated by value equal to remainder obtained by dividing the jth index coordinate αj of first
data element in the data vector by number of data elements in the data vector, where m=st and t is an integer.
18. The apparatus of the claim 17 wherein the address logic generator
generates an address as herein described.
19. A method for providing parallel access to data elements of a data
vector of a 2-dimensional rectangular array, comprising:
receiving an index of a first data element of a data vector to be stored into a memory bank, a direction of access, and the data vector to be stored,
generating an address to which to store the data element, wherein each data element of a data vector is stored at a different address to allow parallel access;
generating a bank number of a memory bank to which to store the data element, wherein each data element of a data vector is stored in a different memory bank to allow parallel access; and
storing all data elements at the generated address information and bank number corresponding to the data element.
20. A method for providing parallel access to data elements of a data
vector of a 2-dimensional rectangular array, comprising:
receiving an index of a first data element of a data vector to be retrieved from a memory bank, and the direction of access, generating an address form which to retrieve the data element, wherein each data element of a data vector is stored at a different address to allow parallel access;

. generating a bank number of a memory bank from which to retrieve - the data element, wherein each data element of a data vector is stored in a different memory bank to allow parallel access; and
retrieving all data elements at the generated address information and bank number to obtain the desired data vector.
21. An apparatus for providing parallel access to data elements of a data
vector of an n-dimensional rectangular array substantially as herein
described with reference to an as illustrated by the accompanying drawings.
22, A method for providing parallel access to data elements of a data
vector of a 2-dimensional rectangular array substantially as herein described
with reference to an as illustrated by the accompanying drawings.

Documents:

1212-mas-1999-abstract.pdf

1212-mas-1999-claims filed.pdf

1212-mas-1999-claims granted.pdf

1212-mas-1999-correspondnece-others.pdf

1212-mas-1999-correspondnece-po.pdf

1212-mas-1999-description(complete)filed.pdf

1212-mas-1999-description(complete)granted.pdf

1212-mas-1999-drawings.pdf

1212-mas-1999-form 1.pdf

1212-mas-1999-form 26.pdf

1212-mas-1999-form 3.pdf

1212-mas-1999-form 5.pdf

1212-mas-1999-other document.pdf

« Previous Patent

Next Patent »

Patent Number

210181

Indian Patent Application Number

1212/MAS/1999

PG Journal Number

50/2007

Publication Date

14-Dec-2007

Grant Date

25-Sep-2007

Date of Filing

22-Dec-1999

Name of Patentee

M/S. SILICON AUTOMATION SYSTEMS LTD

Applicant Address

3008, 12TH MAIN ROAD , 8TH CROSS, HAL 2ND STAGE, INDIRANAGAR, BANGALORE - 560 008,

Inventors:

#	Inventor's Name	Inventor's Address
1	JANA SOUMYA	C/O DR. D. JANA, UTTARAM, TEACHERS' HSG ESTATE , BAGHAJATIN PARK, CALCUTTA - 700 094,
2	BANSAL PANKAJ	C/O SILICON AUTOMATION SYSTEMS LTD, 3008, 12TH B MAIN, 8TH CROSS, HAL, 2ND STAGE, INDIRANAGAR , BANGALORE, 560 008,
3	SINGH BALVINDER	C/O SILICON AUTOMATION SYSTEMS LTD, 3008, 12TH B MAIN, 8TH CROSS, HAL, 2ND STAGE, INDIRANAGAR, BANGALORE - 560 008,

PCT International Classification Number

G 06 F 012/00

PCT International Application Number

N/A

PCT International Filing date

PCT Conventions:

#	PCT Application Number	Date of Convention	Priority Country
1			NA