Title of Invention	DIGITAL PROCESSING DEVICE
Abstract	A digital processing device P, generally configured as a regular tree with n+1 levels S<sub>0</sub>,S<sub>i</sub>,S<sub>2</sub>....S<sub>n</sub> and degree k, is provided in the form of a circuit P<sub>n</sub> on the level S<sub>n</sub> and forms the root node of the tree, an underlying level S<sub>n</sub>.q, q=1,2...n-1, in the circuit P comprising generally kQ circuits P<sub>n-q+1</sub> provided nested in the k<sup>q</sup> circuit P<sub>n-q+1</sub> on the overlying level S<sub>2-q-1</sub>, each circuit P<sub>n-q+1</sub> on the overlying level S<sub>n-q+1</sub>, each circuit P<sub>n-q+1</sub> on this level comprising k circuits P<sub>n-q</sub>. A for q = n defined zeroth level in the circuit Pn comprises from k<sup>n-1</sup> +1 to kn circuits Po which form kernel processors in the processing device P and on the level So constitute the leaf nodes of the tree, the kernel processor PQ being provided nested in a number from 1 to k in each of the circuits P<sub>1</sub> on the level S<sub>1</sub>. Each of the circuits P<sub>1</sub>,P<sub>2</sub>,....P<sub>n</sub>, comprises a logic unit E which generally is connected with circuits P<sub>0</sub>, P<sub>1</sub>,....P<sub>n-1</sub> provided nested in the former circuits P<sub>0</sub>, P<sub>1</sub>....P<sub>n-1</sub> provided nested in the former circuits on the nearest underlying level and is additionally adapted to connect nodes on the same or underlying levels in neighbour trees. Each of the circuits P<sub>0</sub>, P<sub>1</sub>...P<sub>n</sub> has additionally identical interfaces I, such that IPo= ip<sub>1</sub>=...IP<sub>n</sub>.

Title of Invention

DIGITAL PROCESSING DEVICE

Abstract

A digital processing device P, generally configured as a regular tree with n+1 levels S0,Si,S2....Sn and degree k, is provided in the form of a circuit Pn on the level Sn and forms the root node of the tree, an underlying level Sn.q, q=1,2...n-1, in the circuit P comprising generally kQ circuits Pn-q+1 provided nested in the kq circuit Pn-q+1 on the overlying level S2-q-1, each circuit Pn-q+1 on the overlying level Sn-q+1, each circuit Pn-q+1 on this level comprising k circuits Pn-q. A for q = n defined zeroth level in the circuit Pn comprises from kn-1 +1 to kn circuits Po which form kernel processors in the processing device P and on the level So constitute the leaf nodes of the tree, the kernel processor PQ being provided nested in a number from 1 to k in each of the circuits P1 on the level S1. Each of the circuits P1,P2,....Pn, comprises a logic unit E which generally is connected with circuits P0, P1,....Pn-1 provided nested in the former circuits P0, P1....Pn-1 provided nested in the former circuits on the nearest underlying level and is additionally adapted to connect nodes on the same or underlying levels in neighbour trees. Each of the circuits P0, P1...Pn has additionally identical interfaces I, such that IPo= ip1=...IPn.

Full Text	Digital procesS1ng device The invention concerns a digital procesS1ng device P, particularly for procesS1ng of digital data and S1gnal structures, wherein the data and S1gnal structures comprise repeated sequences and/or nested patterns, and wherein the procesS1ng device P generally is configured as a regular tree with n+l levels So, S1, ... Sn and of degree k. ProcesS1ng of large data volumes with use of repeated or recurS1ve operations on very large data volumes can even in a restricted number often be a bottleneck when uS1ng conventional microprocessors and is thus amenable to masS1vely parallel solutions wherein a very large number of procesS1ng elements S1multaneously execute different operations in parallel on a large data stream, but posS1bly also parallel operations on several data streams. If such large data volumes appear in a form of data or S1gnal structures with repeated sequences and/or nested patterns, the procesS1ng can be made more effective by being realized in parallel on the same or several different levels From US Patent No. 486020 (Stolfo & al.) there is known a parallel procesS1ng device structured as a binary tree, wherein a very large number of processors, each with their own I/O unit are used. Generally Stolfo & al. discloses a computer with a very large number of processors connected in a binary tree structure such that each processor apart from those which constitute the root and the leaves of the tree has a S1ngle parent processor and two child processors. The processors tyP1cally work synchronously with data which are transmitted thereto from the parent processor and communicate the results on to the nearest succeeding processors, that is the parent processors" children. S1multaneously the child processors and the parent processor may also communicate with each other. According to Stolfo & al. each node constitutes a procesS1ng element which comprises a processor in the proper sense, a read/write memory or a random access memory, and an I/O device. The I/O device provides interfaces between each procesS1ng element and its parent and child procesS1ng elements such that a substantial improvement in speed whereby data are sent through the binary tree structure is obtained. As the binary tree structure has a procesS1ng element in every S1ngle node, the procesS1ng device will generally comprise 2" - 1 procesS1ng elements, that is 1023 procesS1ng elements if the binary tree is realized with 10 levels. In a preferred embodiment the known parallel procesS1ng device has a clock frequency of 12 MHz, which in the case of uS1ng a tree with 1023 processors which each has an average instruction cycle time of 1.8 μs, provides a procesS1ng performance of about 570 million instructions per second. A binary parallel processor of this kind may for instance be well suited for handling decomposable or partitionable data procesS1ng problems, for instance searching in large information volumes. A partitionable search problem can be defined as a problem where a query about a relation between an object x and an object set corresponds to a repeated use of a commutative and associative binary operator b which has an identity, and a primitive query which is applied between a new object x and each element f in the set F. One then has a partitionable search problem when the logic function OR is combined with the primitive query " is x = f " applied between the object x and each element f itf. As mentioned by Stolfo & al., a problem which conS1sts of answering a query about set F, may be answered by combining the answers of the queries applied to arbitrary subsets of F. The problem is in other words partitionable or decomposable and well suited for raP1d execution by means of parallel procesS1ng. The set F is partitioned in a number of arbitrary subsets equal to the number of available processors. The primitive query q is then applied in parallel in each processor between the unknown x which is communicated to all processors and the locally stored element fin the set F. The results are then combined in parallel by logzN repetitions of the operator b, as a number of computations first is executed on N/2 adjacent pairs of processors and then a corresponding number of computations on N/4 pairs of processors with the results from the first computations. The operations hence move during the process to overlying levels in the binary tree, in other words from child processors to the parent processor etc. and are repeated in parallel on each level. There is however a number of data procesS1ng problems wherein the data and S1gnal structures comprise repeated sequences and/or nested patterns which are such that a procesS1ng device of the kind that is disclosed in US Patent No. 4860201 does not provide the deS1red flexibility or may not at all be suited for handling the problem. A binary tree structure as disclosed therein presupposes in principle that the problem can be binary partitioned and that operations take places in parallel on the same level. However, there may be problems which demand another degree of decompoS1tion and where procesS1ng must be able to take place in parallel, but on different levels in the tree structure. The problems can also be partitioned such that it will be deS1rable with a larger partitioning capacity on one and the same level in some of the subtrees in the tree structure, and this in practice requires solutions which takes its starting point in a general tree structure which not only has an arbitrary number of levels, but also arbitrary degree, while nodes in subtrees not are only connected with the parent node of the tree in question, but for Instance may be connected to a node on the same or underlying levels in neighbour trees. An increased degree of connectivity in a tree structure with a deS1red number of levels and of arbitrary degree will hence make it posS1ble to reconfigure the original tree structure, either in the form of reduced trees or S1mple or complex graphs. S1multaneously can one or more of the leaf nodes be combined and take over the function of the parent node In question. From US-A-5561768 there is known a masS1vely-parallel computer system including a number of processors and a plurality of control processors. An arrangement is provided for partitioning the processors such that a series of levels in a control network tree structure is formed. The object to the present invention is thus to provide a procesS1ng device which particularly suited for procesS1ng large data volumes in masS1ve parallelism and on different levels in a general tree structure, but which S1multaneously also can be configured arbitrarily as nested circuits on different levels and preferably under determined conditions such that a selected configuration on the given level is generated recurS1vely by a configuration on an underlying level. Particularly it is the object that the procesS1ng device according to the invention shall be able to realize an MIMD procesS1ng device, that is a procesS1ng device which works with multiple instructions and multiple data. The above-mentioned and other objects are obtained according to the invention with a digital procesS1ng device particularly of procesS1ng of digital date and S1gnal structures, wherein the data and the S1gnal structure comprise repeated sequences and/or nested patterns, and wherein the procesS1ng device P is generally configured as a regular tree with n+1 levels So, S1 Sn and of device k which is characterized in that the procesS1ng device P Is provided In the form of a circuit Pn on the level Sn and forms the root node of the tree, that the nearest level Sn -1 is provided nested in the circuit Pn and comprises k circuits Pn-1 which form the child nodes of the root node, that generally an underlying level Sn-q in the circuit Pn, where q £ { 1,2. .n-l }, comprises kq circuits Pn-q, provided nested in the kq-1 circuits Pn-q+1 on the overlying level Sn-qfi. each circuit Pn-q+1 this level compriS1ng k circuits Pn-q, that a defined zeroth level Sn-q = S© in the circuit Pn for q = n comprises from kn-1 + 1 to k" circuits Po which constitute kernel processors in the procesS1ng device P and on this level So form leaf nodes In the tree, the kernel processor Po being provWed nested in a number of 1 to k in each of the kn-1 circuits P1 on the level S1, that each of the circuits P1, P2 Pn on respective levels S1, S2 Sn comprises a logic unit E which generally is connected with those circuits Po, P1 P-n on the respective nearest underlying level So, S1 Sn-1 provided nested in the respective circuits P1, P2, ,..Pn and according to choice configures a network of the former circuits In the respective circuits P1, P2,..Pn, and that each of the circuits P1....Pn has identical interfaces I. Advantageously a first embodiment of the procesS1ng device according to the invention is characterized in that the zeroth level So comprises k" kernel processors Po, that a kernel processor Po recurS1vely maps a circuit P1 on the overlying level with a mapP1ng factor r = k, r Further a second embodiment of the procesS1ng device according to the Invention is advantageously characterized in that the zeroth level So comprises rk?""kernel processors Po, 1 Finally a second embodiment of the procesS1ng device according to the invention is advantageously characterized in that respectively from 1 to k kernel processors are provided nested in each circuit P1 on the level S1, that at least one of the circuits P1 comprises at least 2 and at most k-l kernel processors PQ, such that the total number of kernel processors PQ on the level So is at least k +1 and at the most k " and the tree becomes an asymmetrically reduced or unbalanced tree, and that generally a circuit P n-q on the level Sn-q is mapped by the circuits Pn-q-1 nested in the respective circuit Pn-q According to the invention the kernel processor P advantageously comprises at least one combinatorial unit C and a memory unit M connected with at least one combinatorial unit C, at least a part of the memory unit M preferably being configured as a register unit R. In the latter case can preferably then at least one combinatorial unit C and a register unit R connected with the at least one combinatorial unit, the at least one combinatorial unit C preferably being a multiplexer. It is then advantageous that the logic unit E in a circuit Pn.q is adapted to be connected with the logic unit E in a corresponding circuit Pn-q on the game level as Sn.q in a neighbour tree. It is also advantageous that the logic unit E in a circuit Pn.q is adapted to be connected with the logic unit E in circuits Pn-q-i, Pn-q-2 ••• P1 on respective underlying levels Sn.q-i, Sn.q.2, ... S1 in a neighbour tree. Finally it is also advantageous that the logic unit E in a circuit Pn.q is adapted to be connected with one or more kernel processors PQ in a neighbour tree, either directly or via the logic unit E in the circuit P1, where the kernel processor PQ or the kernel processors PQ in question are nested. The invention shall now be explained in more detail in connection with exemplary embodiments and with reference to the accompanying drawing, wherein fig. 1 shows a first embodiment of the procesS1ng device according to the invention, fig. 2 the embodiment in fig. 1 configured as a regular tree structure, fig. 3 a second embodiment of the procesS1ng device according to the invention configured as a tree structure, fig. 4a a third embodiment of the procesS1ng device according to the invention configured as an asymmetrically reduced tree, fig. 4b an example of a subtree with leaf nodes in the tree structure which represents the procesS1ng device according to the invention, fig. 4c another example of a subtree with leaf nodes in the tree structure which represents the procesS1ng device according to the invention, fig. 5 schematically the principle for generating the procesS1ng device according the invention as a nested circuit by recurS1ve mapP1ng, fig. 6 an embodiment of the procesS1ng device according to the invention as a recurS1ve circuit with three levels and degree four, fig. 7 another emboditaent of the procesS1ng device according to the invention as a recurS1ve circuit with four levels and S1milarly degree four, fig. 8 posS1ble connections between the nodes in a tree corresponding to the circuit in fig. 6, fig. 9 posS1ble connections between the nodes in a tree corresponding to the circuit in fig. 7, fig. 10 the circuit in fig. 6 or fig. 8 connected as a one-dimenS1onal graph, fig. 1 la schematically an embodiment of a kernel processor in the procesS1ng device according to the invention, fig. 1 lb schematically an embodiment of a circuit with a logic unit and a kernel processor in the form of a comparator unit for use in a search processor device based the procesS1ng device according to the invention, fig. 12a the comparator unit in fig. 1 lb, fig. i2b the connection of comparators S1milar to the one in fig. 12a in order to compare a pattern of for instance 3 characters, and fig, 12c a balanced binary tree realized with circuits corresponding to the one in fig. lib. Fig. 1 shows symmetrically the layout of a parallel procesS1ng device according to the present invention. Formally the scheme in fig. 1 can be described as follows: Given a circuit P0 with an interface Ip0 and a further circuit E, there can be created a repeated sequence of circuits P1, P2, P3, with respective interfaces Ip1,Ip2 ••• Ipn each compriS1ng respectively k circuits P0, P2, ... Pn-i and a S1ngle furter circuit E, the interfaces I being identical for all circuits PQ, P\|, ... ?„, in other words Ip^ = Ipj = Ip, ... = Ip^ . In each case the circuit P( is configured as a tree network with n+1 levels S and of degree k. Specifically fig. 1 shows a digital procesS1ng device in the form of the circuit P4 which forms a tree with five levels S and of degree 3. This implies as shown in fig. 1 that the circuit P4 which is located on the level S^, on the underlying level comprises three circuits P3. Each of the circuits P3 on the level S3 is generated with circuits P1 on the underlying level S1, and each of the circuits P2 the level 82 is generated with three circuits P1 on the underlying level S1. Finally, each of the circuits P1 on the level S1 is generated with three.circuits PQ on the level SQ. The circuits PQ form the proper procesS1ng units in the procesS1ng device P4 and constitute the kernel processors of P4. As the circuit P4 as shown in fig. 2 is structured as a ternary tree, this implies that the level S0 comprises a total of 3 =^81 kernel processors PQ, the overlying level S\| 27 circuits P1 and the following level S7 9 circuits P2. Finally the level S3 has 3 circuits P3. Each of the circuits PQ, P1, P2, P3 constitutes the nodes in the tree in fig. 2. The kernel processors PQ constitute the leaf nodes of the tree and the circuit P4 the root node of the tree. Structured in this way the tree in fig. 2 appears as a recurS1ve ternary tree, as a circuit on an arbitrary level S maps the circuit on the nearest following overlying level recurS1vely with a mapP1ng factor r which in this case is 3. Each of the circuits Pj, P2, P3, P4, additionally comprises a logic unit E, such that the logic units E in all circuits P\|, P2, P3, Pn phyS1cally regarded constitute the nodes of the tree. By means of the logic unit E can each node of the tree generally be connected with its parent node or its child nodes. Through the logic unit E can also each of the child nodes of the node be mutually connected. Each of the kernel processors which constitutes the leaf nodes of the tree also has respective circuits Pj as parent nodes and can via these be connected mutually. As per se known is the problem which the procesS1ng device according to invention shall process, partitioned and the procesS1ng executed on the level SQ in parallel by the kernel processors PQ. The results from the kernel processors PQ are combined and communicated further by the respective parent node P1. This process is repeated until the fmal result is present on the circuit P4. Both the kernel processor PQ and the logic unit E shall be discussed a somewhat greater detail in the following. However, the circuit P4 does not need to be realized as a complete or regular recurS1ve tree. It may well be generated as a symmetrically reduced tree and this implies that the number of leaf nodes or number of kernel processors PQ connected with the nodes P1, is reduced such that each circuit P\| comprises 2 or more leaf nodes or kernel processors PQ, but in each case less then k kernel processors PQ. AS the circuits in fig. 1 has k = 3, this implies that the symmetrically reduced tree as shown in fig. 3, comprises two kernel processors PQ for each circuit P\|. The symmetrically reduced tree on fig. 3 is in other words not regularly recurS1ve, the recurS1on first from level S\| taking place with a mapP1ng factor 3. Generated as in fig. 1, each circuit P\| in the tree in fig. 3 will in addition to the logic unit E comprise only two kernel processors ?P0 The recurS1ve mapP1ng of circuits on an underlying level to an overlying level implies evidently that these circuits will be nested in a circuit in the overlying level and the whole circuit P4 appears as a^ested circuit. If the circuit is an asymmetrically reduced circuit, it may appear as shown in fig. 4a. Here some of the nodes P1 regularly have three leaf nodes PQ, other nodes P1 only two leaf nodes Po and yet other nodes P\| only a S1ngle leaf node Po- In the latter case the mapP1ng of the parent node becomes an identity mapP1ng thereof, and for the sake of that the logic element then can be left out in the circuits P1. If all circuits P1 comprise only a S1ngle kernel processor Po, the tree which generates the circuit P4 will appear as completely reduced, as P1 = Po and the circuit P4 is transformed into a regular recurS1ve circuit P3 with four levels and 27 kernel processors Poon the level SQ. IS thus the number of levels n+1, a circuit Pn must have at least k""" +1 kernel processors on the level SQ in order not to appear as a completely reduced tree. A subtree in a maximally asymmetrical reduced tree is shown in fig. 4b, this subtree compriS1ng a node P1 with child nodes P1, such that only one of nodes P1 has two kernel processors or leaf nodes Po, while all of the remaining nodes P1 in the tree only has a S1ngle leaf node PQ, such that P1 in these cases is reduced to PQ. With five levels or responding to the circuit in fig. 1, a circuit P4 configured as in fig. 4b, that is with two leaf nodes in a circuit P\| and otherwise only one leaf node in each of the remaining nodes P1, would appear as a maximally asymmetrical reduced tree and the number of nodes on the level SQ in other words becomes 3""+l=28. Correspondingly a subtree for the node PT in a minimally asymmetrical reduced tree is shown in fig. 4c. Here a S1ngle node P1 comprises two leaf nodes or kerne! processors PQ, while all the remaining nodes P\| in the tree as before comprise three leaf nodes PQ. This of course implies that a minimally asymmetrical reduced tree corresponding to the circuit on fig. 1 in this case shall comprise a total of 80 kernel processors PQ. A circuit which is generated as the tree shown in fig. 4, will of course still appear as nested on all underlying levels S1, S2, S3, but the mapP1ng will no longer be recurS1ve, as a circuit P1, P1, P3, P4, will be mapped by all circuits which are located on the respective underlying level So, S1, S2, S3. The advantages of generating the circuit P4 as a symmetrically or asymmetrically reduced tree shall be discussed in more detail in the following. Fig. 5 shows how a recurS1ve self-mapP1ng procesS1ng device Pn according to the invention is generated. The circuit Pn is realized as a tree with n+ I levels and of degree k = 4. Each of the circuits P1, P2 Pn-i is nested in the respective circuits P1, P2 PnOn the nearest overlying level S9, s1..... Sn. The number of nested levels Is then of course n and the circuit Pn shall on the zeroth level So defined by the kernel processor Po comprise a total of kn kernel processors of this kind. Generally shall the number of circuits Pn-q on a level Sn-q, where q = 0,1,2,...n, be equal to k". Fig. 6 shows a nested circuit P2 generated as a recurS1ve tree according to the principle in fig. 5. The circuit P1 comprises the logic unit E and nested on the underlying level S1 four circuits P1 which each In turn comprises the logic unit E and nested on the underlying level four circuits or kernel processors Po. The logic units configure the circuit P2 as a quaternary tree with three levels So, S1 Sn, and a total of S1xteen kernel processors Poor leaf nodes on the level So. Correspondingly fig. 7 shows a sequentially generated circuit with four levels Sand k = 4. The circuit In fig. 7 comprises in other words a quaternary configured tree with a total of 64 kernel processors Po on the level So. As before the interfaces I of each circuit Po, P1, P2, P3 are of course identical. Fig. 8 shows the circuit P2 in fig. 6 generally configured as a quaternary tree, but furthermore such that for instance the circuits P1, that are the nodes on the lev^l S1, are mutually connected with a kernr! prcice-.sor PQ, in the neighbour tree on the level So. Generally will the logic unit E in one of the nodes P1, P2 ... Pn nut uniy diiuw d cuimcttion upwards and downwards in the subtree wherein the node is locdtcd, buL also a v.onnettion to nodes in neighbour trees within one and the same level or a connection to the nearest-lying node on an underlying level In a neighbour tree. For the circuit In fig 6 the node P1 in a subtree hence can be connected to the first leaf node Po in the neighbour tree etc. Correspondingly Fig. 9 shows connections in the circuit in Fig. 7, which forms a quaternary tree with four levels and a total of 64 kernel processors. Here again a node on a given level can be connected to the node in the neighbour tree on the same level and to the nearest adjacent nodes in the neighbour tree on an arbitrary underlying level. This implies that the general tree structure can be configured via the logic element into more or less complex graphs. An example is shown in fig, 10, where the circuit PT in fig. 6 or fig. 8 is configured such that P2 in reality forms a one-dimenS1onal graph where the S1xteen kernel pcocessors PQ are . connected sequentially. Usually the procesS1ng device according to the invention will be realized as a dedicated procesS1ng device for a determined task, for instance image procesS1ng. With baS1s in the general tree structure the configuring may take place task-specifically, either operatively or phyS1cally. If the configuring takes place operatively, the circuit is via the interface supplied with configuring parameters which realize the appropriate network for the procesS1ng. The configuring may then very well imply that the circuit is configured with fewer kernel processors than those which actually are provided in the circuit The procesS1ng device according to the invention may then appear as a symmetrically reduced tree, such this is shown in fig. 3, or an asymmetrically reduced tree, such this is shown in fig. 4. For some concrete data procesS1ng tasks it can namely be required a greater number of kernel processors in one or more subtrees in order to increase the capacity of the procesS1ng device for a specific task. As mentioned, the procesS1ng device according to the invention can also appear as a phyS1cally reduced tree, whether it is symmetrically or asymmetrically reduced. This implies that the phyS1cal circuit solution actually is realized with a smaller number of kernel processors that which is given by the number of levels S. in the tree and the degree k. This may be appropriate for providing a circuit technology solution which optimally exploit the real estate conditions on the chip. If the number of kernel processors in a dedicated procesS1ng device according to the invention is reduced phyS1cally, this implies for instance a reduction in the phyS1cal S1ze of the chip or a better exploitation of the real estate of the chip, dependent on the actual architectural solution which is selected. The condition is of course that under every circumstance the task which is asS1gned to the procesS1ng device, is executed. The circuit PQ defines a zeroth unnested level in the procesS1ng device P and constitutes a kernel processor in P. The kernel processor P0 is generally shown in fig. I la and comprises a combinatorial unit C connected with a memory unit M. The interface Ip in the kernel processor P0 which is of course is identical with the interface I of the remaining circuits P1, ... ?„ is on the S1de of P0 schematically indicated with arrows denoted "In" and "Out" for respectively the input S1de and the output S1de. The combinatorial unit C receives on the input S1de data in form of operative parameters which comprise the data which shall be processed. In additon the input S1de also supplies data in the form of configuring parameters for the kernel processor PQ. These configuring parameters may conS1st of instructions as well as setting parameters for subelements in the combinatorial unit C and the memory unit M. If the data which shall be processed are in the form of structures with repeated sequences and/or nested patterns and the procesS1ng of the data is specifically directed towards determined sequences and patterns with use of a comparatively limited set of instructions, the memory unit may advantageously wholly or partly be configured as a register unit R, as each register unit may comprise a plurality of registers with respectively fixed length and address, dependent on the operations which shall be executed in the procesS1ng. The output S1de of the interface delivers the results of the procesS1ng, for instance to a not shown external in/out interface for the whole procesS1ng device P„ or delivers the results to other kernel procesS1ng units P0 in the circuits P1 or to the input interfaces in circuit P2, Pj, ••• Pn on the overlying levels S2, S3....Sn A specific embodiment of the circuit P\| for use in a search processor device is shown somewhat schematically, but with all main components in fig. l1b. The interface is shown respectively as an input interface Iin and an output interface lom- The input interface In„ is connected with a multiplexer MUX with outputs respectively to the output interface lout, a comperator unit COM which corresponds to the kernel processor PQ, a first D flip-flop and a sequence control unit SC. The comparator unit COM is used in a search operation for comparing characters or strings of characters which form given patterns. In this manner it can be determined whether a given pattern can be retrieved in a form of one or more elements in a set created of different patterns. The comperator unit COM delivers the received data on to the output interface lom and the result of the comparison to a latency unit LAT which can be used for delaying a poS1tive binary value for a given number of cycles. The latency unit LAT is hence via the D flip-flop D\ connected with an output on the multiplexer MUX and itself delivers the result of the comparison further to the sequence control unit SC which is adapted respectively to require that the kernel processor connected before or after the present one reports a match before the present kernel processor itself can report a match, to allow a comparison of patterns in case certain parts of the pattern are misS1ng and additonally a comparison of patterns where certain parts of the pattern may be repeated. The sequence control unit SC is on the input S1de connected with the D flip-flop D1 and on the output S1de with the D flip-flop D2, the number of flip-flops being identical to the number of characters which are compared with the comperator unit COM. The sequence control unit SC must be able to receive results from a comparison operation in preceeding or succeeding processor units and is hence connected with the input interface Ijn and the multiplexer MUX and itself reports the results of the comparison operation further via output interface lout- Finally the circuit P1 in this embodiment also comprises a result selector RS which is used for combining two result values from other kernel processors or other result selectors. The result selector RS is thus connected with respectively the input interface Ijn and the output interface loui and can perform Boolean operations on two result values or compare operations of the type greater than, less than, or equal to over a set of processors. In a specific embodiment a digital procesS1ng device P can be adapted to different types of numeric computation in masS1ve parallelism and the combinatorial unit C and a register unit R will then be configured as arithmetic logic unit ALU. Each kernel processor PQ can additionally comprise a memory unit.M configured as RAM and hence implement an intelligent RAM (IRAM) such that latency problems are reduced or eliminated in a data procesS1ng of this kind. On each nested level S1, S2, ...Sn there is additionally provided a further circuit in the form of a logic unit E which in itself may be a processor unit conS1sting of a combinatorial unit C and a register unit R. The logic unit E is generally identical for all circuits P1, P1,... Pn. For example, in the circuit P the logic unit E corresponds to all components in fig. 1 lb with the exception of the interface I and the comparator (kernel processor) COM. In a regular binary tree the circuit P1 in fig. 11 b will of course have two kernel processors, that is two connected comparator units COM. Advantageously the circuit E on each level S1, S1, ... Sn then functions as a procesS1ng unit with the puqjose of configuring the network on the respective level and posS1bly underlying levels where the circuit E is located in each node in the network. By supplying the circuit E which suitable network-configuring parameters, a network on a given level can be configured optimally for^he procesS1ng task which shall be executed. The logic unit E is adapted to be connected with nodes on the same or underlying levels in neighbour trees and can with baS1s in a regular tree structure transform this into more or less complex graphs and networks in order to better handle task-specific data volumes and data rates, for instance in problems which do not lend themselves to a balanced partitioning. The combinatorial unit in the kernel processor PQ realized as a comparator unit COM and used in a search processor device is shown in fig. 12a. A register x contains the dataword which momentarily is located in the comparator unit COM and the register a comprises the dataword which x shall be compared to. It is from fig. 12a eaS1ly seen that four comparisons can be performed, viz. x = a, x > a, x ?i a or x If the procesS1ng device according to the present invention shall be implemented as a search processor device, an unreduced and balanced binary tree may be used. For each inner node in the tree, that is all nodes with exception of the root node and the leaf nodes, a multiplexer is used for deciding whether two subtrees shall receive the same data elements in parallel and whether they shall receive data sequentially. This corresponds to connecting nodes on two different levels in neighbour trees, as already mentioned. Fig. 12c shows a search processor device realized as a binary tree with 8 kernel processors, here shown as circuits 0, I, ... 7. In this connection it shall be remarked that when the results are fetched from two circuits or two kernel processors, these will be combined with the use of logic operations. This takes place in the result selector RS. The logic unit E in each circuit comprises as shown in fig. 1 lb a result selector RS which also allows that only one of the results is selected. The result selectors used in a binary tree will in other words give a S1ngle result from a group of processor units. S1multaneously the search procesS1ng device in the form of a binary tree shall also fetch the results from a comparison of characters or oh.aracter strings executed before or after the comparison of the character or the character strings which for the moment are processed by the kernel processor or the comperator unit COM. By uS1ng a sequence control unit in the logic unit the tree can be configured such that a comparison operation can be performed by chooS1ng the results either to the right or to the left for any of the elements which compare different characters or character strings. When a plurality of comparator units additionally compares the same characters or character strings, will their preceding and succeeding results be the same. These results may be found by the same multiplexer MUX which controls the data stream being used, such that a comparison result is communicated further to any of the neighbour elements which compare the same characters or character strings. This is also realized in the search processor device as shown in fig. 12c. In this connection it shall be remarked that the same multiplexer set now is used for three separate purposes, namely for distributing data, for routing memory bits through the number of flip-flops and to provide or supply the results from comparisons of proceeding or following characters or character strings. The results from different kernel processors can however be combined by uS1ng a look-up table unit which can be connected between the subtrees and the output of the search processor device. The look-up table unit will then replace the result selectors RS shown in fig. 12c. The neste.d and posS1bly recurS1vely generated procesS1ng device P according to the invention can be used for realizing task-specific procesS1ng devices for procesS1ng large data volumes in masS1ve parallelism. Examples of such tasks may be searching in large data volumes structured in the form of documents, where the search takes place in S1multaneous parallel in all or a large number of documents and can for instance be directed towards retrieval, identification and comparison of patterns. Further the procesS1ng device according to the invention can for instance be configured for image procesS1ng, encryption and decryption as well as procesS1ng based on a recurS1ve procedures generally. Applied as purely numerical processor, the procesS1ng device according to the invention could be adapted for factorisation. The kernel processor"s P0 can as mentioned be based on standard microelectronic components and these may for instance be implemented as a so-called field programmed gate array (FPGA) or an application specific integrated circuit (AS1C), in both cases with a varying scale of integration. As the procesS1ng device according to the invention can be scaled in several dimenS1ons, for instance with regard to procesS1ng capacity or connectivity, the realization is however not bound to a determined technology or a determined technological level, but may in practice be implemented based on the at any time available technological level, such that the performance of the procesS1ng device can be improved in step with the technological development. In principle the procesS1ng device P according to the present invention could be realized with arbitrary degree k and an arbitrary number n+I levels S. In each case the number of kernel processors will be given by k". In reality both k and n will be limited by the microelectronic technology which is available for realizing procesS1ng device P and particularly then the available integration scale. Dependent of both technology and the specific procesS1ng tasks which procesS1ng devices P shall be realized for, the chosen values for k an n can be made the object of a balanced judgement, for instance dependent on the need for parallelism and degree of partitioning. In a practical realized implementation the procesS1ng device Pn for instance is realized with k = 8 and n = 3, that is 512 kernel processors ?Q on the level SQ. By uS1ng standard microelectronics based on a conventional inorganic semiconductor technology, for instance S1licon technology, the connectivit>", that is theinterconnection"of the circuits in a network may offer problems in the form of complexity and long S1gnal paths if the procesS1ng device P is realized integrated in a substantial planar technology on a substrate. In principle the network however is reconfigurable, that is that a network configured for one task may be configured in to another network, for instance with the use of suitable configuring parameters via the logic unit E. For certain tasks within masS1ve parallel procesS1ng it may however be deS1rable wirh a higher degree of connectivity and ideally with total connectivity, that is every node being mutually connected. In this case the degree of connectivity for m nodes scales as m"/2 and shall even for a limited number m, for instance 512 be difficult to realize with use of conventional circuit topologies. A posS1bility for realizing a circuit topology is to stack circuits based on organic thin-film technology above each other in a volumetric configuration and uS1ng both vertical and horizontal conducting structures for generating connections between the separate nodes or circuits, such that a three-dimenS1onal network topology is obtained which will allow a very high degree of connectivity. Such technologies are disclosed in NO patent application 19982518 (Gudesen, Leistad & Nordal) which is directed towards an integrated scalable data procesS1ng device with MIMD architecture and realized as a volumetrically structured unit, i.a. with optimal network topologies and high degree of connectivity for masS1ve parallel procesS1ng. "We clalm; 1. A digital processing device P, particularly for processing of digital data and signal structures, wherein the data and signal structures comprise repeated sequences and/or nested patterns, and wherein the processing device P generally is configured as a regular tree with n+ 1 levels So, S1 Sn and of degree K characterized In that the processing device P is provided in the form of a circuit Pn on the level Sn and forms the root node of the tree, that the nearest level Sn-i is provided nested in the circuit Pn and comprises k circuits Pn-1 which form the child nodes of the root node, that generally an underlying level Sn-q in the circuit Pn, where q £ { 1 ,2..n-l}, comprises kq circuits Pn-q provided nested in the kq-1 circuits Pn-q+1 on the overlying level Sn-q+1 each circuit Pn-q+1 on this level comprising k circuits Pn-q, that a defined zeroth level Sn-q = So In the circuit Pn for q = n n 1 comprises from k" +1 to kn circuits PQ which constitute kernel processors in the processing device P and on this level So form leaf nodes in the tree, the kernel processor Po being n 1 provided nested in a number of 1 to k in each of the k circuits P1 on the level S1 that each of the circuits Pi, P2 ...Pn on respective levels S1, S2 ... Sn comprises a logic unit E which generally is connected with those circuits Po, P1 ... Pni on the nsspective nearest underlying level So, Si Sn -1 provided nested in the respective circuits Pi, P2, ...Pn and according to choice configures a network of the former circuits in the respective circuits Pi, P2—Pn, and that each of the circuits Po, Pi Pn has Identical interfaces I. 2. A digital processing device as claimed in claim 1, wherein the zeroth level So comprises KN kernel processors Por that a kernel processor Po recursively maps a circuit P1 on the overlying level with a mapping factor r = k, such that the tree is a unreduced or complete tree, and that generally a circuit Pn-q on the level Sn-q maps recursively a circuit Pn-q+i on the overlying level Sn-q+i with the mapping factor r = k. 3. A digital processing device P as claimed in claim 1, wherein the zeroth level So comprises rkn-1 kernel processors Po, 1 4. A digital processing device P as claimed in claim I, wherein respectively from I to k kernel processors are provided nested in each circuit P1 on the level S1 that at least one of the circuits P1 comprises at least 2 and at most k-l kernel processors Po, such that the total number of kernel processors Po on the level So is at least kn-1+land at the most kn -1 and the tree becomes an asymmetrically reduced or unbalanced tree, and that generally a circuit Pn-q on the level Sn-q is mapped by the circuits Pn-q1i nested in the respective circuit Pn-q. 5. A digital processing device P as claimed In claim 1, wherein the kernel processor Po at least comprises one combinatorial unit C and a memory unit M connected with at least one combinatorial unit C. 6. A digital processing device P as claimed in claim 5, wherein at least a part of the memory unit M is configured as a register unit R. 7. A digital processing device P as claimed in claim 6, wherein at least one combinatorial unit C and a register unit R are configured as an arithmetic logic unit ALU. 8. A digital processing device P as claimed in claim 1, wherein the logic unit E comprises at least one combinatorial unit C and a register unit R connected with the at least one combinatorial unit C. 9. A digital processing device P as claimed In claim 8, wherein at least one combinatorial unit C is a multiplexer. 10. A digital processing device P as claimed in claim 9, wherein the logic unit E in a circuit Pn-q is adapted to be connected with the logic unit E in a corresponding circuit Pn-q on the same level Sn-q in a neighbour tree. 11. A digital processing device P as claimed in claim 9, wherein the logic unit E in a circuit Pn-q, is adapted to be connected with the logic unit E in circuits Pn-q-i, P n-q-2, ... P1 on respective underlying levels Sn-q-1, Sn-q-2,... S1 in a neighbour tree. 12. A digital processing device P as claimed in claim 11, wherein the logic unit E in a circuit Pn-q Is adapted to be connected with one or more kernel processors Po In a neighbour tree, either directly or via the logic unit E in the circuit PI, where the kernel processor PO or the kernel processors Po in question are nested. 13. A digital processing device substantially as herein described with reference to the accompanying drawings.

Full Text

Digital procesS1ng device
The invention concerns a digital procesS1ng device P, particularly for procesS1ng of digital data and S1gnal structures, wherein the data and S1gnal structures comprise repeated sequences and/or nested patterns, and wherein the procesS1ng device P generally is configured as a regular tree with n+l levels So, S1, ... Sn and of degree k.
ProcesS1ng of large data volumes with use of repeated or recurS1ve operations on very large data volumes can even in a restricted number often be a bottleneck when uS1ng conventional microprocessors and is thus amenable to masS1vely parallel solutions wherein a very large number of procesS1ng elements S1multaneously execute different operations in parallel on a large data stream, but posS1bly also parallel operations on several data streams. If such large data volumes appear in a form of data or S1gnal structures with repeated sequences and/or nested patterns, the procesS1ng can be made more effective by being realized in parallel on the same or several different levels
From US Patent No. 486020 (Stolfo & al.) there is known a parallel procesS1ng device structured as a binary tree, wherein a very large number of processors, each with their own I/O unit are used. Generally Stolfo & al. discloses a computer with a very large number of processors connected in a binary tree structure such that each processor apart from those which constitute the root and the leaves of the tree has a S1ngle parent processor and two child processors. The processors tyP1cally work synchronously with data which are transmitted thereto from the parent processor and communicate the results on to the nearest succeeding processors, that is the parent processors" children. S1multaneously the child processors and the parent processor may also communicate with each other. According to Stolfo & al. each node constitutes a procesS1ng element which comprises a processor in the proper sense, a read/write memory or a random access memory, and an I/O device. The I/O device provides interfaces between each procesS1ng element and its parent and child procesS1ng elements such that a substantial improvement in speed whereby data are sent through the binary tree structure is obtained. As the binary tree structure has a procesS1ng element in every S1ngle node, the procesS1ng device will generally comprise 2" - 1 procesS1ng elements, that is 1023 procesS1ng elements if the binary tree is realized with 10 levels. In a preferred embodiment the known parallel procesS1ng device has a clock

frequency of 12 MHz, which in the case of uS1ng a tree with 1023 processors which each has an average instruction cycle time of 1.8 μs, provides a procesS1ng performance of about 570 million instructions per second.
A binary parallel processor of this kind may for instance be well suited for handling decomposable or partitionable data procesS1ng problems, for instance searching in large information volumes. A partitionable search problem can be defined as a problem where a query about a relation between an object x and an object set corresponds to a repeated use of a commutative and associative binary operator b which has an identity, and a primitive query which is applied between a new object x and each element f in the set F. One then has a partitionable search problem when the logic function OR is combined with the primitive query " is x = f " applied between the object x and each element f itf. As mentioned by Stolfo & al., a problem which conS1sts of answering a query about set F, may be answered by combining the answers of the queries applied to arbitrary subsets of F. The problem is in other words partitionable or decomposable and well suited for raP1d execution by means of parallel procesS1ng. The set F is partitioned in a number of arbitrary subsets equal to the number of available processors. The primitive query q is then applied in parallel in each processor between the unknown x which is communicated to all processors and the locally stored element fin the set F. The results are then combined in parallel by logzN repetitions of the operator b, as a number of computations first is executed on N/2 adjacent pairs of processors and then a corresponding number of computations on N/4 pairs of processors with the results from the first computations. The operations hence move during the process to overlying levels in the binary tree, in other words from child processors to the parent processor etc. and are repeated in parallel on each level.
There is however a number of data procesS1ng problems wherein the data and S1gnal structures comprise repeated sequences and/or nested patterns which are such that a procesS1ng device of the kind that is disclosed in US Patent No. 4860201 does not provide the deS1red flexibility or may not at all be suited for handling the problem. A binary tree structure as disclosed therein presupposes in principle that the problem can be binary partitioned and that operations take places in parallel on the same level. However, there may be problems which demand another degree of decompoS1tion and where procesS1ng must be able to take place in parallel, but on different levels in the

tree structure. The problems can also be partitioned such that it will be deS1rable with a larger partitioning capacity on one and the same level in some of the subtrees in the tree structure, and this in practice requires solutions which takes its starting point in a general tree structure which not only has an arbitrary number of levels, but also arbitrary degree, while nodes in subtrees not are only connected with the parent node of the tree in question, but for Instance may be connected to a node on the same or underlying levels in neighbour trees. An increased degree of connectivity in a tree structure with a deS1red number of levels and of arbitrary degree will hence make it posS1ble to reconfigure the original tree structure, either in the form of reduced trees or S1mple or complex graphs. S1multaneously can one or more of the leaf nodes be combined and take over the function of the parent node In question.
From US-A-5561768 there is known a masS1vely-parallel computer system including a number of processors and a plurality of control processors. An arrangement is provided for partitioning the processors such that a series of levels in a control network tree structure is formed.
The object to the present invention is thus to provide a procesS1ng device which particularly suited for procesS1ng large data volumes in masS1ve parallelism and on different levels in a general tree structure, but which S1multaneously also can be configured arbitrarily as nested circuits on different levels and preferably under determined conditions such that a selected configuration on the given level is generated recurS1vely by a configuration on an underlying level. Particularly it is the object that the procesS1ng device according to the invention shall be able to realize an MIMD procesS1ng device, that is a procesS1ng device which works with multiple instructions and multiple data.
The above-mentioned and other objects are obtained according to the invention with a digital procesS1ng device particularly of procesS1ng of digital date and S1gnal structures, wherein the data and the S1gnal structure comprise repeated sequences and/or nested patterns, and wherein the procesS1ng device P is generally configured as a regular tree with
n+1 levels So, S1 Sn and of device k which is characterized in that the procesS1ng device
P Is provided In the form of a circuit Pn on the level Sn and forms the root node of the tree, that the nearest level Sn -1 is provided nested in the circuit Pn and comprises k circuits Pn-1 which form the child nodes of the root node, that generally an underlying level Sn-q in the circuit Pn, where q £ { 1,2. .n-l }, comprises kq circuits Pn-q, provided nested in the kq-1 circuits
Pn-q+1 on the overlying level Sn-qfi. each circuit Pn-q+1 this level compriS1ng

k circuits Pn-q, that a defined zeroth level Sn-q = S© in the circuit Pn for q = n comprises from kn-1 + 1 to k" circuits Po which constitute kernel processors in the procesS1ng device P and on this level So form leaf nodes In the tree, the kernel processor Po being provWed nested in a number of 1 to k in each of the kn-1 circuits P1 on the level S1, that each of the
circuits P1, P2 Pn on respective levels S1, S2 Sn comprises a logic unit E which
generally is connected with those circuits Po, P1 P-n on the respective nearest underlying
level So, S1 Sn-1 provided nested in the respective circuits P1, P2, ,..Pn and according to
choice configures a network of the former circuits In the respective circuits P1, P2,..Pn, and that each of the circuits P1....Pn has identical interfaces I.
Advantageously a first embodiment of the procesS1ng device according to the invention is characterized in that the zeroth level So comprises k" kernel processors Po, that a kernel processor Po recurS1vely maps a circuit P1 on the overlying level with a mapP1ng factor r = k, r Further a second embodiment of the procesS1ng device according to the Invention is advantageously characterized in that the zeroth level So comprises rk?""kernel processors Po, 1 Finally a second embodiment of the procesS1ng device according to the invention is advantageously characterized in that respectively from 1 to k kernel processors are provided nested in each circuit P1 on the level S1, that at least one of the circuits P1 comprises at least 2 and at most k-l kernel processors PQ, such that the total number of kernel processors PQ on the level So is at least k +1 and at the most k " and the tree becomes an asymmetrically reduced or unbalanced tree, and that generally a circuit P n-q on the level Sn-q is mapped by the circuits Pn-q-1 nested in the respective circuit Pn-q
According to the invention the kernel processor P advantageously comprises at least one combinatorial unit C and a memory unit M connected with at least one combinatorial unit C, at least a part of the memory unit M preferably being configured as a register unit R. In the latter case can preferably then at least one combinatorial unit C and

a register unit R connected with the at least one combinatorial unit, the at least one combinatorial unit C preferably being a multiplexer. It is then advantageous that the logic unit E in a circuit Pn.q is adapted to be connected with the logic unit E in a corresponding circuit Pn-q on the game level as Sn.q in a neighbour tree.
It is also advantageous that the logic unit E in a circuit Pn.q is adapted to be connected with the logic unit E in circuits Pn-q-i, Pn-q-2 ••• P1 on respective underlying levels Sn.q-i, Sn.q.2, ... S1 in a neighbour tree.
Finally it is also advantageous that the logic unit E in a circuit Pn.q is adapted to be connected with one or more kernel processors PQ in a neighbour tree, either directly or via the logic unit E in the circuit P1, where the kernel processor PQ or the kernel processors PQ in question are nested.
The invention shall now be explained in more detail in connection with exemplary embodiments and with reference to the accompanying drawing, wherein
fig. 1 shows a first embodiment of the procesS1ng device according to the invention,
fig. 2 the embodiment in fig. 1 configured as a regular tree structure,
fig. 3 a second embodiment of the procesS1ng device according to the invention configured as a tree structure,
fig. 4a a third embodiment of the procesS1ng device according to the invention configured as an asymmetrically reduced tree,
fig. 4b an example of a subtree with leaf nodes in the tree structure which represents the procesS1ng device according to the invention,
fig. 4c another example of a subtree with leaf nodes in the tree structure which represents the procesS1ng device according to the invention,
fig. 5 schematically the principle for generating the procesS1ng device according the invention as a nested circuit by recurS1ve mapP1ng,
fig. 6 an embodiment of the procesS1ng device according to the invention as a recurS1ve circuit with three levels and degree four,

fig. 7 another emboditaent of the procesS1ng device according to the invention as a recurS1ve circuit with four levels and S1milarly degree four,
fig. 8 posS1ble connections between the nodes in a tree corresponding to the circuit in fig. 6,
fig. 9 posS1ble connections between the nodes in a tree corresponding to the circuit in fig. 7,
fig. 10 the circuit in fig. 6 or fig. 8 connected as a one-dimenS1onal graph,
fig. 1 la schematically an embodiment of a kernel processor in the procesS1ng device according to the invention,
fig. 1 lb schematically an embodiment of a circuit with a logic unit and a kernel processor in the form of a comparator unit for use in a search processor device based the procesS1ng device according to the invention,
fig. 12a the comparator unit in fig. 1 lb,
fig. i2b the connection of comparators S1milar to the one in fig. 12a in order to compare a pattern of for instance 3 characters, and
fig, 12c a balanced binary tree realized with circuits corresponding to the one in fig. lib.
Fig. 1 shows symmetrically the layout of a parallel procesS1ng device according to the present invention. Formally the scheme in fig. 1 can be described as follows: Given a circuit P0 with an interface Ip0 and a further
circuit E, there can be created a repeated sequence of circuits P1, P2, P3, with respective interfaces Ip1,Ip2 ••• Ipn each compriS1ng respectively k circuits P0, P2, ... Pn-i and a S1ngle furter circuit E, the interfaces I being identical for all circuits PQ, P|, ... ?„, in other words Ip^ = Ipj = Ip, ... = Ip^ . In each case the circuit P( is configured as a tree network with n+1 levels S and of degree k. Specifically fig. 1 shows a digital procesS1ng device in the form of the circuit P4 which forms a tree with five levels S and of degree 3. This implies as shown in fig. 1 that the circuit P4 which is located on the level S^, on the underlying level comprises three circuits P3. Each of the circuits P3 on the level S3 is generated with circuits P1 on the underlying level S1, and each of the circuits P2 the level 82 is generated with three circuits P1 on the underlying level S1. Finally, each of the circuits P1 on the level S1 is

generated with three.circuits PQ on the level SQ. The circuits PQ form the proper procesS1ng units in the procesS1ng device P4 and constitute the kernel processors of P4. As the circuit P4 as shown in fig. 2 is structured as a ternary tree, this implies that the level S0 comprises a total of 3 =^81 kernel processors PQ, the overlying level S| 27 circuits P1 and the following level S7 9 circuits P2. Finally the level S3 has 3 circuits P3. Each of the circuits PQ, P1, P2, P3 constitutes the nodes in the tree in fig. 2. The kernel processors PQ constitute the leaf nodes of the tree and the circuit P4 the root node of the tree. Structured in this way the tree in fig. 2 appears as a recurS1ve ternary tree, as a circuit on an arbitrary level S maps the circuit on the nearest following overlying level recurS1vely with a mapP1ng factor r which in this case is 3. Each of the circuits Pj, P2, P3, P4, additionally comprises a logic unit E, such that the logic units E in all circuits P|, P2, P3, Pn phyS1cally regarded constitute the nodes of the tree. By means of the logic unit E can each node of the tree generally be connected with its parent node or its child nodes. Through the logic unit E can also each of the child nodes of the node be mutually connected. Each of the kernel processors which constitutes the leaf nodes of the tree also has respective circuits Pj as parent nodes and can via these be connected mutually. As per se known is the problem which the procesS1ng device according to invention shall process, partitioned and the procesS1ng executed on the level SQ in parallel by the kernel processors PQ. The results from the kernel processors PQ are combined and communicated further by the respective parent node P1. This process is repeated until the fmal result is present on the circuit P4.
Both the kernel processor PQ and the logic unit E shall be discussed a somewhat greater detail in the following.
However, the circuit P4 does not need to be realized as a complete or regular recurS1ve tree. It may well be generated as a symmetrically reduced tree and this implies that the number of leaf nodes or number of kernel processors PQ connected with the nodes P1, is reduced such that each circuit P| comprises 2 or more leaf nodes or kernel processors PQ, but in each case less then k kernel processors PQ. AS the circuits in fig. 1 has k = 3, this implies that the symmetrically reduced tree as shown in fig. 3, comprises two kernel processors PQ for each circuit P|. The symmetrically reduced tree on fig. 3 is in other words not regularly recurS1ve, the recurS1on first from level S| taking place with a mapP1ng factor 3. Generated as in fig. 1, each circuit P| in the

tree in fig. 3 will in addition to the logic unit E comprise only two kernel processors ?P0 The recurS1ve mapP1ng of circuits on an underlying level to an overlying level implies evidently that these circuits will be nested in a circuit in the overlying level and the whole circuit P4 appears as a^ested circuit.
If the circuit is an asymmetrically reduced circuit, it may appear as shown in fig. 4a. Here some of the nodes P1 regularly have three leaf nodes PQ, other nodes P1 only two leaf nodes Po and yet other nodes P| only a S1ngle leaf node Po- In the latter case the mapP1ng of the parent node becomes an identity mapP1ng thereof, and for the sake of that the logic element then can be left out in the circuits P1. If all circuits P1 comprise only a S1ngle kernel processor Po, the tree which generates the circuit P4 will appear as completely reduced, as P1 = Po and the circuit P4 is transformed into a regular recurS1ve circuit P3 with four levels and 27 kernel processors Poon the level SQ. IS thus the number of levels n+1, a circuit Pn must have at least k""" +1 kernel processors on the level SQ in order not to appear as a completely reduced tree.
A subtree in a maximally asymmetrical reduced tree is shown in fig. 4b, this subtree compriS1ng a node P1 with child nodes P1, such that only one of nodes P1 has two kernel processors or leaf nodes Po, while all of the remaining nodes P1 in the tree only has a S1ngle leaf node PQ, such that P1 in these cases is reduced to PQ. With five levels or responding to the circuit in fig. 1, a circuit P4 configured as in fig. 4b, that is with two leaf nodes in a circuit P| and otherwise only one leaf node in each of the remaining nodes P1, would appear as a maximally asymmetrical reduced tree and the number of nodes on the level SQ in other words becomes 3""+l=28. Correspondingly a subtree for the node PT in a minimally asymmetrical reduced tree is shown in fig. 4c. Here a S1ngle node P1 comprises two leaf nodes or kerne! processors PQ, while all the remaining nodes P| in the tree as before comprise three leaf nodes PQ. This of course implies that a minimally asymmetrical reduced tree corresponding to the circuit on fig. 1 in this case shall comprise a total of 80 kernel processors PQ. A circuit which is generated as the tree shown in fig. 4, will of course still appear as nested on all underlying levels S1, S2, S3, but the mapP1ng will no longer be recurS1ve, as a circuit P1, P1, P3, P4, will be mapped by all circuits which are located on the respective underlying level So, S1, S2, S3. The advantages of generating the circuit P4 as a symmetrically or asymmetrically reduced tree shall be discussed in more detail in the following.

Fig. 5 shows how a recurS1ve self-mapP1ng procesS1ng device Pn according to the invention is generated. The circuit Pn is realized as a tree with n+ I levels and of degree k = 4. Each
of the circuits P1, P2 Pn-i is nested in the respective circuits P1, P2 PnOn the nearest
overlying level S9, s1..... Sn. The number of nested levels Is then of course n and the circuit Pn shall on the zeroth level So defined by the kernel processor Po comprise a total of kn kernel processors of this kind. Generally shall the number of circuits Pn-q on a level Sn-q, where q = 0,1,2,...n, be equal to k".
Fig. 6 shows a nested circuit P2 generated as a recurS1ve tree according to the principle in fig. 5. The circuit P1 comprises the logic unit E and nested on the underlying level S1 four circuits P1 which each In turn comprises the logic unit E and nested on the underlying level four circuits or kernel processors Po. The logic units configure the circuit P2 as a quaternary
tree with three levels So, S1 Sn, and a total of S1xteen kernel processors Poor leaf nodes
on the level So. Correspondingly fig. 7 shows a sequentially generated circuit with four levels Sand k = 4. The circuit In fig. 7 comprises in other words a quaternary configured tree with a total of 64 kernel processors Po on the level So. As before the interfaces I of each circuit Po, P1, P2, P3 are of course identical.
Fig. 8 shows the circuit P2 in fig. 6 generally configured as a quaternary tree, but furthermore such that for instance the circuits P1, that are the nodes on the lev^l S1, are mutually connected with a kernr! prcice-.sor PQ, in the neighbour tree on the level So. Generally will the logic unit E in one of the nodes P1, P2 ... Pn nut uniy diiuw d cuimcttion upwards and downwards in the subtree wherein the node is locdtcd, buL also a v.onnettion to nodes in neighbour trees within one and the same level or a connection to the nearest-lying node on an underlying level In a neighbour tree. For the circuit In fig 6 the node P1 in a subtree hence can be connected to the first leaf node Po in the neighbour tree etc.
Correspondingly Fig. 9 shows connections in the circuit in Fig. 7, which forms a quaternary tree with four levels and a total of 64 kernel processors. Here again a node on a given level can be connected to the node in the neighbour tree on the same level and to the nearest adjacent nodes in the neighbour tree on an arbitrary underlying level.

This implies that the general tree structure can be configured via the logic element into more or less complex graphs. An example is shown in fig, 10, where the circuit PT in fig. 6 or fig. 8 is configured such that P2 in reality forms a one-dimenS1onal graph where the S1xteen kernel pcocessors PQ are . connected sequentially.
Usually the procesS1ng device according to the invention will be realized as a dedicated procesS1ng device for a determined task, for instance image procesS1ng. With baS1s in the general tree structure the configuring may take place task-specifically, either operatively or phyS1cally. If the configuring takes place operatively, the circuit is via the interface supplied with configuring parameters which realize the appropriate network for the procesS1ng. The configuring may then very well imply that the circuit is configured with fewer kernel processors than those which actually are provided in the circuit The procesS1ng device according to the invention may then appear as a symmetrically reduced tree, such this is shown in fig. 3, or an asymmetrically reduced tree, such this is shown in fig. 4. For some concrete data procesS1ng tasks it can namely be required a greater number of kernel processors in one or more subtrees in order to increase the capacity of the procesS1ng device for a specific task.
As mentioned, the procesS1ng device according to the invention can also appear as a phyS1cally reduced tree, whether it is symmetrically or asymmetrically reduced. This implies that the phyS1cal circuit solution actually is realized with a smaller number of kernel processors that which is given by the number of levels S. in the tree and the degree k. This may be appropriate for providing a circuit technology solution which optimally exploit the real estate conditions on the chip. If the number of kernel processors in a dedicated procesS1ng device according to the invention is reduced phyS1cally, this implies for instance a reduction in the phyS1cal S1ze of the chip or a better exploitation of the real estate of the chip, dependent on the actual architectural solution which is selected. The condition is of course that under every circumstance the task which is asS1gned to the procesS1ng device, is executed.
The circuit PQ defines a zeroth unnested level in the procesS1ng device P and constitutes a kernel processor in P.

The kernel processor P0 is generally shown in fig. I la and comprises a combinatorial unit C connected with a memory unit M. The interface Ip in
the kernel processor P0 which is of course is identical with the interface I of the remaining circuits P1, ... ?„ is on the S1de of P0 schematically indicated with arrows denoted "In" and "Out" for respectively the input S1de and the output S1de. The combinatorial unit C receives on the input S1de data in form of operative parameters which comprise the data which shall be processed. In additon the input S1de also supplies data in the form of configuring parameters for the kernel processor PQ. These configuring parameters may conS1st of instructions as well as setting parameters for subelements in the combinatorial unit C and the memory unit M. If the data which shall be processed are in the form of structures with repeated sequences and/or nested patterns and the procesS1ng of the data is specifically directed towards determined sequences and patterns with use of a comparatively limited set of instructions, the memory unit may advantageously wholly or partly be configured as a register unit R, as each register unit may comprise a plurality of registers with respectively fixed length and address, dependent on the operations which shall be executed in the procesS1ng. The output S1de of the interface delivers the results of the procesS1ng, for instance to a not shown external in/out interface for the whole procesS1ng device P„ or delivers the results to other kernel procesS1ng units P0 in the circuits P1 or to the input interfaces in circuit P2, Pj, ••• Pn on the overlying levels S2, S3....Sn
A specific embodiment of the circuit P| for use in a search processor device is shown somewhat schematically, but with all main components in fig. l1b. The interface is shown respectively as an input interface Iin and an output interface lom- The input interface In„ is connected with a multiplexer MUX with outputs respectively to the output interface lout, a comperator unit COM which corresponds to the kernel processor PQ, a first D flip-flop and a sequence control unit SC. The comparator unit COM is used in a search operation for comparing characters or strings of characters which form given patterns. In this manner it can be determined whether a given pattern can be retrieved in a form of one or more elements in a set created of different patterns. The comperator unit COM delivers the received data on to the output interface lom and the result of the comparison to a latency unit LAT which can be used for delaying a poS1tive binary value for a given number of cycles. The latency unit LAT is hence via the D flip-flop D\ connected with

an output on the multiplexer MUX and itself delivers the result of the comparison further to the sequence control unit SC which is adapted respectively to require that the kernel processor connected before or after the present one reports a match before the present kernel processor itself can report a match, to allow a comparison of patterns in case certain parts of the pattern are misS1ng and additonally a comparison of patterns where certain parts of the pattern may be repeated. The sequence control unit SC is on the input S1de connected with the D flip-flop D1 and on the output S1de with the D flip-flop D2, the number of flip-flops being identical to the number of characters which are compared with the comperator unit COM. The sequence control unit SC must be able to receive results from a comparison operation in preceeding or succeeding processor units and is hence connected with the input interface Ijn and the multiplexer MUX and itself reports the results of the comparison operation further via output interface lout- Finally the circuit P1 in this embodiment also comprises a result selector RS which is used for combining two result values from other kernel processors or other result selectors. The result selector RS is thus connected with respectively the input interface Ijn and the output interface loui and can perform Boolean operations on two result values or compare operations of the type greater than, less than, or equal to over a set of processors.
In a specific embodiment a digital procesS1ng device P can be adapted to different types of numeric computation in masS1ve parallelism and the combinatorial unit C and a register unit R will then be configured as arithmetic logic unit ALU. Each kernel processor PQ can additionally comprise a memory unit.M configured as RAM and hence implement an intelligent RAM (IRAM) such that latency problems are reduced or eliminated in a data procesS1ng of this kind.
On each nested level S1, S2, ...Sn there is additionally provided a further circuit in the form of a logic unit E which in itself may be a processor unit conS1sting of a combinatorial unit C and a register unit R. The logic unit E is generally identical for all circuits P1, P1,... Pn. For example, in the circuit P the logic unit E corresponds to all components in fig. 1 lb with the exception of the interface I and the comparator (kernel processor) COM. In a regular binary tree the circuit P1 in fig. 11 b will of course have two kernel processors, that is two connected comparator units COM. Advantageously the circuit E on each level S1, S1, ... Sn then functions as a procesS1ng unit with

the puqjose of configuring the network on the respective level and posS1bly underlying levels where the circuit E is located in each node in the network. By supplying the circuit E which suitable network-configuring parameters, a network on a given level can be configured optimally for^he procesS1ng task which shall be executed. The logic unit E is adapted to be connected with nodes on the same or underlying levels in neighbour trees and can with baS1s in a regular tree structure transform this into more or less complex graphs and networks in order to better handle task-specific data volumes and data rates, for instance in problems which do not lend themselves to a balanced partitioning.
The combinatorial unit in the kernel processor PQ realized as a comparator unit COM and used in a search processor device is shown in fig. 12a. A register x contains the dataword which momentarily is located in the comparator unit COM and the register a comprises the dataword which x shall be compared to. It is from fig. 12a eaS1ly seen that four comparisons can be performed, viz. x = a, x > a, x ?i a or x If the procesS1ng device according to the present invention shall be implemented as a search processor device, an unreduced and balanced binary tree may be used. For each inner node in the tree, that is all nodes with exception of the root node and the leaf nodes, a multiplexer is used for deciding whether two subtrees shall receive the same data elements in parallel and whether they shall receive data sequentially. This corresponds to connecting nodes on two different levels in neighbour trees, as already mentioned.
Fig. 12c shows a search processor device realized as a binary tree with 8 kernel processors, here shown as circuits 0, I, ... 7. In this connection it shall be remarked that when the results are fetched from two circuits or two kernel processors, these will be combined with the use of logic operations. This takes place in the result selector RS. The logic unit E in each circuit comprises as shown in fig. 1 lb a result selector RS which also allows that

only one of the results is selected. The result selectors used in a binary tree will in other words give a S1ngle result from a group of processor units. S1multaneously the search procesS1ng device in the form of a binary tree shall also fetch the results from a comparison of characters or oh.aracter strings executed before or after the comparison of the character or the character strings which for the moment are processed by the kernel processor or the comperator unit COM. By uS1ng a sequence control unit in the logic unit the tree can be configured such that a comparison operation can be performed by chooS1ng the results either to the right or to the left for any of the elements which compare different characters or character strings. When a plurality of comparator units additionally compares the same characters or character strings, will their preceding and succeeding results be the same. These results may be found by the same multiplexer MUX which controls the data stream being used, such that a comparison result is communicated further to any of the neighbour elements which compare the same characters or character strings. This is also realized in the search processor device as shown in fig. 12c. In this connection it shall be remarked that the same multiplexer set now is used for three separate purposes, namely for distributing data, for routing memory bits through the number of flip-flops and to provide or supply the results from comparisons of proceeding or following characters or character strings. The results from different kernel processors can however be combined by uS1ng a look-up table unit which can be connected between the subtrees and the output of the search processor device. The look-up table unit will then replace the result selectors RS shown in fig. 12c.
The neste.d and posS1bly recurS1vely generated procesS1ng device P according to the invention can be used for realizing task-specific procesS1ng devices for procesS1ng large data volumes in masS1ve parallelism. Examples of such tasks may be searching in large data volumes structured in the form of documents, where the search takes place in S1multaneous parallel in all or a large number of documents and can for instance be directed towards retrieval, identification and comparison of patterns. Further the procesS1ng device according to the invention can for instance be configured for image procesS1ng, encryption and decryption as well as procesS1ng based on a recurS1ve procedures generally. Applied as purely numerical processor, the procesS1ng device according to the invention could be adapted for factorisation.

The kernel processor"s P0 can as mentioned be based on standard microelectronic components and these may for instance be implemented as a so-called field programmed gate array (FPGA) or an application specific integrated circuit (AS1C), in both cases with a varying scale of integration. As the procesS1ng device according to the invention can be scaled in several dimenS1ons, for instance with regard to procesS1ng capacity or connectivity, the realization is however not bound to a determined technology or a determined technological level, but may in practice be implemented based on the at any time available technological level, such that the performance of the procesS1ng device can be improved in step with the technological development.
In principle the procesS1ng device P according to the present invention could be realized with arbitrary degree k and an arbitrary number n+I levels S. In each case the number of kernel processors will be given by k". In reality both k and n will be limited by the microelectronic technology which is available for realizing procesS1ng device P and particularly then the available integration scale. Dependent of both technology and the specific procesS1ng tasks which procesS1ng devices P shall be realized for, the chosen values for k an n can be made the object of a balanced judgement, for instance dependent on the need for parallelism and degree of partitioning. In a practical realized implementation the procesS1ng device Pn for instance is realized with k = 8 and n = 3, that is 512 kernel processors ?Q on the level SQ.
By uS1ng standard microelectronics based on a conventional inorganic semiconductor technology, for instance S1licon technology, the connectivit>", that is theinterconnection"of the circuits in a network may offer problems in the form of complexity and long S1gnal paths if the procesS1ng device P is realized integrated in a substantial planar technology on a substrate. In principle the network however is reconfigurable, that is that a network configured for one task may be configured in to another network, for instance with the use of suitable configuring parameters via the logic unit E. For certain tasks within masS1ve parallel procesS1ng it may however be deS1rable wirh a higher degree of connectivity and ideally with total connectivity, that is every node being mutually connected. In this case the degree of connectivity for m nodes scales as m"/2 and shall even for a limited number m, for instance 512 be difficult to realize with use of conventional circuit topologies. A posS1bility for realizing a circuit topology is to stack circuits

based on organic thin-film technology above each other in a volumetric configuration and uS1ng both vertical and horizontal conducting structures for generating connections between the separate nodes or circuits, such that a three-dimenS1onal network topology is obtained which will allow a very high degree of connectivity. Such technologies are disclosed in NO patent application 19982518 (Gudesen, Leistad & Nordal) which is directed towards an integrated scalable data procesS1ng device with MIMD architecture and realized as a volumetrically structured unit, i.a. with optimal network topologies and high degree of connectivity for masS1ve parallel procesS1ng.

"We clalm;
1. A digital processing device P, particularly for processing of digital data and signal
structures, wherein the data and signal structures comprise repeated sequences and/or
nested patterns, and wherein the processing device P generally is configured as a regular
tree with n+ 1 levels So, S1 Sn and of degree K characterized In that the processing
device P is provided in the form of a circuit Pn on the level Sn and forms the root node of the tree, that the nearest level Sn-i is provided nested in the circuit Pn and comprises k circuits Pn-1 which form the child nodes of the root node, that generally an underlying level Sn-q in the circuit Pn, where q £ { 1 ,2..n-l}, comprises kq circuits Pn-q provided nested in the kq-1 circuits Pn-q+1 on the overlying level Sn-q+1 each circuit Pn-q+1 on this level comprising k circuits Pn-q, that a defined zeroth level Sn-q = So In the circuit Pn for q = n
n 1
comprises from k" +1 to kn circuits PQ which constitute kernel processors in the processing device P and on this level So form leaf nodes in the tree, the kernel processor Po being
n 1
provided nested in a number of 1 to k in each of the k circuits P1 on the level S1 that each of the circuits Pi, P2 ...Pn on respective levels S1, S2 ... Sn comprises a logic unit E which generally is connected with those circuits Po, P1 ... Pni on the nsspective nearest
underlying level So, Si Sn -1 provided nested in the respective circuits Pi, P2, ...Pn and
according to choice configures a network of the former circuits in the respective circuits
Pi, P2—Pn, and that each of the circuits Po, Pi Pn has Identical interfaces I.
2. A digital processing device as claimed in claim 1, wherein the zeroth level So comprises KN kernel processors Por that a kernel processor Po recursively maps a circuit P1 on the overlying level with a mapping factor r = k, such that the tree is a unreduced or complete tree, and that generally a circuit Pn-q on the level Sn-q maps recursively a circuit Pn-q+i on the overlying level Sn-q+i with the mapping factor r = k.
3. A digital processing device P as claimed in claim 1, wherein the zeroth level So comprises rkn-1 kernel processors Po, 1
4. A digital processing device P as claimed in claim I, wherein respectively from I to k kernel processors are provided nested in each circuit P1 on the level S1 that at least one of the circuits P1 comprises at least 2 and at most k-l kernel processors Po, such that the total number of kernel processors Po on the level So is at least kn-1+land at the most kn -1 and the tree becomes an asymmetrically reduced or unbalanced tree, and that generally a circuit Pn-q on the level Sn-q is mapped by the circuits Pn-q1i nested in the respective circuit Pn-q.
5. A digital processing device P as claimed In claim 1, wherein the kernel processor Po at least comprises one combinatorial unit C and a memory unit M connected with at least one combinatorial unit C.
6. A digital processing device P as claimed in claim 5, wherein at least a part of the memory unit M is configured as a register unit R.

7. A digital processing device P as claimed in claim 6, wherein at least one combinatorial unit C and a register unit R are configured as an arithmetic logic unit ALU.
8. A digital processing device P as claimed in claim 1, wherein the logic unit E comprises at least one combinatorial unit C and a register unit R connected with the at least one combinatorial unit C.
9. A digital processing device P as claimed In claim 8, wherein at least one combinatorial unit C is a multiplexer.

10. A digital processing device P as claimed in claim 9, wherein the logic unit E in a circuit Pn-q is adapted to be connected with the logic unit E in a corresponding circuit Pn-q on the same level Sn-q in a neighbour tree.
11. A digital processing device P as claimed in claim 9, wherein the logic unit E in a circuit Pn-q, is adapted to be connected with the logic unit E in circuits Pn-q-i, P n-q-2, ... P1 on respective underlying levels Sn-q-1, Sn-q-2,... S1 in a neighbour tree.
12. A digital processing device P as claimed in claim 11, wherein the logic unit E in a circuit Pn-q Is adapted to be connected with one or more kernel processors Po In a neighbour tree, either directly or via the logic unit E in the circuit PI, where the kernel processor PO or the kernel processors Po in question are nested.

13. A digital processing device substantially as herein described with reference to the accompanying drawings.

Documents:

in-pct-2001-0562-che abstract-1.jpg

in-pct-2001-0562-che abstract-2.jpg

in-pct-2001-0562-che abstract.pdf

in-pct-2001-0562-che claims-duplicate.pdf

in-pct-2001-0562-che claims.pdf

in-pct-2001-0562-che correspondence-others.pdf

in-pct-2001-0562-che correspondence-po.pdf

in-pct-2001-0562-che description (complete)-duplicate.pdf

in-pct-2001-0562-che description (complete).pdf

in-pct-2001-0562-che drawings-duplicate.pdf

in-pct-2001-0562-che drawings.pdf

in-pct-2001-0562-che form-1.pdf

in-pct-2001-0562-che form-19.pdf

in-pct-2001-0562-che form-26.pdf

in-pct-2001-0562-che form-3.pdf

in-pct-2001-0562-che form-5.pdf

in-pct-2001-0562-che others.pdf

in-pct-2001-0562-che pct.pdf

in-pct-2001-0562-che petition.pdf

« Previous Patent

Next Patent »

Patent Number

216433

Indian Patent Application Number

IN/PCT/2001/562/CHE

PG Journal Number

13/2008

Publication Date

31-Mar-2008

Grant Date

13-Mar-2008

Date of Filing

20-Apr-2001

Name of Patentee

FAST SEARCH & TRANSFER ASA

Applicant Address

P.O. Box 1677 Vika, N-0120 Oslo,

Inventors:

#	Inventor's Name	Inventor's Address
1	HALAAS, Arne	Overlege Bratts vei 60, N-7026 Trondheim,
2	SVINGEN, Borge	Borkdalen 6, N-7550 Hommelvik,
3	LEISTAD, Geirr, I	Jongsstubben 19, 1337 Sandvika,

PCT International Classification Number

G06F 15/173

PCT International Application Number

PCT/NO1999/000308

PCT International Filing date

1999-10-08

PCT Conventions:

#	PCT Application Number	Date of Convention	Priority Country
1	19984746	1998-10-09	Norway