XII Reunio´n de Trabajo en Procesamiento de la Informacio´n y Control, 16 al 18 de octubre de 2007
An integrated circuit realization for a piecewise linear function
Mart´ın Di Federico, V´ıctor M. Jime´nez-Ferna´ndez, Pedro Marcelo Julia´n and Osvaldo Agamenoni† Luis Herna´ndez-Mat´ınez and Arturo Sarmiento-Reyes‡
†Universidad Nacional del Sur, Bah´ıa Blanca, Argentina mdife@uns.edu.ar
‡Instituto Nacional de Astrof´ısica, O´ ptica y Electro´nica, Me´xico. luish@inaoep.mx
Abstract— In this paper we present an integrated circuit (IC) realization for a three dimensional piecewise linear (PWL) function. The IC is designed and fabricated in a standard CMOS 0.5 µm technology. It includes three analog or 8 bit-coded inputs. The output of the circuit is a digital word with 8-bit precision which represents the value of the PWL function at the three-dimensional input. Programmability is considered in the chip architecture. The PWL function is programmed in an external 4kB RAM memory addressed by a 12-bit word.
Keywords— Piecewise linear function, integrated circuit realization, circuit architecture
I Introduction
In recent papers [1], [2] two different (analog and mixedsignal,respectively) implementations of the PWL approximation technique proposed by Julia´n et al. has been proposed. In particular, the circuit architecture proposed in [2] provides a piecewise-linear inputs-output relationship based on a weighted sum of the so-called α-functions which are defined over a domain partitioned by simplices. Each α-function is of a local nature, since it is different from “0” only over a reduced number of simplices of the domain. As a consequence, the value of the approximate PWL function can be obtained, for any n-dimensional input vector x, by combining a limited subset of the basis functions weighted by their corresponding coefficients [3], [4]. Then all basis functions perform basically the same operation and the difference between two basis functions is that they operate over two different regions of the domain. Therefore, the evaluation can be done using only one function circuit block and an algorithm to shift the inputs [5], [6], [7]. For every evaluation point, all nonzero basis functions need to be evaluated, weighted and added. This principle has been considered in the architecture of the IC presented in this paper.
II Mathematical background
Let us consider a domain S subdivided with a simplicial partition H using a grid step δ.
It produces a set of vertices
Vs = {v ∈ Rn : vi = −1 + mi × δ, i = 1, · · · , n}
(1)
where 0 ≤ mi ≤ m, and m × δ = 2. The grid step δ is the size of the division on every coordinate axis.
Also, let us consider a family of PWL functions defined
over the simplicial partition. It constitutes a linear vector
space P W LH (S) whose dimension is q = (m + 1)n.
Any function F ∈ P W LH [S] can be expressed in vecto-
rial form as
F (x) = cT Λ(x)
(2)
where c ∈ Rq is the so called vector of parameters, Λ = [α1, · · · , αq], and αi, for i = 1, · · · , q, is a PWL basis function. In the formulation under consideration, each basis function αi ∈ P W LH[S] is defined as
αi(vj) =
1, if i = j 0, if i = j
(3)
where vj ∈ VS, for every i = 1, · · · , q. It has been shown in [5], that any point x ∈ S can be
decomposed as
r
x=
µil vil
(4)
l=1
where the terms in the expansion satisfy 0 ≤ µil ≤ 1,
vilr
il
∈ VS , µil =
for 1.
every
l
=
1, · · · , r,
with
r
=
n
+
1,
and
Also in [5], it has been proved, that any function of the
basis given by (3) satisfies
αi(x) = µil , for l = 1, · · · , r αp(x) = 0, for p = {i1, · · · , ir}
(5)
where x ∈ S is a point in the form of (4). The evaluation of function (2) at a point (4) gives a result
q
r
F (x) = ciαi
µil vil
(6)
i=1
l=1
As F is linear inside simplex Si, then F (x) =
q i=1
ci
r l=1
µil
αi(vil
),
and
after
exchanging
the
sumation terms, we have
r
q
F (x) = µil ciαi(v)il
(7)
l=1 i=1
XII Reunio´n de Trabajo en Procesamiento de la Informacio´n y Control, 16 al 18 de octubre de 2007
Finally, if we consider the relation given in (3), then (7)
reduces to
r
F (x) = cil µil
(8)
l=1
From this equation, we observe that in order to calculate the value of F (x) at the input x, we need to determine the a weighted sum that involves the parameters ci) and µil, for l = 1, · · · , r.
III Chip architecture
The architecture of the IC presented in this paper, represents the circuit implementation of eq.(8). Fig.1 is a block diagram which describes a general scheme for implementing eq.(8).
Vertex selector
x
Internal vertex location
ciµi-adder
ci-Selector
Adder
F(x)
µ
-generator
i
Comparator
Ramp
Figure 1: Block diagram architecture for implementing eq.(8)
In references [5] and [6], it was reported a proposal for obtaining the µ parameters. Such proposal consists in a set of comparators which comparate the input signals (x vector), with a ramp. This idea has been considered in the µi-generator block of Fig.1. Notice that input x has been decomposed in two subsets, any of them considers all the fuction domain and it selects a specific vertex, the other one, indicates a location inside of the selected vertex. It is important to point out that the µici-adder performs a µi times addition of ci. Finally, the output block F (x) indicates the value of the fuction F (·) at the input x.
A Description
In Fig.2 is shown the circuit implementation for the scheme of Fig.1. The output of the IC is a digital word with 8-bit precision. In the present version of the IC, the memory was left outside. There are two alternatives to load the input values into the chip. The first alternative is by presenting three analog values at three input pins. There are three comparators which compare the input signals with an analog ramp and latch the conversion. The second alternative is to load directly the digital values serially. In both cases, the inputs are stored in 8-bit registers. The four most significant bits of the inputs are used to select the simplex the input belongs to and the four less significant bits indicate the input position inside the simplex. The weighting coefficients ck are kept in the external memory, which is addressed with a 12-bit word.
The value of the PWL function F (x) at each input x is the weighted sum of n + 1 = 4 parameter values. The four addresses to the memory positions where the coefficients cj (j ∈ z) are stored are obtained by comparing the values of a digital ramp with the four less significant bits (LSB) of the 8-bit registers. This ramp is implemented with a 4-bit digital counter. Each 12-bit address is obtained by juxtaposing n = 3 4 bit strings. The i-th string is equal to the four most significant bits (MSB) of the i-th register if the counter count is greater than the four LSB of the register; otherwise, the i-th string is the value of the four MSB of the register plus one. The comparison between the counter and each register is done using a digital comparator. Each address is calculated by a block called Address Generator, and the weighted sum is done with a 12-bit adder. The weighted sum j∈ z cjµj is obtained for free, since the memory position of c j is addressed by the Address Generator for a time proportional to µj. Then, it is sufficient the 12-bit adder to perform the whole weighted sum. B Architecture The IC has an analog block and a digital block; both are powered up from different sources to allow them working and being tested separately as shown in Fig.2 The analog block consists of three A/D converters, based on an external ramp and an OTA comparator. The analog ramp must be synchronized with the internal counter. The comparator output is used to latch the value of the input so that the A/D conversion is performed at the same time in the three input channels. The comparator outputs are connected to output pads and the Latch signals are connected to input pads. Therefore, an external signal can be used to latch the values in the registers. As was mentioned before, this alternative was used to obtain the experimental results of the IC.
Figure 2: Chip Architecture.
As shown in Fig.2, the digital Block consist of a control block, a 8-bit counter, a 12-bit adder and three sets of 8-Bit Register, Address generator and Comparator. Ap-
XII Reunio´n de Trabajo en Procesamiento de la Informacio´n y Control, 16 al 18 de octubre de 2007
Table I
State EP PROC
Nothing 1
0
Converting 0 0
Processing 0 1
propriately sized buffers were designed to drive the clock and clear lines.
B 1 Control Block
In order to perform the A/D conversion and the function evaluation, the chip has three different states called Nothing, Converting and Processing, which are coded with two registers in the Control block. In the Nothing state, the I/O bus works as an output bus and shows the value of the function calculated previously. When the input (say SP) is “1”, the state machine (FSM) goes to the Converting state, to make the A/D conversion. The FSM stays in this state 256 clock cycles and after that, it goes into the Processing state. While the chip is making the A/D conversion, the signal to latch the value of the counter in the register is generated by the OTA. In the next state (Processing) it should be ensured that the signals to latch do not change, because the register would latch the new counter value. In order to avoid this, a multiplexer was placed in the input of the register which connects the output of the OTA in the Converting state, and sets a “1” in the latch signal in the Processing state. In the Processing state the I/O bus works as an input bus connected to the external RAM. In this state the chip performs the 16 additions reading the PWL parameter values from the external memory.
The two control signals EP (End of Processing) and PROC (Processing) provided by the FSM in each state are summarized in Table I.
B 2 12-Bit Adder
In order to produce the weighted sum, necessary to obtain the value of F (x), the adder adds the sixteen values from the memory and divide it by sixteen. In order to add 16 values of 8 bits, a 12-bit adder is needed; the divideby-16 operation is easily done by taking only the 8 most significant bits. The 12-bit adder has 8 inputs, so that the 4 most significant bits are connected to “0”. The adder circuit is comprised of two modules, one calculates the carry, and another calculates the value of the sum.
B 3 8-Bit Counter
The 8Bit Counter is used for two different functions: To perform the A/D conversion and also to perform the addition of the 16 sums of the values of the memory parameters (ci). This block has a modular Structure and work with a two-phase clock.
B 4 8-Bit Register
Each register is Master-Slave with a two-phase clock, where the Master reads input data with a logic “1” in
phase one and locks the data with a logic “0”. The slave works in a similar fashion but with the second phase. B 5 Comparator It compares the less significant bits of each input register with the digital ramp. Since the ramp is implemented with the 8-bit Counter the 4 LSB of the counter are connected to the comparator and also the 4 LSB of the input register.
R0-R 3 D 0-D 3
Comp.
Figure 3: Comparator.
B 6 Address Generator The Address generator determines the address memory where the (ci) parameter can be found. The inputs of this circuit are the 4 MSB of the input register and the comparator outputs. If a comparator output is 0,then the corresponding address generator output is directly the four more significant bits of that specific input register. If a comparator output is 1, then the corresponding address generator output is given by the consecutive address memory.
IN
D0-D 3
Figure 4: Address generator.
S0-S 3
C Numerical example
In order to clarify the mathematical background exposed in section II and the IC performance explained in the
XII Reunio´n de Trabajo en Procesamiento de la Informacio´n y Control, 16 al 18 de octubre de 2007
previous section, let us consider a hypothetical two-
dimensional example. Suppose the continuous PWL funcion F (x1, x2) defined over a simplicial partition with m1 = m2 = 2 and a unitary grid step (δ = 1), as it is depicted in Fig.5.
the integer and fraction part of the number, respectively and Bn is the 2n-bit for n ∈ {3, 2, 1, 0, −1, −2, −3, −4}. The digital numerical code for x is given by
0001.1000 0001.1100
= .1000
0010 0010
+
.0100
0001 0010
+
.0100
0001 0001
The BMSB and BLSB are in fact, the more and less significant bits of the 8-bits input register.
Notice that the decomposed representation of the point x = (1.5, 1.75) can also be rewritten as
1.5 1.75
= 0.5
1 1
+
1 1
+
0.25
1 1
+
0 1
+
0.25
1 1
+
0 0
Figure 5: A two-dimensional PWL function.
The value of the PWL funcion, ci = F (x1, x2) at the vertex points, is collected and stored into the RAM memory as it is summarized in Table1.
where [1 1]T is a simplex selector term and it corresponds with the 4-bits more significant of the input register.
In the digital format it is given by
i Vertex Memory Dir. ci = F (x1, x2)
0 (0, 0) 00000000
0
1 (0, 1) 00000001
2
2 (0, 2) 00000010
1
3 (1, 0) 00010000
0
4 (2, 0) 00100000
0
5 (1, 1) 00010001
1
6 (1, 2) 00010010
2
7 (2, 1) 00100001
2
8 (2, 2) 00100010
1
Table 1: ci = F (x1, x2) values.
The evaluation of an arbitrary point, for instance, the point x = (1.5, 1.75) at the function F (x1, x2) is obtained as follows: As a first step, the point x is decomposed by
1.5 1.75
= 0.5
2 2
+ 0.25
1 2
+ 0.25
1 1
Now, let us introduce the following notation:
0001.1000 0001.1100
= .1000
0001 0001
+
0001 0001
+
.0100
0001 0001
+
0000 0001
+
.0100
0001 0001
+
0000 0000
In accordance with reference [6], from a purely mathematical point of view, the µil parameters are computed as
µi3 = 0.5 µi2 = 0.25 µi1 = 1 − (µi2 + µi3 ) = 1 − (0.75) = 0.25
From a circuit point of view, the µil values indicate the times that a parameter must be added itself. Fig.6 shows the two comparator outputs for our example. Notice that µi3 takes 8 ramp cycles and µi2 = µi1, 4 cycles.
Finally, according with equation (8), F (x) is computed
by a weighted sum of the µil cil product terms, where cil indicates the value of F (x) at the il-th vertex. After substituting the value of F (x) from Table1, at the vertices [2 2]T , [1 2]T , and [1 1]T , it results
Xb = [BMSB .BLSB] = [B3B2B1B0.B−1B−2B−3B−4] F (x1, x2) = 0.5(1) + 0.25(2) + 0.25(1) = 1.25
where Xb indicates a 8 bits number decomposed in two In a digital format the evaluation of F (·) is obtained sections separated by a point. BMSB and BLSB indicate directly form the 12-bit adder output. It performs a
XII Reunio´n de Trabajo en Procesamiento de la Informacio´n y Control, 16 al 18 de octubre de 2007
0000 1000 1100 1111
x2 LSB x1 LSB
Ramp
Comparator
IV Layouts for the digital sections of the IC
In this section we present the layouts for the digital sections involved into the IC architecture. The layouts were designed aided by the Tanner EDA Tools software. The IC was integrated in a n-well non-silicided CMOS process of 0.5µm. This process has 3 metal layers and 2 poly layers. All the transistors of the digital part are minimum size, being the PMOS of 3µm × 0.6µm and the NMOS of 1.8µm × 0.6µm. Fig.7 show the comparator layout. The size of this block is 114µm × 57µm. The 1-bit-adder block of the 12-bit-adder is shown in Fig.9. The selector layout of a 30µm × 114µm size is shown in Fig.8.
µ i3
µ i2
µi1
Figure 6: µ parameters.
µil times sum of the ci values indicated by the address generator. The memory directions to obtain the ci values are: DIR[00100010], DIR[00010010], and DIR[00010001]. For our example, the value of the PWL function is given by
F (x1, x2) = 000000000001 + 000000000001 + 000000000001 + 000000000001 + 000000000001 + 000000000001 + 000000000001 + 000000000001 + 000000000010 + 000000000010 + 000000000010 + 000000000010 + 000000000001 + 000000000001 + 000000000001 + 000000000001
= [000000010100]
As the adder result is scaled, then it must be divided by 16 in order to obtain the final result. Such result is obtained by considering the 8 more significant bits of the adder as the integer part of the final result and the 4 less significant as the fraction part. The evaluation of F (·) at the input (x1, x2) in a digital format is given by
F (0001.1000, 0001.1100) = 00000001.0100
Figure 7: Comparator layout. Figure 8: Selector layout.
XII Reunio´n de Trabajo en Procesamiento de la Informacio´n y Control, 16 al 18 de octubre de 2007
[2] M. Parodi, M. Storace, and P. Julia´n, “Synthesis of multiport resistors with piecewise-linear characteristics: a mixed-signal architecture,” International Journal of Circuit Theory and Applications, VOL.33, no. 4, pp. 307–319, Jul.-Aug. 2005.
Figure 9: Adder layout.
[3] P. Julia´n, A. Desages, and B. D’Amico, “ Orthonormal High-Level Canonical PWL Functions with Applications to Model Reduction,” IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, VOL.47, pp. 702-712, May 2000.
[4] P. Julia´n and O. Agamennoni, “High-Level Canonical Piecewise Linear Representation Using a Simplicial Partition,” IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, VOL.46, pp. 463-480, April 1999.
[5] P. Julia´n, R. Dogaru, and L. Chua, “A PiecewiseLinear Simplicial Coupling Cell for CNN Gray-Level Image Processing,” IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, VOL.49, pp. 904-913, ¡July 2002.
Figure 10: The IC. The main blocks of the circuit are evidenced.
V Conclusions
It has been shown the implementation of a simplicial PWL function evaluator. The proposed IC allows to evaluate with good accuracy a three dimensional PWL function. The block diagram and a detailed explanation of the chip operation is described. The mathematical background is presented and also a simple two-dimensional numerical example lets understanding the chip operation.
[6] P. Mandolesi, P. Julia´n, and A. Andreou, “ A scalable and Programmable Simplicial CNN Digital Pixel Processor Architecture,” IEEE Transactions on Circuits and Systems-I: Regular papers, VOL.51, pp. 988996, May 2004.
[7] M. Di Federico, P. Julia´n, T. Poggi, and M. Storace, “ A Simplicial PWL Integrated Circuit Realization, accepted in” IEEE International Symposium on Circuits and Systems ISCAS-2007, New Orleans, U.S.A., May 2007.
VI Aknowledgment
Ph.D. V´ıctor M. Jime´nez Ferna´ndez is grateful for the partial economical support that he received by the National Institute for Astrophysics, Optics and Electronics in the Post.Ph.D. visitor position at the Universidad Nacional del Sur, Bah´ıa Blanca, Argentina. The authors would thank Poggi Tomaso for his help in the chip testing. Also, authors are grateful to “Fundacio´n Universidad Nacional del Sur”’ for the support given by the PICT 2003 No.13468..
REFERENCES
[1] M. Storace and M. Parodi, “Towards analog implementations of PWL two-dimensional non-linear functions,” International Journal of Circuit Theory and Applications, vol. 33, no. 2, pp. 147-160, Mar.-Apr. 2005.
Ver+/-