next up previous
Next: Tri-state buffer based technique Up: Reducing Spurious Switching Activity Previous: Reducing Spurious Switching Activity

Signal gating with filler values

If one part of a circuit is not doing anything useful in a given cycle, ideally its inputs should be frozen, i.e., remain unchanged. Various options exist for ensuring this. Transparent latches were proposed in [36]. However, often the power savings obtained may not justify the large overhead introduced by multi-bit latches. Another option may be to set a input to tri-state when it is not active. Unfortunately, the tri-state gate output will usually drift to some mid-way voltage value if the gate is not refreshed in a short time. Such a mid-way value may cause large power consumption in downstream DPUs. Tri-state gating is justified only in very limited cases, which will be addressed later. In [61], it was mentioned that freezing the inputs to zero (AND gated) or one (OR gated) also reduces SSA. To make full use of data correlations inside the datapath, we propose to freeze the inputs to a fixed (hardwired) value. We call this value the filler value f. For the RTL circuit in Fig. 4(b), Fig. 9 gives the names of the variables whose values are propagated from the multiplier to register Reg2 after gating in one iteration of the CDFG, under four different scenarios. Z refers to tri-state (high-impedance), X implies that the value is not yet stable or visible at the RTL, y1' is the value of variable y1 in the previous iteration and `-' implies that the value remains unchanged. It is obvious that the SSA will be significantly suppressed by gating.
Figure 9: Cycle-by-cycle value from the gated multiplier output to register Reg2 in Fig. 4(b).
\begin{figure}
\centering\epsfig{file=diffeq-cycle.eps,height=1.9in}
\end{figure}

\begin{algorithm}
% latex2html id marker 247
\renewedcommand{baselinestretch}{...
...^n = 0$;} \ENDIF \ENDFOR \STATE{return $f$};
\end{algorithmic}
\end{algorithm}
Algorithm 1 contains the pseudo-code for computing the filler value to minimize interconnect switching activity. Suppose the set of variables sent by DPU $D_x$ (Sending_DPU) to all the DPUs in its output network is $V$. Let $V'$ denote the subset of $V$ which contains all the variables used by an input port $P_y$ of another DPU (Receiving_DPU_port). Variables in $V'$ are called desired variables for $P_y$. For the RTL circuit in Fig. 4(b), $V$ for the multiplier's output network is {t1,t2,t3,t4,y5,y1} and its $V'$ for register Reg2 is {t3,y1}. For variables $v\in V'$ and $w\in V-V'$, we obtain the probability $P(w,v)$ that values of $w$ and $v$ will be output consecutively, irrespective of their order. $P(w,v)$ can be computed by simulating the RTL circuit. Moreover, this simulation can also yield the probability for the $n$th bit of $v$ to be 1 and 0, respectively, when $v$ is output right before or after any variable from $V-V'$. Let us denote these probabilities by $P^n(v,1)$ and $P^n(v,0)$, respectively. Note each value of $v$ will have a lifetime of consecutive cycles. That is, $v$ will hold the same value for these cycles. These consecutive cycles are counted as one single occurrence of the corresponding value when calculating the probabilities, because only transitions from $w$ to $v$ will consume dynamic power. Since $w$ is not used by $P_y$ and will be replaced by the filler value $f$, we need to decide what $f$ should be in order to minimize the switching activity. For the $n$th bit of $f$, say $f^n$, transitions take place when it is different from the values of the desired variables which are output right before or after it. The probabilities of transition when $f^n$ is $0$ and $1$, $P^n_f(0)$ and $P^n_f(1)$, are

\begin{eqnarray*}
P^n_f(0)=\sum_{w\in V-V'}{\sum_{v\in V'}{P(w,v)P^n(v,1)}}\\
P^n_f(1)=\sum_{w\in V-V'}{\sum_{v\in V'}{P(w,v)P^n(v,0)}}
\end{eqnarray*}



If $P^n_f(0)>P^n_f(1)$, $f^n$ is set to 1, othervise 0. Thus, we can statistically minimize the bit transition activity in the output network from DPU $D_x$ to port $P_y$ by introducing the filler value. Since the switched capacitance is highly dependent on switching activity in the physically neighboring wires (due to coupling), we cannot say that the filler value thus chosen will yield minimal spurious switched capacitance in the wires. However, it has been found to reduce spurious switched capacitance significantly. When the data are totally random, $i.e.$, $P^n(v,1)=P^n(v,0)=0.5$, the optimal filler value can be either 1 or 0, which reduces to the method in [61]. The overhead for setting a input to a fixed value is very low compared to the overhead for latches. One AND gate is enough for 0 and one OR gate for 1.
next up previous
Next: Tri-state buffer based technique Up: Reducing Spurious Switching Activity Previous: Reducing Spurious Switching Activity
Lin Zhong 2003-10-11