IEEE 754 floating point standard is the most common representation today for real numbers on computers. The IEEE (Institute Of Electrical And Electronics Engineers) has produced a standard to define floating–point representation and arithmetic. Although there are other representation used for floating point numbers. The standard brought out by the IEEE come to be known as IEEE 754. It is interesting to note that the string of significant digits is technically termed the mantissa of the number, while the scale factor is appropriately called the exponent of the number. The general form of the representation is the following (-1)S*M* 2^E. Where S represents the sign bit, M represents the mantissa and E represents the exponent. When it comes to their precision and width in bits, the standard defines two groups: base and extended format. The basic format is further divided into Single –Precision format with 32-bits wide, and double-precision format with 64-bits wide. The three basic components are the sign, exponent, and mantissa.
IEEE 754 Floating Point Standard:
The IEEE 754 is a floating-point standard established by the IEEE in 1985. It contains two representations for floating-point numbers—the IEEE single precision format and the IEEE double precision format. The IEEE 754 single precision representation uses 32 bits, and the double precision system uses 64 bits. Although 2’s complement representations are very common for negative numbers, the IEEE floating-point representations do not use 2’s complement for either the fraction or the exponent. The designers of IEEE 754 desired a format that was easy to sort and hence adopted a sign-magnitude system for the fractional part and a biased notation for the exponent.
The IEEE 754 floating-point formats need three sub-fields: sign, fraction, and exponent. The fractional part of the number is represented using a sign-magnitude representation in the IEEE floating-point formats—that is, there is an explicit sign bit (S) for the fraction. The sign is 0 for positive numbers and 1 for negative numbers. In a binary normalized scientific notation, the leading bit before the binary point is always 1. Therefore, the designers of the IEEE format decided to make it implied, representing only the bits after the binary point. In general, the number is of the form
N 5 (21)S 3 (1 1 F ) 3 2E
where S is the sign bit
F is the fractional part
E is the exponent.
The base of the exponent is 2. The base is implied—that is, it is not stored anywhere in the representation. The magnitude of the number is 1 1 F because of the omitted leading 1. The term significand means the magnitude of the fraction and is 1 1 F in the IEEE format. But often the terms significand and fraction are used interchangeably by many and are so used in this book. The exponent in the IEEE floating-point formats uses what is known as a biased notation. A biased representation is one in which every number is represented by the number plus a certain bias. In the IEEE single precision format, the bias is 127. Hence, if the exponent is 11, it will be represented by 11 1 127 5 128. If the exponent is 22, it will be represented by 22 1 127 5 125. Thus, exponents less than 127 indicate actual negative exponents, and exponents greater than 127 indicate actual positive exponents. The bias is 1023 in the double precision format. If a positive exponent becomes too large to fit in the exponent field, the situation is called overflow, and if a negative exponent is too large to fit in the exponent field, that situation is called underflow.
1.2 Single Precision floating point Numbers:
The IEEE single precision format uses 32 bits for representing a floating-point number, divided into three subfields, as illustrated in Figure 7-1. The first field is the sign bit for the fractional part. The next field consists of 8 bits, which are used for the exponent. The third field consists of the remaining 23 bits and is used for the fractional part.
Fig.1.1: Single-precision floating point representation.
The sign bit reflects the sign of the fraction. It is 0 for positive numbers and 1 for negative numbers. In order to represent a number in the IEEE single precision format, first it should be converted to a normalized scientific notation with exactly one bit before the binary point, simultaneously adjusting the exponent value.
The exponent representation that goes into the second field of the IEEE 754 representation is obtained by adding 127 to the actual exponent of the number when it is represented in the normalized form. Exponents in the range 1–254 are used for representing normalized floating-point numbers. Exponent values 0 and 255 are reserved for special cases, which will be discussed subsequently.
The representation for the 23-bit fraction is obtained from the normalized scientific notation by dropping the leading 1. Zero cannot be represented in this fashion; hence, it is treated as a special case (explained later). Since every number in the normalized scientific notation will have a leading 1, this leading 1 can be dropped so that one more bit can be packed into the significand (fraction). Thus, a 24-bit fraction can be represented using the 23 bits in the representation. The designers of the IEEE formats wanted to make the highest use of all the bits in the exponent and fraction fields.
1.3 Double Precision floating point Numbers:
The IEEE double precision format uses 64 bits for representing a floating-point number, as illustrated in Figure 7-3. The first bit is the sign bit for the fractional part. The next 11 bits are used for the exponent, and the remaining 52 bits are used for the fractional part.
Fig.1.2: Double-precision floating point representation
As in the single precision format, the sign bit is 0 for positive numbers and 1 for
negative numbers. The exponent representation used in the second field is obtained by adding the bias value of 1023 to the actual exponent of the number in the normalized form. Exponents in the range 1–2046 are used for representing normalized floating-point
numbers. Exponent values 0 and 2047 are reserved for special cases. The representation for the 52-bit fraction is obtained from the normalized scientific notation by dropping the leading 1 and considering only the next 52 bits.
Addanki Puma Ramesh, A. V. N. Tilak, A.M.Prasad 1 the double precision floating point multiplier supports the LEEE-754 binary interchange format. The design achieved the increased operating frequency. The implemented design is verified with single precision floating point multiplier and Xilinx core, it provides high speed and supports double precision, which gives more accuracy compared to single precession. This design handles the overflow, underflow, and truncation rounding mode resp.
Itagi Mahi P and S. S. Kerur 2 In this paper a pipelined Floating point Arithmetic unit has been designed to perform five arithmetic operations, addition, subtraction, multiplication, division and square root, on floating point numbers. IEEE 754 standard based floating point representation has been used. The unit has been coded in VHDL. The same arithmetic operations have also been simulated in Xilinx IP Core Generator.
Remadevi R 3 This paper presents design and simulation of a floating point multiplier that supports the IEEE 754-2008 binary interchange format, the proposed multiplier does not implement rounding and presents the significant multiplication result. It focuses only on single precision normalized binary interchange format. It handles the overflow and underflow cases. Rounding is not implemented to give more precision when using the multiplier in a Multiply and Accumulate (MAC) unit.
Rakesh Babu, R. Saikiran and Sivanantham S 4 In this implementation exceptions like infinity, zero, overflow are considered. In this implementation rounding methods like round to zero, round to positive infinity, round to negative infinity, round to even are considered. To analyse the working of our designed multiplier we designed a MAC unit and is tested.
Reshma Cherian, Nisha Thomas, Y.Shyju 5 Implementation of Binary to Floating Point Converter using HDL. Implemented a binary to floating point converter which is based on IEEE 754 single precision format. The unit had been designed to perform the conversion of binary inputs to IEEE 754 32 bit format, which will be given as inputs to the floating point adder/sub block.
Sunil Kumar Mishra, Vishakha Nandanwar, Eskinder Anteneh Ayele, S.B. Dhok6 FPGA Implementation of Single Precision Floating Point Multiplier using High Speed Compressors. For Mantissa calculation, a 24×24 bit multiplier has been developed by using these compressors. Owing to these high speed compressors, the proposed multiplier obtains a maximum frequency. The results obtained using the proposed algorithm and implementation is better not only in terms of speed but also in terms of hardware used.
Gargi S. Rewatkar 7 Implementation of Double Precision Floating Point Multiplier in VHDL. Implemented double precision floating point multiplier in VHDL. Double precision floating point multiplier implemented in VHDL may be used applications such as digital signal processors, general purpose processors and controllers and hardware accelerators.
Standard Algorithms and Architectures
Multiplication is basically a shift add operation. There are, however, many variations on how to do it. Some are more suitable for FPGA use than others, some of them may be efficient for a system like CPU. This section explores various verities and attracting features of multiplication hardware.
3.1 Scaling Accumulator Multipliers:
A Scaling accumulator multiplier performs multiplication using an iterative shift-add routine. One input is presented in bit parallel form while the other is in bit serial form. Each bit in the serial input multiplies the parallel input by either 0 or 1. The parallel input is held constant while each bit of the serial input is presented. Note that the one bit multiplication either passes the parallel input unchanged or substitutes zero. The result from each bit is added to an accumulated sum. That sum is shifted one bit before the result of the next bit multiplication is added to it.
Parallel by serial algorithm.
Iterative shift add routine.
N clock cycles to complete.
Very compact design.
Serial input can be MSB or LSB first depending on direction of shift in accumulator.
Fig.3.1.Scaling Multiplication Accumulation Operation.
3.2 Serial by Parallel Booth Multipliers:
The simple serial by parallel booth multiplier is particularly well suited for bit serial processors implemented in FPGAs without carry chains because all of its routing is to nearest neighbors with the exception of the input. The serial input must be sign extended to a length equal to the sum of the lengths of the serial input and parallel input to avoid overflow, which means this multiplier takes more clocks to complete than the scaling accumulator version. This is the structure used in the venerable TTL serial by parallel multiplier.
Well suited for FPGAs without fast carry logic.
Serial input LSB first.
Routing is all nearest neighbor except serial input which is broadcast.
Latency is one bit time.
Fig.3.2. Parallel Booth Multiplers Operation
3.3 Ripple Carry Array Multipliers:
A ripple carry array multiplier (also called row ripple form) is an unrolled embodiment of the classic shift-add multiplication algorithm. The illustration shows the adder structure used to combine all the bit products in a 4×4 multiplier. The bit products are the logical and of the bits from each input. They are shown in the form x, y in the drawing. The maximum delay is the path from either LSB input to the MSB of the product, and is the same (ignoring routing delays) regardless of the path taken. The delay is approximately 2*n. This basic structure is simple to implement in FPGAs, but does not make efficient use of the logic in many FPGAs, and is therefore larger and slower than other implementations.
Row ripple form.
Unrolled shift-add algorithm.
Delay is proportional to N.
Fig.3.3 Ripple Carry Array Multiplier
3.4 Carry Save Array Multipliers:
Column ripple form.
Fundamentally same delay and gate count as row ripple form.
Gate level speed ups available for ASICs.
Ripple adder can be replaced with faster carry tree adder.
Regular routing pattern.
Fig.3.4 Carry Save Array Multiplier
3.5 Wallace Tree Multipliers:
A Wallace tree is an implementation of an adder tree designed for minimum propagation delay. Rather than completely adding the partial products in pairs like the ripple adder tree does, the Wallace tree sums up all the bits of the same weights in a merged tree. Usually full adders are used, so that 3 equally weighted bits are combined to produce two bits: one (the carry) with weight of n+1 and the other (the sum) with weight n. Each layer of the tree therefore reduces the number of vectors by a factor of 3:2 (Another popular scheme obtains a 4:2 reduction using a different adder style that adds little delay in an ASIC implementation). The tree has as many layers as is necessary to reduce the number of vectors to two (a carry and a sum). A conventional adder is used to combine these to obtain the final product. The structure of the tree is shown below. For a multiplier, this tree is pruned because the input partial products are shifted by varying amounts. A Wallace tree multiplier is one that uses a Wallace tree to combine the partial products from a field of 1x n multipliers (made of AND gates). It turns out that the number of Carry Save Adders in a Wallace tree multiplier is exactly the same as used in the carry save version of the array multiplier. The Wallace tree rearranges the wiring however, so that the partial product bits with the longest delays are wired closer to the root of the tree. This changes the delay characteristic from o(n*n) to o(n*log(n)) at no gate cost. Unfortunately the nice regular routing of the array multiplier is also replaced with a ratsnest.
Fig.3.5 Wallace Tree Multiplier
Optimized column adder tree.
Combines all partial products into 2 vectors (carry and sum).
Carry and sum outputs combined using a conventional adder.
Delay is log(n).
Wallace tree multiplier uses Wallace tree to combine 1 x n partial products.
3.6 Booth Recoding:
Booth recoding is a method of reducing the number of partial products to be summed. Booth observed that when strings of ‘1’ bits occur in the multiplicand the number of partial products can be reduced by using subtraction. For example the multiplication of 89 by 15 shown below has four 1xn partial products that must be summed. This is equivalent to the subtraction shown in the right panel.
Fig.3.6 Booth Recoding Operation
Floating Point Multiplier
Given two floating-point numbers, F_1 X2^(E_1 ) and F_2 X2^(E_2 ), the product is
(F_1 ?X2?^(E_1 ))X(F_2 X2^(E_2 ))=(F_1 XF_2)X(2^(E_1+E_2 ))=FX2^E
The fraction part of the product is the product of the fractions, and the exponent part of the product is the sum of the exponents. Hence, a floating-point multiplier consists of two major components:
1. A fraction multiplier
2. An exponent adder.
The details of floating-point multiplication depend on the precise formats in which the fraction multiplication and exponent addition are performed.
Fraction multiplication can be done in many ways. If the IEEE format is used, multiplication of the magnitude can be done and then the signs can be adjusted. If 2’s complement fractions are used, one can use a fraction multiplier that handles signed 2’s complement numbers directly.
Addition of the exponents can be done with a binary adder. If the IEEE formats are directly used, the representations have to be carefully adjusted in order to obtain the correct result. For instance, if exponents of two floating-point numbers in the biased format are added, the sum contains twice the bias value. To get the correct exponent, the bias value must be subtracted from the sum.
The 2’s complement system has several interesting properties for performing arithmetic. Hence, many floating-point arithmetic units convert the IEEE notation to 2’s complement and then use the 2’s complement internally for carrying out the floating-point operations. Then the final result is converted back to IEEE standard notation.
The Complete Flowchart For Floating Point Multiplication is as Shown in the Figure below:
Fig.4.1 Floating Point Multiplier Block Diagram.
4.1 Number Representation using Single Precision Format:
let us represent 13.45 in the IEEE floating-point format. One can see that 0.45 is a recurring binary fraction and hence,
13.45 = 1101.01 1100 1100 1100 … … … with the bits 1100 continuing to recur.
Normalized scientific representation yields
13.45=1.10101 1100 1100 …x2^3
Since the number is positive, the sign bit for the IEEE 754 representation is 0. The exponent in the biased notation will be 127+3=130, which in binary format is 10000010. The fraction is 1.10101 1100 1100 … … … (with 1100 recurring). Omitting the leading 1, the 23 bits for the fractional part are
10101 1100 1100 1100 1100 11
Thus, the 32 bits are
0 100 0001 0 101 01 11 00 11 00 11 00 11 00 11
So this is represented in the figure below:
Fig:4.2.Single Precision Floating Point Number Representation.
4.2 Floating Point Multiplication:
The general procedure for performing floating-point multiplication is the
1. Add the two exponents
2. Multiply the two fractions (significands).
3. If the product is 0, adjust the representation to the proper representation for 0.
4. a. If the product fraction is too big, normalize by shifting it right and incrementing
b. If the product fraction is too small, normalize by shifting left and decrementing the exponent.
5. If an exponent underflow or overflow occurs, generate an exception or error
6. Round to the appropriate number of bits. If rounding resulted in loss of normalization,
go to step 4 again.
One may note that, in addition to adding the exponents and multiplying the fractions, several steps such as normalizing the product, handling overflow and underflow, rounding to the appropriate number of bits, and so on need to be done. We assume that the two numbers are properly normalized to start with, and we want the final result to be normalized.
4.4 Example Taken For Floating Point Multiplication:
Consider the Example with
A= -18.0 and B= 9.5.
The floating point multipliers will be synthesized and Simulated using Synopsys VCS Software. So the codes written in Verilog language is Required for Synthesizing and Simulating the code in Synopsys VCS Software.
5.1 Software Specification
The Floating Point Multiplication structures designed in this project are done using verilog codes are written using Verilog language. The synthesis of the design and the simulation of the same is done using Synopsys VCS Tool.
The software used for Floating Point Multiplication is
Synopsys VCS Software.
5.1.1 Software description
Simulation in a system design is defined as a procedure to verify the functionality of a module, system defining block, sub-modules which consists of block representation of logics.
Simulation of a design primarily implies the logical connectivity and the verification of the functionality of the logic blocks using those logical connectivity. For simulation of a design, Synopsys VCS Software is used as a tool.
5.1.2 Flow of Simulation
we can enable a debug flag during compilation by using following command
vcs -lca -debug_access+all Counter.v Counter_tb.v
Now run the code:
This should open the dve tool automatically and you can fully run your test bench or debug it step by step.
To do this first select inputs and outputs from variable window and right click “Add to the Waves” as before.
Then click the tool button of blue arrow in brace or press F11 to run the test bench step by step Or click the tool button of the blue arrow pointing downward or press F5 to run the test bench fully.
Results and Analysis
The Schematic Diagram Of Floating Point Multiplier simulated Using Synopsys VCS software is shown in the figure below:
Fig.6.1.Schematic Diagram Of Floating Point Multiplier.
The Enlarged Image of Schematic Diagram is Shown In the Figure below:
Fig.6.2.Enlarged View of Floating Point Multiplier Schematic Diagram
The Output Waveforms For Floating Point Multiplier is Simulated Using Synopsys VCS tool and is as Shown in the Figure Below:
Fig.6.3.Output Waveforms For Floating Point Multiplier
In this Project Floating Point Multiplication Techniques like Single Precision and Double Precision Multipliers are taken because of its advantages over fixed point Multipliers and the Multiplier Source Code written in Verilog HDL is Synthesized and Simulated using Synopsys VCS Tool.