Let's get an internal representation of a floating point number

It is near, Kon'nichiwa! This time, I will introduce how to get the internal representation of floating point numbers.

1. What is a floating point number?

Floating-point numbers are numbers that represent decimal numbers in the sign, exponent, and mantissa. In many programming languages, the standard called IEEE 754 has been adopted, and the internal representation is arranged in the order of sign part, exponent part, and mantissa part from the top.

f1-1.PNG

2. Let's programmatically get an internal representation of a floating point number

Here, we will create some programs to get the internal representation of a single precision floating point type (float type). The languages are C, C ++, C #, VB.NET and Java.

2.1 Union type method (C language, C ++)

In C language and C ++, a ** union ** consisting of one ** float type variable ** and one ** int type variable ** with the same data size as the float type is used. Since each member of the union is in the same memory area, you can get an internal representation of a float type value from an int type member by putting a real number in the float type member.

float-bin.c


#include <stdio.h>

int main( void ) {

    /*Since the data size of the float type is 32 bits, it is combined with the 32-bit integer type (int).*/
    union { float f; int i; } a;
    int i;
    a.f = -2.5f;

    printf( "%f ( %08X )\n", a.f, a.i );
    
    /*Show a column of bits*/
    for( i = 31; i >= 0; i-- ){
        printf( "%d", ( a.i >> i ) & 1 );
    }
    printf( "\n" );
    
    /*Extract the exponent part (1 bit), exponent part (8 bits), and mantissa part (23 bits).*/
    printf( "Sign part: %X\n", ( a.i >> 31 ) & 1 );
    printf( "Index part: %X\n", ( a.i >> 23 ) & 0xFF );
    printf( "Mantissa: %X\n", a.i & 0x7FFFFF );

    return 0;
}

** Execution result ** -2.500000 ( C0200000 ) 11000000001000000000000000000000 Sign: 1 Index part: 80 Mantissa: 200000

2.2 Method using class functions (C #, VB.NET, Java)

C # and VB.NET use the ** BitConverter class ** of the .NET Framework. You can use the ** BitConverter.GetBytes ** method to convert a float type value to a byte type array once, and then use the ** BitConverter.ToInt32 ** method to get the internal representation of the float type value.

float-bin.cs


using System;

public class Program{
    public static void Main(){
        
        float f = -2.5f;
        // BitConverter.The ToInt32 method takes a byte array and the first index to be converted as arguments.
        int i = BitConverter.ToInt32( BitConverter.GetBytes( f ), 0 );
        Console.WriteLine( "{0:F} ( {1:X8} )", f, i );
        
        //Show a column of bits
        for( int j = 31; j >= 0; j-- ){
            Console.Write( ( i >> j ) & 1 );
        }
        Console.WriteLine();
        
        //Extract the exponent part (1 bit), exponent part (8 bits), and mantissa part (23 bits).
        Console.WriteLine( "Sign part: {0:X}", ( i >> 31 ) & 1 );
        Console.WriteLine( "Index part: {0:X}", ( i >> 23 ) & 0xFF );
        Console.WriteLine( "Mantissa: {0:X}", i & 0x7FFFFF );
    }
}

float-bin.vb


Module Module1

    Sub Main()
		
	Dim f As Single = -2.5f
        ' BitConverter.The ToInt32 method takes a byte array and the first index to be converted as arguments.
        Dim i As Integer = BitConverter.ToInt32( BitConverter.GetBytes( f ), 0 )
        Console.WriteLine( "{0:F} ( {1:X8} )", f, i )
        
        'Show a column of bits
        Dim j As Integer
        For j = 31 To 0 Step -1
            Console.Write( ( i >> j ) And 1 )
        Next
        Console.WriteLine()
        
        'Extract the exponent part (1 bit), exponent part (8 bits), and mantissa part (23 bits).
        Console.WriteLine( "Sign part: {0:X}", ( i >> 31 ) And 1 )
        Console.WriteLine( "Index part: {0:X}", ( i >> 23 ) And &HFF )
        Console.WriteLine( "Mantissa: {0:X}", i And &H7FFFFF )

        
    End Sub

End Module

In Java, ** Float class ** is used. You can get the internal representation of a float type value with the ** Float.floatToIntBits ** method.

float-bin.java


public class Main {
    public static void main(String[] args) throws Exception {
        
        float f = -2.5f;
        int i = Float.floatToIntBits( f );
        System.out.printf( "%f ( %08X )\n", f, i );
        
        //Show a column of bits
        for( int j = 31; j >= 0; j-- ){
            System.out.printf( "%d", ( i >> j ) & 1 );
        }
        System.out.println();
        
        //Extract the exponent part (1 bit), exponent part (8 bits), and mantissa part (23 bits).
        System.out.printf( "Sign part: %X\n", ( i >> 31 ) & 1 );
        System.out.printf( "Index part: %X\n", ( i >> 23 ) & 0xFF );
        System.out.printf( "Mantissa: %X\n", i & 0x7FFFFF );
        
    }
}

2.3 Method using pointer cast conversion (C language, C ++, C #)

In C language, C ++, and C #, you can get the internal representation of a float type value by casting ** a float type pointer to an int type pointer **.

float-binptr.c


#include <stdio.h>

int main( void ) {

    float f = -2.5f;
    int i = *( ( int* )&f ), j;
    printf( "%f ( %08X )\n", f, i );
    
    /*Show a column of bits*/
    for( j = 31; j >= 0; j-- ){
        printf( "%d", ( i >> j ) & 1 );
    }
    printf( "\n" );
    
    /*Extract the exponent part (1 bit), exponent part (8 bits), and mantissa part (23 bits).*/
    printf( "Sign part: %X\n", ( i >> 31 ) & 1 );
    printf( "Index part: %X\n", ( i >> 23 ) & 0xFF );
    printf( "Mantissa: %X\n", i & 0x7FFFFF );

    return 0;
}

In C #, pointers are treated as unsafe code and must be compiled with the "** / unsafe **" option.

float-binptr.cs


using System;

public class Program {
	//When dealing with unsafe code, add the "unsafe" keyword.
	public unsafe static void Main() {

		float f = -2.5f;
		int i = *( ( int* )&f );
		Console.WriteLine( "{0:F} ( {1:X8} )", f, i );

		//Show a column of bits
		for( int j = 31; j >= 0; j-- ) {
			Console.Write( ( i >> j ) & 1 );
		}
		Console.WriteLine();

		//Extract the exponent part (1 bit), exponent part (8 bits), and mantissa part (23 bits).
		Console.WriteLine( "Sign part: {0:X}", ( i >> 31 ) & 1 );
		Console.WriteLine( "Index part: {0:X}", ( i >> 23 ) & 0xFF );
		Console.WriteLine( "Mantissa: {0:X}", i & 0x7FFFFF );
	}
}

3. Conclusion

This time, I created a program to get the internal representation of float type using some languages as an example. Of course, you can get the internal representation in the same way with double type.

For example, how about using it when you see the calculation of floating point numbers in a computer class?

See you next time!

Recommended Posts

Let's get an internal representation of a floating point number
How to get an arbitrary digit from a number of 2 or more digits! !!
What is a floating point?
Let's get rid of Devise's sickness and lead a comfortable Rails life!
A simple example of an MVC model
[Swift] How to get the number of elements in an array (super basic)