• Uncategorized

About c : Why-is-the-bss-segment-required

Question Detail

What I know is that global and static variables are stored in the .data segment, and uninitialized data are in the .bss segment. What I don’t understand is why do we have dedicated segment for uninitialized variables? If an uninitialised variable has a value assigned at run time, does the variable exist still in the .bss segment only?

In the following program, a is in the .data segment, and b is in the .bss segment; is that correct? Kindly correct me if my understanding is wrong.

#include <stdio.h>
#include <stdlib.h>

int a[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9};
int b[20]; /* Uninitialized, so in the .bss and will not occupy space for 20 * sizeof (int) */

int main ()
{
   ;
}  

Also, consider following program,

#include <stdio.h>
#include <stdlib.h>
int var[10];  /* Uninitialized so in .bss */
int main ()
{
   var[0] = 20  /* **Initialized, where this 'var' will be ?** */
}

Question Answer

The reason is to reduce program size. Imagine that your C program runs on an embedded system, where the code and all constants are saved in true ROM (flash memory). In such systems, an initial “copy-down” must be executed to set all static storage duration objects, before main() is called. It will typically go like this pseudo:

for(i=0; i<all_explicitly_initialized_objects; i++)
{
  .data[i] = init_value[i];
}

memset(.bss, 
       0, 
       all_implicitly_initialized_objects);

Where .data and .bss are stored in RAM, but init_value is stored in ROM. If it had been one segment, then the ROM had to be filled up with a lot of zeroes, increasing ROM size significantly.

RAM-based executables work similarly, though of course they have no true ROM.

Also, memset is likely some very efficient inline assembler, meaning that the startup copy-down can be executed faster.

The .bss segment is an optimization. The entire .bss segment is described by a single number, probably 4 bytes or 8 bytes, that gives its size in the running process, whereas the .data section is as big as the sum of sizes of the initialized variables. Thus, the .bss makes the executables smaller and quicker to load. Otherwise, the variables could be in the .data segment with explicit initialization to zeroes; the program would be hard-pressed to tell the difference. (In detail, the address of the objects in .bss would probably be different from the address if it was in the .data segment.)

In the first program, a would be in the .data segment and b would be in the .bss segment of the executable. Once the program is loaded, the distinction becomes immaterial. At run time, b occupies 20 * sizeof(int) bytes.

In the second program, var is allocated space and the assignment in main() modifies that space. It so happens that the space for var was described in the .bss segment rather than the .data segment, but that doesn’t affect the way the program behaves when running.

From Assembly Language Step-by-Step: Programming with Linux by Jeff Duntemann, regarding the .data section:

The .data section contains data definitions of initialized data items. Initialized
data is data that has a value before the program begins running. These values
are part of the executable file. They are loaded into memory when the
executable file is loaded into memory for execution.

The important thing to remember about the .data section is that the
more initialized data items you define, the larger the executable file
will be, and the longer it will take to load it from disk into memory
when you run it.

and the .bss section:

Not all data items need to have values before the program begins running.
When you’re reading data from a disk file, for example, you need to have a
place for the data to go after it comes in from disk. Data buffers like that are
defined in the .bss section of your program. You set aside some number of
bytes for a buffer and give the buffer a name, but you don’t say what values
are to be present in the buffer.

There’s a crucial difference between data items defined in the .data
section and data items defined in the .bss section: data items in the
.data section add to the size of your executable file. Data items in
the .bss section do not. A buffer that takes up 16,000 bytes (or more,
sometimes much more) can be defined in .bss and add almost nothing
(about 50 bytes for the description) to the executable file size.

Well, first of all, those variables in your example aren’t uninitialized; C specifies that static variables not otherwise initialized are initialized to 0.

So the reason for .bss is to have smaller executables, saving space and allowing faster loading of the program, as the loader can just allocate a bunch of zeroes instead of having to copy the data from disk.

When running the program, the program loader will load .data and .bss into memory. Writes into objects residing in .data or .bss thus only go to memory, they are not flushed to the binary on disk at any point.

The System V ABI 4.1 (1997) (AKA ELF specification) also contains the answer:

.bss This section holds uninitialized data that contribute to the
program’s memory image. By definition, the system initializes the
data with zeros when the program begins to run. The section occupies no file space, as indicated by the section type, SHT_NOBITS.

says that the section name .bss is reserved and has special effects, in particular it occupies no file space, thus the advantage over .data.

The downside is of course that all bytes must be set to 0 when the OS puts them on memory, which is more restrictive, but a common use case, and works fine for uninitialized variables.

The SHT_NOBITS section type documentation repeats that affirmation:

sh_size This member gives the section’s size in bytes. Unless the section type is SHT_NOBITS, the section occupies sh_size
bytes in the file. A section of type SHT_NOBITS may have a non-zero
size, but it occupies no space in the file.

The C standard says nothing about sections, but we can easily verify where the variable is stored in Linux with objdump and readelf, and conclude that uninitialized globals are in fact stored in the .bss. See for example this answer: What happens to a declared, uninitialized variable in C?

The wikipedia article .bss provides a nice historical explanation, given that the term is from the mid-1950’s (yippee my birthday;-).

Back in the day, every bit was precious, so any method for signalling reserved empty space, was useful. This (.bss) is the one that has stuck.

.data sections are for space that is not empty, rather it will have (your) defined values entered into it.

You may also like...

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.