Issue 250407.1: Associating allocator call sites with type information
| Author: | Jann Horn |
|---|---|
| Champion: | David Blaikie |
| Date submitted: | 2025-04-07 |
| Date revised: | 2025-07-07 |
| Date closed: | 2025-06-23 |
| Type: | Enhancement |
| Status: | Accepted |
| DWARF version: | 6 |
Background
If it was possible to query the DWARF type of a heap-allocated object given its address, or list the addresses of all heap objects with a given DWARF type, that would be useful for various use cases:
- Most obviously for heap profiling ("how many objects of type X are there, and what do their contents look like").
- For associating Intel PEBS / AMD IBS hardware performance events related to data caches with type information (answering questions like "for this struct type, how often is each member accessed, and which members are often cache-cold on access").
- For associating memory access traces with type information.
- Maybe for adding type information to ASAN use-after-free reports.
Some memory allocators already have support for recording from where
the allocator was called for each object; either by saving the
instruction address from which the allocator was called (with
__builtin_return_address(0), for example in the Linux SLUB allocator
with SLAB_STORE_USER), or by saving an entire stack trace (for example
in ASAN).
The missing link for associating a heap object with a DWARF type is a link between the callsite address and the DWARF type.
LLVM IR and Microsoft CodeView debuginfo already have this feature;
LLVM IR calls it "heapallocsite", CodeView calls it "S_HEAPALLOCSITE".
It may make sense to choose a similar name in DWARF
("DW_AT_heapallocsite"). An alternative that more closely aligns with
DWARF conventions might be something like "DW_AT_call_allocated_type".
[Committee feedback: DW_AT_alloc_type might provide a more general name
that could allow the attribute to be used in other places in the future,
without consuming more attribute numbering space.]
A compiler might sometimes only have unreliable indicators of the type
of an allocation (like when malloc() is called with a size argument
derived from sizeof() in a non-trivial way), so I think the compiler
should be allowed to make a reasonable guess when no reliable type
information is available.
The attribute would primarily be used for calls to subroutines that
allocate individual objects (like C++ new or C malloc()); but it
might also be used, for example, when calling a subroutine that sets
up a type-specialized allocator instance (like
__kmem_cache_create_args() in Linux).
The attribute would mostly be attached to DW_TAG_call_site in
practice, so that recording the return address in the allocator is
sufficient. However, it could work similarly with
DW_TAG_inlined_subroutine if the allocator records the current
instruction pointer when it is inlined.
If an allocator subroutine is expected to reliably record type information with this mechanism, the compiler will probably have to disable tail calls and, depending on implementation, also inlining for calls to the allocator subroutine.
Overview
Add a new DWARF attribute DW_AT_alloc_type that
can be attached to a DW_TAG_call_site (or DW_TAG_inlined_subroutine)
for calls to allocator subroutines (or subroutines with similar
semantics). The value of the attribute is essentially the type
allocated by the callee when called from the callsite.
Proposed Changes
In section "2.2 Attribute Types", Table 2.2 "Attribute names", add the following entry:
Attribute: DW_AT_alloc_type
Identifies or Specifies: Type allocated at call site
In section "7.5.4 Attribute Encodings", Table 7.5 "Attribute encodings", add the following entry:
Attribute name: DW_AT_alloc_type
Value: <next available ID>
Classes: reference
At the end of section "3.4.1 Call Site Entries", append these paragraphs:
The call site may have a
DW_AT_alloc_typeattribute referencing a debugging information entry for the type that the type-agnostic callee operates on when called from this call site. In particular, if the callee's primary purpose is to allocate memory, it refers to the type of the allocated object. The referenced type may be a reasonable guess if no reliable type information is available.This attribute should only be used when either the callee is a memory allocation subroutine or the programmer has requested its use at a specific call site or for calls to a specific subroutine.
In section "3.3.8.2 Concrete Instances", before the paragraph starting
with "An inlined subroutine entry may have a DW_AT_const_expr
attribute", insert this paragraph:
An inlined subroutine entry may also have a
DW_AT_alloc_typeattribute referencing a debugging information entry for a type, which is interpreted in the same way as for aDW_TAG_call_site.
In Appendix A ("Attributes by Tag Value (Informative)"), Table A.1
("Attributes by tag value"), add applicable attribute
DW_AT_alloc_type to entries:
DW_TAG_call_siteDW_TAG_inlined_subroutine
References
-
clang C/C++ attribute documentation for the CodeView version of this
-
proposed LLVM implementation of DWARF heapallocsite information
-
Microsoft documentation for their memory profiler built on S_HEAPALLOCSITE debuginfo
2025-05-13: Revised.
Change name to DW_AT_alloc_type.
Remove sentence about non-memory-allocating usage:
The meaning of this attribute, when attached to a subroutine whose primary purpose is not to allocate memory, is defined by the programmer.
2025-06-22: Revised.
2025-06-23: Accepted.
2025-07-07: Editorial change: in Appendix A, attribute name should be
DW_AT_alloc_type.