URI:
       tgxxint_15.html - plan9port - [fork] Plan 9 from user space
  HTML git clone git://src.adamsgaard.dk/plan9port
   DIR Log
   DIR Files
   DIR Refs
   DIR README
   DIR LICENSE
       ---
       tgxxint_15.html (11113B)
       ---
            1 <HTML>
            2 <HEAD>
            3 <!-- This HTML file has been created by texi2html 1.52
            4      from gxxint.texi on 27 August 1999 -->
            5 
            6 <TITLE>G++ internals - Mangling</TITLE>
            7 </HEAD>
            8 <BODY>
            9 Go to the <A HREF="gxxint_1.html">first</A>, <A HREF="gxxint_14.html">previous</A>, <A HREF="gxxint_16.html">next</A>, <A HREF="gxxint_16.html">last</A> section, <A HREF="gxxint_toc.html">table of contents</A>.
           10 <P><HR><P>
           11 
           12 
           13 <H2><A NAME="SEC20" HREF="gxxint_toc.html#TOC20">Function name mangling for C++ and Java</A></H2>
           14 
           15 <P>
           16 Both C++ and Jave provide overloaded function and methods,
           17 which are methods with the same types but different parameter lists.
           18 Selecting the correct version is done at compile time.
           19 Though the overloaded functions have the same name in the source code,
           20 they need to be translated into different assembler-level names,
           21 since typical assemblers and linkers cannot handle overloading.
           22 This process of encoding the parameter types with the method name
           23 into a unique name is called <EM>name mangling</EM>.  The inverse
           24 process is called <EM>demangling</EM>.
           25 
           26 </P>
           27 <P>
           28 It is convenient that C++ and Java use compatible mangling schemes,
           29 since the makes life easier for tools such as gdb, and it eases
           30 integration between C++ and Java.
           31 
           32 </P>
           33 <P>
           34 Note there is also a standard "Jave Native Interface" (JNI) which
           35 implements a different calling convention, and uses a different
           36 mangling scheme.  The JNI is a rather abstract ABI so Java can call methods
           37 written in C or C++; 
           38 we are concerned here about a lower-level interface primarily
           39 intended for methods written in Java, but that can also be used for C++
           40 (and less easily C).
           41 
           42 </P>
           43 
           44 
           45 <H3><A NAME="SEC21" HREF="gxxint_toc.html#TOC21">Method name mangling</A></H3>
           46 
           47 <P>
           48 C++ mangles a method by emitting the function name, followed by <CODE>__</CODE>,
           49 followed by encodings of any method qualifiers (such as <CODE>const</CODE>),
           50 followed by the mangling of the method's class,
           51 followed by the mangling of the parameters, in order.
           52 
           53 </P>
           54 <P>
           55 For example <CODE>Foo::bar(int, long) const</CODE> is mangled
           56 as <SAMP>`bar__C3Fooil'</SAMP>.
           57 
           58 </P>
           59 <P>
           60 For a constructor, the method name is left out.
           61 That is <CODE>Foo::Foo(int, long) const</CODE>  is mangled 
           62 as <SAMP>`__C3Fooil'</SAMP>. 
           63 
           64 </P>
           65 <P>
           66 GNU Java does the same.
           67 
           68 </P>
           69 
           70 
           71 <H3><A NAME="SEC22" HREF="gxxint_toc.html#TOC22">Primitive types</A></H3>
           72 
           73 <P>
           74 The C++ types <CODE>int</CODE>, <CODE>long</CODE>, <CODE>short</CODE>, <CODE>char</CODE>,
           75 and <CODE>long long</CODE> are mangled as <SAMP>`i'</SAMP>, <SAMP>`l'</SAMP>,
           76 <SAMP>`s'</SAMP>, <SAMP>`c'</SAMP>, and <SAMP>`x'</SAMP>, respectively.
           77 The corresponding unsigned types have <SAMP>`U'</SAMP> prefixed
           78 to the mangling.  The type <CODE>signed char</CODE> is mangled <SAMP>`Sc'</SAMP>.
           79 
           80 </P>
           81 <P>
           82 The C++ and Java floating-point types <CODE>float</CODE> and <CODE>double</CODE>
           83 are mangled as <SAMP>`f'</SAMP> and <SAMP>`d'</SAMP> respectively.
           84 
           85 </P>
           86 <P>
           87 The C++ <CODE>bool</CODE> type and the Java <CODE>boolean</CODE> type are
           88 mangled as <SAMP>`b'</SAMP>.
           89 
           90 </P>
           91 <P>
           92 The C++ <CODE>wchar_t</CODE> and the Java <CODE>char</CODE> types are
           93 mangled as <SAMP>`w'</SAMP>.
           94 
           95 </P>
           96 <P>
           97 The Java integral types <CODE>byte</CODE>, <CODE>short</CODE>, <CODE>int</CODE>
           98 and <CODE>long</CODE> are mangled as <SAMP>`c'</SAMP>, <SAMP>`s'</SAMP>, <SAMP>`i'</SAMP>,
           99 and <SAMP>`x'</SAMP>, respectively.
          100 
          101 </P>
          102 <P>
          103 C++ code that has included <CODE>javatypes.h</CODE> will mangle
          104 the typedefs  <CODE>jbyte</CODE>, <CODE>jshort</CODE>, <CODE>jint</CODE>
          105 and <CODE>jlong</CODE> as respectively <SAMP>`c'</SAMP>, <SAMP>`s'</SAMP>, <SAMP>`i'</SAMP>,
          106 and <SAMP>`x'</SAMP>.  (This has not been implemented yet.)
          107 
          108 </P>
          109 
          110 
          111 <H3><A NAME="SEC23" HREF="gxxint_toc.html#TOC23">Mangling of simple names</A></H3>
          112 
          113 <P>
          114 A simple class, package, template, or namespace name is
          115 encoded as the number of characters in the name, followed by
          116 the actual characters.  Thus the class <CODE>Foo</CODE>
          117 is encoded as <SAMP>`3Foo'</SAMP>.
          118 
          119 </P>
          120 <P>
          121 If any of the characters in the name are not alphanumeric
          122 (i.e not one of the standard ASCII letters, digits, or '_'),
          123 or the initial character is a digit, then the name is
          124 mangled as a sequence of encoded Unicode letters.
          125 A Unicode encoding starts with a <SAMP>`U'</SAMP> to indicate
          126 that Unicode escapes are used, followed by the number of
          127 bytes used by the Unicode encoding, followed by the bytes
          128 representing the encoding.  ASSCI letters and
          129 non-initial digits are encoded without change.  However, all
          130 other characters (including underscore and initial digits) are
          131 translated into a sequence starting with an underscore,
          132 followed by the big-endian 4-hex-digit lower-case encoding of the character.
          133 
          134 </P>
          135 <P>
          136 If a method name contains Unicode-escaped characters, the
          137 entire mangled method name is followed by a <SAMP>`U'</SAMP>.
          138 
          139 </P>
          140 <P>
          141 For example, the method <CODE>X\u0319::M\u002B(int)</CODE> is encoded as
          142 <SAMP>`M_002b__U6X_0319iU'</SAMP>.
          143 
          144 </P>
          145 
          146 
          147 <H3><A NAME="SEC24" HREF="gxxint_toc.html#TOC24">Pointer and reference types</A></H3>
          148 
          149 <P>
          150 A C++ pointer type is mangled as <SAMP>`P'</SAMP> followed by the
          151 mangling of the type pointed to.
          152 
          153 </P>
          154 <P>
          155 A C++ reference type as mangled as <SAMP>`R'</SAMP> followed by the
          156 mangling of the type referenced.
          157 
          158 </P>
          159 <P>
          160 A Java object reference type is equivalent
          161 to a C++ pointer parameter, so we mangle such an parameter type
          162 as <SAMP>`P'</SAMP> followed by the mangling of the class name.
          163 
          164 </P>
          165 
          166 
          167 <H3><A NAME="SEC25" HREF="gxxint_toc.html#TOC25">Qualified names</A></H3>
          168 
          169 <P>
          170 Both C++ and Java allow a class to be lexically nested inside another
          171 class.  C++ also supports namespaces (not yet implemented by G++).
          172 Java also supports packages.
          173 
          174 </P>
          175 <P>
          176 These are all mangled the same way:  First the letter <SAMP>`Q'</SAMP>
          177 indicates that we are emitting a qualified name.
          178 That is followed by the number of parts in the qualified name.
          179 If that number is 9 or less, it is emitted with no delimiters.
          180 Otherwise, an underscore is written before and after the count.
          181 Then follows each part of the qualified name, as described above.
          182 
          183 </P>
          184 <P>
          185 For example <CODE>Foo::\u0319::Bar</CODE> is encoded as
          186 <SAMP>`Q33FooU5_03193Bar'</SAMP>.
          187 
          188 </P>
          189 
          190 
          191 <H3><A NAME="SEC26" HREF="gxxint_toc.html#TOC26">Templates</A></H3>
          192 
          193 <P>
          194 A class template instantiation is encoded as the letter <SAMP>`t'</SAMP>,
          195 followed by the encoding of the template name, followed
          196 the number of template parameters, followed by encoding of the template
          197 parameters.  If a template parameter is a type, it is written
          198 as a <SAMP>`Z'</SAMP> followed by the encoding of the type.
          199 
          200 </P>
          201 <P>
          202 A function template specialization (either an instantiation or an
          203 explicit specialization) is encoded by an <SAMP>`H'</SAMP> followed by the
          204 encoding of the template parameters, as described above, followed by 
          205 an <SAMP>`_'</SAMP>, the encoding of the argument types template function (not the
          206 specialization), another <SAMP>`_'</SAMP>, and the return type.  (Like the
          207 argument types, the return type is the return type of the function
          208 template, not the specialization.)  Template parameters in the argument
          209 and return types are encoded by an <SAMP>`X'</SAMP> for type parameters, or a
          210 <SAMP>`Y'</SAMP> for constant parameters, and an index indicating their position
          211 in the template parameter list declaration.
          212 
          213 </P>
          214 
          215 
          216 <H3><A NAME="SEC27" HREF="gxxint_toc.html#TOC27">Arrays</A></H3>
          217 
          218 <P>
          219 C++ array types are mangled by emitting <SAMP>`A'</SAMP>, followed by
          220 the length of the array, followed by an <SAMP>`_'</SAMP>, followed by
          221 the mangling of the element type.  Of course, normally
          222 array parameter types decay into a pointer types, so you
          223 don't see this.
          224 
          225 </P>
          226 <P>
          227 Java arrays are objects.  A Java type <CODE>T[]</CODE> is mangled
          228 as if it were the C++ type <CODE>JArray&#60;T&#62;</CODE>.
          229 For example <CODE>java.lang.String[]</CODE> is encoded as
          230 <SAMP>`Pt6JArray1ZPQ34java4lang6String'</SAMP>.
          231 
          232 </P>
          233 
          234 
          235 <H3><A NAME="SEC28" HREF="gxxint_toc.html#TOC28">Table of demangling code characters</A></H3>
          236 
          237 <P>
          238 The following special characters are used in mangling:
          239 
          240 </P>
          241 <DL COMPACT>
          242 
          243 <DT><SAMP>`A'</SAMP>
          244 <DD>
          245 Indicates a C++ array type.
          246 
          247 <DT><SAMP>`b'</SAMP>
          248 <DD>
          249 Encodes the C++ <CODE>bool</CODE> type,
          250 and the Java <CODE>boolean</CODE> type.
          251 
          252 <DT><SAMP>`c'</SAMP>
          253 <DD>
          254 Encodes the C++ <CODE>char</CODE> type, and the Java <CODE>byte</CODE> type.
          255 
          256 <DT><SAMP>`C'</SAMP>
          257 <DD>
          258 A modifier to indicate a <CODE>const</CODE> type.
          259 Also used to indicate a <CODE>const</CODE> member function
          260 (in which cases it precedes the encoding of the method's class).
          261 
          262 <DT><SAMP>`d'</SAMP>
          263 <DD>
          264 Encodes the C++ and Java <CODE>double</CODE> types.
          265 
          266 <DT><SAMP>`e'</SAMP>
          267 <DD>
          268 Indicates extra unknown arguments <CODE>...</CODE>.
          269 
          270 <DT><SAMP>`f'</SAMP>
          271 <DD>
          272 Encodes the C++ and Java <CODE>float</CODE> types.
          273 
          274 <DT><SAMP>`F'</SAMP>
          275 <DD>
          276 Used to indicate a function type.
          277 
          278 <DT><SAMP>`H'</SAMP>
          279 <DD>
          280 Used to indicate a template function.
          281 
          282 <DT><SAMP>`i'</SAMP>
          283 <DD>
          284 Encodes the C++ and Java <CODE>int</CODE> types.
          285 
          286 <DT><SAMP>`J'</SAMP>
          287 <DD>
          288 Indicates a complex type.
          289 
          290 <DT><SAMP>`l'</SAMP>
          291 <DD>
          292 Encodes the C++ <CODE>long</CODE> type.
          293 
          294 <DT><SAMP>`P'</SAMP>
          295 <DD>
          296 Indicates a pointer type.  Followed by the type pointed to.
          297 
          298 <DT><SAMP>`Q'</SAMP>
          299 <DD>
          300 Used to mangle qualified names, which arise from nested classes.
          301 Should also be used for namespaces (?).
          302 In Java used to mangle package-qualified names, and inner classes.
          303 
          304 <DT><SAMP>`r'</SAMP>
          305 <DD>
          306 Encodes the GNU C++ <CODE>long double</CODE> type.
          307 
          308 <DT><SAMP>`R'</SAMP>
          309 <DD>
          310 Indicates a reference type.  Followed by the referenced type.
          311 
          312 <DT><SAMP>`s'</SAMP>
          313 <DD>
          314 Encodes the C++ and java <CODE>short</CODE> types.
          315 
          316 <DT><SAMP>`S'</SAMP>
          317 <DD>
          318 A modifier that indicates that the following integer type is signed.
          319 Only used with <CODE>char</CODE>.
          320 
          321 Also used as a modifier to indicate a static member function.
          322 
          323 <DT><SAMP>`t'</SAMP>
          324 <DD>
          325 Indicates a template instantiation.
          326 
          327 <DT><SAMP>`T'</SAMP>
          328 <DD>
          329 A back reference to a previously seen type.
          330 
          331 <DT><SAMP>`U'</SAMP>
          332 <DD>
          333 A modifier that indicates that the following integer type is unsigned.
          334 Also used to indicate that the following class or namespace name
          335 is encoded using Unicode-mangling.
          336 
          337 <DT><SAMP>`v'</SAMP>
          338 <DD>
          339 Encodes the C++ and Java <CODE>void</CODE> types.
          340 
          341 <DT><SAMP>`V'</SAMP>
          342 <DD>
          343 A modified for a <CODE>const</CODE> type or method.
          344 
          345 <DT><SAMP>`w'</SAMP>
          346 <DD>
          347 Encodes the C++ <CODE>wchar_t</CODE> type, and the Java <CODE>char</CODE> types.
          348 
          349 <DT><SAMP>`x'</SAMP>
          350 <DD>
          351 Encodes the GNU C++ <CODE>long long</CODE> type, and the Java <CODE>long</CODE> type.
          352 
          353 <DT><SAMP>`X'</SAMP>
          354 <DD>
          355 Encodes a template type parameter, when part of a function type.
          356 
          357 <DT><SAMP>`Y'</SAMP>
          358 <DD>
          359 Encodes a template constant parameter, when part of a function type.
          360 
          361 <DT><SAMP>`Z'</SAMP>
          362 <DD>
          363 Used for template type parameters. 
          364 
          365 </DL>
          366 
          367 <P>
          368 The letters <SAMP>`G'</SAMP>, <SAMP>`M'</SAMP>, <SAMP>`O'</SAMP>, and <SAMP>`p'</SAMP>
          369 also seem to be used for obscure purposes ...
          370 
          371 </P>
          372 <P><HR><P>
          373 Go to the <A HREF="gxxint_1.html">first</A>, <A HREF="gxxint_14.html">previous</A>, <A HREF="gxxint_16.html">next</A>, <A HREF="gxxint_16.html">last</A> section, <A HREF="gxxint_toc.html">table of contents</A>.
          374 </BODY>
          375 </HTML>