Stochastic Learnings

Random walk in the land of machine learning, algorithms, and software engineering.

Follow me on GitHub

Analyzing Linux object files

Object files in Linux come in four different flavors:

  • .o object file. This is the most basic object file format in Unix. There is usually a 1:1 correspondence between a source file, e.g. in C, C++, assembly, or Fortran.
  • .a static library. A static library in Linux is just a collection of .o object files archived into one file. That is, .o file to .a file is like .class file to .jar file in Java.
  • .so dynamic library.
  • An executable (usually in ELF format).

In this tutorial you will learn how to introspect all four types of binary files.

Dynamic dependencies

Oftentimes we need to see what other dynamic libraries a given .so file depends on. This is where ldd utility comes in handy:

$ ldd libcaffe2.so
        linux-vdso.so.1 =>  (0x00007fffc006a000)
        libc10.so => /home/sergiym/devel/libtorch/lib/libc10.so (0x00007f5381190000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f5380f80000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5380d60000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5380b40000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5380920000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5380610000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f5380270000)
        libgomp-7bcb08ae.so.1 => /home/sergiym/devel/libtorch/lib/libgomp-7bcb08ae.so.1 (0x00007f5380040000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f537fc70000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f5389600000)

ldd can be applied to both .so and executable files. This tool is very helpful to make sure a certain library or an application is portable, or see what dynamic dependencies are missing.

Static libraries

Static libraries in Unix are just uncompressed archives of .o files. It is possible to extract an individual object file out of archive, or to add a new file to it. For that, every Unix system as an ar command (“ar” stands for “archive”; it is very similar in usage to its tar “tape archive” cousin).

To list .o files from a static library, run ar tv (“table/verbose”), e.g.:

$ ar tv libonnx.a
rw-r--r-- 0/0  84328 Oct 22 00:20 2018 defs.cc.o
rw-r--r-- 0/0  37888 Oct 22 00:20 2018 schema.cc.o
rw-r--r-- 0/0 152472 Oct 22 00:20 2018 checker.cc.o
rw-r--r-- 0/0   9464 Oct 22 00:19 2018 assertions.cc.o
rw-r--r-- 0/0  96096 Oct 22 00:20 2018 interned_strings.cc.o
rw-r--r-- 0/0 263568 Oct 22 00:20 2018 ir_pb_converter.cc.o
rw-r--r-- 0/0   9344 Oct 22 00:20 2018 model_helpers.cc.o
rw-r--r-- 0/0  10936 Oct 22 00:19 2018 status.cc.o
rw-r--r-- 0/0  92480 Oct 22 00:20 2018 defs.cc.o
rw-r--r-- 0/0  54864 Oct 22 00:20 2018 data_type_utils.cc.o
rw-r--r-- 0/0 184264 Oct 22 00:20 2018 defs.cc.o
. . . . .

Listing symbols

nm

There are several ways to list the symbol names available in such file. The simplest one is nm command (“nm” stands for “name”). It works for all four types of object files, including executables (if their symbol table wasn’t stripped). For static libraries, nm will also print the names of .o files that contain each symbol.

A useful incantation is nm -gC --defined-only, e.g.

$ nm -gC --defined-only libcaffe2.so
0000000007f5b1e0 D backend_lib
0000000007f609c0 B __bss_start
00000000014251b0 T cblas_dcopy
0000000001422590 T cblas_dgemm_batch
0000000001422560 T cblas_dscal

. . . . .

0000000007e28780 V vtable for std::_Sp_counted_ptr_inplace<std::string, std::allocator<std::string>, (__gnu_cxx::_Lock_policy)2>
0000000007faf6e0 u onnx_torch::RegisterOneFunctionBuilder(onnx_torch::FunctionBuilder&&)::function_builder_0_status
0000000007faf430 u onnx_torch::OpSchema::all_tensor_types()::all_tensor_types
0000000007faf790 u onnx_torch::OpSchema::all_numeric_types()::all_numeric_types
0000000007faf950 u onnx_torch::OpSchema::numeric_types_for_math_reduction()::numeric_types_for_math_reduction
0000000007f60a88 u at::LegacyTypeDispatch::initForDeviceType(at::DeviceType)::cpu_once
0000000007f60a84 u at::LegacyTypeDispatch::initForDeviceType(at::DeviceType)::cuda_once
0000000007f60a80 u at::LegacyTypeDispatch::initForScalarType(at::ScalarType)::once
00000000007c6710 W at::Context::registerType(at::Backend, at::ScalarType, at::Type*)::{lambda(at::Type*)#1}::_FUN(at::Type*)

Here -C stands for demangling C++ identifiers, and -g --defined-only to print only exported symbols.

nm output are triplets of: symbol value (i.e. address), symbol type, and symbol name. Value is missing for undefined (i.e. external) symbols. man nm explains all symbol types in detail.

objdump

Another useful tool to inspect the symbol table is objdump. Comparing to nm, it can print much more useful information about each symbol and the library. Use -T to list dynamic symbols, and -C to demangle C++ names:

$ objdump -TC libcaffe2.so | head

libcaffe2.so:     file format elf64-x86-64

DYNAMIC SYMBOL TABLE:
0000000000443f10 l    d  .init  0000000000000000              .init
0000000007dd6610 l    d  .tdata 0000000000000000              .tdata
0000000000000000      DF *UND*  0000000000000000  GLIBCXX_3.4 std::string::compare(unsigned long, unsigned long, std::string const&) const
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 dlerror
0000000000000000      DF *UND*  0000000000000000  GLIBCXX_3.4 std::string::replace(unsigned long, unsigned long, char const*, unsigned long)
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 strdup

readelf

If shared library or executable is in ELF format (which is the case in Linux), readelf command can be useful, too. -Ws options stand for printing symbols in wide format.

$ readelf -Ws libcaffe2.so | head

Symbol table '.dynsym' contains 32333 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000443f10     0 SECTION LOCAL  DEFAULT    6
     2: 0000000007dd6610     0 SECTION LOCAL  DEFAULT   14
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _ZNKSs7compareEmmRKSs@GLIBCXX_3.4 (2)
     4: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND dlerror@GLIBC_2.2.5 (3)
     5: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSs7replaceEmmPKcm@GLIBCXX_3.4 (2)
     6: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strdup@GLIBC_2.2.5 (4)

I don’t know if readelf has options to demangle C++ names or filter symbols by type.