In response to a Quora question, I wrote the following:

Of course it is possible [to reverse engineer code without the source]. Let me give you a simple example.

I am just looking at a Linux executable that I created moments ago. I pretend that I don't have the source code. The executable file is called t.

First, let me disassemble the file:

objdump -d t > t.asm

And just in case, let me also dump any global data:

objdump -s -j .rodata t > t.dat

And now let's look at the result. The file t.asm is over 200 lines, but a lot of it is just the standard C preamble/cleanup code.

The relevant bit is the main() function, which begins as follows:

00000000004004ec :
  4004ec:   55                      push   %rbp
  4004ed:   48 89 e5                mov    %rsp,%rbp
  4004f0:   48 83 ec 10             sub    $0x10,%rsp

This is just the standard C function preamble. Memory is allocated for local (automatic) variables on the stack, by adjusting the stack pointer.

  4004f4:   c7 45 fc 02 00 00 00    movl   $0x2,-0x4(%rbp)

The value of 2 is stored in a first variable, which is a 32-bit integer. So let me write down some C code that corresponds to this:

    int A; // First variable
    A = 2;

Continuing, I read:

  4004fb:   8b 45 fc                mov    -0x4(%rbp),%eax
  4004fe:   0f af 45 fc             imul   -0x4(%rbp),%eax
  400502:   89 45 f8                mov    %eax,-0x8(%rbp)

Here, the first variable is moved to the accumulator, multiplied by itself, and stored in a second variable (also a 32-bit integer). I'd write this as:

    int B; // Second variable

    B = A * A;

Carrying on:

  400505:   b8 0c 06 40 00          mov    $0x40060c,%eax

The address 0x40060c is pushed down the stack. What is at this address? This is where the second file I created, t.dat, comes in handy as it contains a dump of the relevant data segment:

Contents of section .rodata:
 400608 01000200 25640a00                    ....%d..

So at 0x40060c, I find the string "%d\n".

Going back to the code:

  40050a:   8b 55 f8                mov    -0x8(%rbp),%edx

The value of the second variable is pushed down the stack.

  40050d:   89 d6                   mov    %edx,%esi
  40050f:   48 89 c7                mov    %rax,%rdi
  400512:   b8 00 00 00 00          mov    $0x0,%eax
  400517:   e8 cc fe ff ff          callq  4003e8 <printf@plt>

After the usual function call preamble, the printf function is called. In other words, the original C code had to look something like this:

    printf("%d\n", B);


  40051c:   c9                      leaveq
  40051d:   c3                      retq
  40051e:   90                      nop
  40051f:   90                      nop

And the main() function is concluded.

So fully reconstructed, the program reads:

    int A;
    int B;

    A = 2;
    B = A * A;
    printf("%d\n", B);

For comparison, this was my original source code:

#include <stdio.h>

void main(void)
    int i, j;

    i = 2;
    j = i * i;
    printf("%d\n", j);

Apart from variable names (which are not preserved in the object file) the code was correctly reconstructed.

Now I am not trying to give the impression that it is always easy to reverse engineer code. On the contrary, it can be fiendishly difficult, particularly due to compiler optimizations, when real-world code is concerned and can often look nothing like the original code. But it can be done.