Mar 8 2013

Ender Dai

Secure web proxy with squid and chrome (a.k.a. A way to break the GFW)

The well known China Great Firewall (GFW) blocks the access to several web services from China, such as Facebook, Twitter, and Youtube. Personally, I have no interest in using Facebook or Twitter, and I hardly watch online video streams. So it is not a big trouble for me. However, the GFW will cause the infamous TCP RST issue when using google. This is unacceptable since it will dramatically decay the efficiency of every programmer.

Thanks to my wife, I obtained root access to an ubuntu server hosted in U.S. and am able to setup a proxy server to break the GFW. The following paragraphs details some key points of the procedure.

First of all, one must be noted that a traditional http proxy server setup will NOT work in this case. The data flow between our web browser and the proxy server is not encrypted, hence it is still vulnerable to the keyword based GFW filter. So we must encrypt this data channel to protect the data from been recognized by GFW keyword filter.

The famous proxy software Squid provides the capability to protect the data flow with SSL. However, this feature is not built in the squid provided by some linux distribution. To check whether squid installation supports this feature or not:

$ squid -v | grep -- '--enable-ssl'

If the squid is not configured with the option –enable-ssl, we should build one from source with this configure option enabled.

The only vital directives in squid.conf to enable SSL is https_port:

https_port 443 cert=/etc/squid3/limitedwish.org.crt key=/etc/squid3/limitedwish.org.key

This directive designates the port number used by SSL secured data channel as well as the certificate and the corresponding private key. The certificate can be self-signed for personal use:

$ openssl genpkey -out limitedwish.org.key -algorithm rsa
$ openssl req -new -key limitedwish.org.key -out limitedwish.org.csr
$ openssl x509 -req -days 365 -in limitedwish.org.csr -signkey limitedwish.org.key -out limitedwish.org.crt

The first command generates a RSA private key. The next command creates a CSR (Certificate Signing Request) using this private key. Openssl will ask you for several information, the most important of which is the “Common Name”. Make sure you provided the correct FQDN of the proxy server, otherwise the certificate will be denied by your browser. The last command will use the CSR and the private key to generated a self-signed certificate which is good for 365 days.

We can tune other directives in squid.conf to fit our needs, e.g. add user authentication. However, https_port is the most vital one.

Now we need to ask the browser to talk with our proxy server via SSL protected channel. Unfortunately, most of the browsers have no such feature built in (try stunnel if those browsers are prefered). The only browser with such capability is the Google Chrome as far as I know.

$ chrome --proxy-server=https://proxy.limitedwish.org:443

Since the certificate is self-signed, chrome will deny it and report Error 136 (net::ERR_PROXY_CERTIFICATE_INVALID): Unknown error.. We should add just created certificate into trusted certificate list of our OS or browser. In Mac, this can be achieved by import the certificate into the keychain. Also, please make sure this certificate is trusted when used by SSL.

Now it should be all set. Connect to Youtube to check if everything works :)

Reference

  1. Some technical overview about the GFW
  2. How to create a self-signed SSL certificate
  3. Chrome Secure Web Proxy
  4. How to pass command line arguments to Mac OSX Apps

Aug 9 2012

Ender Dai

Proxy Git

Git works with git/ssh/http/https protocols and we need to use different ways to proxy them, which are summarized as below.

git

To proxy git running git protocol, we need

  1. A working socks proxy server, and
  2. BSD netcat as a helper utility. Please note that the so called traditional netcat doesn’t fit our need as there is not proxy support.
$ cat ~/bin/socks-gateway.sh
#!/bin/bash

METHOD="-X 5 -x proxy-prc.intel.com:1080"
/bin/nc $METHOD $*

$ export GIT_PROXY_COMMAND=~/bin/socks-gateway.sh

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Cloning into 'linux'...
remote: Counting objects: 97264
....

http/https

Git utilize curl to fetch data over http/https, so we can simply point the environment variable http_proxy or https_proxy to our proxy server and then it is all set.

$ export http_proxy=http://proxy-prc.intel.com:911
$ git clone http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Cloning into 'linux'...
remote: Counting objects: 45896
...

$ export https_proxy=http://proxy-prc.intel.com:911
$ git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Cloning into 'linux'...
remote: Counting objects: 93630
...

Reference

Search the phrase core.gitProxy, http.proxy and https_proxy in git-config(1) for more detailed and information.

Jul 16 2012

Ender Dai

linkname, soname and realname

As you may already know, the linker1 option -l should be used to specify the name of the library which is needed by your application when it was compiled. For example, if we write a program which utilize libjpeg, we should specify -ljpeg explicitly when we compile it:

$ gcc -ljpeg image.c

The problem lies behind is where does gcc find the library we specified with -ljpeg. The only clue we get here is the string jpeg, a.k.a. the NAMESPEC. So we need to understand the rules used by toolchain which will help us to find the correct library file in the filesystem. This is fairly straightforward: The toolchain will search for a file named libjpeg.so in some directories and use the first one it finds. The directories searched includes several standard system directories plus any that you specify with -L. We can check the directory list with gcc option -v when we compile our program. libjpeg.so here is the linkname of the library.

Now we have get the program compiled successfully, but when we run the binary, how could the dynamic linker2 find the library installed in our system? One of the obvious way is to save the path of the library file we find in linking stage into the binary we created and let the dynamic linker to get the file in that path. Well, this is not flexible enough though, because the system you run the application is not necessarily to be the same one that compiled it, and the library may not be installed in the same directory between two systems. So let’s go a little further, we just save the linkname into the binary and let the dynamic linker to search it in a list of directories, just like what we have done when we link our application.

This is better, but not enough. One of the benefits comes with shared library is that we can upgrade the library (e.g. to fix bug) without rewrite / rebuild our application. Different versions of one library may not be compatible. For example, our application probably used a API which was introduced in version 6 of libjpeg, then it won’t work with libjpeg prior to version 6. Different applications installed in one system may depend on different versions of one library, hence it is very common that several versions of a same library coexist in one system. We need to add some kinds of version info into the linkname, e.g. libjpeg.so.1, so dynamic linker can find the correct library. libjpeg.so.1 here is the soname of the library. We should save soname instead of linkname in the binary.

Note that not every library upgrade breaks the ABI compatibility – some upgrades just fix bugs, therefore minor version number is necessary, e.g. libjpeg.so.1.12, which hereafter referenced as realname.

Now let’s put this all together. When we create a shared library, we should specify the soname and realname separately, as they are usually not the same.

$ gcc -Wall -Werror -fPIC -c fibonacci.c
$ gcc -shared -Wl,-soname,libfibo.so.1 -o libfibo.so.1.0.1 fibonacci.o
$ ls -l
-rw-r--r-- 1 ender ender  174 Jul 15 17:59 fibonacci.c
-rw-r--r-- 1 ender ender 1464 Jul 16 10:32 fibonacci.o
-rwxr-xr-x 1 ender ender 6329 Jul 16 10:32 libfibo.so.1.0.1

We can check the the soname with readelf:

$ readelf -d libfibo.so.1.0.1

Dynamic section at offset 0x7a0 contains 26 entries:
Tag        Type                         Name/Value
0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
0x000000000000000e (SONAME)             Library soname: [libfibo.so.1]
0x000000000000000c (INIT)               0x560
0x000000000000000d (FINI)               0x6ec
0x0000000000000019 (INIT_ARRAY)         0x200788
0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
0x000000000000001a (FINI_ARRAY)         0x200790
0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
0x0000000000000004 (HASH)               0x1b8
0x000000006ffffef5 (GNU_HASH)           0x200
0x0000000000000005 (STRTAB)             0x378
0x0000000000000006 (SYMTAB)             0x240
0x000000000000000a (STRSZ)              186 (bytes)
0x000000000000000b (SYMENT)             24 (bytes)
0x0000000000000003 (PLTGOT)             0x2009a8
0x0000000000000002 (PLTRELSZ)           48 (bytes)
0x0000000000000014 (PLTREL)             RELA
0x0000000000000017 (JMPREL)             0x530
0x0000000000000007 (RELA)               0x470
0x0000000000000008 (RELASZ)             192 (bytes)
0x0000000000000009 (RELAENT)            24 (bytes)
0x000000006ffffffe (VERNEED)            0x450
0x000000006fffffff (VERNEEDNUM)         1
0x000000006ffffff0 (VERSYM)             0x432
0x000000006ffffff9 (RELACOUNT)          3
0x0000000000000000 (NULL)               0x0

Now let’s try to build a program which depends on libfibo

$ gcc main.c -L. -lfibo -o fib
/usr/bin/ld: cannot find -lfibo
collect2: error: ld returned 1 exit status

This fails because the linkname is used in linking stage. The linker can not find a file named libfibo.so in this case.

$ ln -s libfibo.so.1.0.1 libfibo.so
$ gcc main.c -L. -lfibo -o fib
$ ls -l
-rwxr-xr-x 1 ender ender 7330 Jul 16 10:32 fib
-rw-r--r-- 1 ender ender  174 Jul 15 17:59 fibonacci.c
-rw-r--r-- 1 ender ender 1464 Jul 16 10:32 fibonacci.o
lrwxrwxrwx 1 ender ender   16 Jul 16 10:41 libfibo.so -> libfibo.so.1.0.1
-rwxr-xr-x 1 ender ender 6329 Jul 16 10:32 libfibo.so.1.0.1
-rw-r--r-- 1 ender ender  152 Jul 15 17:41 main.c

Compile succeed. But the binary won’t run, because soname is used by dynamic linker to find the library file when we are trying to run the binary:

$ ./fib
./fib: error while loading shared libraries: libfibo.so.1: cannot open
shared object file: No such file or directory
$ ldd fib
        linux-vdso.so.1 =>  (0x00007fffe75ff000)
        libfibo.so.1 => not found
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff65ad78000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ff65b119000)

Ok, let’s create a symbol link for the soname:

$ ln -s libfibo.so.1.0.1 libfibo.so.1
$ ls -l
-rwxr-xr-x 1 ender ender 7330 Jul 16 10:32 fib
-rw-r--r-- 1 ender ender  174 Jul 15 17:59 fibonacci.c
-rw-r--r-- 1 ender ender 1464 Jul 16 10:32 fibonacci.o
lrwxrwxrwx 1 ender ender   16 Jul 16 10:41 libfibo.so -> libfibo.so.1.0.1
lrwxrwxrwx 1 ender ender   16 Jul 16 10:45 libfibo.so.1 -> libfibo.so.1.0.1
-rwxr-xr-x 1 ender ender 6329 Jul 16 10:32 libfibo.so.1.0.1
-rw-r--r-- 1 ender ender  152 Jul 15 17:41 main.c
$ ./fib
./fib: error while loading shared libraries: libfibo.so.1: cannot open
shared object file: No such file or directory
$ ldd fib
        linux-vdso.so.1 =>  (0x00007fffe75ff000)
        libfibo.so.1 => not found
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff65ad78000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ff65b119000)

Hmmm? Still failed! Well, this is because the current directory is not in the search path list of the dynamic linker. We have several way to get rid of it. The first one is the environment variable LD_LIBRARY_PATH:

$ LD_LIBRARY_PATH=. ./fib
fibonacci(1) = 1
$ LD_LIBRARY_PATH=. ldd fib
        linux-vdso.so.1 =>  (0x00007fffd71ff000)
        libfibo.so.1 => ./libfibo.so.1 (0x00007f9f55a02000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9f55663000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f9f55c05000)

The second one is a little bit more complex. We can ask toolchain to embed into the binary some additional search path for dynamic linker. In this way we don’t need the end user to do some special configuration to make the binary run, however we must rebuild our application:

$ gcc main.c -L. -lfibo -Wl,-rpath,$(pwd) -o fib
$ ./fib
fibonacci(1) = 1
$ ldd fib-rpath
        linux-vdso.so.1 =>  (0x00007fff5a5ff000)
        libfibo.so.1 => /home/ender/src/git/arsenal/library/shared/libfibo.so.1 (0x00007f0b06303000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0b05f64000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f0b06506000)

As the end of this article, let’s get some insight into how dynamic linker finds the correct library:

$ LD_DEBUG=libs ./fib
      9122:     find library=libfibo.so.1 [0]; searching
      9122:      search path=/home/ender/src/git/arsenal/library/shared/tls/x86_64:/home/ender/src/git/arsenal/library/shared/tls:/home/ender/src/git/arsenal/library/shared/x86_64:/home/ender/src/git/arsenal/library/shared               (RPATH from file ./fib)
      9122:       trying file=/home/ender/src/git/arsenal/library/shared/tls/x86_64/libfibo.so.1
      9122:       trying file=/home/ender/src/git/arsenal/library/shared/tls/libfibo.so.1
      9122:       trying file=/home/ender/src/git/arsenal/library/shared/x86_64/libfibo.so.1
      9122:       trying file=/home/ender/src/git/arsenal/library/shared/libfibo.so.1
      9122:
      9122:     find library=libc.so.6 [0]; searching
      9122:      search path=/home/ender/src/git/arsenal/library/shared         (RPATH from file ./fib)
      9122:       trying file=/home/ender/src/git/arsenal/library/shared/libc.so.6
      9122:      search cache=/etc/ld.so.cache
      9122:       trying file=/lib/x86_64-linux-gnu/libc.so.6
      9122:
      9122:
      9122:     calling init: /lib64/ld-linux-x86-64.so.2
      9122:
      9122:
      9122:     calling init: /lib/x86_64-linux-gnu/libc.so.6
      9122:
      9122:
      9122:     calling init: /home/ender/src/git/arsenal/library/shared/libfibo.so.1
      9122:
      9122:
      9122:     initialize program: ./fib
      9122:
      9122:
      9122:     transferring control: ./fib
      9122:
fibonacci(1) = 1
      9122:
      9122:     calling fini: ./fib [0]
      9122:
      9122:
      9122:     calling fini: /home/ender/src/git/arsenal/library/shared/libfibo.so.1 [0]
      9122:

Reference

Jul 3 2012

Ender Dai

Data alignment on ARM

First of all let’s check the demo code below:

Compile and run on ARM926EJ-S:

$ arm-linux-gcc -o tst tst.c
...
<copy to arm board>
...
# ./tst
0x80000:0x39020000
0x800:0x390200

Surprise, isn’t it? Well, let’s check the address of the array ReqBuffer[]:

$ arm-linux-nm tst | grep ReqBuffer
00010741 B ReqBuffer

So, it’s not word-aligned. What if we force it to be aligned on word boundary? After modifying line 14 into uint8_t ReqBuffer[100] __attribute__ ((aligned (4)));:

$ arm-linux-gcc -o tst tst.c
$ arm-linux-nm tst | grep ReqBuffer
00010744 B ReqBuffer

# ./tst
0x80000:0x39020000
0x80000:0x39020000

Now we get expected result, but why? We need to delve into “ARM Architecture Reference Manual” to get some insight. ARM926EJ-S is ARMv5 based, and in “A2.8 Unaligned data access” it said:

Prior to ARMv6, doubleword (LDRD/STRD) accesses to memory, where the address is not doubleword-aligned, are UNPREDICTABLE. Also, data accesses to non-aligned word and halfword data are treated as aligned from the memory interface perspective. That is:

  • the address is treated as truncated, with address bits[1:0] treated as zero for word accesses, and address bit[0] treated as zero for halfword accesses.
  • load single word ARM instructions are architecturally defined to rotate right the word-aligned data transferred by a non word-aligned address one, two or three bytes depending on the value of the two least significant address bits.
  • alignment checking is defined for implementations supporting a System Control coprocessor using the A bit in CP15 register 1. When this bit is set, a Data Abort indicating an alignment fault is reported for unaligned accesses.

To understand first two of the statements, it would be better to try it out.

$ arm-linux-gcc -o align align.c

# ./align
0x10828 00 00 00 00 11 22 33 44 0x10828 11223344
0x10828 00 00 00 00 11 22 33 44 0x10829 44112233
0x10828 00 00 00 00 11 22 33 44 0x1082a 33441122
0x10828 00 00 00 00 11 22 33 44 0x1082b 22334411
0x10828 11 22 33 44 00 00 00 00 0x1082c 11223344
0x10828 11 22 33 44 00 00 00 00 0x1082d 44112233
0x10828 11 22 33 44 00 00 00 00 0x1082e 33441122
0x10828 11 22 33 44 00 00 00 00 0x1082f 22334411

As you can see, if the word pointer is not word-aligned, the value you read out is not necessarily to be equal with the one you write into previously. The behavior is annoying and may lead to program bug which is not easy to find out, just like the one we demoed in the beginning of the article.

Linux provided a software workaround for this issue. It’s implemented based on Data Abort exception. If the A bit of CP15 register 1 is set, a Data Abort will be reported for unaligned data access. So, when Linux Kernel captured a Data Abort, it will check CP15 register 5 (Fault Status Register, FSR), if it turns out that the Data Abort is caused by unaligned data access, the Kernel will try to fix the data access on the fly. For Linux Kernel 2.6.28, the fixup for unaligned data access in kernel space is mandatory, but for user space application, the behavior can be configured via /proc/cpu/alignment. Valid configurations are: 0 for ignored, 1 for warn, 2 for fixup, 3 for fixup+warn, 4 for signal, 5 for signal+warn:

# echo 0 >/proc/cpu/alignment
# ./align
0x10828 00 00 00 00 11 22 33 44 0x10828 11223344
0x10828 00 00 00 00 11 22 33 44 0x10829 44112233
0x10828 00 00 00 00 11 22 33 44 0x1082a 33441122
0x10828 00 00 00 00 11 22 33 44 0x1082b 22334411
0x10828 11 22 33 44 00 00 00 00 0x1082c 11223344
0x10828 11 22 33 44 00 00 00 00 0x1082d 44112233
0x10828 11 22 33 44 00 00 00 00 0x1082e 33441122
0x10828 11 22 33 44 00 00 00 00 0x1082f 22334411
# echo 2 >/proc/cpu/alignment
# ./align
0x10828 00 00 00 00 11 22 33 44 0x10828 11223344
0x10828 00 00 00 11 22 33 44 00 0x10829 11223344
0x10828 00 00 11 22 33 44 00 00 0x1082a 11223344
0x10828 00 11 22 33 44 00 00 00 0x1082b 11223344
0x10828 11 22 33 44 00 00 00 00 0x1082c 11223344
0x10828 22 33 44 00 00 00 00 00 0x1082d 11223344
0x10828 33 44 00 00 00 00 00 00 0x1082e 11223344
0x10828 44 00 00 00 00 00 00 00 0x1082f 11223344
# echo 4 >/proc/cpu/alignment
# ./align
0x10828 00 00 00 00 11 22 33 44 0x10828 11223344
Bus error

ARMv6 added support for unaligned word and halfword load and store data access.

ARMv6 introduces unaligned word and halfword load and store data access support. When this is enabled, the processor uses one or more memory accesses to generate the required transfer of adjacent bytes transparently to the programmer, apart from a potential access time penalty where the transaction crosses an IMPLEMENTATION DEFINED cache-line, bus-width or page boundary condition. Doubleword accesses must be word-aligned in this configuration.

The feature can be configured by U bit and A bit of CP15 register 1. Once it’s enabled, the behavior is just the same as Linux Kernel fixup for end user.

May 11 2012

Ender Dai

Get assembly / C code map with objdump

We can use objdump to disassamble a binary executable. Sometimes it would be handy if we can get a map between the assambly and C source code, which enables us to quickly locate the assembly of a specific C block. -S and -l flags of objdump provide some means to archive that.

-S

–source

Display source code intermixed with disassembly, if possible. Implies -d.

-l

–line-numbers

Label the display (using debugging information) with the filename and source line numbers corresponding to the object code or relocs shown. Only useful with -d, -D, or -r.

As long as the application is compiled with gcc -g, we can use objdump -S -l to disassemble the application binary and get the map we want.

$ arm-linux-gcc -g zero_checksum.c -c
$ arm-linux-objdump -Sl zero_checksum.o | head -40

zero_checksum.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 <main>:
main():
/home/ender/src/tmp/zero_checksum.c:5
#include <stdio.h>

int
main(int argc, char **argv)
{
   0:   e1a0c00d        mov     ip, sp
   4:   e92dd800        stmdb   sp!, {fp, ip, lr, pc}
   8:   e24cb004        sub     fp, ip, #4      ; 0x4
   c:   e24dd010        sub     sp, sp, #16     ; 0x10
  10:   e50b0010        str     r0, [fp, #-16]
  14:   e50b1014        str     r1, [fp, #-20]
/home/ender/src/tmp/zero_checksum.c:7
        int i;
        unsigned char sum = 0;
  18:   e3a03000        mov     r3, #0  ; 0x0
  1c:   e54b3019        strb    r3, [fp, #-25]
/home/ender/src/tmp/zero_checksum.c:9
        signed char ssum;
        for (i=1; i<argc; i++) {
  20:   e3a03001        mov     r3, #1  ; 0x1
  24:   e50b3018        str     r3, [fp, #-24]
  28:   e51b2018        ldr     r2, [fp, #-24]
  2c:   e51b3010        ldr     r3, [fp, #-16]
  30:   e1520003        cmp     r2, r3
  34:   aa000020        bge     bc <.text+0xbc>
/home/ender/src/tmp/zero_checksum.c:10
            sum += (unsigned char)strtol(argv[i], NULL, 0);
  38:   e51b3018        ldr     r3, [fp, #-24]
  3c:   e1a02103        mov     r2, r3, lsl #2
  40:   e51b3014        ldr     r3, [fp, #-20]
  44:   e0823003        add     r3, r2, r3
  48:   e5930000        ldr     r0, [r3]
PREV 2/2 NEXT