The well known China Great Firewall (GFW) blocks the access to several
web services from China, such as Facebook, Twitter, and Youtube.
Personally, I have no interest in using Facebook or Twitter, and I
hardly watch online video streams. So it is not a big trouble for me.
However, the GFW will cause the infamous TCP RST issue when using
google. This is unacceptable since it will dramatically decay the
efficiency of every programmer.
Thanks to my wife, I obtained root access to an ubuntu server hosted
in U.S. and am able to setup a proxy server to break the GFW. The
following paragraphs details some key points of the procedure.
First of all, one must be noted that a traditional http proxy server
setup will NOT work in this case. The data flow between our web
browser and the proxy server is not encrypted, hence it is still
vulnerable to the keyword based GFW filter. So we must encrypt this
data channel to protect the data from been recognized by GFW keyword
filter.
The famous proxy software Squid provides
the capability to protect the data flow with SSL. However, this
feature is not built in the squid provided by some linux
distribution. To check whether squid installation supports this
feature or not:
$ squid -v | grep -- '--enable-ssl'
If the squid is not configured with the option –enable-ssl
, we
should build one from source with this configure option enabled.
The only vital directives in squid.conf to enable SSL is https_port
:
https_port 443 cert=/etc/squid3/limitedwish.org.crt key=/etc/squid3/limitedwish.org.key
This directive designates the port number used by SSL secured data
channel as well as the certificate and the corresponding private
key. The certificate can be self-signed for personal use:
$ openssl genpkey -out limitedwish.org.key -algorithm rsa
$ openssl req -new -key limitedwish.org.key -out limitedwish.org.csr
$ openssl x509 -req -days 365 -in limitedwish.org.csr -signkey limitedwish.org.key -out limitedwish.org.crt
The first command generates a RSA private key. The next command
creates a CSR (Certificate Signing Request) using this private
key. Openssl will ask you for several information, the most important
of which is the “Common Name”. Make sure you provided the correct FQDN
of the proxy server, otherwise the certificate will be denied by your
browser. The last command will use the CSR and the private key to
generated a self-signed certificate which is good for 365 days.
We can tune other directives in squid.conf to fit our needs, e.g. add
user authentication. However, https_port
is the most vital one.
Now we need to ask the browser to talk with our proxy server via SSL
protected channel. Unfortunately, most of the browsers have no such
feature built in (try stunnel if those
browsers are prefered). The only browser with such capability is the
Google Chrome as far as I know.
$ chrome --proxy-server=https://proxy.limitedwish.org:443
Since the certificate is self-signed, chrome will deny it and report
Error 136 (net::ERR_PROXY_CERTIFICATE_INVALID): Unknown error.
. We
should add just created certificate into trusted certificate list of
our OS or browser. In Mac, this can be achieved by import the
certificate into the keychain. Also, please make sure this certificate
is trusted when used by SSL.
Now it should be all set. Connect to Youtube to check if everything
works :)
Reference
- Some technical overview about the GFW
- How to create a self-signed SSL certificate
- Chrome Secure Web Proxy
- How to pass command line arguments to Mac OSX Apps
Git works with git/ssh/http/https protocols and we need to use
different ways to proxy them, which are summarized as below.
git
To proxy git running git protocol, we need
- A working socks proxy server, and
- BSD netcat as a helper utility. Please note that the so called
traditional netcat doesn’t fit our need as there is not proxy
support.
$ cat ~/bin/socks-gateway.sh
#!/bin/bash
METHOD="-X 5 -x proxy-prc.intel.com:1080"
/bin/nc $METHOD $*
$ export GIT_PROXY_COMMAND=~/bin/socks-gateway.sh
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Cloning into 'linux'...
remote: Counting objects: 97264
....
http/https
Git utilize curl to fetch data over http/https, so we can simply point
the environment variable http_proxy
or https_proxy
to our proxy
server and then it is all set.
$ export http_proxy=http://proxy-prc.intel.com:911
$ git clone http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Cloning into 'linux'...
remote: Counting objects: 45896
...
$ export https_proxy=http://proxy-prc.intel.com:911
$ git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Cloning into 'linux'...
remote: Counting objects: 93630
...
Reference
Search the phrase core.gitProxy
, http.proxy
and https_proxy
in
git-config(1) for more
detailed and information.
As you may already know, the linker option -l
should be used to
specify the name of the library which is needed by your application
when it was compiled. For example, if we write a program which utilize
libjpeg, we should specify -ljpeg
explicitly when we compile it:
The problem lies behind is where does gcc find the library we
specified with -ljpeg
. The only clue we get here is the string
jpeg
, a.k.a. the NAMESPEC
. So we need to understand the rules
used by toolchain which will help us to find the correct library file
in the filesystem. This is fairly straightforward: The toolchain will
search for a file named libjpeg.so
in some directories and use the
first one it finds. The directories searched includes several standard
system directories plus any that you specify with -L
. We can check
the directory list with gcc option -v
when we compile our
program. libjpeg.so
here is the linkname
of the library.
Now we have get the program compiled successfully, but when we run the
binary, how could the dynamic linker find the library installed in
our system? One of the obvious way is to save the path of the library
file we find in linking stage into the binary we created and let the
dynamic linker to get the file in that path. Well, this is not
flexible enough though, because the system you run the application is
not necessarily to be the same one that compiled it, and the library
may not be installed in the same directory between two systems. So
let’s go a little further, we just save the linkname
into the binary
and let the dynamic linker to search it in a list of directories, just
like what we have done when we link our application.
This is better, but not enough. One of the benefits comes with shared
library is that we can upgrade the library (e.g. to fix bug) without
rewrite / rebuild our application. Different versions of one library
may not be compatible. For example, our application probably used a
API which was introduced in version 6 of libjpeg, then it won’t work
with libjpeg prior to version 6. Different applications installed in
one system may depend on different versions of one library, hence it
is very common that several versions of a same library coexist in one
system. We need to add some kinds of version info into the linkname
,
e.g. libjpeg.so.1
, so dynamic linker can find the correct
library. libjpeg.so.1
here is the soname
of the library. We
should save soname
instead of linkname
in the binary.
Note that not every library upgrade breaks the ABI compatibility –
some upgrades just fix bugs, therefore minor version number is
necessary, e.g. libjpeg.so.1.12
, which hereafter referenced as
realname
.
Now let’s put this all together. When we create a shared library, we
should specify the soname
and realname
separately, as they are
usually not the same.
$ gcc -Wall -Werror -fPIC -c fibonacci.c
$ gcc -shared -Wl,-soname,libfibo.so.1 -o libfibo.so.1.0.1 fibonacci.o
$ ls -l
-rw-r--r-- 1 ender ender 174 Jul 15 17:59 fibonacci.c
-rw-r--r-- 1 ender ender 1464 Jul 16 10:32 fibonacci.o
-rwxr-xr-x 1 ender ender 6329 Jul 16 10:32 libfibo.so.1.0.1
We can check the the soname with readelf
:
$ readelf -d libfibo.so.1.0.1
Dynamic section at offset 0x7a0 contains 26 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000e (SONAME) Library soname: [libfibo.so.1]
0x000000000000000c (INIT) 0x560
0x000000000000000d (FINI) 0x6ec
0x0000000000000019 (INIT_ARRAY) 0x200788
0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
0x000000000000001a (FINI_ARRAY) 0x200790
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x0000000000000004 (HASH) 0x1b8
0x000000006ffffef5 (GNU_HASH) 0x200
0x0000000000000005 (STRTAB) 0x378
0x0000000000000006 (SYMTAB) 0x240
0x000000000000000a (STRSZ) 186 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000003 (PLTGOT) 0x2009a8
0x0000000000000002 (PLTRELSZ) 48 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x530
0x0000000000000007 (RELA) 0x470
0x0000000000000008 (RELASZ) 192 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffe (VERNEED) 0x450
0x000000006fffffff (VERNEEDNUM) 1
0x000000006ffffff0 (VERSYM) 0x432
0x000000006ffffff9 (RELACOUNT) 3
0x0000000000000000 (NULL) 0x0
Now let’s try to build a program which depends on libfibo
$ gcc main.c -L. -lfibo -o fib
/usr/bin/ld: cannot find -lfibo
collect2: error: ld returned 1 exit status
This fails because the linkname
is used in linking stage. The linker
can not find a file named libfibo.so
in this case.
$ ln -s libfibo.so.1.0.1 libfibo.so
$ gcc main.c -L. -lfibo -o fib
$ ls -l
-rwxr-xr-x 1 ender ender 7330 Jul 16 10:32 fib
-rw-r--r-- 1 ender ender 174 Jul 15 17:59 fibonacci.c
-rw-r--r-- 1 ender ender 1464 Jul 16 10:32 fibonacci.o
lrwxrwxrwx 1 ender ender 16 Jul 16 10:41 libfibo.so -> libfibo.so.1.0.1
-rwxr-xr-x 1 ender ender 6329 Jul 16 10:32 libfibo.so.1.0.1
-rw-r--r-- 1 ender ender 152 Jul 15 17:41 main.c
Compile succeed. But the binary won’t run, because soname
is used by
dynamic linker to find the library file when we are trying to run the
binary:
$ ./fib
./fib: error while loading shared libraries: libfibo.so.1: cannot open
shared object file: No such file or directory
$ ldd fib
linux-vdso.so.1 => (0x00007fffe75ff000)
libfibo.so.1 => not found
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff65ad78000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff65b119000)
Ok, let’s create a symbol link for the soname
:
$ ln -s libfibo.so.1.0.1 libfibo.so.1
$ ls -l
-rwxr-xr-x 1 ender ender 7330 Jul 16 10:32 fib
-rw-r--r-- 1 ender ender 174 Jul 15 17:59 fibonacci.c
-rw-r--r-- 1 ender ender 1464 Jul 16 10:32 fibonacci.o
lrwxrwxrwx 1 ender ender 16 Jul 16 10:41 libfibo.so -> libfibo.so.1.0.1
lrwxrwxrwx 1 ender ender 16 Jul 16 10:45 libfibo.so.1 -> libfibo.so.1.0.1
-rwxr-xr-x 1 ender ender 6329 Jul 16 10:32 libfibo.so.1.0.1
-rw-r--r-- 1 ender ender 152 Jul 15 17:41 main.c
$ ./fib
./fib: error while loading shared libraries: libfibo.so.1: cannot open
shared object file: No such file or directory
$ ldd fib
linux-vdso.so.1 => (0x00007fffe75ff000)
libfibo.so.1 => not found
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff65ad78000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff65b119000)
Hmmm? Still failed! Well, this is because the current directory is not
in the search path list of the dynamic linker. We have several way to
get rid of it. The first one is the environment variable
LD_LIBRARY_PATH
:
$ LD_LIBRARY_PATH=. ./fib
fibonacci(1) = 1
$ LD_LIBRARY_PATH=. ldd fib
linux-vdso.so.1 => (0x00007fffd71ff000)
libfibo.so.1 => ./libfibo.so.1 (0x00007f9f55a02000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9f55663000)
/lib64/ld-linux-x86-64.so.2 (0x00007f9f55c05000)
The second one is a little bit more complex. We can ask toolchain to
embed into the binary some additional search path for dynamic
linker. In this way we don’t need the end user to do some special
configuration to make the binary run, however we must rebuild our
application:
$ gcc main.c -L. -lfibo -Wl,-rpath,$(pwd) -o fib
$ ./fib
fibonacci(1) = 1
$ ldd fib-rpath
linux-vdso.so.1 => (0x00007fff5a5ff000)
libfibo.so.1 => /home/ender/src/git/arsenal/library/shared/libfibo.so.1 (0x00007f0b06303000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0b05f64000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0b06506000)
As the end of this article, let’s get some insight into how dynamic
linker finds the correct library:
$ LD_DEBUG=libs ./fib
9122: find library=libfibo.so.1 [0]; searching
9122: search path=/home/ender/src/git/arsenal/library/shared/tls/x86_64:/home/ender/src/git/arsenal/library/shared/tls:/home/ender/src/git/arsenal/library/shared/x86_64:/home/ender/src/git/arsenal/library/shared (RPATH from file ./fib)
9122: trying file=/home/ender/src/git/arsenal/library/shared/tls/x86_64/libfibo.so.1
9122: trying file=/home/ender/src/git/arsenal/library/shared/tls/libfibo.so.1
9122: trying file=/home/ender/src/git/arsenal/library/shared/x86_64/libfibo.so.1
9122: trying file=/home/ender/src/git/arsenal/library/shared/libfibo.so.1
9122:
9122: find library=libc.so.6 [0]; searching
9122: search path=/home/ender/src/git/arsenal/library/shared (RPATH from file ./fib)
9122: trying file=/home/ender/src/git/arsenal/library/shared/libc.so.6
9122: search cache=/etc/ld.so.cache
9122: trying file=/lib/x86_64-linux-gnu/libc.so.6
9122:
9122:
9122: calling init: /lib64/ld-linux-x86-64.so.2
9122:
9122:
9122: calling init: /lib/x86_64-linux-gnu/libc.so.6
9122:
9122:
9122: calling init: /home/ender/src/git/arsenal/library/shared/libfibo.so.1
9122:
9122:
9122: initialize program: ./fib
9122:
9122:
9122: transferring control: ./fib
9122:
fibonacci(1) = 1
9122:
9122: calling fini: ./fib [0]
9122:
9122:
9122: calling fini: /home/ender/src/git/arsenal/library/shared/libfibo.so.1 [0]
9122:
Reference
First of all let’s check the demo code below:
Compile and run on ARM926EJ-S:
$ arm-linux-gcc -o tst tst.c
...
<copy to arm board>
...
# ./tst
0x80000:0x39020000
0x800:0x390200
Surprise, isn’t it? Well, let’s check the address of the array
ReqBuffer[]
:
$ arm-linux-nm tst | grep ReqBuffer
00010741 B ReqBuffer
So, it’s not word-aligned. What if we force it to be aligned on word
boundary? After modifying line 14 into uint8_t ReqBuffer[100]
__attribute__ ((aligned (4)));
:
$ arm-linux-gcc -o tst tst.c
$ arm-linux-nm tst | grep ReqBuffer
00010744 B ReqBuffer
# ./tst
0x80000:0x39020000
0x80000:0x39020000
Now we get expected result, but why? We need to delve into “ARM
Architecture Reference Manual” to get some insight. ARM926EJ-S is
ARMv5 based, and in “A2.8 Unaligned data access” it said:
Prior to ARMv6, doubleword
(LDRD
/STRD
) accesses to memory,
where the address is not doubleword
-aligned, are
UNPREDICTABLE. Also, data accesses to non-aligned word
and
halfword
data are treated as aligned from the memory interface
perspective. That is:
- the address is treated as truncated, with address
bits[1:0]
treated as zero for word
accesses, and address bit[0] treated as
zero for halfword
accesses.
- load single word ARM instructions are architecturally defined to
rotate right the word-aligned data transferred by a non
word-aligned address one, two or three bytes depending on the
value of the two least significant address bits.
- alignment checking is defined for implementations supporting a
System Control coprocessor using the A bit in
CP15
register 1. When this bit is set, a Data Abort indicating an
alignment fault is reported for unaligned accesses.
To understand first two of the statements, it would be better to try
it out.
$ arm-linux-gcc -o align align.c
# ./align
0x10828 00 00 00 00 11 22 33 44 0x10828 11223344
0x10828 00 00 00 00 11 22 33 44 0x10829 44112233
0x10828 00 00 00 00 11 22 33 44 0x1082a 33441122
0x10828 00 00 00 00 11 22 33 44 0x1082b 22334411
0x10828 11 22 33 44 00 00 00 00 0x1082c 11223344
0x10828 11 22 33 44 00 00 00 00 0x1082d 44112233
0x10828 11 22 33 44 00 00 00 00 0x1082e 33441122
0x10828 11 22 33 44 00 00 00 00 0x1082f 22334411
As you can see, if the word pointer is not word-aligned, the value you
read out is not necessarily to be equal with the one you write into
previously. The behavior is annoying and may lead to program bug which
is not easy to find out, just like the one we demoed in the beginning
of the article.
Linux provided a software workaround for this issue. It’s implemented
based on Data Abort exception. If the A bit of CP15
register 1 is
set, a Data Abort will be reported for unaligned data access. So, when
Linux Kernel captured a Data Abort, it will check CP15
register 5
(Fault Status Register, FSR
), if it turns out that the Data Abort is
caused by unaligned data access, the Kernel will try to fix the data
access on the fly. For Linux Kernel 2.6.28, the fixup for unaligned
data access in kernel space is mandatory, but for user space
application, the behavior can be configured via
/proc/cpu/alignment
. Valid configurations are: 0 for ignored
, 1
for warn
, 2 for fixup
, 3 for fixup+warn
, 4 for signal
, 5 for
signal+warn
:
# echo 0 >/proc/cpu/alignment
# ./align
0x10828 00 00 00 00 11 22 33 44 0x10828 11223344
0x10828 00 00 00 00 11 22 33 44 0x10829 44112233
0x10828 00 00 00 00 11 22 33 44 0x1082a 33441122
0x10828 00 00 00 00 11 22 33 44 0x1082b 22334411
0x10828 11 22 33 44 00 00 00 00 0x1082c 11223344
0x10828 11 22 33 44 00 00 00 00 0x1082d 44112233
0x10828 11 22 33 44 00 00 00 00 0x1082e 33441122
0x10828 11 22 33 44 00 00 00 00 0x1082f 22334411
# echo 2 >/proc/cpu/alignment
# ./align
0x10828 00 00 00 00 11 22 33 44 0x10828 11223344
0x10828 00 00 00 11 22 33 44 00 0x10829 11223344
0x10828 00 00 11 22 33 44 00 00 0x1082a 11223344
0x10828 00 11 22 33 44 00 00 00 0x1082b 11223344
0x10828 11 22 33 44 00 00 00 00 0x1082c 11223344
0x10828 22 33 44 00 00 00 00 00 0x1082d 11223344
0x10828 33 44 00 00 00 00 00 00 0x1082e 11223344
0x10828 44 00 00 00 00 00 00 00 0x1082f 11223344
# echo 4 >/proc/cpu/alignment
# ./align
0x10828 00 00 00 00 11 22 33 44 0x10828 11223344
Bus error
ARMv6 added support for unaligned word
and halfword
load and store
data access.
ARMv6 introduces unaligned word and halfword load and store data
access support. When this is enabled, the processor uses one or more
memory accesses to generate the required transfer of adjacent bytes
transparently to the programmer, apart from a potential access time
penalty where the transaction crosses an IMPLEMENTATION DEFINED
cache-line, bus-width or page boundary condition. Doubleword
accesses must be word-aligned in this configuration.
The feature can be configured by U bit and A bit of CP15
register 1. Once it’s enabled, the behavior is just the same as Linux
Kernel fixup for end user.
We can use objdump
to disassamble a binary executable. Sometimes it
would be handy if we can get a map between the assambly and C source
code, which enables us to quickly locate the assembly of a specific C
block. -S
and -l
flags of objdump
provide some means to archive
that.
-S
–source
Display source code intermixed with disassembly, if possible. Implies -d.
-l
–line-numbers
Label the display (using debugging information) with the filename and
source line numbers corresponding to the object code or relocs shown.
Only useful with -d, -D, or -r.
As long as the application is compiled with gcc -g
, we can use
objdump -S -l
to disassemble the application binary and get the map
we want.
$ arm-linux-gcc -g zero_checksum.c -c
$ arm-linux-objdump -Sl zero_checksum.o | head -40
zero_checksum.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <main>:
main():
/home/ender/src/tmp/zero_checksum.c:5
#include <stdio.h>
int
main(int argc, char **argv)
{
0: e1a0c00d mov ip, sp
4: e92dd800 stmdb sp!, {fp, ip, lr, pc}
8: e24cb004 sub fp, ip, #4 ; 0x4
c: e24dd010 sub sp, sp, #16 ; 0x10
10: e50b0010 str r0, [fp, #-16]
14: e50b1014 str r1, [fp, #-20]
/home/ender/src/tmp/zero_checksum.c:7
int i;
unsigned char sum = 0;
18: e3a03000 mov r3, #0 ; 0x0
1c: e54b3019 strb r3, [fp, #-25]
/home/ender/src/tmp/zero_checksum.c:9
signed char ssum;
for (i=1; i<argc; i++) {
20: e3a03001 mov r3, #1 ; 0x1
24: e50b3018 str r3, [fp, #-24]
28: e51b2018 ldr r2, [fp, #-24]
2c: e51b3010 ldr r3, [fp, #-16]
30: e1520003 cmp r2, r3
34: aa000020 bge bc <.text+0xbc>
/home/ender/src/tmp/zero_checksum.c:10
sum += (unsigned char)strtol(argv[i], NULL, 0);
38: e51b3018 ldr r3, [fp, #-24]
3c: e1a02103 mov r2, r3, lsl #2
40: e51b3014 ldr r3, [fp, #-20]
44: e0823003 add r3, r2, r3
48: e5930000 ldr r0, [r3]