Hello World | Adel Zhang

2012年8月30日星期四

enum的小问题

在这个博客上看到这样一个问题:

//问题1
//foo函数输出什么?
enum MY_ENUM{
  MY_OK = 0,
  MY_NOT_OK,
}

void foo()
{
   int i = -1;
   enum My_ENUM my_enum = MY_OK;

   if( i < my_enum) printf("I am OK!\n");
   else printf("I am NOT OK!\n");

}

如果对上面的函数做小小的修改，又是什么结果？

//问题2
//foo函数输出什么？
enum MY_ENUM{
  MY_OK = 0,
  MY_NOT_OK,
}

void foo()
{
   int i = -1;
   enum My_ENUM my_enum = MY_OK;

   if( i < MY_OK) printf("I am OK!\n");
   else printf("I am NOT OK!\n");

}

FOR THE MARGIN. HAVE BETTER MATHOD?

文中给出的答案是问题1输出"I am NOT OK!",而问题2输出"I am OK!".多多少少让我有点琢磨不透，为什么会这样呢？

实际上，正如文中提到的，gcc确实是这样的结果。但如果同样的程序在msvc去运行，就会发现问题1和问题2都是输出"I am OK!". 这是跟编译器实现有关的问题。在gcc中，当enum常数 (所谓的enum常数就是enum类型中标记的那些值，如程序中的MY_OK)没有负数时，gcc将 enum类型变量（注意与enum常数的区别，如程序中的my_enum）存储为无符号int。当enum常数有负数时，则仍然乖乖地存储为int。而enum常数是一直都存储为int。msvc则是不管什么情况，都将enum常数和enum类型存储为int。在这个实现上，我觉得gcc真是吃饱了撑的，你觉得呢？

综上所述，这是个不好的问题。姑且一笑而过。

参考资料

2012年8月26日星期日

java对象模型

原文地址：http://www.codeinstructions.com/2008/12/java-objects-memory-structure.html

Update (December 18th, 2008): I've posted here an experimental library that implements Sizeof for Java.

One thing about Java that has always bothered me, given my C/C++ roots, is the lack of a way to figure out how much memory is used by an object. C++ features the sizeof operator, that lets you query the size of primitive types and also the size of objects of a given class. This operator in C and C++ is useful for pointer arithmetic, copying memory around, and IO, for example.

Java doesn't have a corresponding operator. In reality, Java doesn't need one. Size of primitive types in Java is defined in the language specification, whereas in C and C++ it depends on the platform. Java has its own IO infrastructure built around serialization. And both pointer arithmetic and bulk memory copy don't apply because Java doesn't have pointers.

But every Java developer at some point wondered how much memory is used by a Java object. The answer, it turns out, is not so simple.

The first distinction to be made is between shallow size and deep size. The shallow size of an object is the space occupied by the object alone, not taking into account size of other objects that it references. The deep size, on the other hand, takes into account the shallow size of the object, plus the deep size of each object referenced by this object, recursively. Most of the times you will be interested on knowing the deep size of an object, but, in order to know that, you need to know how to calculate the shallow size first, which is what I'm going to talk about here.

One complication is that runtime in memory structure of Java objects is not enforced by the virtual machine specification, which means that virtual machine providers can implement them as they please. The consequence is that you can write a class, and instances of that class in one VM can occupy a different amount of memory than instances of that same class when run in another VM. Most of the world, including myself, uses the Sun HotSpot virtual machine though, which simplifies things a lot. The remainder of the discussion will focus on the 32 bit Sun JVM. I will lay down a few 'rules that will help explain how the JVM organizes the objects' layout in memory.

Memory layout of classes that have no instance attributes

In the Sun JVM, every object (except arrays) has a 2 words header. The first word contains the object's identity hash code plus some flags like lock state and age, and the second word contains a reference to the object's class. Also, any object is aligned to an 8 bytes granularity. This is the first rule or objects memory layout:

Rule 1: every object is aligned to an 8 bytes granularity.

Now we know that if we call new Object(), we will be using 8 bytes of the heap for the two header words and nothing else, since the Objectclass doesn't have any fields.

Memory layout of classes that extend Object

After the 8 bytes of header, the class attributes follow. Attributes are always aligned in memory to their size. For instance, ints are aligned to a 4 byte granularity, and longs are aligned to an 8 byte granularity. There is a performance reason to do it this way: usually the cost to read a 4 bytes word from memory into a 4 bytes register of the processor is much cheaper if the word is aligned to a 4 bytes granularity.

In order to save some memory, the Sun VM doesn't lay out object's attributes in the same order they are declared. Instead, the attributes are organized in memory in the following order:

doubles and longs
ints and floats
shorts and chars
booleans and bytes
references

This scheme allows for a good optimization of memory usage. For example, imagine you declared the following class:

class MyClass {
    byte a;
    int c;
    boolean d;
    long e;
    Object f;        
}

If the JVM didn't reorder the attributes, the object memory layout would be like this:

[HEADER:  8 bytes]  8
[a:       1 byte ]  9
[padding: 3 bytes] 12
[c:       4 bytes] 16
[d:       1 byte ] 17
[padding: 7 bytes] 24
[e:       8 bytes] 32
[f:       4 bytes] 36
[padding: 4 bytes] 40

Notice that 14 bytes would have been wasted with padding and the object would use 40 bytes of memory. By reordering the objects using the rules above, the in memory structure of the object becomes:

[HEADER:  8 bytes]  8
[e:       8 bytes] 16
[c:       4 bytes] 20
[a:       1 byte ] 21
[d:       1 byte ] 22
[padding: 2 bytes] 24
[f:       4 bytes] 28
[padding: 4 bytes] 32

This time, only 6 bytes are used for padding and the object uses only 32 bytes of memory.

So here is rule 2 of object memory layout:

Rule 2: class attributes are ordered like this: first longs and doubles; then ints and floats; then chars and shorts; then bytes and booleans, and last the references. The attributes are aligned to their own granularity.

Now we know how to calculate the memory used by any instance of a class that extends Object directly. One practical example is the java.lang.Boolean class. Here is its memory layout:

[HEADER:  8 bytes]  8 
[value:   1 byte ]  9
[padding: 7 bytes] 16

An instance of the Boolean class takes 16 bytes of memory! Surprised? (Notice the padding at the end to align the object size to an 8 bytes granularity.)

Memory layout of subclasses of other classes

The next three rules are followed by the JVM to organize the the fields of classes that have superclasses. Rule 3 of object memory layout is the following:

Rule 3: Fields that belong to different classes of the hierarchy are NEVER mixed up together. Fields of the superclass come first, obeying rule 2, followed by the fields of the subclass.

Here is an example:

class A {
   long a;
   int b;
   int c;
}

class B extends A {
   long d;
}

An instance of B looks like this in memory:

[HEADER:  8 bytes]  8
[a:       8 bytes] 16
[b:       4 bytes] 20
[c:       4 bytes] 24
[d:       8 bytes] 32

The next rule is used when the fields of the superclass don't fit in a 4 bytes granularity. Here is what it says:

Rule 4: Between the last field of the superclass and the first field of the subclass there must be padding to align to a 4 bytes boundary.

Here is an example:

class A {
   byte a;
}

class B {
   byte b;
}

[HEADER:  8 bytes]  8
[a:       1 byte ]  9
[padding: 3 bytes] 12
[b:       1 byte ] 13
[padding: 3 bytes] 16

Notice the 3 bytes padding after field a to align b to a 4 bytes granularity. That space is lost and cannot be used by fields of class B.

The final rule is applied to save some space when the first field of the subclass is a long or double and the parent class doesn't end in an 8 bytes boundary.

Rule 5: When the first field of a subclass is a double or long and the superclass doesn't align to an 8 bytes boundary, JVM will break rule 2 and try to put an int, then shorts, then bytes, and then references at the beginning of the space reserved to the subclass until it fills the gap.

Here is an example:

class A {
  byte a;
}

class B {
  long b;
  short c;  
  byte d;
}

Here is the memory layout:

[HEADER:  8 bytes]  8
[a:       1 byte ]  9
[padding: 3 bytes] 12
[c:       2 bytes] 14
[d:       1 byte ] 15
[padding: 1 byte ] 16
[b:       8 bytes] 24

At byte 12, which is where class A 'ends', the JVM broke rule 2 and stuck a short and a byte before a long, to save 3 out of 4 bytes that would otherwise have been wasted.

Memory layout of arrays

Arrays have an extra header field that contain the value of the 'length' variable. The array elements follow, and the arrays, as any regular objects, are also aligned to an 8 bytes boundary.

Here is the layout of a byte array with 3 elements:

[HEADER:  12 bytes] 12
[[0]:      1 byte ] 13
[[1]:      1 byte ] 14
[[2]:      1 byte ] 15
[padding:  1 byte ] 16

And here is the layout of a long array with 3 elements:

[HEADER:  12 bytes] 12
[padding:  4 bytes] 16
[[0]:      8 bytes] 24
[[1]:      8 bytes] 32
[[2]:      8 bytes] 40

Memory layout of inner classes

Non-static inner classes have an extra 'hidden' field that holds a reference to the outer class. This field is a regular reference and it follows the rule of the in memory layout of references. Inner classes, for this reason, have an extra 4 bytes cost.

Final thoughts

We have learned how to calculate the shallow size of any Java object in the 32 bit Sun JVM. Knowing how memory is structured can help you understand how much memory is used by instances of your classes.

2012年8月19日星期日

神奇的异或

熟练理解和掌握位操作，在有些情况下能使程序的效率提高不少，当然代码的可读性就降低了，维护也比较困难。相比于与操作(&),或操作(|)，异或操作(^)显得更难以理解，甚至看不出它有什么妙用。下面列举一些用到异或操作的问题，或许能从中学到一些新的东西。

不使用中间变量，交换两个数的值

  //version 1
  void swap(int &a,int &b){
      a ^= b;
      b ^= a;
      a ^= b;
  }

  //version 2
  void swap(int &a,int &b){
      a -= b;
      b += a;
      a = b-a;
  }


/*注意这两个版本都有个bug，当a，b指向同一个对象时，会变成0。而且有文章指出，版本
1位操作未必比使用中间变量的性能好，再考虑到可读性，那这样的代码就更不推荐了。
姑且就当作一种娱乐吧。
*/

有一组数字,每个数字都不相同，放在大小为N-1的数组里，数字的范围是1-N，找出丢失的那个数。只能申请少于O(N)的常数空间，时间复杂度O(n)。如果丢失两个数呢？

/*容易想到的O(n)算法是将所有的数相加，然后用N(N+1)/2去减，得到的差就是丢失的那个
数。但问题在于所有的数之和可能会溢出。还有人说用 sum(array[i]-i),实际上这也没
有解决可能的溢出问题。

异或可以解决这个问题。将所有的数array[i]异或得到k，k再与1到N所有的数异或得到
k',则k'就是丢失的那个数a。

如果丢失两个数，同样用异或的方法。假设丢失的是a,b,则 k' = a^b,k'获得方法如上。
定义lowbit(i)函数,返回i的二进制表示中不为0的最低位，则a，b在lowbit(k')位上肯定
是不同的。将原数组array[1..N-1]中lowbit(k')位为1的数异或得到k'',再用k''与1-N中
lowbit(k')位为1的数异或得到k''',则k'''为a,b其中的一个，再用k''' ^ k'得到另一个。
*/

  //查找唯一丢失的数
  //array 长度为N-1，值范围为1-N，数组中每个数都不相同
  int findmiss(int *array,int N){
      int a; // result
      int i;
      a = 0;
      for(i=0;i<N-1;i++)
          a ^= array[i];
      for(i=1;i<=N;i++)
          a ^= i;
      return a;
  }
  
  //lowbit 的实现
  int lowbit(int i){
      return i & -i;
  }
  
  //查找丢失的两个数
  //array 长度为N-2，值范围为1-N，数组中每个数都不相同
  //a,b为丢失的数
  void findmiss2(int *array,int N,int *a,int *b){
      int i,k,lb;
      k = 0;
      *a = 0,*b = 0;
      for(i=0;i<N-2;i++)
          k ^= array[i];
      /* after this recurse, k=a^b */
      for(i=1;i<=N;i++)
          k ^= i;
      lb = lowbit(k); 
      for(i=0;i<N-2;i++){
          if(array[i] & lb != 0)
              *a ^= array[i];
          else
              *b ^= array[i];
      }
      for(i=1;i<=N;i++){
          if(i & lb != 0)
              *a ^= i;
          else
              *b ^= i;
      }
      return ;
  }

n个数中有且仅有一个数出现了奇数次（其他数都出现了偶数次），如果用线性时间常数空间找出这个数？如果是两个数出现了奇数次呢？如果是三个数呢？或者是k个数？

/*如果稍微想一想，就会发现这题与题目2有很多相似之处。所以下面的解法也很类似。
  
第一问：有且仅有一个数出现奇数次，对数组n个数从头到尾异或一遍，得到的值即为要
求的那个数。

第二问：如果有两个数出现了奇数次。同题目2，先异或所有元素得到k，取lowbit(k),
按照lowbit(k)将原数组分为两组，对这两组分别求异或，就得到要求的两个数。

第三问: 如果是三个数。这一问就很难了，即使知道应该从异或这个角度去考虑，也很难
想出好方法。当然，总有牛人会想出好方法，关键是找出第一个来，然后借助第二问结论
求另外两个。为了不影响意思的理解，转载原文如下:

// Let s = a ^ b ^ c.  We know that s not in (a, b, c), 
// since if s == a, say, then b ^ c == 0 and b == c.  
// Let f(x) be the lowest bit where x differs from s.  
// The algorithm computes flips = f(a) ^ f(b) ^ f(c), 
// since the numbers appearing an even number of times cancel.  
// The variable flips has parity 1, so it is non-zero, 
// and lowbit(flips) is a bit where one or three of a, b, c 
// differ from s.  It can't be three(这个需要仔细想想), however, so the final 
// exclusive-or includes exactly one of a, b, c.

第四问：如果是k个数。从第3问我们可以发现，用类似的方法总可以找出第一个来，然
后将问题转化为k-1规模的，则可以递归解决。复杂度为O(k^2*n).
*/

  // get lowest different bit
  int lowbit(int x){
      return x & ~(x - 1);
  }
  
  // Given an array of n integers, such that each number 
  // in the array appears exactly twice, except for two 
  // numbers (say a and b) which appear exactly once.
  // 
  // In O(n) time and O(1) space find a and b. 
  // e.g.
  // { 2 3 3 2 4 6 4 7 8 8 }  ---> a/b = { 6 7}
  void Find2(int seq[], int n, int& a, int& b)
  {
      ////XOR
      int xors = 0;
      for(int i = 0; i < n; i++)
          xors ^= seq[i];
  
      ////get different bit
      int diff = lowbit(xors);
  
      ////
      a = 0;
      b = 0;
      for(int i = 0; i < n; i++){
          if(diff & seq[i])
              a ^= seq[i];
          else
              b ^= seq[i];
      }
  }
  
  
  // Given an array of n integers, such that each number 
  // in the array appears exactly twice, except for three 
  // numbers (say a, b and c) which appear exactly once.
  // 
  // In O(n) time and O(1) space find a,b and c. 
  // e.g.
  // { 2 3 3 2 4 6 4 7 8 8 1 }  ---> a/b = { 6 7 1}
  
  
  // Let s = a ^ b ^ c.  We know that s not in (a, b, c), 
  // since if s == a, say, then b ^ c == 0 and b == c.  
  // Let f(x) be the lowest bit where x differs from s.  
  // The algorithm computes flips = f(a) ^ f(b) ^ f(c), 
  // since the numbers appearing an even number of times cancel.  
  // The variable flips has parity 1, so it is non-zero, 
  // and lowbit(flips) is a bit where one or three of a, b, c 
  // differ from s.  It can't be three, however, so the final 
  // exclusive-or includes exactly one of a, b, c.
  
  void Find3(int seq[], int n, int& a, int& b, int& c)
  {
      ////XOR
      int xors = 0;
      for(int i = 0; i < n; i++)
          xors ^= seq[i];
  
      ////
      int flips = 0;
      for(int i = 0; i < n; i++)
          flips ^= lowbit(xors ^ seq[i]);
      flips = lowbit(flips);
  
      ////get one of three
      a = 0;
      for(int i = 0; i < n; i++){
          if(lowbit(seq[i] ^ xors) == flips)
              a ^= seq[i];
      }
  
      ////swap a with the last element of seq
      for(int i = 0; i < n; i++){
          if(a == seq[i]){
              int temp = seq[i];
              seq[i] = seq[n - 1];
              seq[n - 1] = temp;
          }
      }
  
      ////call Find2() to get b and c
      Find2(seq, n - 1, b, c);
  }

n个数中只有一个数出现了一次，其他的数出现了3次，如何用O(n)的时间复杂度，常数空间的算法找出这个数？如果这个数出现两次呢？
```
/*考虑3进制。利用3进制的异或：  
1 xor 0 = 1
1 xor 1 = 2
1 xor 2 = 0
2 xor 0 = 2
2 xor 2 = 1
0 xor 0 = 0

详见参考资料[4]
*/
```

nim游戏.有若干堆石子，每次可以选择一堆石子，从这堆石子中拿走任意数量的石子，也就是至少拿走一个，最多把这堆石子全部拿走，两人轮流取，谁取走最后一个石子谁就赢.问是否先手必胜。

假设有n堆石子，数量分别为a1,a2,a3...an，那么如果石子的异或和不是0， 那么先手必
胜也就是a1^a2^...^an != 0,那么先手必胜。看起来很神奇吧，感觉很复杂的游戏居然用
异或就给解决了。是不是觉得自己很笨，怎么没有想到？没啥关系，早在16世纪，nim游
戏就已经提出来了，但是到了1901年才被哈佛的一个教授解决，说明这世上跟我一样笨的
人还是很多的。详细的讨论可以看 "http://en.wikipedia.org/wiki/Nim" .那先手要怎么走呢（想想怎么使异或变为0）？

参考资料

2012年8月10日星期五

10个paste命令的例子

题外话

虽然自己经常上cnbeta看新闻刷评论，也在hacknews看帖子,还上reddit看好文长见识，偶
尔也逛逛stackoverflow.但一直觉得没什么提高，就跟看体育新闻八卦新闻似的，过了也就
过了。问题症结在哪里？看了这篇博文给了我些许启发，就像是给我打了兴奋剂，重新燃起
我日刷一博的雄心。但我现在已经淡定了很多，日刷一博，一夜七次也只能是想想而已，拖
延症是要长期艰苦卓绝的斗争才有可能取得胜利的。所以我要求放低一点，以后看到好文
（难易不定，不过水平有限，基本上该都不难），尝试着翻译翻译，也算是不白费了浏览的
时间。

10个paste命令的例子

来源：原文地址

在这篇文章中，我们将在一些例子中看看怎么使用paste。按照man里的定义，paste命令是
用来 *合并文件行* 的。它对合并单文件的行和合并多个文件的行都是非常有用的。这篇文章
分为两个部分：
1. 单文件处理中的paste命令例子
2. 多文件处理中的paste命令例子

让我们先看看例子要处理的文件file1的内容：

  $ cat file1
  Linux
  Unix
  Solaris
  HPUX
  AIX

单文件paste命令

1. paste命令不带任何参数操作单文件时，与cat命令相同

     $ paste file1
     Linux
     Unix
     Solaris
     HPUX
     AIX

2. 连接文件中的所有行

     $ paste -s file1
     Linux Unix Solaris HPUX AIX

-s 选项连接文件中的所有行。因为没有指定分隔符，默认使用tab来分隔不同列。
3. 使用逗号分隔符连接所有行

     $ paste -d, -s file1
     Linux,Unix,Solaris,HPUX,AIX

-d 选项用来指定分隔符。同时使用-d和-s，将文件中的所有行连接为一行。
4. 以两列的形式合并文件行

     $ paste - - < file1
     Linux Unix
     Solaris HPUX
     AIX

“-”从标准输入中读入一行。两个“-”读两行，并且按照两列粘帖。 5. 使用冒号分隔符以两列合并文件行

       $ paste -d':' - - < file1
       Linux:Unix
       Solaris:HPUX
       AIX:

这与每两行连接文件相同。 6. 以三列合并文件行

       $ paste - - - < file1
       Linux Unix Solaris
       HPUX AIX

7. 合并文件成三列，使用两个不同的分隔符

       $ paste -d ':,' - - - < file1
       Linux:Unix,Solaris
       HPUX:AIX,

-d 选项可以接受多个分隔符。第1列和第2列用“：”分隔，第2列和第3列用','分隔。

多文件paste命令

我们先看看file2的内容:

  $ cat file2
  Suse
  Fedora
  CentOS
  OEL
  Ubuntu

1. 以并排的方式粘帖两个文件

       $ paste file1 file2
       Linux     Suse
       Unix     Fedora
       Solaris CentOS
       HPUX     OEL
       AIX     Ubuntu

paste命令这里被用来以并排的方式合并多个文件。就像上面例子显示的那样，文件是以并排的方式显示的。 2. 并排粘帖两个文件，使用逗号分隔符

       $ paste -d, file1 file2
       Linux,Suse
       Unix,Fedora
       Solaris,CentOS
       HPUX,OEL
       AIX,Ubuntu

3. paste命令也可以在多文件处理中使用标准输入

     $ cat file2 | paste -d, file1 -
     Linux,Suse
     Unix,Fedora
     Solaris,CentOS
     HPUX,OEL
     AIX,Ubuntu

下面也一样:

     $cat file1 | paste -d, - file2
     Linux,Suse
     Unix,Fedora
     Solaris,CentOS
     HPUX,OEL
     AIX,Ubuntu

再来一个(译注：注意与上面两个例子是不同的):

     $ cat file1 file2 | paste -d, - -
     Linux,Unix
     Solaris,HPUX
     AIX,Suse
     Fedora,CentOS
     OEL,Ubuntu

4. 从两个文件中交替读一行

     $paste -d'\n' file1 file2
     Linux
     Suse
     Unix
     Fedora
     Solaris
     CentOS
     HPUX
     OEL
     AIX
     Ubuntu

使用换行符作为分隔符，我们能交替地从两个文件中读取行。

2012年8月9日星期四

格式化字符串缺陷

起因是bbs有人问了这样一道面试题:

  char *p="%s";
  printf(p);

这段程序有什么问题?

然后各路神人各显神通，让我大开眼界，知道了所谓的“格式化字符串缺陷”东东。简单地说
就是，格式化输出函数（比如printf,sprintf,snprintf等）的行为是由格式化字符串决定
的,函数通过解析格式字符串,根据格式参数从栈中获取参数。所有若由用户来提供格式化字
符串时，攻击者就可以通过构造格式化字符串来达到各种神奇的效果（当然包括使程序崩
溃，甚至于得到root）。

格式化字符串(format string)就是以'\0'结尾的c string，包含一般文本和格式参数。例如:

  printf("The magic number is: %d\n",1911);

在输出时，格式参数"%d"将被实际参数1911替代。有些参数是以值传入到格式输出函数的
（比如%d,%u），有些则是需要指针（如%s）。
还要注意下'\n'这样的字符，称为转义字符，在c编译期，转义字符将被实际的字符替代
（因为这个实际字符一般是不能输出的，所以是以二进制的形式）。格式输出函数并不能识
别转义字符，实际上转义字符与格式输出函数半毛钱关系没有，但因为它们一直纠葛在一起，
通常被误以为是由格式输出函数来解析的。下面这段代码

  printf("The magic number is : \x25d\n",23);

也能正常工作，因为在编译器0x25替换为"%"。

简单的格式化字符串的例子包括文章开头的那道题目，可能会造成程序崩溃，或者打印出一
串乱码。又或者下面这段代码

  printf("%08x.%08x.%08x.%08x.%08x\n");

会打印出栈上的数据。甚至通过精心构造，可以访问任意内存地址甚至修改（见参考资料）。
参考资料中有详细的介绍，无奈水平有限，基本不能明白，期盼日后可以看得懂吧（这基本
上是没戏的意思）。

特别再提一下printf的一个很少用的格式参数"%n"，它的用法如下:

  int i;
  printf("hello world!\n%n",&i);//%n 的参数是一个整形指针
  printf("%d",i); // 打印出13

这样i中就保存了前面已输出的字节数，在这里i是13. 在利用格式化字符串缺陷来提权时，
就利用了这个格式参数，长见识了。

参考资料
1. Exploiting Format String Vulnerabilities
2. 格式化字符串缺陷(csdn的博客真难看！)

2012年7月18日星期三

emacsclient 下的复制粘帖

我现在习惯用emacs daemon模式，好处是很明显的，直接在终端中使用，并不用来回地切换终端和emacs。但一直以来有个问题都没有解决，就是在用emacsclient时，它并不能使用系统的剪贴板，使得emacsclient与其他程序间不能复制粘帖，以至于要用到复制粘帖时都用gedit来打开文件。

今天终于找到了一个方法，就是使用xclip。首先安装xclip：

sudo apt-get install xclip

然后下载xclip.el，看名字好像是国内的牛人的作品。在.emacs文件中添加以下语句即可。

(require 'xclip)
(xclip-mode 1)

2012年6月29日星期五

《c专家编程》笔记

第19页
关于类型不匹配的两个对比案例：
（1）

foo(const char **p) {}
main(int argc, char ** argv){
    foo(arvg);
}

(2)

char *cp;
const char *ccp;
ccp = cp;

例子（1）将提示警告“参数与原型不匹配”，例子（2）却是合法的。问题的症结在于“左边指针所指向的类型必须具有右边指针所指向类型的全部限定词”。按照这个规则，例子（2）因此是合法的。但例子（1）仍然不合法，因为指针p指向的是 const char*, 指针argv指向的是char*,这是两个不同的指针类型。

第23页
＊关于隐式转换的规则
ANSI C标准中，隐式转换可以通俗地表述为：

当执行算术运算时，操作数的类型如果不同，就会发生转换。数据类型一般朝着浮点精度更高，长度更长的方向转换，整型数如果转换为signed 不会丢失信息，就转换为signed,否则转换为unsigned。
对于隐式转换，K＆R C则采用无符号优先原则，就是当一个无符号类型与int或更小的整型混合使用时，结果类型是无符号类型。
当然，现在的编译器一般都是符合ANSI C 标准的。下面的code，对于ANSI C 和 K＆R C编译器中运行结果是不一样的。

main(){
    if(-1 < (unsigned char) 1)
        printf("-1 is less than (unsigned char)1 : ANSI semantics ");
    else
        printf("-1 NOT less than (unsigned char) 1： K＆R semantics ");
}

-1的位模式是一样的，但是在K＆R C 中，编译器将其解释为无符号数，也就是变成正数，所以“NOT less”。
那么，下面这段代码有问题么？

int array[] = {23,34,12,17,204,99,16};
#define TOTAL_ELEMENTS (sizeof(array)/sizeof(array[0]))
main(){
    int d = -1,x;
    /*...*/
    if(d <= TOTAL_ELEMENTS - 2)
        x = array[d+1];
   /*...*/
}

第30页
* switch 语句的误区

在这个位置上（指的是switch的左花括号之后）声明一些变量会被编译器很自然地接受，尽管在switch语句中为这些变量加上初始值是没有什么用处的，因为它绝不会被执行——语句从匹配表达式的case开始执行。
考虑下面这段代码：

int main(int argc, char *argv[])
{
    switch(4){
        int i = 3;
    case 1: printf("1\n");break;
    case 2: printf("2\n");break;
    case 3: printf("3\n");break;
    case 4: printf("4 %d\n",i);break;
    default: printf("default\n");break;
    }
    return 0;
}

程序将输出“4 0”（在我的机器上是输出0，实际上可能是任意值）。原因就在于switch语句直接跳到匹配位置的case处开始执行，所有的初始化并不会被执行。变量i是有定义的，i的可见范围为switch语句块，变量的声明是在编译期就可见，而初始化要等到运行时。可以说case语句就相当于goto。
另外要注意的是，上面的代码在g++编译是会出错的。c++ 与 c 的switch 语句不同？

第38页
＊运算符优先级和求值顺序
书中讲了一个小八卦，关于＆和＆＆的。Dennis Ritchie 发帖说在以前＆不只是位操作符，而且承担逻辑操作符＆＆的作用（＆＆还没出生）。后来为了区分位操作符和逻辑操作符，重新加入了＆＆。因为已大量存在类似于 if(a == b & c == d)这样的表达式，由于这个“历史原因”，＆的运算符优先级低于关系运算符。
优先级只是定义不同优先级间的计算顺序，但同一运算符内的多个操作数的计算顺序C是没有规定的（除了&&, ||, ?: 和，外)。类似的，C也没有指定函数各参数的求值顺序。

x = f() + g()*h()

上面的代码唯一可以确定的是乘法会在加法之前执行。但是操作数f(),g(),h()会以什么顺序执行并没有规定，可能f()会在乘法之前调用也可能在乘法之后调用，也可能在g()和h()之间调用。
类似地，

printf("%d %d\n",++n,power(2,n));

也认为上面代码是一种不良风格的代码，因为在不同的编译器中可能会得到不同的结果。
2012-06-20 22:03:14 回应

第46页
＊ maximal munch strategy(最大一口策略）
这个策略表示如果下一个标记有超过一种的解释方案，编译器将选取能组成最长字符序列的方案。所以，

z = y+++x;

将被理解为，而且只能被理解为

z = y++ + x;

但这种策略也会再次让人迷糊，比如，

z = y+++++x;

按照ANSI C 的策略，将被解析为，

z = y++ ++ +x;

所以会编译错误。虽然当理解做 z = y++ + ++x 时貌似是可行的，但按照策略，编译器并不会这么理解。
还有一个很鸡贼的错误。还能发现的出来的么？

ratio = *x/*y;

2012-06-20 22:27:47 回应

第117页
* 运行时数据结构
可执行文件a.out有三个段：文本段（text),数据段(data),bss段(bss)。text段为可执行文件的指令，data段保存初始化后的全局变量和静态变量。bss段这个名字原意是“Block Started By Symbol"的缩写，不知所云。有人更喜欢把它解释为“Better Save Space", 这更能体现bss的作用，它保存没有初始化的全局变量或静态变量。实际上它只是记录运行时所需要的bss段的大小记录，bss段（不像其他段）并不占据目标文件(就是a.out)的任何空间。当可执行文件由loader载入内存时，再按照bss段的记录申请空间。
譬如声明 int arr[1000] ; 的全局变量, bss段将增加 4000 Byte, a.out的大小并不会因此增加 4000 Byte。但如果声明的是初始化过的变量， int arr[1000] = {100}; (实验了一下，初始化为0效果跟没初始化一样), 则 bss段没有变化， data 段将增加4000 Byte, 另外 a.out 也会增加4000 Byte，说明data段是占用a.out的空间的。可以用 size 命令查看每个段的大小。
局部变量并不进入a.out,它们在运行时创建。
＊堆栈段（p122）

除了递归调用之外，堆栈并非必需。因为在编译时可以知道局部变量、参数和返回地址所需空间的固定大小，并可以将它们分配于bss段。BASIC，COBOL和FORTRAN的早期编译器并不允许函数的递归调用，所以它们在运行时并不需要动态的堆栈。允许递归调用意味着必须找到一种方法，在同一时刻允许局部变量的多个实例存在，但只有最近被创建的那个才能被访问，这很像栈的经典定义。
2012-06-23 11:06:58 回应

第170页
* 根据位模式构筑图形
在C语言中，典型的16x16的黑白图形可能如下：

static unsigned short stopwatch[] = {
0x07C6,
0x1FF7,
0x383B,
0x600C,
0xC006,
0xC006,
0xDF06,
0xC106,
0xC106,
0x610C,
0x610C,
0x3838,
0x1FF0,
0x07C0,
0x0000
};

正如所看到的那样，这些C语言常量并未提供有关图形实际模样的任何线索。这里有一个惊人的#define 定义的优雅集合，运行程序建立常量使它们看上去像是屏幕上的图形。

#define X   )*2+1
#define _   )*2
#define s   ((((((((((((((((0  /* 用于建立16位宽的图形 */
static unsigned short stopwatch[] = {
s _ _ _ _ _ X X X X X _ _ _ X X _,
s _ _ _ X X X X X X X X X _ X X X,
s _ _ X X X _ _ _ _ _ X X X _ X X,
s _ X X _ _ _ _ _ _ _ _ _ X X _ _,
s _ X X _ _ _ _ _ _ _ _ _ X X _ _,
s X X _ _ _ _ _ _ _ _ _ _ _ X X _,
s X X _ _ _ _ _ _ _ _ _ _ _ X X _,
s X X _ X X X X X _ _ _ _ _ X X _,
s X X _ _ _ _ _ X _ _ _ _ _ X X _,
s X X _ _ _ _ _ X _ _ _ _ _ X X _,
s _ X _ _ _ _ _ X _ _ _ _ X X _ _,
s _ X _ _ _ _ _ X _ _ _ _ X X _ _,
s _ _ X X X _ _ X _ _ X X X _ _ _,
s _ _ _ X X X X X X X X X _ _ _ _,
s _ _ _ _ _ X X X X X _ _ _ _ _ _,
s _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _,
};

千万不要忘了在绘图结束后清除这些宏定义，否则可能造成很大的麻烦。
2012-06-23 21:23:28 回应

第172页
* 整型提升
在C语言中，字符常量的类型为int 型。所以对于

printf("%d",sizeof 'A');

输出为4，不应该感到诧异。
但下面的语句有所不同：

char a = 'A';
char b = 'B';
printf ( " the size of the result of a+b :%d " ,sizeof( a+b) );

输出的结果也是 4（或许之前会认为应该输出1）。
这是由于发生了整型提升。两个操作数都不是三种浮点类型之一，它们一定是某种整值类型。在确定共同的目标提升类型之前，编译器将在所有小于int的整值类型上施加一个被称为整型提升(integral promotion)的过程。整型提升就是char、short int和位段类型（无论signed或unsigned）以及枚举类型将被提升为int，前提是int 能够完整地容纳原先的数据，否则将被转换为unsigned int。wchar_t和枚举类型被提升为能够表示其底层类型(underlying type)所有值的最小整数类型。一旦整型提升执行完毕，类型比较就又一次开始，也就是普通的算术类型转换。这在之前笔记有介绍过。
但是下面这个语句又会让人困惑：

char a = 'A';
char b = 'B';
printf ( " the size of the result of a++ :%d " ,sizeof( a++) );

或许你会认为，按照整型提升输出为4. 但实际上输出为 1。原因不详。
2012-06-23 22:55:21 回应

第281页
＊判断一个变量是有符号数还是无符号数
宏参数是个变量值时，大概可以这么做：

#define ISUNSIGNED(a) (a >=0 && ~a >= 0)

如果宏参数是类型时，可以这么做：

#define ISUNSIGNED(type) ((type)0 - 1 > 0)

其实，前一个代码由于整型提升的存在，并不能正常工作，比如 unsigned short us = 1 ,对us进行测试时就会得到错误答案。
我没有想出来怎么办，搜了一下，下面的代码可以工作：

#define ISUNSIGNED(a) ( a >= 0 && (a=~a,a >= 0? (a=~a,1):(a=~a,0)))

而对于第二个代码，我也觉得书上写得似乎有点不对，原因也在于整型提升，可以改为

#define ISUNSIGNED(type) ((type)- 1 > 0)

你认为呢？

* 从文件中随机提取一个字符串
只能顺序遍历文件，并且不能使用表格来存储所有字符串的偏移位置。书中给了一NB的方法。
2012-06-26 14:38:33 回应

订阅：博文 (Atom)