Hello World | Adel Zhang

2012年7月18日星期三

emacsclient 下的复制粘帖

我现在习惯用emacs daemon模式，好处是很明显的，直接在终端中使用，并不用来回地切换终端和emacs。但一直以来有个问题都没有解决，就是在用emacsclient时，它并不能使用系统的剪贴板，使得emacsclient与其他程序间不能复制粘帖，以至于要用到复制粘帖时都用gedit来打开文件。

今天终于找到了一个方法，就是使用xclip。首先安装xclip：

sudo apt-get install xclip

然后下载xclip.el，看名字好像是国内的牛人的作品。在.emacs文件中添加以下语句即可。

(require 'xclip)
(xclip-mode 1)

2012年6月29日星期五

《c专家编程》笔记

第19页
关于类型不匹配的两个对比案例：
（1）

foo(const char **p) {}
main(int argc, char ** argv){
    foo(arvg);
}

(2)

char *cp;
const char *ccp;
ccp = cp;

例子（1）将提示警告“参数与原型不匹配”，例子（2）却是合法的。问题的症结在于“左边指针所指向的类型必须具有右边指针所指向类型的全部限定词”。按照这个规则，例子（2）因此是合法的。但例子（1）仍然不合法，因为指针p指向的是 const char*, 指针argv指向的是char*,这是两个不同的指针类型。

第23页
＊关于隐式转换的规则
ANSI C标准中，隐式转换可以通俗地表述为：

当执行算术运算时，操作数的类型如果不同，就会发生转换。数据类型一般朝着浮点精度更高，长度更长的方向转换，整型数如果转换为signed 不会丢失信息，就转换为signed,否则转换为unsigned。
对于隐式转换，K＆R C则采用无符号优先原则，就是当一个无符号类型与int或更小的整型混合使用时，结果类型是无符号类型。
当然，现在的编译器一般都是符合ANSI C 标准的。下面的code，对于ANSI C 和 K＆R C编译器中运行结果是不一样的。

main(){
    if(-1 < (unsigned char) 1)
        printf("-1 is less than (unsigned char)1 : ANSI semantics ");
    else
        printf("-1 NOT less than (unsigned char) 1： K＆R semantics ");
}

-1的位模式是一样的，但是在K＆R C 中，编译器将其解释为无符号数，也就是变成正数，所以“NOT less”。
那么，下面这段代码有问题么？

int array[] = {23,34,12,17,204,99,16};
#define TOTAL_ELEMENTS (sizeof(array)/sizeof(array[0]))
main(){
    int d = -1,x;
    /*...*/
    if(d <= TOTAL_ELEMENTS - 2)
        x = array[d+1];
   /*...*/
}

第30页
* switch 语句的误区

在这个位置上（指的是switch的左花括号之后）声明一些变量会被编译器很自然地接受，尽管在switch语句中为这些变量加上初始值是没有什么用处的，因为它绝不会被执行——语句从匹配表达式的case开始执行。
考虑下面这段代码：

int main(int argc, char *argv[])
{
    switch(4){
        int i = 3;
    case 1: printf("1\n");break;
    case 2: printf("2\n");break;
    case 3: printf("3\n");break;
    case 4: printf("4 %d\n",i);break;
    default: printf("default\n");break;
    }
    return 0;
}

程序将输出“4 0”（在我的机器上是输出0，实际上可能是任意值）。原因就在于switch语句直接跳到匹配位置的case处开始执行，所有的初始化并不会被执行。变量i是有定义的，i的可见范围为switch语句块，变量的声明是在编译期就可见，而初始化要等到运行时。可以说case语句就相当于goto。
另外要注意的是，上面的代码在g++编译是会出错的。c++ 与 c 的switch 语句不同？

第38页
＊运算符优先级和求值顺序
书中讲了一个小八卦，关于＆和＆＆的。Dennis Ritchie 发帖说在以前＆不只是位操作符，而且承担逻辑操作符＆＆的作用（＆＆还没出生）。后来为了区分位操作符和逻辑操作符，重新加入了＆＆。因为已大量存在类似于 if(a == b & c == d)这样的表达式，由于这个“历史原因”，＆的运算符优先级低于关系运算符。
优先级只是定义不同优先级间的计算顺序，但同一运算符内的多个操作数的计算顺序C是没有规定的（除了&&, ||, ?: 和，外)。类似的，C也没有指定函数各参数的求值顺序。

x = f() + g()*h()

上面的代码唯一可以确定的是乘法会在加法之前执行。但是操作数f(),g(),h()会以什么顺序执行并没有规定，可能f()会在乘法之前调用也可能在乘法之后调用，也可能在g()和h()之间调用。
类似地，

printf("%d %d\n",++n,power(2,n));

也认为上面代码是一种不良风格的代码，因为在不同的编译器中可能会得到不同的结果。
2012-06-20 22:03:14 回应

第46页
＊ maximal munch strategy(最大一口策略）
这个策略表示如果下一个标记有超过一种的解释方案，编译器将选取能组成最长字符序列的方案。所以，

z = y+++x;

将被理解为，而且只能被理解为

z = y++ + x;

但这种策略也会再次让人迷糊，比如，

z = y+++++x;

按照ANSI C 的策略，将被解析为，

z = y++ ++ +x;

所以会编译错误。虽然当理解做 z = y++ + ++x 时貌似是可行的，但按照策略，编译器并不会这么理解。
还有一个很鸡贼的错误。还能发现的出来的么？

ratio = *x/*y;

2012-06-20 22:27:47 回应

第117页
* 运行时数据结构
可执行文件a.out有三个段：文本段（text),数据段(data),bss段(bss)。text段为可执行文件的指令，data段保存初始化后的全局变量和静态变量。bss段这个名字原意是“Block Started By Symbol"的缩写，不知所云。有人更喜欢把它解释为“Better Save Space", 这更能体现bss的作用，它保存没有初始化的全局变量或静态变量。实际上它只是记录运行时所需要的bss段的大小记录，bss段（不像其他段）并不占据目标文件(就是a.out)的任何空间。当可执行文件由loader载入内存时，再按照bss段的记录申请空间。
譬如声明 int arr[1000] ; 的全局变量, bss段将增加 4000 Byte, a.out的大小并不会因此增加 4000 Byte。但如果声明的是初始化过的变量， int arr[1000] = {100}; (实验了一下，初始化为0效果跟没初始化一样), 则 bss段没有变化， data 段将增加4000 Byte, 另外 a.out 也会增加4000 Byte，说明data段是占用a.out的空间的。可以用 size 命令查看每个段的大小。
局部变量并不进入a.out,它们在运行时创建。
＊堆栈段（p122）

除了递归调用之外，堆栈并非必需。因为在编译时可以知道局部变量、参数和返回地址所需空间的固定大小，并可以将它们分配于bss段。BASIC，COBOL和FORTRAN的早期编译器并不允许函数的递归调用，所以它们在运行时并不需要动态的堆栈。允许递归调用意味着必须找到一种方法，在同一时刻允许局部变量的多个实例存在，但只有最近被创建的那个才能被访问，这很像栈的经典定义。
2012-06-23 11:06:58 回应

第170页
* 根据位模式构筑图形
在C语言中，典型的16x16的黑白图形可能如下：

static unsigned short stopwatch[] = {
0x07C6,
0x1FF7,
0x383B,
0x600C,
0xC006,
0xC006,
0xDF06,
0xC106,
0xC106,
0x610C,
0x610C,
0x3838,
0x1FF0,
0x07C0,
0x0000
};

正如所看到的那样，这些C语言常量并未提供有关图形实际模样的任何线索。这里有一个惊人的#define 定义的优雅集合，运行程序建立常量使它们看上去像是屏幕上的图形。

#define X   )*2+1
#define _   )*2
#define s   ((((((((((((((((0  /* 用于建立16位宽的图形 */
static unsigned short stopwatch[] = {
s _ _ _ _ _ X X X X X _ _ _ X X _,
s _ _ _ X X X X X X X X X _ X X X,
s _ _ X X X _ _ _ _ _ X X X _ X X,
s _ X X _ _ _ _ _ _ _ _ _ X X _ _,
s _ X X _ _ _ _ _ _ _ _ _ X X _ _,
s X X _ _ _ _ _ _ _ _ _ _ _ X X _,
s X X _ _ _ _ _ _ _ _ _ _ _ X X _,
s X X _ X X X X X _ _ _ _ _ X X _,
s X X _ _ _ _ _ X _ _ _ _ _ X X _,
s X X _ _ _ _ _ X _ _ _ _ _ X X _,
s _ X _ _ _ _ _ X _ _ _ _ X X _ _,
s _ X _ _ _ _ _ X _ _ _ _ X X _ _,
s _ _ X X X _ _ X _ _ X X X _ _ _,
s _ _ _ X X X X X X X X X _ _ _ _,
s _ _ _ _ _ X X X X X _ _ _ _ _ _,
s _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _,
};

千万不要忘了在绘图结束后清除这些宏定义，否则可能造成很大的麻烦。
2012-06-23 21:23:28 回应

第172页
* 整型提升
在C语言中，字符常量的类型为int 型。所以对于

printf("%d",sizeof 'A');

输出为4，不应该感到诧异。
但下面的语句有所不同：

char a = 'A';
char b = 'B';
printf ( " the size of the result of a+b :%d " ,sizeof( a+b) );

输出的结果也是 4（或许之前会认为应该输出1）。
这是由于发生了整型提升。两个操作数都不是三种浮点类型之一，它们一定是某种整值类型。在确定共同的目标提升类型之前，编译器将在所有小于int的整值类型上施加一个被称为整型提升(integral promotion)的过程。整型提升就是char、short int和位段类型（无论signed或unsigned）以及枚举类型将被提升为int，前提是int 能够完整地容纳原先的数据，否则将被转换为unsigned int。wchar_t和枚举类型被提升为能够表示其底层类型(underlying type)所有值的最小整数类型。一旦整型提升执行完毕，类型比较就又一次开始，也就是普通的算术类型转换。这在之前笔记有介绍过。
但是下面这个语句又会让人困惑：

char a = 'A';
char b = 'B';
printf ( " the size of the result of a++ :%d " ,sizeof( a++) );

或许你会认为，按照整型提升输出为4. 但实际上输出为 1。原因不详。
2012-06-23 22:55:21 回应

第281页
＊判断一个变量是有符号数还是无符号数
宏参数是个变量值时，大概可以这么做：

#define ISUNSIGNED(a) (a >=0 && ~a >= 0)

如果宏参数是类型时，可以这么做：

#define ISUNSIGNED(type) ((type)0 - 1 > 0)

其实，前一个代码由于整型提升的存在，并不能正常工作，比如 unsigned short us = 1 ,对us进行测试时就会得到错误答案。
我没有想出来怎么办，搜了一下，下面的代码可以工作：

#define ISUNSIGNED(a) ( a >= 0 && (a=~a,a >= 0? (a=~a,1):(a=~a,0)))

而对于第二个代码，我也觉得书上写得似乎有点不对，原因也在于整型提升，可以改为

#define ISUNSIGNED(type) ((type)- 1 > 0)

你认为呢？

* 从文件中随机提取一个字符串
只能顺序遍历文件，并且不能使用表格来存储所有字符串的偏移位置。书中给了一NB的方法。
2012-06-26 14:38:33 回应

2012年6月20日星期三

String Reduction

来源：Tristan's Collection of Interview Questions

Problem: A string that only contains three types of letters :"a", "b", "c". Our goal is to reduce the string. If two different letters are adjacent to each other, we can reduce the two letters into another letter. The rule is like this: if we encounter "ab" (or "ba"), we reduce to "c"; if we encounter "bc" (or "cb"), we reduce to "a"; if we encounter "ca" (or "ac"), we reduce to "b". If two adjacent letters are the same, we can't reduce them. We need to find an optimal way to reduce the string as much as possible.

For example, for string "acbb", if we try to reduce "ac"first, we get "bbb". But, if we reduce "cb" first, we get "aab" and we can further make "aab" => "ac" => "b". The latter approach gives us an optimal reduction.What is the smallest string which can result by applying this operation repeatedly?

Solution: Here what really matters is the numbers of letter "a", "b" and "c". Let us name them as num_a, num_b and num_c. If two of them are zero, we can't reduce that string. Otherwise, we can reduce the string into a string that has a length of 1 or 2. If num_a, num_b and num_c are all even or odd, we can reduce to a string with length 2; If not, we can reduce to a string with length 1.

Then how to do the reduction? The detail is as follow:

if the string has a length of 3 and contains one "a", one "b" and one "c", whatever order of the three letter are in, we can only reduce the string into a string with length 2; if the string has a length of 2 and contains two different letters, we can only reduce the string into a string with length 1. Let us regard these two cases as base cases.
for a general case, we have num_a "a" , num_b "b" and num_c "c". After every reduction, the sum of num_a, num_b and num_c will decrease by 1, since we substitue two letters with one letter.
Then at each round, which two adjacent letters we choose to reduce? We try to find an adjacent pair that contains a letter which has the highest count. For example, if now, we have 3 "a", 4 "b" and 6 "c" in the string, we choose an adjacent pair that contains "c"since num_c = max(num_a, num_b, num_c) . Can we find such pair? definitely we can. If there are multiple such pairs, choose a random one. Then after we reduce this pair, num_c--, max(num_a, num_b, num_c) may decrease by 1, remain unchanged, or increase by 1. However, since max(num_a, num_b, num_c) is up-bounded by num_a + num_b + num_c. num_a + num_b + num_c is decreasing after every round, then max(num_a, num_b, num_c) will also decrease if we look at a long wrong. Therefore, by using our strategy, max(num_a, num_b, num_c) will eventually decrease to 1, which means we are encounter the base cases in step 1.
Then when the string will be reduced to a length of 1 and when to a length of 2? We observe that is num_a, num_b and num_c are all even, then after one transformation, they will become all odd; similarly, if there are originally all odd, after one transformation, they will become all even. Then according to the analysis in step 3), we know at the end, the max(num_a, num_b, num_c) will eventually decrease to 1. But, they should still be all odd at that time (since "1" is odd). Therefore, at the very end, we will have num_a = 1, num_b = 1 and num_c =1, which will eventually lead to a string of length 2. It is easy to prove that if num_a, num_b and num_c are not all even or odd, the string will be reduced to length 1.
if num_a, num_b and num_c are not all even or odd, there must be only two cases: a) two of the counters are odd and one is even b) two of the counters are even and one is odd. For example, if num_b is odd, num_a and num_c are both even. The string actually will eventually be reduced to "b".
if num_a, num_b and num_c are all even or odd, there might be multiple different final forms.

if((a == 0 && b == 0 && c == 0) ||
   (a == 0 && b == 0 && c != 0) ||
   (a == 0 && b != 0 && c == 0) ||
   (a != 0 && b == 0 && c == 0))
   {
      result = a+b+c;
   }
else if(a != 0 && b != 0 && c != 0)
   {
      if((a%2 == 0 && b%2 == 0 && c%2 == 0) ||
         (a%2 == 1 && b%2 == 1 && c%2 == 1))
           result = 2;
      else
           result = 1;
   }
else if((a == 0 && b != 0 && c != 0) || 
        (a != 0 && b == 0 && c != 0) || 
        (a != 0 && b != 0 && c == 0))
   {
      if(a%2 == 0 && b%2 == 0 && c%2 == 0)
           result = 2;
      else
           result = 1;
   }

2012年6月14日星期四

树状数组

一个基本的问题：求数组中的逆序对数目?

这也是算法导论第一章的最后一道习题。按照算法导论中的提示，该问题可以通过修改合并排序在O(nlgn)内解决。另外一种方法，是使用一种特殊的数据结构称之为“树状数组”，也可以在O(nlgn）内解决。代码如下：

#include<iostream>
#include<stdio.h>
#include<string.h>
using namespace std ;
#define MAXN 100002
#define MAX 1000002
int n,a[MAXN],c[MAX] ;
int main()
{
 int runs ;
 scanf("%d",&runs) ;
 while(runs--)
 {
  scanf("%d",&n) ;
  for(int i = 0;i < n;i++) scanf("%d",&a[i]) ;
  long long ret = 1LL * n * (n - 1) / 2 ;
  memset(c,0,sizeof c) ;
  for(int i = 0;i < n;i++)
  {
   for(int j = a[i];j > 0;j -= j & -j) ret -= c[j] ;
   for(int j = a[i];j < MAX;j += j & -j) c[j]++ ;  
  }
  cout << ret << endl ;
 }
 return 0 ;
}

下面介绍下树状数组（Binary Indexed Tree)。如果搜一下，就会发现有很多文章详细介绍树状数组,比如这篇,所以这里不再赘述。它能够高效地获取数组中连续n个数的和。概括说，树状数组通常用于解决以下问题：数组{a}中的元素可能不断地被修改，怎样才能快速地获取连续几个数的和？传统数组(共n个元素)的元素修改和连续元素求和的复杂度分别为O(1)和O(n)。树状数组通过将线性结构转换成伪树状结构（线性结构只能逐个扫描元素，而树状结构可以实现跳跃式扫描），使得修改和求和复杂度均为O(lgn)，大大提高了整体效率。 let the code talk.值得提一下的是，树状数组下标是从1开始的。

//求2^k
int lowbit(int t)
{
    return t & ( t ^ ( t - 1 ) );
}
 
//求前n项和
int sum(int end)
{
   int sum = 0;
   while(end > 0)
  {
     sum += in[end];
     end -= lowbit(end);
  }
 
  return sum;
}
 
//增加某个元素的大小
void plus(int pos, int num)
{
   while(pos <= n)
  {
     in[pos] += num;
     pos += lowbit(pos);
  }
}

2012年4月23日星期一

非越狱ios的twitter方案

其他人我不清楚，对于我自己而言，发tweet的理由是基于以下几点考虑的：

可以有个地方随时记录自己的牢骚（都不好意思说思想两字）
厌恶国内微博浓烈的商业气息和娱乐味道，还有恶心至极的实名
对于社交性并不是那么在乎（刚开始会follow些名人，但已然厌倦），当然如果能有一两枚粉丝还是会暗爽的

一个很理想的场景是有个ios客户端，闷骚时就随手一tweet。虽然Twitter已有优秀的官方客户端，但在国内糟糕的网络环境下，就只能变成摆设。VPN我是考虑过的，但是懒得也舍不得购买。至于免费VPN，就没有找到好用的，tenacy VPN 得每次查询密码，刚开始想能克服，最终还是怕了这个麻烦。网上有教程说怎么在ios中装goagent，过程极为繁琐，看起来挺美，但我的itouch没有越狱，也不想越狱（好像已经说了好多次懒了，羞愧）。总之，就这么简单的需求也难有很好的解决方法。

忽然哪天灵光一闪，为什么不用email来发tweet呢？这篇博文介绍了使用twitterfeed.com + blogger 的方法来通过email发tweet。这是我现在采纳的方法，这个方法的优点就是可以方便的发tweet，不用担心vpn又登陆不上或是私有twitter api又被封的问题，而且相应的推文是发送的blogger的，这等于做了一个备份。缺点也很明显，就是完全没有了社交性，这对于我来说真不是问题，如果有强烈的社交互动需求的，那就要重新考虑考虑拉。

如果在google中搜“twitter email”,第一个链接是tweetymail这个服务。tweetymail看起来满不错的，可以通过邮件来回复等一定的社交性，更新twitter的速度比twitterfeed快，因为它是接收到邮件就更新，而twitterfeed是查看blogger rss的输出来相应地更新twitter。tweetymail让我不能忍受（主要是我的心里作祟）的地方有两点，让我放弃了使用这个服务：1. 免费用户每月发tweet有限额，不在于说限额是否满足自己需求（实际上100条是完全够的），而在于这样的限制让我挺不舒服的。 2. 免费用户的tweet都会在tweetymail上公开，应该是在它的服务器上有存档，这尤其令我难以接受。

twitterfeed + blogger 的方案也不是很完美，因为twitterfeed好像并不能作用在有图片的邮件，只能输出邮件中的文字而会忽略图片。还好这世上还有twitpic这个服务，它支持通过邮件来发送tweet，这时邮件必须有图片附件才会被twitpic转到twitter。所以这刚好与twitterfeed珠联璧合，相得益彰。

2012年2月15日星期三

emacs输入中文

症状：无法在emacs中调出ibus，输入中文
环境: Linux 2.6.32-5-amd64 + GNU Emacs 23.4.1
Locale:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

解决：
(1)先用locale -a 查看是否安装 zh_CN.UTF-8,如果有，跳过步骤(2)
(2)运行sudo dpkg-reconfigure locales,安装zh_CN.UTF-8
(3)编辑/etc/environment文件，添加

LC_CTYPE="zh_CN.UTF-8"

重启即可

2012年2月10日星期五

euler 48: 大整数乘法

euler project 48:

The series, 1¹ + 2² + 3³ + ... + 10¹⁰ = 10405071317.

Find the last ten digits of the series, 1¹ + 2² + 3³ + ... + 1000¹⁰⁰⁰.

我是用python解的,python 的数没有大小的限制,这样就很方便,brute force.

在问题的讨论帖中,有人(grimbal)贴了他用C解决的算法。我觉得很优美，记录如下:

#define mod 10000
void ex48(){
    int n, k;
    unsigned int p2, p1, p0;
    unsigned int s2, s1, s0;
    unsigned int t2, t1, t0;

    // s = 0
    s2 = s1 = s0 = 0;
    for( n=1 ; n<=1000 ; n++ ){
        // p = 1
        p2 = p1 = 0; p0 = 1;
        for( k=1 ; k<=n ; k<<=1 );
        while( k>>=1 ){
            // p *= p
            t0 = p0*p0;
            t1 = 2*p0*p1 + t0/mod;
            t2 = 2*p0*p2 + p1*p1 + t1/mod;
            p0 = t0%mod;
            p1 = t1%mod;
            p2 = t2%mod;
            if( n&k ){
                // p *= n
                t0 = p0*n;
                t1 = p1*n + t0/mod;
                t2 = p2*n + t1/mod;
                p0 = t0%mod;
                p1 = t1%mod;
                p2 = t2%mod;
            }
       }
       // s += p
       t0 = p0+s0;
       t1 = p1+s1 + t0/mod;
       t2 = p2+s2 + t1/mod;
       s0 = t0%mod;
       s1 = t1%mod;
       s2 = t2%mod;
    }

    printf("ex48: %02d%04d%04d\n", s2%100, s1, s0);
}

订阅：博文 (Atom)