2009年5月12日星期二

数学之美系列

转自http://jun.wu.googlepages.com/beautyofmathematics

I am writing a serial of essays introducing the applications of math in natural language processing, speech recognition and web search etc for non-technical readers . Here are the links

0. Page Rank ( 网页排名算法 )

1. Language Models (统计语言模型)

2. Chinese word segmentation (谈谈中文分词)

3. Hidden Markov Model and its application in natural language processing (隐含马尔可夫模型)

4. Entropy - the measurement of information (怎样度量信息?)

5. Boolean algebra and search engine index (简单之美:布尔代数和搜索引擎的索引)

6. Graph theory and web crawler (图论和网络爬虫 Web Crawlers)

7. Information theory and its applications in NLP (信息论在信息处理中的应用)

8. Fred Jelinek and modern speech and language processing (贾里尼克的故事和现代语言处理)

9. how to measure the similarity between queries and web pages. (如何确定网页和查询的相关性)

10. Finite state machine and local search (有限状态机和地址识别)

11. Amit Singhal: AK-47 Maker in Google (Google 阿卡 47 的制造者阿米特.辛格博士)

12. The Law of Cosines and news classification (余弦定理和新闻的分类)

13. Fingerprint of information and its applications (信息指纹及其应用)

14. The importance of precise mathematical modeling (谈谈数学模型的重要性)

15. The perfectionism and simplism 繁与简 自然语言处理的几位精英

16. Don't put all of your eggs in one basket - Maximum Entropy Principles 不要把所有的鸡蛋放在一个篮子里 -- 谈谈最大熵模型(A)

17. Don't put all of your eggs in one basket - Maximum Entropy Principles不要把所有的鸡蛋放在一个篮子里 -- 谈谈最大熵模型(B)

18. 闪光的不一定是金子 谈谈搜索引擎作弊问题(Search Engine Anti-SPAM)

19. Matrix operation and Text classification 矩阵运算和文本处理中的分类问题

20. The Godfather of NLP - MItch Marcus 自然语言处理的教父 马库斯

21. The extension of HMM, Bayesian Networks 马尔可夫链的扩展 贝叶斯网络

22. The principle of cryptography 由电视剧《暗算》所想到的 — 谈谈密码学的数学原理

23. How many keys need we type to input a Chinese character 输入一个汉字需要敲多少个键 — 谈谈香农第一定律

吴军主页的中文首页

吴军 (Jun Wu) 的英文首页

2009年5月6日星期三

SunPinyin输入法代码导读

http://opensolaris.org/os/project/input-method/documents/sunpinyin_code_tour_slm/

详细介绍Solaris上SunPinyin的设计及源代码,另外有介绍其它输入法。

2009年5月1日星期五

在ubuntu上装firefox3的flash player

在网上找到答案了……
https://lists.ubuntu.com/archives/unive ... 48531.html

I am using ubuntu8.04 and firefox requires the libnss3.so file, I
searched my usr/lib directory and found the file named "libnss3.so.1d"
then I created a link named libnss3.so to libnss3.so.1d

I need to created the following links in order to make flash player
working:

ln /usr/lib/libnss3.so.1d /usr/lib/libnss3.so

ln /usr/lib/libsmime3.so.1d /usr/lib/libsmime3.so

ln /usr/lib/libssl3.so.1d /usr/lib/libssl3.so

ln /usr/lib/libplds4.so.0d /usr/lib/libplds4.so

ln /usr/lib/libplc4.so.0d /usr/lib/libplc4.so

ln /usr/lib/libnspr4.so.0d /usr/lib/libnspr4.so

Yes it is not a firefox bug and also not a flashplayer bug.........It is
associated with the library names required to run the flash
player......and also before installing flashplayer 10 , flash player 9
was working well (but without these libraies????).

2009年3月5日星期四

像个UNIX工程师那样工作

Unix工程师是怎样工作的呢?
我个人认为最大的特点是命令行+脚本的组合带来的巨大便利。

现在很多公司都是在客户/服务的环境下进行软件开发,
而且服务器端基本都是跑在UNIX或者LINUX之上。

但是我们很多工程师/准工程师都是Windows出身,
早就习惯了点鼠标和直观的GUI交互,工作的效率其实不怎么高。

懒得说在windows环境下我们会怎么做了,
直接列一下我们在Unix下怎么去完成一些常见的工作吧。

1. 不要忘记她(TAR)
1)备份我的home目录:
$tar cvf home.bak.tar /home
2) 备份并压缩
$tar zcvf home.tar.gz /home
$tar jcvf home.tar.bz2 /home
3) 把home目录拷贝到远程机器192.168.1.101,并保留目录中的链接及文件的权限
$tar cpf - /home | ( ssh 192.168.1.101 "cd /tmp/; tar xpf -")
4)把远程机器192.168.1.101上的文件解压到本机当前目录
$ssh 192.168.1.101 "cat /tmp/home.tar.bz2" | tar jxvf -
更多用法在UNIX机器上用命令'man tar'查看帮助,下面是一个链接:
http://linux.chinaunix.net/techdoc/net/2008/12/11/1051956.shtml

2009年1月9日星期五

GData之旅

1. Java client library http://code.google.com/intl/zh-CN/apis/gdata/client-java.html

1) Download library and sample from http://code.google.com/p/gdata-java-client/downloads/list
2) Go over the background knowledges in 5 minutes
a. Google Data APIs Protocol: http://code.google.com/intl/zh-CN/apis/gdata/docs/2.0/basics.html
b. Javadoc of this client library: http://code.google.com/intl/zh-CN/apis/gdata/javadoc/
3) Build and run the library

2009年1月3日星期六

CS162 OS及系统编程 - lecture 1 Overview

第一课 笔记
Goal:
. What is an Operating System? And what is it not?
. Examples of OS design
. Why study OS?
. How does this class operate?

** Interactive is important! **
Ask Questions!

#为何要学习OS?
底层组件越来越多
设备越来越多
软件复杂度不断攀升(30行代码一个bug,一个月?)
The latency is not change
Heat is a major problem

*Complexity
. How to manage complexity at all levels?
. Many issues and many tradeoffs
. Need a global view of systems(decompose into components)
. Need a global understanding of systems(applications, networks, databases, os, security, software engineering...)

Examples: some Mars Rover Requirements
- 20Mhz powerPC processor, 128MB of RAM
- cameras, scientific instruments, batteries, solar panels, and locamotion equipment
- Many independent processes work together
- Can't hit reset button very easilly
- Must reboot itself if necessary
- Always able to receive commands from Earth.
- Individual programs must not interfere
- Suppose the MUT(Martion Universal Translator Module) buggy
- Better not crash antenna positioning software
- Aall software may crash accasionally
- Automatic restart with diagnostics sent to Earth
- Periodic checkpoint of results saved?
- Certain functions time critical
-Nee to stop before hitting something
- Must track orbit of Earth for communication

How do we tame complexity?
# Every piece of computer hardware different
- Different CPU
- Different amount of meory, disk
- Different types of devices
- Different networking environment
Questions:
- Does the programmer need to write a single program that performs many idependent activities?
- Does every program have to be altered for every piece of hardware?
- Does a faulty program crash everything?
- Does every program have access to all hardware?

OS Tool: Virtual Machine Abstraction
Application
----------------------------- Virtual Machine Interface
Operating System
----------------------------- Physical Machine Interface
Hardware

#Software Engineering Problem:
- Trun hardware/software quirks (what programmers want/need)
- Optimize for convenience, utilization, security, reliability, etc...

# For any OS area (e.g. file systems, virtual memory, networking, scheduling)
- What's the hardware interface? (physical reality)
- What's the application interfaces (nicer abstraction)

Software
******** instruction set *********
hardware

# Why do interfaces look the way that the do?
- History, Functionality, Stupidity, Bugs, Mangement
- CS152 => Machine interface
- CS160 => Human interface
- Cs169 => Software engineering/management
# Should responsibilities be pushed across boundaries?
- RISC architectures, Graphical Pipelien Architecture

Course Website: http://inst.eecs.berkeley.edu/~cs162/fa08/
Webcast/Podcast: http://webcast.berkeley.edu/courses/index.php
Newsgroup: ucb.class.cs162 (use authnews.berkeley.edu)
Text book: Operating System Concepts, 7th Edition Silbershatz, Golvin, Gogne

Topic Coverage
1 week Fundamentals (OS structures)
1.5 weeks Process Central and Threads
2.5 weeks Synchronization and scheduling
2 weeks Protection, Address translation, Caching
1 week Demand Pagine
1 week File Systems
2.5 weeks Networking and Distributed Systems
1 week Protection and Security
1 week Software Engineering
?? Advanced topics

Grading
# Rough Grade Breakdown
-Two Midterms: 15% each
One Final: 15%
Four Projects: 50% (i.e 12.5% each)
Participation: 5%
# Four Projects:
- Phase I: Build a thread system
- Phase II: Implement Multithreading
- Phase III:Caching and Virtual Memory
- Phase IV: Parallel and Distributed Systems
#Late Policy:
- Each group has 5 "slip" days
- Far Projects, slip days deducted from all partners
- 10% off per day after slip days exhausted

Group Project Simulates Industrial Environment
# Project teams have 4 or 5 memebers in same discussion section
- Must work in groups in "the real world"
# Communicate with colleagues (team members)
- Communication problems are natural
- What have you done?
- What answers you need from others?
- You must document your work!!!!
- Everyone must keep on on-line notebook
# Communicate with supervisor(TAs)
- How is the team's plan?
- Short progress reports are required:
> What's the team's game plan?
> What is each member's responsibility?

# Typical Lecture Format
Attention: (minutes) 20m25m25m"In Conclusion,..."
1 -Minute Review
20-Minute Lecture
5 -Minute Administrative Matters
25-Minute Lecture
5 -Minute break(water, stretch)
25-Minute lecture
Instructor will come to class early & stay after the answer questions

(44 Minute)
Virtual Machines
# Software emulation of an abstract machine
- Make it look like hardware has features you want
- Programs from one hardware & OS on another one
# Programming simplicity
- Each process thinks it has all memory/CPU time
- Each process thinks it owns all devices
- Different Devices appear to have same interface
- Device Interfaces more powerful than raw hardware
> Bitmapped display => windowing system
> Ethernet card => reliable, ordered, networking(TCP/IP)
# Fault Isolation
- Processes unable to directly impact other processes
- Bugs cannot crash whole machine
# Protection and Portability
- Java interface safe and stable across many platforms

Four Components of a Computer System
Hardware, Operating System, Application, User
Definition: An operating system implements a virtual machine that is (hopefully) easier and safer to program and use than the raw hardware.

What does an Operating System do?
# Silerschatz and Govin:
"An OS is simplilar to a government"
- Begs the question: does a government do anything useful by itself?
# Coordinator and Traffic Cop
- Manages all resources
- Settles conflicting requests for resources
- Prevent errors and improper use of the computer
- Facilitator:
Provides facilities that everone needs
Standard Libraries, Windowing systems
Make application programming easier, faster, less error-prone
# Some fetures reflect both tasks
- E.g. File Sstem is needed by everone (Facilitator)
- But File system must be protected (Traffic Cop)

What is an Operating System, ... Really?
# Most Likely:
- Memory Management
- I/O Management
- CPU Scheduling
- Communications? (Does Email belong in OS?)
- Multitasking/multiprogramming?

#What about?
- File System?
- Multimedia Support?
- User Interfaces?
- Internet Browser?
#Is this only interesting to Academics??

# No universally accepted definition
# "Everything a vendor ships when you order an operating system" is good approximation
- But varies widly
#"The one program running at all times on the computer" is the kernel.
- Everything else is either a system program (ships with the operating system) or an application program

What if we didn't have an OS
# Source code => compiler => object code => hardware
#How do you get object code onto the hardware?
#How do you print out the answer?
#Once upon a time, had to toggle in program in binary and read out answer from LED's!

Simple OS: what if only one application
#Examples:
- Very early computers
- Early PCs
- Embedded controllers(elevators, cars, etc)

# OS becomes just a library of standard services
- Standard device drivers
- Interrupt handlers
- Math library
(MS-Dos)
#What about Cell-phones, Xboxes, etc?
- Is this organization enough?
# Can OS be encoded in ROM/Flash ROM?
# Does OS have to be software?
-Can it be hardware?
-Custom chip with predefined behavior
-Are these even OSs?

More complex OS: Multiple Apps
# Full coordination and protection
- Manage interactions between different users
-Multiple programs running simultaneously
- Multiplex and protect Hardware resources
> CPU, Memory, I/O devices like disks, printers, etc
Example: Protecting processes from each other
#Problem: run multiple application in such a way that they are protected from one another
#Goal:
- Keep User Programs form Crashing OS
- Keep User Programs from Crashing each other
- [Keep parts of OS form crashing other parts?]
# (Some of the required) Mechanisms:
- Address Translation
- Dual Mode Operation
# Simple policy:
- Programs are not allowed to read/write memory of other programs or of Operating System

#Address Space
- A groups of memory addresses usable by something
- Each program (process) and kernel has potentially different address spaces.
#Address Translation:
- Translate from Virtual Addresses (emitted by CPU) into Physical Addresses(of memory)
- Mapping often performed in hardware by Memory Management Unit(MMU)
Virtual Addresses Physical Address
CPU ---------------------> MMU ---------------------> RAM HW
Address translation

Dual Mode Operation
#Hardware provides at least two modes:
- "Kernel" mode (or "supervisor" or "protected")
- "User" mode: Normal programs executed
User Mode --> system calls --->User Mode
| |
kernel mode ----------
# Some instructions/ops prohibited in user mode:
- Example: cannot modify page tables in user mode
> attempt to modify => exception generated
# Transitions from user mode to kernel mode:
- System calls, interrupts, other exeptions

OS Systems Principles
#OS as illusionist:
- Make hardware limitatins go away
- Provide issulsion of dedicated machine with infinite memory and infinite processors
# OS as government:
- Protect users from each other
- Allocate resources efficiently and fairly
# OS as complex system
- Constant tension between simplicity and functionality or performance
# OS as history teacher
- Learn form past
- Adopt as hardware tradeoffs change

Why Study Operating Systems
# Lear how to build complex systems
- How can you manage complexity for future projects?
# Engineering issues
- Why is the web so slow sometimes? Can you fix it?
- What features should be in the next mars Rover?
- How do large distributed system work? (Kazoo, etc)
# Buying and using a personal computer
- Why different PCs with same CPU behave differently
- How to choose a processor (Opteron, Itanium, Celeron, Pentium, Hexium)? [Ok, made last one up]
- Shoulw you get Windows XP, Vista, Linux, Mac OS...?
- Why does Microsoft have such a bad name?
#Business issues:
- Should your division buy thin=clients vs PC?
#Security, viruses, and worms
- What exposure do you have to worry about?

"In conclusion"
# OS provide a virtual machine abstraction to handle diverse hardware
# OS coordinate resources and protect users from each other
# OS simplify application development by providingn standard services
# OS can provide an array of fault containment, fault tolerance, and fault recovery

2007年2月9日星期五

doxygen parsing flow

http://www.stack.nl/~dimitri/doxygen/starting.html#extract_all

1. execute the special commands in documentation. the special commands are listed in:
http://www.stack.nl/~dimitri/doxygen/commands.html
the documentation is always specified by documentation block: http://www.stack.nl/~dimitri/doxygen/docblocks.html#specialblock
somtimes, it can be specified at other place:
http://www.stack.nl/~dimitri/doxygen/docblocks.html#structuralcommands

2. remove the line start by some whitespace followed by one or more asterisks(*) and then optionally more whitespace.

3. Keep all resulting blank lines to make documentation readable.

4. Create links.
1) for the words corresponding to documented class, execpt the words is preceded by a '%'.
2) for the members corresponding to some pattern, http://www.stack.nl/~dimitri/doxygen/autolink.html

5. Interpret HTML tags, http://www.stack.nl/~dimitri/doxygen/htmlcmds.html.

Tips:
.% can be used to despress the auto links for the words of ducumented class.
.It can be documented with documentation block or at other place.