950 阿里云技术社区[云栖]

编写可读代码的艺术

这是《The Art of Readable Code》的读书笔记，再加一点自己的认识。强烈推荐此书：

英文版：《The Art of Readable Code》
中文版：编写可读代码的艺术

代码为什么要易于理解

“Code should be written to minimize the time it would take for someone else to understand it.”

日常工作的事实是：

写代码前的思考和看代码的时间远大于真正写的时间
读代码是很平常的事情，不论是别人的，还是自己的，半年前写的可认为是别人的代码
代码可读性高，很快就可以理解程序的逻辑，进入工作状态
行数少的代码不一定就容易理解
代码的可读性与程序的效率、架构、易于测试一点也不冲突

整本书都围绕“如何让代码的可读性更高”这个目标来写。这也是好代码的重要标准之一。

如何命名

变量名中应包含更多信息

使用含义明确的词，比如用download而不是get，参考以下替换方案：

send -> deliver, dispatch, announce, distribute, route
find -> search, extract, locate, recover
start -> lanuch, create, begin, open
make -> create,set up, build, generate, compose, add, new

避免通用的词

像tmp和retval这样词，除了说明是临时变量和返回值之外，没有任何意义。但是给他加一些有意义的词，就会很明确：

tmp_file = tempfile.NamedTemporaryFile()
...
SaveData(tmp_file, ...)

不使用retval而使用变量真正代表的意义：

sum_squares += v[i]; // Where's the "square" that we're summing? Bug!

嵌套的for循环中,i、j也有同样让人困惑的时候：

for (int i = 0; i < clubs.size(); i++)
for (int j = 0; j < clubs[i].members.size(); j++)
for (int k = 0; k < users.size(); k++) if (clubs[i].members[k] == users[j])
cout << "user[" << j << "] is in club[" << i << "]" << endl;

换一种写法就会清晰很多：

if (clubs[ci].members[mi] == users[ui]) # OK. First letters match.

所以，当使用一些通用的词，要有充分的理由才可以。

使用具体的名字

CanListenOnPort就比ServerCanStart好，can start比较含糊，而listen on port确切的说明了这个方法将要做什么。

--run_locally就不如--extra_logging来的明确。

增加重要的细节，比如变量的单位`_ms`，对原始字符串加`_raw`

如果一个变量很重要，那么在名字上多加一些额外的字就会更加易读，比如将string id; // Example: "af84ef845cd8"换成string hex_id;。

Start(int delay) --> delay → delay_secs
CreateCache(int size) --> size → size_mb
ThrottleDownload(float limit) --> limit → max_kbps
Rotate(float angle) --> angle → degrees_cw

更多例子：

password -> plaintext_password
comment -> unescaped_comment
html -> html_utf8
data -> data_urlenc

对于作用域大的变量使用较长的名字

在比较小的作用域内，可以使用较短的变量名，在较大的作用域内使用的变量，最好用长一点的名字，编辑器的自动补全都可以很好的减少键盘输入。对于一些缩写前缀，尽量选择众所周知的(如str)，一个判断标准是，当新成员加入时，是否可以无需他人帮助而明白前缀代表什么。

合理使用`_`、`-`等符号，比如对私有变量加`_`前缀。

var x = new DatePicker(); // DatePicker() 是类的"构造"函数，大写开始
var y = pageHeight(); // pageHeight() 是一个普通函数
var $all_images = $("img"); // $all_images 是jQuery对象
var height = 250; // height不是
//id和class的写法分开
<div id="middle_column" class="main-content"> ...

命名不能有歧义

命名的时候可以先想一下，我要用的这个词是否有别的含义。举个例子：

results = Database.all_objects.filter("year <= 2011")

现在的结果到底是包含2011年之前的呢还是不包含呢？

使用`min`、`max`代替`limit`

CART_TOO_BIG_LIMIT = 10
if shopping_cart.num_items() >= CART_TOO_BIG_LIMIT:
Error("Too many items in cart.")
MAX_ITEMS_IN_CART = 10
if shopping_cart.num_items() > MAX_ITEMS_IN_CART:
Error("Too many items in cart.")

对比上例中CART_TOO_BIG_LIMIT和MAX_ITEMS_IN_CART，想想哪个更好呢？

使用`first`和`last`来表示闭区间

print integer_range(start=2, stop=4)
# Does this print [2,3] or [2,3,4] (or something else)?
set.PrintKeys(first="Bart", last="Maggie")

first和last含义明确，适宜表示闭区间。

使用`beigin`和`end`表示前闭后开(2,9))区间

PrintEventsInRange("OCT 16 12:00am", "OCT 17 12:00am")
PrintEventsInRange("OCT 16 12:00am", "OCT 16 11:59:59.9999pm")

上面一种写法就比下面的舒服多了。

Boolean型变量命名

bool read_password = true;

这是一个很危险的命名，到底是需要读取密码呢，还是密码已经被读取呢，不知道，所以这个变量可以使用user_is_authenticated代替。通常，给Boolean型变量添加is、has、can、should可以让含义更清晰，比如：

SpaceLeft() --> hasSpaceLeft()
bool disable_ssl = false --> bool use_ssl = true

符合预期

public class StatisticsCollector {
public void addSample(double x) { ... }
public double getMean() {
// Iterate through all samples and return total / num_samples
}
...
}

在这个例子中，getMean方法遍历了所有的样本，返回总额，所以并不是普通意义上轻量的get方法，所以应该取名computeMean比较合适。

漂亮的格式

写出来漂亮的格式，充满美感，读起来自然也会舒服很多，对比下面两个例子：

class StatsKeeper {
public:
// A class for keeping track of a series of doubles
void Add(double d); // and methods for quick statistics about them
private: int count; /* how many so far
*/ public:
double Average();
private: double minimum;
list<double>
past_items
;double maximum;
};

什么是充满美感的呢：

// A class for keeping track of a series of doubles
// and methods for quick statistics about them.
class StatsKeeper {
public:
void Add(double d);
double Average();
private:
list<double> past_items;
int count; // how many so far
double minimum;
double maximum;
};

考虑断行的连续性和简洁

这段代码需要断行，来满足不超过一行80个字符的要求，参数也需要注释说明：

public class PerformanceTester {
public static final TcpConnectionSimulator wifi = new TcpConnectionSimulator(
500, /* Kbps */
80, /* millisecs latency */
200, /* jitter */
1 /* packet loss % */);
public static final TcpConnectionSimulator t3_fiber = new TcpConnectionSimulator(
45000, /* Kbps */
10, /* millisecs latency */
0, /* jitter */
0 /* packet loss % */);
public static final TcpConnectionSimulator cell = new TcpConnectionSimulator(
100, /* Kbps */
400, /* millisecs latency */
250, /* jitter */
5 /* packet loss % */);
}

考虑到代码的连贯性，先优化成这样：

public class PerformanceTester {
public static final TcpConnectionSimulator wifi =
new TcpConnectionSimulator(
500, /* Kbps */
80, /* millisecs latency */ 200, /* jitter */
1 /* packet loss % */);
public static final TcpConnectionSimulator t3_fiber =
new TcpConnectionSimulator(
45000, /* Kbps */
10, /* millisecs latency */
0, /* jitter */
0 /* packet loss % */);
public static final TcpConnectionSimulator cell =
new TcpConnectionSimulator(
100, /* Kbps */
400, /* millisecs latency */
250, /* jitter */
5 /* packet loss % */);
}

连贯性好一点，但还是太罗嗦，额外占用很多空间：

public class PerformanceTester {
// TcpConnectionSimulator(throughput, latency, jitter, packet_loss)
// [Kbps] [ms] [ms] [percent]
public static final TcpConnectionSimulator wifi =
new TcpConnectionSimulator(500, 80, 200, 1);
public static final TcpConnectionSimulator t3_fiber =
new TcpConnectionSimulator(45000, 10, 0, 0);
public static final TcpConnectionSimulator cell =
new TcpConnectionSimulator(100, 400, 250, 5);
}

用函数封装

// Turn a partial_name like "Doug Adams" into "Mr. Douglas Adams".
// If not possible, 'error' is filled with an explanation.
string ExpandFullName(DatabaseConnection dc, string partial_name, string* error);
DatabaseConnection database_connection;
string error;
assert(ExpandFullName(database_connection, "Doug Adams", &error)
== "Mr. Douglas Adams");
assert(error == "");
assert(ExpandFullName(database_connection, " Jake Brown ", &error)
== "Mr. Jacob Brown III");
assert(error == "");
assert(ExpandFullName(database_connection, "No Such Guy", &error) == "");
assert(error == "no match found");
assert(ExpandFullName(database_connection, "John", &error) == "");
assert(error == "more than one result");

上面这段代码看起来很脏乱，很多重复性的东西，可以用函数封装：

CheckFullName("Doug Adams", "Mr. Douglas Adams", "");
CheckFullName(" Jake Brown ", "Mr. Jake Brown III", "");
CheckFullName("No Such Guy", "", "no match found");
CheckFullName("John", "", "more than one result");
void CheckFullName(string partial_name,
string expected_full_name,
string expected_error) {
// database_connection is now a class member
string error;
string full_name = ExpandFullName(database_connection, partial_name, &error);
assert(error == expected_error);
assert(full_name == expected_full_name);
}

列对齐

列对齐可以让代码段看起来更舒适：

CheckFullName("Doug Adams" , "Mr. Douglas Adams" , "");
CheckFullName(" Jake Brown ", "Mr. Jake Brown III", "");
CheckFullName("No Such Guy" , "" , "no match found");
CheckFullName("John" , "" , "more than one result");
commands[] = {
...
{ "timeout" , NULL , cmd_spec_timeout},
{ "timestamping" , &opt.timestamping , cmd_boolean},
{ "tries" , &opt.ntry , cmd_number_inf},
{ "useproxy" , &opt.use_proxy , cmd_boolean},
{ "useragent" , NULL , cmd_spec_useragent},
...
};

代码用块区分

class FrontendServer {
public:
FrontendServer();
void ViewProfile(HttpRequest* request);
void OpenDatabase(string location, string user);
void SaveProfile(HttpRequest* request);
string ExtractQueryParam(HttpRequest* request, string param);
void ReplyOK(HttpRequest* request, string html);
void FindFriends(HttpRequest* request);
void ReplyNotFound(HttpRequest* request, string error);
void CloseDatabase(string location);
~FrontendServer();
};

上面这一段虽然能看，不过还有优化空间：

class FrontendServer {
public:
FrontendServer();
~FrontendServer();
// Handlers
void ViewProfile(HttpRequest* request);
void SaveProfile(HttpRequest* request);
void FindFriends(HttpRequest* request);
// Request/Reply Utilities
string ExtractQueryParam(HttpRequest* request, string param);
void ReplyOK(HttpRequest* request, string html);
void ReplyNotFound(HttpRequest* request, string error);
// Database Helpers
void OpenDatabase(string location, string user);
void CloseDatabase(string location);
};

再来看一段代码：

# Import the user's email contacts, and match them to users in our system.
# Then display a list of those users that he/she isn't already friends with.
def suggest_new_friends(user, email_password):
friends = user.friends()
friend_emails = set(f.email for f in friends)
contacts = import_contacts(user.email, email_password)
contact_emails = set(c.email for c in contacts)
non_friend_emails = contact_emails - friend_emails
suggested_friends = User.objects.select(email__in=non_friend_emails)
display['user'] = user
display['friends'] = friends
display['suggested_friends'] = suggested_friends
return render("suggested_friends.html", display)

全都混在一起，视觉压力相当大，按功能化块：

def suggest_new_friends(user, email_password):
# Get the user's friends' email addresses.
friends = user.friends()
friend_emails = set(f.email for f in friends)
# Import all email addresses from this user's email account.
contacts = import_contacts(user.email, email_password)
contact_emails = set(c.email for c in contacts)
# Find matching users that they aren't already friends with.
non_friend_emails = contact_emails - friend_emails
suggested_friends = User.objects.select(email__in=non_friend_emails)
# Display these lists on the page. display['user'] = user
display['friends'] = friends
display['suggested_friends'] = suggested_friends
return render("suggested_friends.html", display)

让代码看起来更舒服，需要在写的过程中多注意，培养一些好的习惯，尤其当团队合作的时候，代码风格比如大括号的位置并没有对错，但是不遵循团队规范那就是错的。

如何写注释

当你写代码的时候，你会思考很多，但是最终呈现给读者的就只剩代码本身了，额外的信息丢失了，所以注释的目的就是让读者了解更多的信息。

应该注释什么

不应该注释什么

这样的注释毫无价值：

// The class definition for Account
class Account {
public:
// Constructor
Account();
// Set the profit member to a new value
void SetProfit(double profit);
// Return the profit from this Account
double GetProfit();
};

不要像下面这样为了注释而注释：

// Find a Node with the given 'name' or return NULL.
// If depth <= 0, only 'subtree' is inspected.
// If depth == N, only 'subtree' and N levels below are inspected.
Node* FindNodeInSubtree(Node* subtree, string name, int depth);

不要给烂取名注释

// Enforce limits on the Reply as stated in the Request,
// such as the number of items returned, or total byte size, etc.
void CleanReply(Request request, Reply reply);

注释的大部分都在解释clean是什么意思，那不如换个正确的名字：

// Make sure 'reply' meets the count/byte/etc. limits from the 'request'
void EnforceLimitsFromRequest(Request request, Reply reply);

记录你的想法

我们讨论了不该注释什么，那么应该注释什么呢？注释应该记录你思考代码怎么写的结果，比如像下面这些：

// Surprisingly, a binary tree was 40% faster than a hash table for this data.
// The cost of computing a hash was more than the left/right comparisons.
// This heuristic might miss a few words. That's OK; solving this 100% is hard.
// This class is getting messy. Maybe we should create a 'ResourceNode' subclass to
// help organize things.

也可以用来记录流程和常量：

// TODO: use a faster algorithm
// TODO(dustin): handle other image formats besides JPEG
NUM_THREADS = 8 # as long as it's >= 2 * num_processors, that's good enough.
// Impose a reasonable limit - no human can read that much anyway.
const int MAX_RSS_SUBSCRIPTIONS = 1000;

可用的词有：

TODO : Stuff I haven't gotten around to yet
FIXME : Known-broken code here
HACK : Adimittedly inelegant solution to a problem
XXX : Danger! Major problem here

站在读者的角度去思考

当别人读你的代码时，让他们产生疑问的部分，就是你应该注释的地方。

struct Recorder {
vector<float> data;
...
void Clear() {
vector<float>().swap(data); // Huh? Why not just data.clear()?
}
};

很多C++的程序员啊看到这里，可能会想为什么不用data.clear()来代替vector.swap，所以那个地方应该加上注释：

// Force vector to relinquish its memory (look up "STL swap trick")
vector<float>().swap(data);

说明可能陷阱

你在写代码的过程中，可能用到一些hack，或者有其他需要读代码的人知道的陷阱，这时候就应该注释：

void SendEmail(string to, string subject, string body);

而实际上这个发送邮件的函数是调用别的服务，有超时设置，所以需要注释：

// Calls an external service to deliver email. (Times out after 1 minute.)
void SendEmail(string to, string subject, string body);

全景的注释

有时候为了更清楚说明，需要给整个文件加注释，让读者有个总体的概念：

// This file contains helper functions that provide a more convenient interface to our
// file system. It handles file permissions and other nitty-gritty details.

总结性的注释

即使是在函数内部，也可以有类似文件注释那样的说明注释：

# Find all the items that customers purchased for themselves.
for customer_id in all_customers:
for sale in all_sales[customer_id].sales:
if sale.recipient == customer_id:
...

或者按照函数的步进，写一些注释：

def GenerateUserReport():
# Acquire a lock for this user
...
# Read user's info from the database
...
# Write info to a file
...
# Release the lock for this user

很多人不愿意写注释，确实，要写好注释也不是一件简单的事情，也可以在文件专门的地方，留个写注释的区域，可以写下你任何想说的东西。

注释应简明准确

前一个小节讨论了注释应该写什么，这一节来讨论应该怎么写，因为注释很重要，所以要写的精确，注释也占据屏幕空间，所以要简洁。

精简注释

// The int is the CategoryType.
// The first float in the inner pair is the 'score',
// the second is the 'weight'.
typedef hash_map<int, pair<float, float> > ScoreMap;

这样写太罗嗦了，尽量精简压缩成这样：

// CategoryType -> (score, weight)
typedef hash_map<int, pair<float, float> > ScoreMap;

避免有歧义的代词

// Insert the data into the cache, but check if it's too big first.

这里的it's有歧义，不知道所指的是data还是cache，改成如下：

// Insert the data into the cache, but check if the data is too big first.

还有更好的解决办法，这里的it就有明确所指：

// If the data is small enough, insert it into the cache.

语句要精简准确

# Depending on whether we've already crawled this URL before, give it a different priority.

这句话理解起来太费劲，改成如下就好理解很多：

# Give higher priority to URLs we've never crawled before.

精确描述函数的目的

// Return the number of lines in this file.
int CountLines(string filename) { ... }

这样的一个函数，用起来可能会一头雾水，因为他可以有很多歧义：

”” 一个空文件，是0行还是1行？
“hello” 只有一行，那么返回值是0还是1？
“hello\n” 这种情况返回1还是2？
“hello\n world” 返回1还是2？
“hello\n\r cruel\n world\r” 返回2、3、4哪一个呢？

所以注释应该这样写：

// Count how many newline bytes ('\n') are in the file.
int CountLines(string filename) { ... }

用实例说明边界情况

// Rearrange 'v' so that elements < pivot come before those >= pivot;
// Then return the largest 'i' for which v[i] < pivot (or -1 if none are < pivot)
int Partition(vector<int>* v, int pivot);

这个描述很精确，但是如果再加入一个例子，就更好了：

// ...
// Example: Partition([8 5 9 8 2], 8) might result in [5 2 | 8 9 8] and return 1
int Partition(vector<int>* v, int pivot);

说明你的代码的真正目的

void DisplayProducts(list<Product> products) {
products.sort(CompareProductByPrice);
// Iterate through the list in reverse order
for (list<Product>::reverse_iterator it = products.rbegin(); it != products.rend();
++it)
DisplayPrice(it->price);
...
}

这里的注释说明了倒序排列，单还不够准确，应该改成这样：

// Display each price, from highest to lowest
for (list<Product>::reverse_iterator it = products.rbegin(); ... )

函数调用时的注释

看见这样的一个函数调用，肯定会一头雾水：

Connect(10, false);

如果加上这样的注释，读起来就清楚多了：

def Connect(timeout, use_encryption): ...
# Call the function using named parameters
Connect(timeout = 10, use_encryption = False)

使用信息含量丰富的词

// This class contains a number of members that store the same information as in the
// database, but are stored here for speed. When this class is read from later, those
// members are checked first to see if they exist, and if so are returned; otherwise the
// database is read from and that data stored in those fields for next time.

上面这一大段注释，解释的很清楚，如果换一个词来代替，也不会有什么疑惑：

// This class acts as a caching layer to the database.

简化循环和逻辑

流程控制要简单

让条件语句、循环以及其他控制流程的代码尽可能自然，让读者在阅读过程中不需要停顿思考或者在回头查找，是这一节的目的。

条件语句中参数的位置

对比下面两种条件的写法：

if (length >= 10)
while (bytes_received < bytes_expected)
if (10 <= length)
while (bytes_expected > bytes_received)

到底是应该按照大于小于的顺序来呢，还是有其他的准则？是的，应该按照参数的意义来

运算符左边：通常是需要被检查的变量，也就是会经常变化的
运算符右边：通常是被比对的样本，一定程度上的常量

这就解释了为什么bytes_received < bytes_expected比反过来更好理解。

if/else的顺序

通常，if/else的顺序你可以自由选择，下面这两种都可以：