950 阿裏雲技術社區[雲棲]

編寫可讀代碼的藝術

這是《The Art of Readable Code》的讀書筆記，再加一點自己的認識。強烈推薦此書：

英文版：《The Art of Readable Code》
中文版：編寫可讀代碼的藝術

代碼為什麼要易於理解

“Code should be written to minimize the time it would take for someone else to understand it.”

日常工作的事實是：

寫代碼前的思考和看代碼的時間遠大於真正寫的時間
讀代碼是很平常的事情，不論是別人的，還是自己的，半年前寫的可認為是別人的代碼
代碼可讀性高，很快就可以理解程序的邏輯，進入工作狀態
行數少的代碼不一定就容易理解
代碼的可讀性與程序的效率、架構、易於測試一點也不衝突

整本書都圍繞“如何讓代碼的可讀性更高”這個目標來寫。這也是好代碼的重要標準之一。

如何命名

變量名中應包含更多信息

使用含義明確的詞，比如用download而不是get，參考以下替換方案：

send -> deliver, dispatch, announce, distribute, route
find -> search, extract, locate, recover
start -> lanuch, create, begin, open
make -> create,set up, build, generate, compose, add, new

避免通用的詞

像tmp和retval這樣詞，除了說明是臨時變量和返回值之外，沒有任何意義。但是給他加一些有意義的詞，就會很明確：

tmp_file = tempfile.NamedTemporaryFile()
...
SaveData(tmp_file, ...)

不使用retval而使用變量真正代表的意義：

sum_squares += v[i]; // Where's the "square" that we're summing? Bug!

嵌套的for循環中,i、j也有同樣讓人困惑的時候：

for (int i = 0; i < clubs.size(); i++)
for (int j = 0; j < clubs[i].members.size(); j++)
for (int k = 0; k < users.size(); k++) if (clubs[i].members[k] == users[j])
cout << "user[" << j << "] is in club[" << i << "]" << endl;

換一種寫法就會清晰很多：

if (clubs[ci].members[mi] == users[ui]) # OK. First letters match.

所以，當使用一些通用的詞，要有充分的理由才可以。

使用具體的名字

CanListenOnPort就比ServerCanStart好，can start比較含煳，而listen on port確切的說明了這個方法將要做什麼。

--run_locally就不如--extra_logging來的明確。

增加重要的細節，比如變量的單位`_ms`，對原始字符串加`_raw`

如果一個變量很重要，那麼在名字上多加一些額外的字就會更加易讀，比如將string id; // Example: "af84ef845cd8"換成string hex_id;。

Start(int delay) --> delay → delay_secs
CreateCache(int size) --> size → size_mb
ThrottleDownload(float limit) --> limit → max_kbps
Rotate(float angle) --> angle → degrees_cw

更多例子：

password -> plaintext_password
comment -> unescaped_comment
html -> html_utf8
data -> data_urlenc

對於作用域大的變量使用較長的名字

在比較小的作用域內，可以使用較短的變量名，在較大的作用域內使用的變量，最好用長一點的名字，編輯器的自動補全都可以很好的減少鍵盤輸入。對於一些縮寫前綴，盡量選擇眾所周知的(如str)，一個判斷標準是，當新成員加入時，是否可以無需他人幫助而明白前綴代表什麼。

合理使用`_`、`-`等符號，比如對私有變量加`_`前綴。

var x = new DatePicker(); // DatePicker() 是類的"構造"函數，大寫開始
var y = pageHeight(); // pageHeight() 是一個普通函數
var $all_images = $("img"); // $all_images 是jQuery對象
var height = 250; // height不是
//id和class的寫法分開
<div id="middle_column" class="main-content"> ...

命名不能有歧義

命名的時候可以先想一下，我要用的這個詞是否有別的含義。舉個例子：

results = Database.all_objects.filter("year <= 2011")

現在的結果到底是包含2011年之前的呢還是不包含呢？

使用`min`、`max`代替`limit`

CART_TOO_BIG_LIMIT = 10
if shopping_cart.num_items() >= CART_TOO_BIG_LIMIT:
Error("Too many items in cart.")
MAX_ITEMS_IN_CART = 10
if shopping_cart.num_items() > MAX_ITEMS_IN_CART:
Error("Too many items in cart.")

對比上例中CART_TOO_BIG_LIMIT和MAX_ITEMS_IN_CART，想想哪個更好呢？

使用`first`和`last`來表示閉區間

print integer_range(start=2, stop=4)
# Does this print [2,3] or [2,3,4] (or something else)?
set.PrintKeys(first="Bart", last="Maggie")

first和last含義明確，適宜表示閉區間。

使用`beigin`和`end`表示前閉後開(2,9))區間

PrintEventsInRange("OCT 16 12:00am", "OCT 17 12:00am")
PrintEventsInRange("OCT 16 12:00am", "OCT 16 11:59:59.9999pm")

上麵一種寫法就比下麵的舒服多了。

Boolean型變量命名

bool read_password = true;

這是一個很危險的命名，到底是需要讀取密碼呢，還是密碼已經被讀取呢，不知道，所以這個變量可以使用user_is_authenticated代替。通常，給Boolean型變量添加is、has、can、should可以讓含義更清晰，比如：

SpaceLeft() --> hasSpaceLeft()
bool disable_ssl = false --> bool use_ssl = true

符合預期

public class StatisticsCollector {
public void addSample(double x) { ... }
public double getMean() {
// Iterate through all samples and return total / num_samples
}
...
}

在這個例子中，getMean方法遍曆了所有的樣本，返回總額，所以並不是普通意義上輕量的get方法，所以應該取名computeMean比較合適。

漂亮的格式

寫出來漂亮的格式，充滿美感，讀起來自然也會舒服很多，對比下麵兩個例子：

class StatsKeeper {
public:
// A class for keeping track of a series of doubles
void Add(double d); // and methods for quick statistics about them
private: int count; /* how many so far
*/ public:
double Average();
private: double minimum;
list<double>
past_items
;double maximum;
};

什麼是充滿美感的呢：

// A class for keeping track of a series of doubles
// and methods for quick statistics about them.
class StatsKeeper {
public:
void Add(double d);
double Average();
private:
list<double> past_items;
int count; // how many so far
double minimum;
double maximum;
};

考慮斷行的連續性和簡潔

這段代碼需要斷行，來滿足不超過一行80個字符的要求，參數也需要注釋說明：

public class PerformanceTester {
public static final TcpConnectionSimulator wifi = new TcpConnectionSimulator(
500, /* Kbps */
80, /* millisecs latency */
200, /* jitter */
1 /* packet loss % */);
public static final TcpConnectionSimulator t3_fiber = new TcpConnectionSimulator(
45000, /* Kbps */
10, /* millisecs latency */
0, /* jitter */
0 /* packet loss % */);
public static final TcpConnectionSimulator cell = new TcpConnectionSimulator(
100, /* Kbps */
400, /* millisecs latency */
250, /* jitter */
5 /* packet loss % */);
}

考慮到代碼的連貫性，先優化成這樣：

public class PerformanceTester {
public static final TcpConnectionSimulator wifi =
new TcpConnectionSimulator(
500, /* Kbps */
80, /* millisecs latency */ 200, /* jitter */
1 /* packet loss % */);
public static final TcpConnectionSimulator t3_fiber =
new TcpConnectionSimulator(
45000, /* Kbps */
10, /* millisecs latency */
0, /* jitter */
0 /* packet loss % */);
public static final TcpConnectionSimulator cell =
new TcpConnectionSimulator(
100, /* Kbps */
400, /* millisecs latency */
250, /* jitter */
5 /* packet loss % */);
}

連貫性好一點，但還是太羅嗦，額外占用很多空間：

public class PerformanceTester {
// TcpConnectionSimulator(throughput, latency, jitter, packet_loss)
// [Kbps] [ms] [ms] [percent]
public static final TcpConnectionSimulator wifi =
new TcpConnectionSimulator(500, 80, 200, 1);
public static final TcpConnectionSimulator t3_fiber =
new TcpConnectionSimulator(45000, 10, 0, 0);
public static final TcpConnectionSimulator cell =
new TcpConnectionSimulator(100, 400, 250, 5);
}

用函數封裝

// Turn a partial_name like "Doug Adams" into "Mr. Douglas Adams".
// If not possible, 'error' is filled with an explanation.
string ExpandFullName(DatabaseConnection dc, string partial_name, string* error);
DatabaseConnection database_connection;
string error;
assert(ExpandFullName(database_connection, "Doug Adams", &error)
== "Mr. Douglas Adams");
assert(error == "");
assert(ExpandFullName(database_connection, " Jake Brown ", &error)
== "Mr. Jacob Brown III");
assert(error == "");
assert(ExpandFullName(database_connection, "No Such Guy", &error) == "");
assert(error == "no match found");
assert(ExpandFullName(database_connection, "John", &error) == "");
assert(error == "more than one result");

上麵這段代碼看起來很髒亂，很多重複性的東西，可以用函數封裝：

CheckFullName("Doug Adams", "Mr. Douglas Adams", "");
CheckFullName(" Jake Brown ", "Mr. Jake Brown III", "");
CheckFullName("No Such Guy", "", "no match found");
CheckFullName("John", "", "more than one result");
void CheckFullName(string partial_name,
string expected_full_name,
string expected_error) {
// database_connection is now a class member
string error;
string full_name = ExpandFullName(database_connection, partial_name, &error);
assert(error == expected_error);
assert(full_name == expected_full_name);
}

列對齊

列對齊可以讓代碼段看起來更舒適：

CheckFullName("Doug Adams" , "Mr. Douglas Adams" , "");
CheckFullName(" Jake Brown ", "Mr. Jake Brown III", "");
CheckFullName("No Such Guy" , "" , "no match found");
CheckFullName("John" , "" , "more than one result");
commands[] = {
...
{ "timeout" , NULL , cmd_spec_timeout},
{ "timestamping" , &opt.timestamping , cmd_boolean},
{ "tries" , &opt.ntry , cmd_number_inf},
{ "useproxy" , &opt.use_proxy , cmd_boolean},
{ "useragent" , NULL , cmd_spec_useragent},
...
};

代碼用塊區分

class FrontendServer {
public:
FrontendServer();
void ViewProfile(HttpRequest* request);
void OpenDatabase(string location, string user);
void SaveProfile(HttpRequest* request);
string ExtractQueryParam(HttpRequest* request, string param);
void ReplyOK(HttpRequest* request, string html);
void FindFriends(HttpRequest* request);
void ReplyNotFound(HttpRequest* request, string error);
void CloseDatabase(string location);
~FrontendServer();
};

上麵這一段雖然能看，不過還有優化空間：

class FrontendServer {
public:
FrontendServer();
~FrontendServer();
// Handlers
void ViewProfile(HttpRequest* request);
void SaveProfile(HttpRequest* request);
void FindFriends(HttpRequest* request);
// Request/Reply Utilities
string ExtractQueryParam(HttpRequest* request, string param);
void ReplyOK(HttpRequest* request, string html);
void ReplyNotFound(HttpRequest* request, string error);
// Database Helpers
void OpenDatabase(string location, string user);
void CloseDatabase(string location);
};

再來看一段代碼：

# Import the user's email contacts, and match them to users in our system.
# Then display a list of those users that he/she isn't already friends with.
def suggest_new_friends(user, email_password):
friends = user.friends()
friend_emails = set(f.email for f in friends)
contacts = import_contacts(user.email, email_password)
contact_emails = set(c.email for c in contacts)
non_friend_emails = contact_emails - friend_emails
suggested_friends = User.objects.select(email__in=non_friend_emails)
display['user'] = user
display['friends'] = friends
display['suggested_friends'] = suggested_friends
return render("suggested_friends.html", display)

全都混在一起，視覺壓力相當大，按功能化塊：

def suggest_new_friends(user, email_password):
# Get the user's friends' email addresses.
friends = user.friends()
friend_emails = set(f.email for f in friends)
# Import all email addresses from this user's email account.
contacts = import_contacts(user.email, email_password)
contact_emails = set(c.email for c in contacts)
# Find matching users that they aren't already friends with.
non_friend_emails = contact_emails - friend_emails
suggested_friends = User.objects.select(email__in=non_friend_emails)
# Display these lists on the page. display['user'] = user
display['friends'] = friends
display['suggested_friends'] = suggested_friends
return render("suggested_friends.html", display)

讓代碼看起來更舒服，需要在寫的過程中多注意，培養一些好的習慣，尤其當團隊合作的時候，代碼風格比如大括號的位置並沒有對錯，但是不遵循團隊規範那就是錯的。

如何寫注釋

當你寫代碼的時候，你會思考很多，但是最終呈現給讀者的就隻剩代碼本身了，額外的信息丟失了，所以注釋的目的就是讓讀者了解更多的信息。

應該注釋什麼

不應該注釋什麼

這樣的注釋毫無價值：

// The class definition for Account
class Account {
public:
// Constructor
Account();
// Set the profit member to a new value
void SetProfit(double profit);
// Return the profit from this Account
double GetProfit();
};

不要像下麵這樣為了注釋而注釋：

// Find a Node with the given 'name' or return NULL.
// If depth <= 0, only 'subtree' is inspected.
// If depth == N, only 'subtree' and N levels below are inspected.
Node* FindNodeInSubtree(Node* subtree, string name, int depth);

不要給爛取名注釋

// Enforce limits on the Reply as stated in the Request,
// such as the number of items returned, or total byte size, etc.
void CleanReply(Request request, Reply reply);

注釋的大部分都在解釋clean是什麼意思，那不如換個正確的名字：

// Make sure 'reply' meets the count/byte/etc. limits from the 'request'
void EnforceLimitsFromRequest(Request request, Reply reply);

記錄你的想法

我們討論了不該注釋什麼，那麼應該注釋什麼呢？注釋應該記錄你思考代碼怎麼寫的結果，比如像下麵這些：

// Surprisingly, a binary tree was 40% faster than a hash table for this data.
// The cost of computing a hash was more than the left/right comparisons.
// This heuristic might miss a few words. That's OK; solving this 100% is hard.
// This class is getting messy. Maybe we should create a 'ResourceNode' subclass to
// help organize things.

也可以用來記錄流程和常量：

// TODO: use a faster algorithm
// TODO(dustin): handle other image formats besides JPEG
NUM_THREADS = 8 # as long as it's >= 2 * num_processors, that's good enough.
// Impose a reasonable limit - no human can read that much anyway.
const int MAX_RSS_SUBSCRIPTIONS = 1000;

可用的詞有：

TODO : Stuff I haven't gotten around to yet
FIXME : Known-broken code here
HACK : Adimittedly inelegant solution to a problem
XXX : Danger! Major problem here

站在讀者的角度去思考

當別人讀你的代碼時，讓他們產生疑問的部分，就是你應該注釋的地方。

struct Recorder {
vector<float> data;
...
void Clear() {
vector<float>().swap(data); // Huh? Why not just data.clear()?
}
};

很多C++的程序員啊看到這裏，可能會想為什麼不用data.clear()來代替vector.swap，所以那個地方應該加上注釋：

// Force vector to relinquish its memory (look up "STL swap trick")
vector<float>().swap(data);

說明可能陷阱

你在寫代碼的過程中，可能用到一些hack，或者有其他需要讀代碼的人知道的陷阱，這時候就應該注釋：

void SendEmail(string to, string subject, string body);

而實際上這個發送郵件的函數是調用別的服務，有超時設置，所以需要注釋：

// Calls an external service to deliver email. (Times out after 1 minute.)
void SendEmail(string to, string subject, string body);

全景的注釋

有時候為了更清楚說明，需要給整個文件加注釋，讓讀者有個總體的概念：

// This file contains helper functions that provide a more convenient interface to our
// file system. It handles file permissions and other nitty-gritty details.

總結性的注釋

即使是在函數內部，也可以有類似文件注釋那樣的說明注釋：

# Find all the items that customers purchased for themselves.
for customer_id in all_customers:
for sale in all_sales[customer_id].sales:
if sale.recipient == customer_id:
...

或者按照函數的步進，寫一些注釋：

def GenerateUserReport():
# Acquire a lock for this user
...
# Read user's info from the database
...
# Write info to a file
...
# Release the lock for this user

很多人不願意寫注釋，確實，要寫好注釋也不是一件簡單的事情，也可以在文件專門的地方，留個寫注釋的區域，可以寫下你任何想說的東西。

注釋應簡明準確

前一個小節討論了注釋應該寫什麼，這一節來討論應該怎麼寫，因為注釋很重要，所以要寫的精確，注釋也占據屏幕空間，所以要簡潔。

精簡注釋

// The int is the CategoryType.
// The first float in the inner pair is the 'score',
// the second is the 'weight'.
typedef hash_map<int, pair<float, float> > ScoreMap;

這樣寫太羅嗦了，盡量精簡壓縮成這樣：

// CategoryType -> (score, weight)
typedef hash_map<int, pair<float, float> > ScoreMap;

避免有歧義的代詞

// Insert the data into the cache, but check if it's too big first.

這裏的it's有歧義，不知道所指的是data還是cache，改成如下：

// Insert the data into the cache, but check if the data is too big first.

還有更好的解決辦法，這裏的it就有明確所指：

// If the data is small enough, insert it into the cache.

語句要精簡準確

# Depending on whether we've already crawled this URL before, give it a different priority.

這句話理解起來太費勁，改成如下就好理解很多：

# Give higher priority to URLs we've never crawled before.

精確描述函數的目的

// Return the number of lines in this file.
int CountLines(string filename) { ... }

這樣的一個函數，用起來可能會一頭霧水，因為他可以有很多歧義：

”” 一個空文件，是0行還是1行？
“hello” 隻有一行，那麼返回值是0還是1？
“hello\n” 這種情況返回1還是2？
“hello\n world” 返回1還是2？
“hello\n\r cruel\n world\r” 返回2、3、4哪一個呢？

所以注釋應該這樣寫：

// Count how many newline bytes ('\n') are in the file.
int CountLines(string filename) { ... }

用實例說明邊界情況

// Rearrange 'v' so that elements < pivot come before those >= pivot;
// Then return the largest 'i' for which v[i] < pivot (or -1 if none are < pivot)
int Partition(vector<int>* v, int pivot);

這個描述很精確，但是如果再加入一個例子，就更好了：

// ...
// Example: Partition([8 5 9 8 2], 8) might result in [5 2 | 8 9 8] and return 1
int Partition(vector<int>* v, int pivot);

說明你的代碼的真正目的

void DisplayProducts(list<Product> products) {
products.sort(CompareProductByPrice);
// Iterate through the list in reverse order
for (list<Product>::reverse_iterator it = products.rbegin(); it != products.rend();
++it)
DisplayPrice(it->price);
...
}

這裏的注釋說明了倒序排列，單還不夠準確，應該改成這樣：

// Display each price, from highest to lowest
for (list<Product>::reverse_iterator it = products.rbegin(); ... )

函數調用時的注釋

看見這樣的一個函數調用，肯定會一頭霧水：

Connect(10, false);

如果加上這樣的注釋，讀起來就清楚多了：

def Connect(timeout, use_encryption): ...
# Call the function using named parameters
Connect(timeout = 10, use_encryption = False)

使用信息含量豐富的詞

// This class contains a number of members that store the same information as in the
// database, but are stored here for speed. When this class is read from later, those
// members are checked first to see if they exist, and if so are returned; otherwise the
// database is read from and that data stored in those fields for next time.

上麵這一大段注釋，解釋的很清楚，如果換一個詞來代替，也不會有什麼疑惑：

// This class acts as a caching layer to the database.

簡化循環和邏輯

流程控製要簡單

讓條件語句、循環以及其他控製流程的代碼盡可能自然，讓讀者在閱讀過程中不需要停頓思考或者在回頭查找，是這一節的目的。

條件語句中參數的位置

對比下麵兩種條件的寫法：

if (length >= 10)
while (bytes_received < bytes_expected)
if (10 <= length)
while (bytes_expected > bytes_received)

到底是應該按照大於小於的順序來呢，還是有其他的準則？是的，應該按照參數的意義來

運算符左邊：通常是需要被檢查的變量，也就是會經常變化的
運算符右邊：通常是被比對的樣本，一定程度上的常量

這就解釋了為什麼bytes_received < bytes_expected比反過來更好理解。

if/else的順序

通常，if/else的順序你可以自由選擇，下麵這兩種都可以：