編寫可讀代碼的藝術
這是《The Art of Readable Code》的讀書筆記,再加一點自己的認識。強烈推薦此書:
- 英文版:《The Art of Readable Code》
- 中文版:編寫可讀代碼的藝術
代碼為什麼要易於理解
“Code should be written to minimize the time it would take for someone else to understand it.”
日常工作的事實是:
- 寫代碼前的思考和看代碼的時間遠大於真正寫的時間
- 讀代碼是很平常的事情,不論是別人的,還是自己的,半年前寫的可認為是別人的代碼
- 代碼可讀性高,很快就可以理解程序的邏輯,進入工作狀態
- 行數少的代碼不一定就容易理解
- 代碼的可讀性與程序的效率、架構、易於測試一點也不衝突
整本書都圍繞“如何讓代碼的可讀性更高”這個目標來寫。這也是好代碼的重要標準之一。
如何命名
變量名中應包含更多信息
使用含義明確的詞,比如用download而不是get,參考以下替換方案:
send -> deliver, dispatch, announce, distribute, route
find -> search, extract, locate, recover
start -> lanuch, create, begin, open
make -> create,set up, build, generate, compose, add, new
避免通用的詞
像tmp
和retval
這樣詞,除了說明是臨時變量和返回值之外,沒有任何意義。但是給他加一些有意義的詞,就會很明確:
tmp_file = tempfile.NamedTemporaryFile()
...
SaveData(tmp_file, ...)
不使用retval而使用變量真正代表的意義:
sum_squares += v[i]; // Where's the "square" that we're summing? Bug!
嵌套的for循環中,i
、j
也有同樣讓人困惑的時候:
for (int i = 0; i < clubs.size(); i++)
for (int j = 0; j < clubs[i].members.size(); j++)
for (int k = 0; k < users.size(); k++) if (clubs[i].members[k] == users[j])
cout << "user[" << j << "] is in club[" << i << "]" << endl;
換一種寫法就會清晰很多:
if (clubs[ci].members[mi] == users[ui]) # OK. First letters match.
所以,當使用一些通用的詞,要有充分的理由才可以。
使用具體的名字
CanListenOnPort
就比ServerCanStart
好,can start比較含煳,而listen on port確切的說明了這個方法將要做什麼。
--run_locally
就不如--extra_logging
來的明確。
增加重要的細節,比如變量的單位_ms
,對原始字符串加_raw
如果一個變量很重要,那麼在名字上多加一些額外的字就會更加易讀,比如將string id; // Example: "af84ef845cd8"
換成string hex_id;
。
Start(int delay) --> delay → delay_secs
CreateCache(int size) --> size → size_mb
ThrottleDownload(float limit) --> limit → max_kbps
Rotate(float angle) --> angle → degrees_cw
更多例子:
password -> plaintext_password
comment -> unescaped_comment
html -> html_utf8
data -> data_urlenc
對於作用域大的變量使用較長的名字
在比較小的作用域內,可以使用較短的變量名,在較大的作用域內使用的變量,最好用長一點的名字,編輯器的自動補全都可以很好的減少鍵盤輸入。對於一些縮寫前綴,盡量選擇眾所周知的(如str),一個判斷標準是,當新成員加入時,是否可以無需他人幫助而明白前綴代表什麼。
合理使用_
、-
等符號,比如對私有變量加_
前綴。
var x = new DatePicker(); // DatePicker() 是類的"構造"函數,大寫開始
var y = pageHeight(); // pageHeight() 是一個普通函數
var $all_images = $("img"); // $all_images 是jQuery對象
var height = 250; // height不是
//id和class的寫法分開
<div id="middle_column" class="main-content"> ...
命名不能有歧義
命名的時候可以先想一下,我要用的這個詞是否有別的含義。舉個例子:
results = Database.all_objects.filter("year <= 2011")
現在的結果到底是包含2011年之前的呢還是不包含呢?
使用min
、max
代替limit
CART_TOO_BIG_LIMIT = 10
if shopping_cart.num_items() >= CART_TOO_BIG_LIMIT:
Error("Too many items in cart.")
MAX_ITEMS_IN_CART = 10
if shopping_cart.num_items() > MAX_ITEMS_IN_CART:
Error("Too many items in cart.")
對比上例中CART_TOO_BIG_LIMIT
和MAX_ITEMS_IN_CART
,想想哪個更好呢?
使用first
和last
來表示閉區間
print integer_range(start=2, stop=4)
# Does this print [2,3] or [2,3,4] (or something else)?
set.PrintKeys(first="Bart", last="Maggie")
first
和last
含義明確,適宜表示閉區間。
使用beigin
和end
表示前閉後開(2,9))區間
PrintEventsInRange("OCT 16 12:00am", "OCT 17 12:00am")
PrintEventsInRange("OCT 16 12:00am", "OCT 16 11:59:59.9999pm")
上麵一種寫法就比下麵的舒服多了。
Boolean型變量命名
bool read_password = true;
這是一個很危險的命名,到底是需要讀取密碼呢,還是密碼已經被讀取呢,不知道,所以這個變量可以使用user_is_authenticated
代替。通常,給Boolean型變量添加is
、has
、can
、should
可以讓含義更清晰,比如:
SpaceLeft() --> hasSpaceLeft()
bool disable_ssl = false --> bool use_ssl = true
符合預期
public class StatisticsCollector {
public void addSample(double x) { ... }
public double getMean() {
// Iterate through all samples and return total / num_samples
}
...
}
在這個例子中,getMean
方法遍曆了所有的樣本,返回總額,所以並不是普通意義上輕量的get
方法,所以應該取名computeMean
比較合適。
漂亮的格式
寫出來漂亮的格式,充滿美感,讀起來自然也會舒服很多,對比下麵兩個例子:
class StatsKeeper {
public:
// A class for keeping track of a series of doubles
void Add(double d); // and methods for quick statistics about them
private: int count; /* how many so far
*/ public:
double Average();
private: double minimum;
list<double>
past_items
;double maximum;
};
什麼是充滿美感的呢:
// A class for keeping track of a series of doubles
// and methods for quick statistics about them.
class StatsKeeper {
public:
void Add(double d);
double Average();
private:
list<double> past_items;
int count; // how many so far
double minimum;
double maximum;
};
考慮斷行的連續性和簡潔
這段代碼需要斷行,來滿足不超過一行80個字符的要求,參數也需要注釋說明:
public class PerformanceTester {
public static final TcpConnectionSimulator wifi = new TcpConnectionSimulator(
500, /* Kbps */
80, /* millisecs latency */
200, /* jitter */
1 /* packet loss % */);
public static final TcpConnectionSimulator t3_fiber = new TcpConnectionSimulator(
45000, /* Kbps */
10, /* millisecs latency */
0, /* jitter */
0 /* packet loss % */);
public static final TcpConnectionSimulator cell = new TcpConnectionSimulator(
100, /* Kbps */
400, /* millisecs latency */
250, /* jitter */
5 /* packet loss % */);
}
考慮到代碼的連貫性,先優化成這樣:
public class PerformanceTester {
public static final TcpConnectionSimulator wifi =
new TcpConnectionSimulator(
500, /* Kbps */
80, /* millisecs latency */ 200, /* jitter */
1 /* packet loss % */);
public static final TcpConnectionSimulator t3_fiber =
new TcpConnectionSimulator(
45000, /* Kbps */
10, /* millisecs latency */
0, /* jitter */
0 /* packet loss % */);
public static final TcpConnectionSimulator cell =
new TcpConnectionSimulator(
100, /* Kbps */
400, /* millisecs latency */
250, /* jitter */
5 /* packet loss % */);
}
連貫性好一點,但還是太羅嗦,額外占用很多空間:
public class PerformanceTester {
// TcpConnectionSimulator(throughput, latency, jitter, packet_loss)
// [Kbps] [ms] [ms] [percent]
public static final TcpConnectionSimulator wifi =
new TcpConnectionSimulator(500, 80, 200, 1);
public static final TcpConnectionSimulator t3_fiber =
new TcpConnectionSimulator(45000, 10, 0, 0);
public static final TcpConnectionSimulator cell =
new TcpConnectionSimulator(100, 400, 250, 5);
}
用函數封裝
// Turn a partial_name like "Doug Adams" into "Mr. Douglas Adams".
// If not possible, 'error' is filled with an explanation.
string ExpandFullName(DatabaseConnection dc, string partial_name, string* error);
DatabaseConnection database_connection;
string error;
assert(ExpandFullName(database_connection, "Doug Adams", &error)
== "Mr. Douglas Adams");
assert(error == "");
assert(ExpandFullName(database_connection, " Jake Brown ", &error)
== "Mr. Jacob Brown III");
assert(error == "");
assert(ExpandFullName(database_connection, "No Such Guy", &error) == "");
assert(error == "no match found");
assert(ExpandFullName(database_connection, "John", &error) == "");
assert(error == "more than one result");
上麵這段代碼看起來很髒亂,很多重複性的東西,可以用函數封裝:
CheckFullName("Doug Adams", "Mr. Douglas Adams", "");
CheckFullName(" Jake Brown ", "Mr. Jake Brown III", "");
CheckFullName("No Such Guy", "", "no match found");
CheckFullName("John", "", "more than one result");
void CheckFullName(string partial_name,
string expected_full_name,
string expected_error) {
// database_connection is now a class member
string error;
string full_name = ExpandFullName(database_connection, partial_name, &error);
assert(error == expected_error);
assert(full_name == expected_full_name);
}
列對齊
列對齊可以讓代碼段看起來更舒適:
CheckFullName("Doug Adams" , "Mr. Douglas Adams" , "");
CheckFullName(" Jake Brown ", "Mr. Jake Brown III", "");
CheckFullName("No Such Guy" , "" , "no match found");
CheckFullName("John" , "" , "more than one result");
commands[] = {
...
{ "timeout" , NULL , cmd_spec_timeout},
{ "timestamping" , &opt.timestamping , cmd_boolean},
{ "tries" , &opt.ntry , cmd_number_inf},
{ "useproxy" , &opt.use_proxy , cmd_boolean},
{ "useragent" , NULL , cmd_spec_useragent},
...
};
代碼用塊區分
class FrontendServer {
public:
FrontendServer();
void ViewProfile(HttpRequest* request);
void OpenDatabase(string location, string user);
void SaveProfile(HttpRequest* request);
string ExtractQueryParam(HttpRequest* request, string param);
void ReplyOK(HttpRequest* request, string html);
void FindFriends(HttpRequest* request);
void ReplyNotFound(HttpRequest* request, string error);
void CloseDatabase(string location);
~FrontendServer();
};
上麵這一段雖然能看,不過還有優化空間:
class FrontendServer {
public:
FrontendServer();
~FrontendServer();
// Handlers
void ViewProfile(HttpRequest* request);
void SaveProfile(HttpRequest* request);
void FindFriends(HttpRequest* request);
// Request/Reply Utilities
string ExtractQueryParam(HttpRequest* request, string param);
void ReplyOK(HttpRequest* request, string html);
void ReplyNotFound(HttpRequest* request, string error);
// Database Helpers
void OpenDatabase(string location, string user);
void CloseDatabase(string location);
};
再來看一段代碼:
# Import the user's email contacts, and match them to users in our system.
# Then display a list of those users that he/she isn't already friends with.
def suggest_new_friends(user, email_password):
friends = user.friends()
friend_emails = set(f.email for f in friends)
contacts = import_contacts(user.email, email_password)
contact_emails = set(c.email for c in contacts)
non_friend_emails = contact_emails - friend_emails
suggested_friends = User.objects.select(email__in=non_friend_emails)
display['user'] = user
display['friends'] = friends
display['suggested_friends'] = suggested_friends
return render("suggested_friends.html", display)
全都混在一起,視覺壓力相當大,按功能化塊:
def suggest_new_friends(user, email_password):
# Get the user's friends' email addresses.
friends = user.friends()
friend_emails = set(f.email for f in friends)
# Import all email addresses from this user's email account.
contacts = import_contacts(user.email, email_password)
contact_emails = set(c.email for c in contacts)
# Find matching users that they aren't already friends with.
non_friend_emails = contact_emails - friend_emails
suggested_friends = User.objects.select(email__in=non_friend_emails)
# Display these lists on the page. display['user'] = user
display['friends'] = friends
display['suggested_friends'] = suggested_friends
return render("suggested_friends.html", display)
讓代碼看起來更舒服,需要在寫的過程中多注意,培養一些好的習慣,尤其當團隊合作的時候,代碼風格比如大括號的位置並沒有對錯,但是不遵循團隊規範那就是錯的。
如何寫注釋
當你寫代碼的時候,你會思考很多,但是最終呈現給讀者的就隻剩代碼本身了,額外的信息丟失了,所以注釋的目的就是讓讀者了解更多的信息。
應該注釋什麼
不應該注釋什麼
這樣的注釋毫無價值:
// The class definition for Account
class Account {
public:
// Constructor
Account();
// Set the profit member to a new value
void SetProfit(double profit);
// Return the profit from this Account
double GetProfit();
};
不要像下麵這樣為了注釋而注釋:
// Find a Node with the given 'name' or return NULL.
// If depth <= 0, only 'subtree' is inspected.
// If depth == N, only 'subtree' and N levels below are inspected.
Node* FindNodeInSubtree(Node* subtree, string name, int depth);
不要給爛取名注釋
// Enforce limits on the Reply as stated in the Request,
// such as the number of items returned, or total byte size, etc.
void CleanReply(Request request, Reply reply);
注釋的大部分都在解釋clean是什麼意思,那不如換個正確的名字:
// Make sure 'reply' meets the count/byte/etc. limits from the 'request'
void EnforceLimitsFromRequest(Request request, Reply reply);
記錄你的想法
我們討論了不該注釋什麼,那麼應該注釋什麼呢?注釋應該記錄你思考代碼怎麼寫的結果,比如像下麵這些:
// Surprisingly, a binary tree was 40% faster than a hash table for this data.
// The cost of computing a hash was more than the left/right comparisons.
// This heuristic might miss a few words. That's OK; solving this 100% is hard.
// This class is getting messy. Maybe we should create a 'ResourceNode' subclass to
// help organize things.
也可以用來記錄流程和常量:
// TODO: use a faster algorithm
// TODO(dustin): handle other image formats besides JPEG
NUM_THREADS = 8 # as long as it's >= 2 * num_processors, that's good enough.
// Impose a reasonable limit - no human can read that much anyway.
const int MAX_RSS_SUBSCRIPTIONS = 1000;
可用的詞有:
TODO : Stuff I haven't gotten around to yet
FIXME : Known-broken code here
HACK : Adimittedly inelegant solution to a problem
XXX : Danger! Major problem here
站在讀者的角度去思考
當別人讀你的代碼時,讓他們產生疑問的部分,就是你應該注釋的地方。
struct Recorder {
vector<float> data;
...
void Clear() {
vector<float>().swap(data); // Huh? Why not just data.clear()?
}
};
很多C++的程序員啊看到這裏,可能會想為什麼不用data.clear()
來代替vector.swap
,所以那個地方應該加上注釋:
// Force vector to relinquish its memory (look up "STL swap trick")
vector<float>().swap(data);
說明可能陷阱
你在寫代碼的過程中,可能用到一些hack,或者有其他需要讀代碼的人知道的陷阱,這時候就應該注釋:
void SendEmail(string to, string subject, string body);
而實際上這個發送郵件的函數是調用別的服務,有超時設置,所以需要注釋:
// Calls an external service to deliver email. (Times out after 1 minute.)
void SendEmail(string to, string subject, string body);
全景的注釋
有時候為了更清楚說明,需要給整個文件加注釋,讓讀者有個總體的概念:
// This file contains helper functions that provide a more convenient interface to our
// file system. It handles file permissions and other nitty-gritty details.
總結性的注釋
即使是在函數內部,也可以有類似文件注釋那樣的說明注釋:
# Find all the items that customers purchased for themselves.
for customer_id in all_customers:
for sale in all_sales[customer_id].sales:
if sale.recipient == customer_id:
...
或者按照函數的步進,寫一些注釋:
def GenerateUserReport():
# Acquire a lock for this user
...
# Read user's info from the database
...
# Write info to a file
...
# Release the lock for this user
很多人不願意寫注釋,確實,要寫好注釋也不是一件簡單的事情,也可以在文件專門的地方,留個寫注釋的區域,可以寫下你任何想說的東西。
注釋應簡明準確
前一個小節討論了注釋應該寫什麼,這一節來討論應該怎麼寫,因為注釋很重要,所以要寫的精確,注釋也占據屏幕空間,所以要簡潔。
精簡注釋
// The int is the CategoryType.
// The first float in the inner pair is the 'score',
// the second is the 'weight'.
typedef hash_map<int, pair<float, float> > ScoreMap;
這樣寫太羅嗦了,盡量精簡壓縮成這樣:
// CategoryType -> (score, weight)
typedef hash_map<int, pair<float, float> > ScoreMap;
避免有歧義的代詞
// Insert the data into the cache, but check if it's too big first.
這裏的it's
有歧義,不知道所指的是data
還是cache
,改成如下:
// Insert the data into the cache, but check if the data is too big first.
還有更好的解決辦法,這裏的it
就有明確所指:
// If the data is small enough, insert it into the cache.
語句要精簡準確
# Depending on whether we've already crawled this URL before, give it a different priority.
這句話理解起來太費勁,改成如下就好理解很多:
# Give higher priority to URLs we've never crawled before.
精確描述函數的目的
// Return the number of lines in this file.
int CountLines(string filename) { ... }
這樣的一個函數,用起來可能會一頭霧水,因為他可以有很多歧義:
- ”” 一個空文件,是0行還是1行?
- “hello” 隻有一行,那麼返回值是0還是1?
- “hello\n” 這種情況返回1還是2?
- “hello\n world” 返回1還是2?
- “hello\n\r cruel\n world\r” 返回2、3、4哪一個呢?
所以注釋應該這樣寫:
// Count how many newline bytes ('\n') are in the file.
int CountLines(string filename) { ... }
用實例說明邊界情況
// Rearrange 'v' so that elements < pivot come before those >= pivot;
// Then return the largest 'i' for which v[i] < pivot (or -1 if none are < pivot)
int Partition(vector<int>* v, int pivot);
這個描述很精確,但是如果再加入一個例子,就更好了:
// ...
// Example: Partition([8 5 9 8 2], 8) might result in [5 2 | 8 9 8] and return 1
int Partition(vector<int>* v, int pivot);
說明你的代碼的真正目的
void DisplayProducts(list<Product> products) {
products.sort(CompareProductByPrice);
// Iterate through the list in reverse order
for (list<Product>::reverse_iterator it = products.rbegin(); it != products.rend();
++it)
DisplayPrice(it->price);
...
}
這裏的注釋說明了倒序排列,單還不夠準確,應該改成這樣:
// Display each price, from highest to lowest
for (list<Product>::reverse_iterator it = products.rbegin(); ... )
函數調用時的注釋
看見這樣的一個函數調用,肯定會一頭霧水:
Connect(10, false);
如果加上這樣的注釋,讀起來就清楚多了:
def Connect(timeout, use_encryption): ...
# Call the function using named parameters
Connect(timeout = 10, use_encryption = False)
使用信息含量豐富的詞
// This class contains a number of members that store the same information as in the
// database, but are stored here for speed. When this class is read from later, those
// members are checked first to see if they exist, and if so are returned; otherwise the
// database is read from and that data stored in those fields for next time.
上麵這一大段注釋,解釋的很清楚,如果換一個詞來代替,也不會有什麼疑惑:
// This class acts as a caching layer to the database.
簡化循環和邏輯
流程控製要簡單
讓條件語句、循環以及其他控製流程的代碼盡可能自然,讓讀者在閱讀過程中不需要停頓思考或者在回頭查找,是這一節的目的。
條件語句中參數的位置
對比下麵兩種條件的寫法:
if (length >= 10)
while (bytes_received < bytes_expected)
if (10 <= length)
while (bytes_expected > bytes_received)
到底是應該按照大於小於的順序來呢,還是有其他的準則?是的,應該按照參數的意義來
- 運算符左邊:通常是需要被檢查的變量,也就是會經常變化的
- 運算符右邊:通常是被比對的樣本,一定程度上的常量
這就解釋了為什麼bytes_received < bytes_expected
比反過來更好理解。
if/else的順序
通常,if/else
的順序你可以自由選擇,下麵這兩種都可以:
if (a == b) {
// Case One ...<
最後更新:2017-04-03 21:30:11
上一篇:
50行Python代碼製作一個計算器
下一篇:
飛信還能飛起來嗎?