阅读950 返回首页    go 阿里云 go 技术社区[云栖]


编写可读代码的艺术

这是《The Art of Readable Code》的读书笔记,再加一点自己的认识。强烈推荐此书:

代码为什么要易于理解

“Code should be written to minimize the time it would take for someone else to understand it.”

日常工作的事实是:

  • 写代码前的思考和看代码的时间远大于真正写的时间
  • 读代码是很平常的事情,不论是别人的,还是自己的,半年前写的可认为是别人的代码
  • 代码可读性高,很快就可以理解程序的逻辑,进入工作状态
  • 行数少的代码不一定就容易理解
  • 代码的可读性与程序的效率、架构、易于测试一点也不冲突

整本书都围绕“如何让代码的可读性更高”这个目标来写。这也是好代码的重要标准之一。

如何命名

变量名中应包含更多信息

使用含义明确的词,比如用download而不是get,参考以下替换方案:


  1. send -> deliver, dispatch, announce, distribute, route
  2. find -> search, extract, locate, recover
  3. start -> lanuch, create, begin, open
  4. make -> create,set up, build, generate, compose, add, new

避免通用的词

tmpretval这样词,除了说明是临时变量和返回值之外,没有任何意义。但是给他加一些有意义的词,就会很明确:


  1. tmp_file = tempfile.NamedTemporaryFile()
  2. ...
  3. SaveData(tmp_file, ...)

不使用retval而使用变量真正代表的意义:


  1. sum_squares += v[i]; // Where's the "square" that we're summing? Bug!

嵌套的for循环中,ij也有同样让人困惑的时候:


  1. for (int i = 0; i < clubs.size(); i++)
  2. for (int j = 0; j < clubs[i].members.size(); j++)
  3. for (int k = 0; k < users.size(); k++) if (clubs[i].members[k] == users[j])
  4. cout << "user[" << j << "] is in club[" << i << "]" << endl;

换一种写法就会清晰很多:


  1. if (clubs[ci].members[mi] == users[ui]) # OK. First letters match.

所以,当使用一些通用的词,要有充分的理由才可以。

使用具体的名字

CanListenOnPort就比ServerCanStart好,can start比较含糊,而listen on port确切的说明了这个方法将要做什么。

--run_locally就不如--extra_logging来的明确。

增加重要的细节,比如变量的单位_ms,对原始字符串加_raw

如果一个变量很重要,那么在名字上多加一些额外的字就会更加易读,比如将string id; // Example: "af84ef845cd8"换成string hex_id;


  1. Start(int delay) --> delay delay_secs
  2. CreateCache(int size) --> size size_mb
  3. ThrottleDownload(float limit) --> limit max_kbps
  4. Rotate(float angle) --> angle degrees_cw

更多例子:


  1. password -> plaintext_password
  2. comment -> unescaped_comment
  3. html -> html_utf8
  4. data -> data_urlenc

对于作用域大的变量使用较长的名字

在比较小的作用域内,可以使用较短的变量名,在较大的作用域内使用的变量,最好用长一点的名字,编辑器的自动补全都可以很好的减少键盘输入。对于一些缩写前缀,尽量选择众所周知的(如str),一个判断标准是,当新成员加入时,是否可以无需他人帮助而明白前缀代表什么。

合理使用_-等符号,比如对私有变量加_前缀。


  1. var x = new DatePicker(); // DatePicker() 是类的"构造"函数,大写开始
  2. var y = pageHeight(); // pageHeight() 是一个普通函数
  3.  
  4. var $all_images = $("img"); // $all_images 是jQuery对象
  5. var height = 250; // height不是
  6.  
  7. //id和class的写法分开
  8. <div id="middle_column" class="main-content"> ...

命名不能有歧义

命名的时候可以先想一下,我要用的这个词是否有别的含义。举个例子:


  1. results = Database.all_objects.filter("year <= 2011")

现在的结果到底是包含2011年之前的呢还是不包含呢?

使用minmax代替limit


  1. CART_TOO_BIG_LIMIT = 10
  2. if shopping_cart.num_items() >= CART_TOO_BIG_LIMIT:
  3. Error("Too many items in cart.")
  4.  
  5. MAX_ITEMS_IN_CART = 10
  6. if shopping_cart.num_items() > MAX_ITEMS_IN_CART:
  7. Error("Too many items in cart.")

对比上例中CART_TOO_BIG_LIMITMAX_ITEMS_IN_CART,想想哪个更好呢?

使用firstlast来表示闭区间


  1. print integer_range(start=2, stop=4)
  2. # Does this print [2,3] or [2,3,4] (or something else)?
  3.  
  4. set.PrintKeys(first="Bart", last="Maggie")

firstlast含义明确,适宜表示闭区间。

使用beiginend表示前闭后开(2,9))区间


  1. PrintEventsInRange("OCT 16 12:00am", "OCT 17 12:00am")
  2.  
  3. PrintEventsInRange("OCT 16 12:00am", "OCT 16 11:59:59.9999pm")

上面一种写法就比下面的舒服多了。

Boolean型变量命名


  1. bool read_password = true;

这是一个很危险的命名,到底是需要读取密码呢,还是密码已经被读取呢,不知道,所以这个变量可以使用user_is_authenticated代替。通常,给Boolean型变量添加ishascanshould可以让含义更清晰,比如:


  1. SpaceLeft() --> hasSpaceLeft()
  2. bool disable_ssl = false --> bool use_ssl = true

符合预期


  1. public class StatisticsCollector {
  2. public void addSample(double x) { ... }
  3. public double getMean() {
  4. // Iterate through all samples and return total / num_samples
  5. }
  6. ...
  7. }

在这个例子中,getMean方法遍历了所有的样本,返回总额,所以并不是普通意义上轻量的get方法,所以应该取名computeMean比较合适。

漂亮的格式

写出来漂亮的格式,充满美感,读起来自然也会舒服很多,对比下面两个例子:


  1. class StatsKeeper {
  2. public:
  3. // A class for keeping track of a series of doubles
  4. void Add(double d); // and methods for quick statistics about them
  5. private: int count; /* how many so far
  6. */ public:
  7. double Average();
  8. private: double minimum;
  9. list<double>
  10. past_items
  11. ;double maximum;
  12. };

什么是充满美感的呢:


  1. // A class for keeping track of a series of doubles
  2. // and methods for quick statistics about them.
  3. class StatsKeeper {
  4. public:
  5. void Add(double d);
  6. double Average();
  7. private:
  8. list<double> past_items;
  9. int count; // how many so far
  10. double minimum;
  11. double maximum;
  12. };

考虑断行的连续性和简洁

这段代码需要断行,来满足不超过一行80个字符的要求,参数也需要注释说明:


  1. public class PerformanceTester {
  2. public static final TcpConnectionSimulator wifi = new TcpConnectionSimulator(
  3. 500, /* Kbps */
  4. 80, /* millisecs latency */
  5. 200, /* jitter */
  6. 1 /* packet loss % */);
  7.  
  8. public static final TcpConnectionSimulator t3_fiber = new TcpConnectionSimulator(
  9. 45000, /* Kbps */
  10. 10, /* millisecs latency */
  11. 0, /* jitter */
  12. 0 /* packet loss % */);
  13.  
  14. public static final TcpConnectionSimulator cell = new TcpConnectionSimulator(
  15. 100, /* Kbps */
  16. 400, /* millisecs latency */
  17. 250, /* jitter */
  18. 5 /* packet loss % */);
  19. }

考虑到代码的连贯性,先优化成这样:


  1. public class PerformanceTester {
  2. public static final TcpConnectionSimulator wifi =
  3. new TcpConnectionSimulator(
  4. 500, /* Kbps */
  5. 80, /* millisecs latency */ 200, /* jitter */
  6. 1 /* packet loss % */);
  7.  
  8. public static final TcpConnectionSimulator t3_fiber =
  9. new TcpConnectionSimulator(
  10. 45000, /* Kbps */
  11. 10, /* millisecs latency */
  12. 0, /* jitter */
  13. 0 /* packet loss % */);
  14.  
  15. public static final TcpConnectionSimulator cell =
  16. new TcpConnectionSimulator(
  17. 100, /* Kbps */
  18. 400, /* millisecs latency */
  19. 250, /* jitter */
  20. 5 /* packet loss % */);
  21. }

连贯性好一点,但还是太罗嗦,额外占用很多空间:


  1. public class PerformanceTester {
  2. // TcpConnectionSimulator(throughput, latency, jitter, packet_loss)
  3. // [Kbps] [ms] [ms] [percent]
  4. public static final TcpConnectionSimulator wifi =
  5. new TcpConnectionSimulator(500, 80, 200, 1);
  6.  
  7. public static final TcpConnectionSimulator t3_fiber =
  8. new TcpConnectionSimulator(45000, 10, 0, 0);
  9.  
  10. public static final TcpConnectionSimulator cell =
  11. new TcpConnectionSimulator(100, 400, 250, 5);
  12. }

用函数封装


  1. // Turn a partial_name like "Doug Adams" into "Mr. Douglas Adams".
  2. // If not possible, 'error' is filled with an explanation.
  3. string ExpandFullName(DatabaseConnection dc, string partial_name, string* error);
  4.  
  5. DatabaseConnection database_connection;
  6. string error;
  7. assert(ExpandFullName(database_connection, "Doug Adams", &error)
  8. == "Mr. Douglas Adams");
  9. assert(error == "");
  10. assert(ExpandFullName(database_connection, " Jake Brown ", &error)
  11. == "Mr. Jacob Brown III");
  12. assert(error == "");
  13. assert(ExpandFullName(database_connection, "No Such Guy", &error) == "");
  14. assert(error == "no match found");
  15. assert(ExpandFullName(database_connection, "John", &error) == "");
  16. assert(error == "more than one result");

上面这段代码看起来很脏乱,很多重复性的东西,可以用函数封装:


  1. CheckFullName("Doug Adams", "Mr. Douglas Adams", "");
  2. CheckFullName(" Jake Brown ", "Mr. Jake Brown III", "");
  3. CheckFullName("No Such Guy", "", "no match found");
  4. CheckFullName("John", "", "more than one result");
  5.  
  6. void CheckFullName(string partial_name,
  7. string expected_full_name,
  8. string expected_error) {
  9. // database_connection is now a class member
  10. string error;
  11. string full_name = ExpandFullName(database_connection, partial_name, &error);
  12. assert(error == expected_error);
  13. assert(full_name == expected_full_name);
  14. }

列对齐

列对齐可以让代码段看起来更舒适:


  1. CheckFullName("Doug Adams" , "Mr. Douglas Adams" , "");
  2. CheckFullName(" Jake Brown ", "Mr. Jake Brown III", "");
  3. CheckFullName("No Such Guy" , "" , "no match found");
  4. CheckFullName("John" , "" , "more than one result");
  5.  
  6. commands[] = {
  7. ...
  8. { "timeout" , NULL , cmd_spec_timeout},
  9. { "timestamping" , &opt.timestamping , cmd_boolean},
  10. { "tries" , &opt.ntry , cmd_number_inf},
  11. { "useproxy" , &opt.use_proxy , cmd_boolean},
  12. { "useragent" , NULL , cmd_spec_useragent},
  13. ...
  14. };

代码用块区分


  1. class FrontendServer {
  2. public:
  3. FrontendServer();
  4. void ViewProfile(HttpRequest* request);
  5. void OpenDatabase(string location, string user);
  6. void SaveProfile(HttpRequest* request);
  7. string ExtractQueryParam(HttpRequest* request, string param);
  8. void ReplyOK(HttpRequest* request, string html);
  9. void FindFriends(HttpRequest* request);
  10. void ReplyNotFound(HttpRequest* request, string error);
  11. void CloseDatabase(string location);
  12. ~FrontendServer();
  13. };

上面这一段虽然能看,不过还有优化空间:


  1. class FrontendServer {
  2. public:
  3. FrontendServer();
  4. ~FrontendServer();
  5. // Handlers
  6. void ViewProfile(HttpRequest* request);
  7. void SaveProfile(HttpRequest* request);
  8. void FindFriends(HttpRequest* request);
  9.  
  10. // Request/Reply Utilities
  11. string ExtractQueryParam(HttpRequest* request, string param);
  12. void ReplyOK(HttpRequest* request, string html);
  13. void ReplyNotFound(HttpRequest* request, string error);
  14.  
  15. // Database Helpers
  16. void OpenDatabase(string location, string user);
  17. void CloseDatabase(string location);
  18. };

再来看一段代码:


  1. # Import the user's email contacts, and match them to users in our system.
  2. # Then display a list of those users that he/she isn't already friends with.
  3. def suggest_new_friends(user, email_password):
  4. friends = user.friends()
  5. friend_emails = set(f.email for f in friends)
  6. contacts = import_contacts(user.email, email_password)
  7. contact_emails = set(c.email for c in contacts)
  8. non_friend_emails = contact_emails - friend_emails
  9. suggested_friends = User.objects.select(email__in=non_friend_emails)
  10. display['user'] = user
  11. display['friends'] = friends
  12. display['suggested_friends'] = suggested_friends
  13. return render("suggested_friends.html", display)

全都混在一起,视觉压力相当大,按功能化块:


  1. def suggest_new_friends(user, email_password):
  2. # Get the user's friends' email addresses.
  3. friends = user.friends()
  4. friend_emails = set(f.email for f in friends)
  5.  
  6. # Import all email addresses from this user's email account.
  7. contacts = import_contacts(user.email, email_password)
  8. contact_emails = set(c.email for c in contacts)
  9.  
  10. # Find matching users that they aren't already friends with.
  11. non_friend_emails = contact_emails - friend_emails
  12. suggested_friends = User.objects.select(email__in=non_friend_emails)
  13.  
  14. # Display these lists on the page. display['user'] = user
  15. display['friends'] = friends
  16. display['suggested_friends'] = suggested_friends
  17.  
  18. return render("suggested_friends.html", display)

让代码看起来更舒服,需要在写的过程中多注意,培养一些好的习惯,尤其当团队合作的时候,代码风格比如大括号的位置并没有对错,但是不遵循团队规范那就是错的。

如何写注释

当你写代码的时候,你会思考很多,但是最终呈现给读者的就只剩代码本身了,额外的信息丢失了,所以注释的目的就是让读者了解更多的信息。

应该注释什么

不应该注释什么

这样的注释毫无价值:


  1. // The class definition for Account
  2. class Account {
  3. public:
  4. // Constructor
  5. Account();
  6. // Set the profit member to a new value
  7. void SetProfit(double profit);
  8. // Return the profit from this Account
  9. double GetProfit();
  10. };

不要像下面这样为了注释而注释:


  1. // Find a Node with the given 'name' or return NULL.
  2. // If depth <= 0, only 'subtree' is inspected.
  3. // If depth == N, only 'subtree' and N levels below are inspected.
  4. Node* FindNodeInSubtree(Node* subtree, string name, int depth);

不要给烂取名注释


  1. // Enforce limits on the Reply as stated in the Request,
  2. // such as the number of items returned, or total byte size, etc.
  3. void CleanReply(Request request, Reply reply);

注释的大部分都在解释clean是什么意思,那不如换个正确的名字:


  1. // Make sure 'reply' meets the count/byte/etc. limits from the 'request'
  2. void EnforceLimitsFromRequest(Request request, Reply reply);

记录你的想法

我们讨论了不该注释什么,那么应该注释什么呢?注释应该记录你思考代码怎么写的结果,比如像下面这些:


  1. // Surprisingly, a binary tree was 40% faster than a hash table for this data.
  2. // The cost of computing a hash was more than the left/right comparisons.
  3.  
  4. // This heuristic might miss a few words. That's OK; solving this 100% is hard.
  5.  
  6. // This class is getting messy. Maybe we should create a 'ResourceNode' subclass to
  7. // help organize things.

也可以用来记录流程和常量:


  1. // TODO: use a faster algorithm
  2. // TODO(dustin): handle other image formats besides JPEG
  3.  
  4. NUM_THREADS = 8 # as long as it's >= 2 * num_processors, that's good enough.
  5.  
  6. // Impose a reasonable limit - no human can read that much anyway.
  7. const int MAX_RSS_SUBSCRIPTIONS = 1000;

可用的词有:


  1. TODO : Stuff I haven't gotten around to yet
  2. FIXME : Known-broken code here
  3. HACK : Adimittedly inelegant solution to a problem
  4. XXX : Danger! Major problem here

站在读者的角度去思考

当别人读你的代码时,让他们产生疑问的部分,就是你应该注释的地方。


  1. struct Recorder {
  2. vector<float> data;
  3. ...
  4. void Clear() {
  5. vector<float>().swap(data); // Huh? Why not just data.clear()?
  6. }
  7. };

很多C++的程序员啊看到这里,可能会想为什么不用data.clear()来代替vector.swap,所以那个地方应该加上注释:


  1. // Force vector to relinquish its memory (look up "STL swap trick")
  2. vector<float>().swap(data);

说明可能陷阱

你在写代码的过程中,可能用到一些hack,或者有其他需要读代码的人知道的陷阱,这时候就应该注释:


  1. void SendEmail(string to, string subject, string body);

而实际上这个发送邮件的函数是调用别的服务,有超时设置,所以需要注释:


  1. // Calls an external service to deliver email. (Times out after 1 minute.)
  2. void SendEmail(string to, string subject, string body);

全景的注释

有时候为了更清楚说明,需要给整个文件加注释,让读者有个总体的概念:


  1. // This file contains helper functions that provide a more convenient interface to our
  2. // file system. It handles file permissions and other nitty-gritty details.

总结性的注释

即使是在函数内部,也可以有类似文件注释那样的说明注释:


  1. # Find all the items that customers purchased for themselves.
  2. for customer_id in all_customers:
  3. for sale in all_sales[customer_id].sales:
  4. if sale.recipient == customer_id:
  5. ...

或者按照函数的步进,写一些注释:


  1. def GenerateUserReport():
  2. # Acquire a lock for this user
  3. ...
  4. # Read user's info from the database
  5. ...
  6. # Write info to a file
  7. ...
  8. # Release the lock for this user

很多人不愿意写注释,确实,要写好注释也不是一件简单的事情,也可以在文件专门的地方,留个写注释的区域,可以写下你任何想说的东西。

注释应简明准确

前一个小节讨论了注释应该写什么,这一节来讨论应该怎么写,因为注释很重要,所以要写的精确,注释也占据屏幕空间,所以要简洁。

精简注释


  1. // The int is the CategoryType.
  2. // The first float in the inner pair is the 'score',
  3. // the second is the 'weight'.
  4. typedef hash_map<int, pair<float, float> > ScoreMap;

这样写太罗嗦了,尽量精简压缩成这样:


  1. // CategoryType -> (score, weight)
  2. typedef hash_map<int, pair<float, float> > ScoreMap;

避免有歧义的代词


  1. // Insert the data into the cache, but check if it's too big first.

这里的it's有歧义,不知道所指的是data还是cache,改成如下:


  1. // Insert the data into the cache, but check if the data is too big first.

还有更好的解决办法,这里的it就有明确所指:


  1. // If the data is small enough, insert it into the cache.

语句要精简准确


  1. # Depending on whether we've already crawled this URL before, give it a different priority.

这句话理解起来太费劲,改成如下就好理解很多:


  1. # Give higher priority to URLs we've never crawled before.

精确描述函数的目的


  1. // Return the number of lines in this file.
  2. int CountLines(string filename) { ... }

这样的一个函数,用起来可能会一头雾水,因为他可以有很多歧义:

  • ”” 一个空文件,是0行还是1行?
  • “hello” 只有一行,那么返回值是0还是1?
  • “hello\n” 这种情况返回1还是2?
  • “hello\n world” 返回1还是2?
  • “hello\n\r cruel\n world\r” 返回2、3、4哪一个呢?

所以注释应该这样写:


  1. // Count how many newline bytes ('\n') are in the file.
  2. int CountLines(string filename) { ... }

用实例说明边界情况


  1. // Rearrange 'v' so that elements < pivot come before those >= pivot;
  2. // Then return the largest 'i' for which v[i] < pivot (or -1 if none are < pivot)
  3. int Partition(vector<int>* v, int pivot);

这个描述很精确,但是如果再加入一个例子,就更好了:


  1. // ...
  2. // Example: Partition([8 5 9 8 2], 8) might result in [5 2 | 8 9 8] and return 1
  3. int Partition(vector<int>* v, int pivot);

说明你的代码的真正目的


  1. void DisplayProducts(list<Product> products) {
  2. products.sort(CompareProductByPrice);
  3. // Iterate through the list in reverse order
  4. for (list<Product>::reverse_iterator it = products.rbegin(); it != products.rend();
  5. ++it)
  6. DisplayPrice(it->price);
  7. ...
  8. }

这里的注释说明了倒序排列,单还不够准确,应该改成这样:


  1. // Display each price, from highest to lowest
  2. for (list<Product>::reverse_iterator it = products.rbegin(); ... )

函数调用时的注释

看见这样的一个函数调用,肯定会一头雾水:


  1. Connect(10, false);

如果加上这样的注释,读起来就清楚多了:


  1. def Connect(timeout, use_encryption): ...
  2.  
  3. # Call the function using named parameters
  4. Connect(timeout = 10, use_encryption = False)

使用信息含量丰富的词


  1. // This class contains a number of members that store the same information as in the
  2. // database, but are stored here for speed. When this class is read from later, those
  3. // members are checked first to see if they exist, and if so are returned; otherwise the
  4. // database is read from and that data stored in those fields for next time.

上面这一大段注释,解释的很清楚,如果换一个词来代替,也不会有什么疑惑:


  1. // This class acts as a caching layer to the database.
   

简化循环和逻辑

流程控制要简单

让条件语句、循环以及其他控制流程的代码尽可能自然,让读者在阅读过程中不需要停顿思考或者在回头查找,是这一节的目的。

条件语句中参数的位置

对比下面两种条件的写法:


  1. if (length >= 10)
  2. while (bytes_received < bytes_expected)
  3.  
  4. if (10 <= length)
  5. while (bytes_expected > bytes_received)

到底是应该按照大于小于的顺序来呢,还是有其他的准则?是的,应该按照参数的意义来

  • 运算符左边:通常是需要被检查的变量,也就是会经常变化的
  • 运算符右边:通常是被比对的样本,一定程度上的常量

这就解释了为什么bytes_received < bytes_expected比反过来更好理解。

if/else的顺序

通常,if/else的顺序你可以自由选择,下面这两种都可以:


  1. if (a == b) {
  2. // Case One ...<

    最后更新:2017-04-03 21:30:11

      上一篇:go 50行Python代码制作一个计算器
      下一篇:go 飞信还能飞起来吗?