Descendant Selectors - 深入解密 CSS 子選擇器的神奇力量

CSS 中，descendant selectors（子選擇器）是一種強大的選擇器類型，允許你選擇包含某個特定元素的子元素，以及這些子元素的子元素，依此類推。使用子選擇器，你可以創造復雜的樣式并非常清晰地定義你想要為哪些元素應用樣式。在這篇文章中，我們將深入探討子選擇器的優勢和巧妙應用方法。

優勢：

精確選擇：子選擇器可以幫助你很好地控制哪些元素應該應用樣式，因為它們只選擇包含特定元素的子元素。
更復雜的樣式：子選擇器能夠創建更復雜的樣式，因為它們可以直接訪問嵌套的元素。
與其他選擇器結合：子選擇器可以與其他 CSS 選擇器（如類選擇器、ID 選擇器和屬性選擇器）結合，為你的網站或應用程序提供更詳細的選擇和stenza 之間的樣式結構。

巧妙用法：

通過嵌套基本選擇器來減少重復的代碼：通過將子選擇器嵌套在另一個選擇器中，你可以用更少的代碼來選擇更復雜的元素，從而優化你的 CSS 代碼。
創建條件樣式：子選擇器可以用來創建條件樣式，如特定的列表項或表格單元格，從而為頁面提供更多的可定制性。
創建建 Juliü模式：子選擇器可以用來創建建 Juliü模式，例如用于 create nested lists，或用于 margining child elements inside parent elements.

掌握子選擇器編寫 CSS 時的關鍵點包括：

對深度理解：子選擇器可能會導致深度問題，所以你需要一個清晰的理解你的 HTML 結構。
避免累積同級選擇：子選擇器可能會導致同級選擇問題，這可能會導致樣式擴展到你不希望它們 extends。
擴展性和維護性：為了保持 CSS 的擴展性和維護性，請使用 class selectors 來避免大量的占位符選擇器。

要：整個在瀏覽器的渲染過程中（頁面初始化，用戶行為改變界面樣式，動畫改變界面樣式等）reflow(回流)和repaint(重繪) 會大大影響web性能，尤其是手機頁面。因此我們在頁面設計的時候要盡量減少reflow和repaint。

什么是reflow和repaint（原文鏈接：http://www.cnblogs.com/Peng2014/p/4687218.html）

reflow：例如某個子元素樣式發生改變，直接影響到了其父元素以及往上追溯很多祖先元素（包括兄弟元素），這個時候瀏覽器要重新去渲染這個子元素相關聯的所有元素的過程稱為回流。

reflow：幾乎是無法避免的。現在界面上流行的一些效果，比如樹狀目錄的折疊、展開（實質上是元素的顯示與隱藏）等，都將引起瀏覽器的 reflow。鼠標滑過、點擊……只要這些行為引起了頁面上某些元素的占位面積、定位方式、邊距等屬性的變化，都會引起它內部、周圍甚至整個頁面的重新渲染。通常我們都無法預估瀏覽器到底會 reflow 哪一部分的代碼，它們都彼此相互影響著。

repaint：如果只是改變某個元素的背景色、文字顏色、邊框顏色等等不影響它周圍或內部布局的屬性，將只會引起瀏覽器 repaint（重繪）。repaint 的速度明顯快于 reflow

下面情況會導致reflow發生

1：改變窗口大小

2：改變文字大小

3：內容的改變，如用戶在輸入框中敲字

4：激活偽類，如:hover

5：操作class屬性

6：腳本操作DOM

7：計算offsetWidth和offsetHeight

8：設置style屬性

那么為了減少回流要注意哪些方式呢？

1：不要通過父級來改變子元素樣式，最好直接改變子元素樣式，改變子元素樣式盡可能不要影響父元素和兄弟元素的大小和尺寸

2：盡量通過class來設計元素樣式，切忌用style

var bstyle = document.body.style; // cache
 
bstyle.padding = "20px"; // reflow, repaint
bstyle.border = "10px solid red"; //  再一次的 reflow 和 repaint
 
bstyle.color = "blue"; // repaint
bstyle.backgroundColor = "#fad"; // repaint
 
bstyle.fontSize = "2em"; // reflow, repaint
 
// new DOM element - reflow, repaint
document.body.appendChild(document.createTextNode('dude!'));

對上面代碼優化：

.b-class{
　　padding:20px;
　　color:blue;
　　border:10px solid red;
　　background-color:#fad;
　　font-size:2em;
}
$div.addClass("b-class");

3：實現元素的動畫，對于經常要進行回流的組件，要抽離出來，它的position屬性應當設為fixed或absolute

4：權衡速度的平滑。比如實現一個動畫，以1個像素為單位移動這樣最平滑，但reflow就會過于頻繁，CPU很快就會被完全占用。如果以3個像素為單位移動就會好很多。

5：不要用tables布局的另一個原因就是tables中某個元素一旦觸發reflow就會導致table里所有的其它元素reflow。在適合用table的場合，可以設置table-layout為auto或fixed，

6：這樣可以讓table一行一行的渲染，這種做法也是為了限制reflow的影響范圍。

7：css里不要有表達式expression

8：減少不必要的 DOM 層級（DOM depth）。改變 DOM 樹中的一級會導致所有層級的改變，上至根部，下至被改變節點的子節點。這導致大量時間耗費在執行 reflow 上面。

9：避免不必要的復雜的 CSS 選擇器，尤其是后代選擇器（descendant selectors），因為為了匹配選擇器將耗費更多的 CPU。

10: 盡量不要過多的頻繁的去增加，修改，刪除元素，因為這可能會頻繁的導致頁面reflow，可以先把該dom節點抽離到內存中進行復雜的操作然后再display到頁面上。

在div.first里面加入div.second,在div.second里面加入div.third:

$divS = $("<div class='second'></div>");
$(div.first).append($divS));//reflow
$divT = $("<div class='third'></div>");
$divS.append($divT);//reflow

優化代碼：

$divS = $("<div class='second'></div>");
$divT = $("<div class='third'></div>");
$divS.append($divT);
$(div.first).append($divS));//reflow

或者：

var $divF = $(div.first);
$divS = $("<div class='second'></div>");
$divS.hide();
$(div.first).append($divS));
$divT = $("<div class='third'></div>");
$divS.append($divT);
$divS.show();//reflow

11：請求如下值offsetTop, offsetLeft, offsetWidth, offsetHeight，scrollTop/Left/Width/Height，clientTop/Left/Width/Height，瀏覽器會發生reflow，建議將他們合并到一起操作，可以減少回流的次數。

如果我們要經常去獲取和操作這些值，則可以先將這些值緩存起來例如：

var windowHeight = window.innerHeight;//reflow
for(i=0;i<10;i++){
　　$body.height(windowHeight++);
　　一系列關于windowHeight的操作.......
}

工作需要，以此用來記錄學習。

. 從一個線上問題說起

最近在線上遇到了一些[HMDConfigManager remoteConfigWithAppID:]卡死

1.1 初步分析

觀察了下主線程堆棧，用到的鎖是讀寫鎖：

隨后又去翻了下持有著鎖的子線程，有各種各樣的情況，且基本都處于正常的執行狀態，例如有的處于打開文件狀態，有的處于read狀態，有的正在執行NSUserDefaults的方法···

通過觀察發現，出問題的線程都有QOS:BACKGROUND標記。整體看起來持有鎖的子線程仍然在執行，只是留給主線程的時間不夠了。為什么這些子線程在持有鎖的情況下，需要執行這么久，直到主線程的 8s 卡死？一種情況就是真的如此耗時，另一種則是出現了優先級反轉。

1.2 解決辦法

這個案例里，持有讀寫鎖且優先級低的線程遲遲得不到調度（又或者得到調度的時候又被搶占了，或者得到調度的時候時間已然不夠了）而具有高優先級的線程由于拿不到讀寫鎖，一直被阻塞，所以互相死鎖。iOS8之后引入了QualityOfService的概念，類似于線程的優先級，設置不同的QualityOfService的值后系統會分配不同的CPU時間、網絡資源和硬盤資源等，因此我們可以通過這個設置隊列的優先級。

1.2.1方案一：去除對 NSOperationQueue 的優先級設置

在 Threading Programming Guide 文檔中，蘋果給出了提示：

Important: It is generally a good idea to leave the priorities of your threads at their default values. Increasing the priorities of some threads also increases the likelihood of starvation among lower-priority threads. If your application contains high-priority and low-priority threads that must interact with each other, the starvation of lower-priority threads may block other threads and create performance bottlenecks.

蘋果的建議是不要隨意修改線程的優先級，尤其是這些高低優先級線程之間存在臨界資源競爭的情況。所以刪除相關優先級設置代碼即可解決問題。

1.2.2 方案二：臨時修改線程優先級

在 pthread_rwlock_rdlock(3pthread) 發現了如下提示：

Realtime applications may encounter priority inversion when using read-write locks. The problem occurs when a high priority thread "locks" a read-write lock that is about to be "unlocked" by a low priority thread, but the low priority thread is preempted by a medium priority thread. This scenario leads to priority inversion; a high priority thread is blocked by lower priority threads for an unlimited period of time. During system design, realtime programmers must take into account the possibility of this kind of priority inversion. They can deal with it in a number of ways, such as by having critical sections that are guarded by read-write locks execute at a high priority, so that a thread cannot be preempted while executing in its critical section.

盡管針對的是實時系統，但是還是有一些啟示和幫助。按照提示，對有問題的代碼進行了修改：在線程通過 pthread_rwlock_wrlock 拿到 _rwlock 的時候，臨時提升其優先級，在釋放 _rwlock 之后，恢復其原先的優先級。

- (id)remoteConfigWithAppID:(NSString *)appID
{
    .......
    pthread_rwlock_rdlock(&_rwlock);
    HMDHeimdallrConfig *result = ....... // get existing config
    pthread_rwlock_unlock(&_rwlock);
    
    if(result == nil) {
        result = [[HMDHeimdallrConfig alloc] init]; // make a new config
        pthread_rwlock_wrlock(&_rwlock);
        
        qos_class_t oldQos = qos_class_self();
        BOOL needRecover = NO;
        
        // 臨時提升線程優先級
        if (_enablePriorityInversionProtection && oldQos < QOS_CLASS_USER_INTERACTIVE) {
            int ret = pthread_set_qos_class_self_np(QOS_CLASS_USER_INTERACTIVE, 0);
            needRecover = (ret == 0);
        }
            
        ......

        pthread_rwlock_unlock(&_rwlock);
        
        // 恢復線程優先級
        if (_enablePriorityInversionProtection && needRecover) {
            pthread_set_qos_class_self_np(oldQos, 0);
        }
    }
    
    return result;
}

值得注意的是，這里只能使用pthread的api，NSThread提供的API是不可行的

1.3 Demo 驗證

為了驗證上述的手動調整線程優先級是否有一定的效果，這里通過demo進行本地實驗：定義了2000個operation（目的是為了CPU繁忙），優先級設置NSQualityOfServiceUserInitiated，且對其中可以被100整除的operation的優先級調整為NSQualityOfServiceBackground，在每個operation執行相同的耗時任務，然后對這被選中的10個operation進行耗時統計。

for (int j = 0; j < 2000; ++j) {
    NSOperationQueue *operation = [[NSOperationQueue alloc] init];
    operation.maxConcurrentOperationCount = 1;
    operation.qualityOfService = NSQualityOfServiceUserInitiated;
    
    // 模塊1
    // if (j % 100 == 0) {
    //    operation.qualityOfService = NSQualityOfServiceBackground;
    // }
    // 模塊1
    
    [operation addOperationWithBlock:^{
        // 模塊2
        // qos_class_t oldQos = qos_class_self();
        // pthread_set_qos_class_self_np(QOS_CLASS_USER_INITIATED, 0);
        // 模塊2
        
        NSTimeInterval start = CFAbsoluteTimeGetCurrent();
        double sum = 0;
        for (int i = 0; i < 100000; ++i) {
            sum += sin(i) + cos(i) + sin(i*2) + cos(i*2);
        }
        start = CFAbsoluteTimeGetCurrent() - start;
        if (j % 100 == 0) {
            printf("%.8f\n", start * 1000);
        }
        
        // 模塊2
        // pthread_set_qos_class_self_np(oldQos, 0);
        // 模塊2
    }];
}

統計信息如下表所示：

可以看到：

正常情況下，每個任務的平均耗時為：11.8190561；
當operation被設置為低優先級時，其耗時大幅度提升為：94.70210189；
當operation被設置為低優先級時，又在Block中手動恢復其原有的優先級，其耗時已經大幅度降低：15.04005137（耗時比正常情況高，大家可以思考下為什么）

通過Demo可以發現，通過手動調整其優先級，低優先級任務的整體耗時得到大幅度的降低，這樣在持有鎖的情況下，可以減少對主線程的阻塞時間。

1.4 上線效果

該問題的驗證過程分為2個階段：

第一個階段如第1個紅框所示，從3月6號開始在版本19.7上有較大幅度的下降，主要原因：堆棧中被等待的隊列信息由QOS:BACKGROUND變為了com.apple.root.default-qos，隊列的優先級從QOS_CLASS_BACKGROUND提升為QOS_CLASS_DEFAULT，相當于實施了方案一，使用了默認優先級。
第二個階段如第2個紅框所示，從4月24號在版本20.3上開始驗證。目前看起來效果暫時不明顯，推測一個主要原因是：demo中是把優先級從QOS_CLASS_BACKGROUND提升為QOS_CLASS_USER_INITIATED，而線上相當于把隊列的優先級從默認的優先級QOS_CLASS_DEFAULT提升為QOS_CLASS_USER_INITIATED

a. QOS_CLASS_BACKGROUND的Mach層級優先級數是4；

b. QOS_CLASS_DEFAULT的Mach層級優先級數是31；

c. QOS_CLASS_USER_INITIATED的Mach層級優先級數是37。

所以相對來說，線上的提升相對有限。

2. 深刻理解優先級反轉

那么是否所有鎖都需要像上文一樣，手動提升持有鎖的線程優先級？系統是否會自動調整線程的優先級？如果有這樣的機制，是否可以覆蓋所有的鎖？要理解這些問題，需要深刻認識優先級反轉。

2.1 什么是優先級反轉？

優先級反轉，是指某同步資源被較低優先級的進程/線程所擁有，較高優先級的進程/線程競爭該同步資源未獲得該資源，而使得較高優先級進程/線程反而推遲被調度執行的現象。根據阻塞類型的不同，優先級反轉又被分為Bounded priority inversion和Unbounded priority inversion。

這里借助 Introduction to RTOS - Solution to Part 11 的圖進行示意。

2.1.1 Bounded priority inversion

如圖所示，高優先級任務（Task H）被持有鎖的低優先級任務（Task L)阻塞，由于阻塞的時間取決于低優先級任務在臨界區的時間（持有鎖的時間），所以被稱為bounded priority inversion。只要Task L一直持有鎖，Task H就會一直被阻塞，低優先級的任務運行在高優先級任務的前面，優先級被反轉。

這里的任務也可以理解為線程

2.1.2 Unbounded priority inversion

在Task L持有鎖的情況下，如果有一個中間優先級的任務（Task M）打斷了Task L，前面的bounded就會變為unbounded，因為Task M只要搶占了Task L的CPU，就可能會阻塞Task H任意多的時間（Task M可能不止1個）。

2.2 優先級反轉常規解決思路

目前解決Unbounded priority inversion有2種方法：一種被稱作優先權極限（priority ceiling protocol），另一種被稱作優先級繼承（priority inheritance）。

2.2.1 Priority ceiling protocol

在優先權極限方案中，系統把每一個臨界資源與 1 個極限優先權相關聯。當1個任務進入臨界區時，系統便把這個極限優先權傳遞給這個任務，使得這個任務的優先權最高；當這個任務退出臨界區后，系統立即把它的優先權恢復正常，從而保證系統不會出現優先權反轉的情況。該極限優先權的值是由所有需要該臨界資源的任務的最大優先級來決定的。

如圖所示，鎖的極限優先權是 3。當Task L持有鎖的時候，它的優先級將會被提升到3，和Task H一樣的優先級。這樣就可以阻止Task M(優先級是2）的運行，直到Task L和Task H不再需要該鎖。

2.2.2 Priority inheritance

在優先級繼承方案中，大致原理是：高優先級任務在嘗試獲取鎖的時候，如果該鎖正好被低優先級任務持有，此時會臨時把高優先級線程的優先級轉移給擁有鎖的低優先級線程，使低優先級線程能更快的執行并釋放同步資源，釋放同步資源后再恢復其原來的優先級。

priority ceiling protocol和priority inheritance都會在釋放鎖的時候，恢復低優先級任務的優先級。同時要注意，以上2種方法只能阻止Unbounded priority inversion，而無法阻止Bounded priority inversion（Task H必須等待Task L執行完畢才能執行，這個反轉是無法避免的）。

可以通過以下幾種發生來避免或者轉移Bounded priority inversion：

減少臨界區的執行時間，減少Bounded priority inversion的反轉耗時；
避免使用會阻塞高優先級任務的臨界區資源；
專門使用一個隊列來管理資源，避免使用鎖。

優先級繼承必須是可傳遞的。舉個栗子：當T1阻塞在被T2持有的資源上，而T2又阻塞在T3持有的一個資源上。如果T1的優先級高于T2和T3的優先級，T3必須通過T2繼承T1的優先級。否則，如果另外一個優先級高于T2和T3，小于T1的線程T4，將搶占T3，引發相對于T1的優先級反轉。因此，線程所繼承的優先級必須是直接或者間接阻塞的線程的最高優先級。

3. 如何避免優先級反轉？

3.1 QoS 傳遞

iOS 系統主要使用以下兩種機制來在不同線程（或 queue）間傳遞 QoS：

機制 1：dispatch_async

dispatch_async() automatically propagates the QoS from the calling thread, though it will translate User Interactive to User Initiated to avoid assigning that priority to non-main threads.
Capturedattimeofblocksubmission,translateuserinteractivetouserinitiated.UsedifdestinationqueuedoesnothaveaQoSanddoesnotlowertheQoS(exdispatch_asyncbacktothemainthread).

機制 2：基于 XPC 的進程間通信（IPC）

系統的 QoS 傳遞規則比較復雜，主要參考以下信息：

當前線程的 QoS
如果是使用 dispatch_block_create() 方法生成的 dispatch_block，則考慮生成 block時所調用的參數
dispatch_async 或 IPC 的目標 queue 或線程的 QoS

調度程序會根據這些信息決定 block 以什么優先級運行。

如果沒有其他線程同步地等待此 block，則 block就按上面所說的優先級來運行。
如果出現了線程間同步等待的情況，則調度程序會根據情況調整線程的運行優先級。

3.2 如何觸發優先級反轉避免機制？

如果當前線程因等待某線程（線程 1）上正在進行的操作（如 block1）而受阻，而系統知道 block1 所在的目標線程（owner），系統會通過提高相關線程的優先級來解決優先級反轉的問題。反之如果系統不知道 block1 所在目標線程，則無法知道應該提高誰的優先級，也就無法解決反轉問題；

記錄了持有者信息（owner）的系統 API 如下：

pthread mutex、os_unfair_lock、以及基于這二者實現的上層APIa. dispatch_once 的實現是基于 os_unfair_lock的b. NSLock、NSRecursiveLock、@synchronized 等的實現是基于 pthreadmutex
dispatch_sync、dispatch_wait
xpc_connection_send_with_message_sync

使用以上這些 API 能夠在發生優先級反轉時使系統啟用優先級反轉避免機制。

3.3 基礎 API 驗證

接下來對前文提到的各種「基礎系統API」進行驗證

測試驗證環境：模擬器 iOS15.2

3.3.1 pthread mutex

pthread mutex的數據結構pthread_mutex_s其中有一個m_tid字段，專門來記錄持有該鎖的線程Id。

// types_internal.h
struct pthread_mutex_s {
        long sig;
        _pthread_lock lock;
        union {
                uint32_t value;
                struct pthread_mutex_options_s options;
        } mtxopts;
        int16_t prioceiling;
        int16_t priority;
#if defined(__LP64__)
        uint32_t _pad;
#endif
        union {
                struct {
                        uint32_t m_tid[2]; // thread id of thread that has mutex locked
                        uint32_t m_seq[2]; // mutex sequence id
                        uint32_t m_mis[2]; // for misaligned locks m_tid/m_seq will span into here
                } psynch;
                struct _pthread_mutex_ulock_s ulock;
        };
#if defined(__LP64__)
        uint32_t _reserved[4];
#else
        uint32_t _reserved[1];
#endif
};

代碼來驗證一下：線程優先級是否會被提升？

// printThreadPriority用來打印線程的優先級信息
void printThreadPriority() {
  thread_t cur_thread = mach_thread_self();
  mach_port_deallocate(mach_task_self(), cur_thread);
  mach_msg_type_number_t thread_info_count = THREAD_INFO_MAX;
  thread_info_data_t thinfo;
  kern_return_t kr = thread_info(cur_thread, THREAD_EXTENDED_INFO, (thread_info_t)thinfo, &thread_info_count);
  if (kr != KERN_SUCCESS) {
    return;
  }
  thread_extended_info_t extend_info = (thread_extended_info_t)thinfo;
  printf("pth_priority: %d, pth_curpri: %d, pth_maxpriority: %d\n", extend_info->pth_priority, extend_info->pth_curpri, extend_info->pth_maxpriority);
}

先在子線程上鎖并休眠，然后主線程請求該鎖。

dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
  printf("begin : \n");
  printThreadPriority();
  printf("queue before lock \n");
  pthread_mutex_lock(&_lock); //確保 backgroundQueue 先得到鎖
  printf("queue lock \n");
  printThreadPriority();
  dispatch_async(dispatch_get_main_queue(), ^{
    printf("before main lock\n");
    pthread_mutex_lock(&_lock);
    printf("in main lock\n");
    pthread_mutex_unlock(&_lock);
    printf("after main unlock\n");
  });
  sleep(10);
  printThreadPriority();
  printf("queue unlock\n");
  pthread_mutex_unlock(&_lock);
  printf("queue after unlock\n");
});

begin : 
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue before lock 
queue lock 
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
before main lock
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
queue unlock
in main lock
after main unlock
queue after unlock

可以看到，低優先級子線程先持有鎖，當時的優先級為4，而該鎖被主線程請求的時候，子線程的優先級被提升為47

3.3.2 os_unfair_lock

os_unfair_lock用來替換OSSpinLock，解決優先級反轉問題。等待os_unfair_lock鎖的線程會處于休眠狀態，從用戶態切換到內核態，而并非忙等。os_unfair_lock將線程ID保存到了鎖的內部，鎖的等待者會把自己的優先級讓出來，從而避免優先級反轉。驗證一下：

  dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
    printf("begin : \n");
    printThreadPriority();
    printf("queue before lock \n");
    os_unfair_lock_lock(&_unfair_lock); //確保 backgroundQueue 先得到鎖
    printf("queue lock \n");
    printThreadPriority();
    dispatch_async(dispatch_get_main_queue(), ^{
      printf("before main lock\n");
      os_unfair_lock_lock(&_unfair_lock);
      printf("in main lock\n");
      os_unfair_lock_unlock(&_unfair_lock);
      printf("after main unlock\n");
    });
    sleep(10);
    printThreadPriority();
    printf("queue unlock\n");
    os_unfair_lock_unlock(&_unfair_lock);
    printf("queue after unlock\n");
  });

begin : 
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue before lock 
queue lock 
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
before main lock
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
queue unlock
in main lock
after main unlock
queue after unlock

結果和pthread mutex一致。

3.3.3 pthread_rwlock_t

在 pthread_rwlock_init 有如下提示：

Caveats: Beware of priority inversion when using read-write locks. A high-priority thread may be blocked waiting on a read-write lock locked by a low-priority thread. The microkernel has no knowledge of read-write locks, and therefore can't boost the low-priority thread to prevent the priority inversion.

大意是內核不感知讀寫鎖，無法提升低優先級線程的優先級，從而無法避免優先級反轉。通過查詢定義發現：pthread_rwlock_s包含了字段rw_tid，專門來記錄持有寫鎖的線程，這不由令人好奇：為什么pthread_rwlock_s有owner信息卻仍然無法避免優先級反轉？

struct pthread_rwlock_s {
        long sig;
        _pthread_lock lock;
        uint32_t
                unused:29,
                misalign:1,
                pshared:2;
        uint32_t rw_flags;
#if defined(__LP64__)
        uint32_t _pad;
#endif
        uint32_t rw_tid[2]; // thread id of thread that has exclusive (write) lock
        uint32_t rw_seq[4]; // rw sequence id (at 128-bit aligned boundary)
        uint32_t rw_mis[4]; // for misaligned locks rw_seq will span into here
#if defined(__LP64__)
        uint32_t _reserved[34];
#else
        uint32_t _reserved[18];
#endif
};

https://news.ycombinator.com/item?id=21751269 鏈接中提到：

xnu supports priority inheritance through "turnstiles", a kernel-internal mechanism which is used by default by a number of locking primitives (list at [1]), including normal pthread mutexes (though not read-write locks [2]), as well as the os_unfair_lock API (via the ulock syscalls). With pthread mutexes, you can actually explicitly request priority inheritance by calling pthread_mutexattr_setprotocol [3] with PTHREAD_PRIO_INHERIT; the Apple implementation supports it, but currently ignores the protocol setting and just gives all mutexes priority inheritance.

大意是：XNU使用 turnstiles 內核機制進行優先級繼承，這種機制被應用在 pthread mutex 和 os_unfair_lock 上。

順藤摸瓜，在ksyn_wait方法中找到了_kwq_use_turnstile的調用，其中的注釋對讀寫鎖解釋的比較委婉，添加了 at least sometimes

pthread mutexes and rwlocks both (at least sometimes) know their owner and can use turnstiles. Otherwise, we pass NULL as the tstore to the shims so they wait on the global waitq.

// libpthread/kern/kern_synch.c
int
ksyn_wait(ksyn_wait_queue_t kwq, kwq_queue_type_t kqi, uint32_t lockseq,
                int fit, uint64_t abstime, uint16_t kwe_flags,
                thread_continue_t continuation, block_hint_t block_hint)
{
        thread_t th = current_thread();
        uthread_t uth = pthread_kern->get_bsdthread_info(th);
        struct turnstile **tstore = NULL;
        int res;

        assert(continuation != THREAD_CONTINUE_NULL);

        ksyn_waitq_element_t kwe = pthread_kern->uthread_get_uukwe(uth);
        bzero(kwe, sizeof(*kwe));
        kwe->kwe_count = 1;
        kwe->kwe_lockseq = lockseq & PTHRW_COUNT_MASK;
        kwe->kwe_state = KWE_THREAD_INWAIT;
        kwe->kwe_uth = uth;
        kwe->kwe_thread = th;
        kwe->kwe_flags = kwe_flags;

        res = ksyn_queue_insert(kwq, kqi, kwe, lockseq, fit);
        if (res != 0) {
                //panic("psynch_rw_wrlock: failed to enqueue\n"); // XXX                ksyn_wqunlock(kwq);
                return res;
        }

        PTHREAD_TRACE(psynch_mutex_kwqwait, kwq->kw_addr, kwq->kw_inqueue,
                        kwq->kw_prepost.count, kwq->kw_intr.count);

        if (_kwq_use_turnstile(kwq)) {
                 // pthread mutexes and rwlocks both (at least sometimes) know their                
                 // owner and can use turnstiles. Otherwise, we pass NULL as the                
                 // tstore to the shims so they wait on the global waitq.                 
                tstore = &kwq->kw_turnstile;
        }
        ......
}

再去查看_kwq_use_turnstile的定義，代碼還是很誠實的，只有在KSYN_WQTYPE_MTX才會啟用turnstile進行優先級反轉保護，而讀寫鎖的類型為KSYN_WQTYPE_RWLOCK，這說明讀寫鎖不會使用_kwq_use_turnstile，所以無法避免優先級反轉。

#define KSYN_WQTYPE_MTX         0x01
#define KSYN_WQTYPE_CVAR        0x02
#define KSYN_WQTYPE_RWLOCK      0x04
#define KSYN_WQTYPE_SEMA        0x08

static inline bool
_kwq_use_turnstile(ksyn_wait_queue_t kwq)
{
        // If we had writer-owner information from the
        // rwlock then we could use the turnstile to push on it. For now, only
        // plain mutexes use it.
        return (_kwq_type(kwq) == KSYN_WQTYPE_MTX);
}

另外在_pthread_find_owner也可以看到，讀寫鎖的owner是0

void
_pthread_find_owner(thread_t thread,
                struct stackshot_thread_waitinfo * waitinfo)
{
        ksyn_wait_queue_t kwq = _pthread_get_thread_kwq(thread);
        switch (waitinfo->wait_type) {
                case kThreadWaitPThreadMutex:
                        assert((kwq->kw_type & KSYN_WQTYPE_MASK) == KSYN_WQTYPE_MTX);
                        waitinfo->owner  = thread_tid(kwq->kw_owner);
                        waitinfo->context = kwq->kw_addr;
                        break;
                /* Owner of rwlock not stored in kernel space due to races. Punt
                 * and hope that the userspace address is helpful enough. */
                case kThreadWaitPThreadRWLockRead:
                case kThreadWaitPThreadRWLockWrite:
                        assert((kwq->kw_type & KSYN_WQTYPE_MASK) == KSYN_WQTYPE_RWLOCK);
                        waitinfo->owner  = 0;
                        waitinfo->context = kwq->kw_addr;
                        break;
                /* Condvars don't have owners, so just give the userspace address. */
                case kThreadWaitPThreadCondVar:
                        assert((kwq->kw_type & KSYN_WQTYPE_MASK) == KSYN_WQTYPE_CVAR);
                        waitinfo->owner  = 0;
                        waitinfo->context = kwq->kw_addr;
                        break;
                case kThreadWaitNone:
                default:
                        waitinfo->owner = 0;
                        waitinfo->context = 0;
                        break;
        }
}

把鎖更換為讀寫鎖，驗證一下前面的理論是否正確：

pthread_rwlock_init(&_rwlock, NULL);
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
  printf("begin : \n");
  printThreadPriority();
  printf("queue before lock \n");
  pthread_rwlock_rdlock(&_rwlock); //確保 backgroundQueue 先得到鎖
  printf("queue lock \n");
  printThreadPriority();
  dispatch_async(dispatch_get_main_queue(), ^{
    printf("before main lock\n");
    pthread_rwlock_wrlock(&_rwlock);
    printf("in main lock\n");
    pthread_rwlock_unlock(&_rwlock);
    printf("after main unlock\n");
  });
  sleep(10);
  printThreadPriority();
  printf("queue unlock\n");
  pthread_rwlock_unlock(&_rwlock);
  printf("queue after unlock\n");
});

begin : 
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue before lock 
queue lock 
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
before main lock
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue unlock
queue after unlock
in main lock
after main unlock

可以看到讀寫鎖不會發生優先級提升。

3.3.4 dispatch_sync

這個API都比較熟悉了，這里直接驗證：

// 當前線程為主線程
dispatch_queue_attr_t qosAttribute = dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_SERIAL, QOS_CLASS_BACKGROUND, 0);
_queue = dispatch_queue_create("com.demo.test", qosAttribute);
printThreadPriority();
dispatch_async(_queue, ^{
    printf("dispatch_async before dispatch_sync : \n");
    printThreadPriority();
});
dispatch_sync(_queue, ^{
    printf("dispatch_sync: \n");
    printThreadPriority();
});
dispatch_async(_queue, ^{
    printf("dispatch_async after dispatch_sync: \n");
    printThreadPriority();
});

pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63 
dispatch_async before dispatch_sync : 
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
dispatch_sync: 
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
dispatch_async after dispatch_sync: 
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63

_queue是一個低優先級隊列（QOS_CLASS_BACKGROUND），可以看到dispatch_sync調用壓入隊列的任務，以及在這之前dispatch_async壓入的任務，都被提升到較高的優先級47（和主線程一致），而最后一個dispatch_async的任務則以優先級4來執行。

3.3.5 dispatch_wait

// 當前線程為主線程
dispatch_queue_attr_t qosAttribute = dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_SERIAL, QOS_CLASS_BACKGROUND, 0);
_queue = dispatch_queue_create("com.demo.test", qosAttribute);
printf("main thread\n");
printThreadPriority();
dispatch_block_t block = dispatch_block_create(DISPATCH_BLOCK_INHERIT_QOS_CLASS, ^{
    printf("sub thread\n");
    sleep(2);
    printThreadPriority();
});
dispatch_async(_queue, block);
dispatch_wait(block, DISPATCH_TIME_FOREVER);

_queue是一個低優先級隊列（QOS_CLASS_BACKGROUND），當在當前主線程使用dispatch_wait進行等待時，輸出如下，低優先級的任務被提升到優先級47

main thread
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
sub thread
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63

而如果將dispatch_wait(block, DISPATCH_TIME_FOREVER)注釋掉之后，輸出如下：

main thread
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
sub thread
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63

值得注意的是，dispatch_wait是一個宏（C11的泛型），或者是一個入口函數，它可以接受dispatch_block_t，dispatch_group_t，dispatch_semaphore_t 3種類型的參數，但是這里的具體含義應該是指dispatch_block_wait，只有dispatch_block_wait會調整優先級，避免優先級反轉。

intptr_t
dispatch_wait(void *object, dispatch_time_t timeout);
#if __has_extension(c_generic_selections)
#define dispatch_wait(object, timeout) \
                _Generic((object), \
                        dispatch_block_t:dispatch_block_wait, \
                        dispatch_group_t:dispatch_group_wait, \
                        dispatch_semaphore_t:dispatch_semaphore_wait \
                )((object),(timeout))
#endif

3.4 神秘的信號量

3.4.1 dispatch_semaphore

之前對dispatch_semaphore的認知非常淺薄，經常把二值信號量和互斥鎖劃等號。但是通過調研后發現：dispatch_semaphore 沒有 QoS 的概念，沒有記錄當前持有信號量的線程（owner），所以有高優先級的線程在等待鎖時，內核無法知道該提高哪個線程的調試優先級（QoS）。如果鎖持有者優先級比其他線程低，高優先級的等待線程將一直等待。Mutexvs Semaphore: What’s the Difference? 一文詳細比對了Mutex和Semaphore之間的區別。

Semaphores are for signaling (sames a condition variables, events) while mutexes are for mutual exclusion. Technically, you can also use semaphores for mutual exclusion (a mutex can be thought as a binary semaphore) but you really shouldn't.
Right, but libdispatch doesn't have a mutex. It has semaphores and queues. So if you're trying to use libdispatch and you don't want the closure-based aspect of queues, you might be tempted to use a semaphore instead. Don't do that, use os_unfair_lock or pthread_mutex (or a higher-level construct like NSLock) instead.

這些是一些警示，可以看到dispatch_semaphore十分危險，使用需要特別小心。

這里通過蘋果官方提供的demo進行解釋：

__block NSString *taskName = nil;
dispatch_semaphore_t sema = dispatch_semaphore_create(0); 
[self.connection.remoteObjectProxy requestCurrentTaskName:^(NSString *task) { 
     taskName = task; 
     dispatch_semaphore_signal(sema); 
}]; 
dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER); 
return taskName;

假設在主線程執行這段代碼，那么當前線程的優先級是QOS_CLASS_USER_INTERACTIVE；
由于從主線程進行了異步，異步任務隊列的QoS將會被提升為QOS_CLASS_USER_INITIATED；
主線程被信號量sema阻塞，而負責釋放該信號量的異步任務的優先級QOS_CLASS_USER_INITIATED低于主線程的優先級QOS_CLASS_USER_INTERACTIVE，因此可能會發生優先級反轉。

值得一提的是，Clang專門針對這種情況進行了靜態檢測：https://github.com/llvm-mirror/clang/blob/master/lib/StaticAnalyzer/Checkers/GCDAntipatternChecker.cpp

static auto findGCDAntiPatternWithSemaphore() -> decltype(compoundStmt()) {

  const char *SemaphoreBinding = "semaphore_name";
  auto SemaphoreCreateM = callExpr(allOf(
      callsName("dispatch_semaphore_create"),
      hasArgument(0, ignoringParenCasts(integerLiteral(equals(0))))));

  auto SemaphoreBindingM = anyOf(
      forEachDescendant(
          varDecl(hasDescendant(SemaphoreCreateM)).bind(SemaphoreBinding)),
      forEachDescendant(binaryOperator(bindAssignmentToDecl(SemaphoreBinding),
                     hasRHS(SemaphoreCreateM))));

  auto HasBlockArgumentM = hasAnyArgument(hasType(
            hasCanonicalType(blockPointerType())
            ));

  auto ArgCallsSignalM = hasAnyArgument(stmt(hasDescendant(callExpr(
          allOf(
              callsName("dispatch_semaphore_signal"),
              equalsBoundArgDecl(0, SemaphoreBinding)
              )))));

  auto HasBlockAndCallsSignalM = allOf(HasBlockArgumentM, ArgCallsSignalM);

  auto HasBlockCallingSignalM =
    forEachDescendant(
      stmt(anyOf(
        callExpr(HasBlockAndCallsSignalM),
        objcMessageExpr(HasBlockAndCallsSignalM)
           )));

  auto SemaphoreWaitM = forEachDescendant(
    callExpr(
      allOf(
        callsName("dispatch_semaphore_wait"),
        equalsBoundArgDecl(0, SemaphoreBinding)
      )
    ).bind(WarnAtNode));

  return compoundStmt(
      SemaphoreBindingM, HasBlockCallingSignalM, SemaphoreWaitM);
}

如果想使用該功能，只需要打開xcode設置即可：

另外，dispatch_group 跟 semaphore 類似，在調用 enter() 方法時，無法預知誰會調用 leave()，所以系統也無法知道其 owner是誰，所以同樣不會有優先級提升的問題。

3.4.2 信號量卡死現身說法

dispatch_semaphore給筆者的印象非常深刻，之前寫過一段這樣的代碼：使用信號量在主線程同步等待相機授權結果。

__block BOOL auth = NO;
dispatch_semaphore_t semaphore = dispatch_semaphore_create(0);
[KTAuthorizeService requestAuthorizationWithType:KTPermissionsTypeCamera completionHandler:^(BOOL allow) {
  auth = allow;
  dispatch_semaphore_signal(semaphore);
}];
dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER);

上線后長期占據卡死top1，當時百思不得其解，在深入了解到信號量無法避免優先級反轉后，終于豁然開朗，一掃之前心中的陰霾。

這類問題一般通過2種方式來解決：

1. 使用同步API

BOOL auth = [KTAuthorizeService authorizationWithType:KTPermissionsTypeCamera];
// do something next

2. 異步回調，不要在當前線程等待

[KTAuthorizeService requestAuthorizationWithType:KTPermissionsTypeCamera completionHandler:^(BOOL allow) {
    BOOL auth = allow;
    // do something next via callback
}];

4. 幾個概念

4.1 turnstile

前文提到XNU使用turnstile進行優先級繼承，這里對turnstile機制進行簡單的描述和理解。在XNU內核中，存在著大量的同步對象（例如lck_mtx_t），為了解決優先級反轉的問題，每個同步對象都必須對應一個分離的數據結構來維護大量的信息，例如阻塞在這個同步對象上的線程隊列?？梢韵胂笠幌?，如果每個同步對象都要分配一個這樣的數據結構，將造成極大的內存浪費。

為了解決這個問題，XNU采用了turnstile機制，一種空間利用率很高的解決方案。該方案的提出依據是同一個線程在同一時刻不能同時阻塞于多個同步對象上。這一事實允許所有同步對象只需要保留一個指向turnstile的指針，且在需要的時候去分配一個turnstile即可，而turnstile則包含了操作一個同步對象需要的所有信息，例如阻塞線程的隊列、擁有這個同步對象的線程指針。turnstile是從池中動態分配的，這個池的大小會隨著系統中已分配的線程數目增加而增加，所以turnstile總數將始終低于或等于線程數，這也決定了turnstile的數目是可控的。turnstile由阻塞在該同步對象上的第一個線程負責分配，當沒有更多線程阻塞在該同步對象上，turnstile會被釋放，回收到池中。

turnstile的數據結構如下：

struct turnstile {
    struct waitq                  ts_waitq;              /* waitq embedded in turnstile */
    turnstile_inheritor_t         ts_inheritor;          /* thread/turnstile inheriting the priority (IL, WL) */
    union {
        struct turnstile_list ts_free_turnstiles;    /* turnstile free list (IL) */
        SLIST_ENTRY(turnstile) ts_free_elm;          /* turnstile free list element (IL) */
    };
    struct priority_queue_sched_max ts_inheritor_queue;    /* Queue of turnstile with us as an inheritor (WL) */
    union {
        struct priority_queue_entry_sched ts_inheritor_links;    /* Inheritor queue links */
        struct mpsc_queue_chain   ts_deallocate_link;    /* thread deallocate link */
    };
    SLIST_ENTRY(turnstile)        ts_htable_link;        /* linkage for turnstile in global hash table */
    uintptr_t                     ts_proprietor;         /* hash key lookup turnstile (IL) */
    os_refcnt_t                   ts_refcount;           /* reference count for turnstiles */
    _Atomic uint32_t              ts_type_gencount;      /* gen count used for priority chaining (IL), type of turnstile (IL) */
    uint32_t                      ts_port_ref;           /* number of explicit refs from ports on send turnstile */
    turnstile_update_flags_t      ts_inheritor_flags;    /* flags for turnstile inheritor (IL, WL) */
    uint8_t                       ts_priority;           /* priority of turnstile (WL) */

#if DEVELOPMENT || DEBUG
    uint8_t                       ts_state;              /* current state of turnstile (IL) */
    queue_chain_t                 ts_global_elm;         /* global turnstile chain */
    thread_t                      ts_thread;             /* thread the turnstile is attached to */
    thread_t                      ts_prev_thread;        /* thread the turnstile was attached before donation */
#endif
};

4.2 優先級數值

在驗證環節有一些優先級數值，這里借助「Mac OS? X and iOS Internals」解釋一下：實驗中涉及到的優先級數值都是相對于Mach層而言的，且都是用戶線程數值。

用戶線程的優先級是0~63；a. NSQualityOfServiceBackground的Mach層級優先級數是4；b. NSQualityOfServiceUtility的Mach層級優先級數是20；c. NSQualityOfServiceDefault的Mach層級優先級數是31；d. NSQualityOfServiceUserInitiated的Mach層級優先級數是37；e. NSQualityOfServiceUserInteractive的Mach層級優先級是47。
內核線程的優先級是80~95；
實時系統線程的優先級是96~127；
64~79被保留給系統使用。

5. 總結

本文主要闡述了優先級反轉的一些概念和解決思路，并結合iOS平臺的幾種鎖進行了詳細的調研。通過深入的理解，可以去規避一些不必要的優先級反轉，從而進一步避免卡死異常。字節跳動 APM團隊也針對線程的優先級做了監控處理，進而達到發現和預防優先級反轉的目的。

6. 參考文檔

WWDC18 What' s New in LLVM - actorsfit
https://developer.apple.com/videos/play/wwdc2015/718
https://developer.apple.com/forums/thread/124155
https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/Multithreading/CreatingThreads/CreatingThreads.html
https://developer.apple.com/library/archive/documentation/Performance/Conceptual/EnergyGuide-iOS/PrioritizeWorkWithQoS.html
https://github.com/llvm-mirror/clang/blob/google/stable/lib/StaticAnalyzer/Checkers/GCDAntipatternChecker.cpp
Don't use dispatch semaphores where mutexes (or dispatch queues) would suffice
Concurrency Problems Written by Scott Grosch
https://www.jianshu.com/p/af64e05de503
https://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_rwlock_wrlock.html
iOS 中各種“鎖”的理解及應用
不再安全的 OSSpinLock
https://blog.actorsfit.com/a?ID=00001-499b1c8e-8a7f-4960-a1c1-c8e2f42c08c6
https://objccn.io/issue-2-1/#Priority-Inversion
Introduction to RTOS - Solution to Part 11 (Priority Inversion)
https://threadreaderapp.com/thread/1229999590482444288.html#
深入理解 iOS 中的鎖
Threads can infect each other with their low priority

7. 關于我們

字節跳動 APM 中臺致力于提升整個集團內全系產品的性能和穩定性表現，技術棧覆蓋 iOS / Android / Server / Web / Hybrid / PC / 游戲 / 小程序等，工作內容包括但不限于性能穩定性監控，問題排查，深度優化，防劣化等。長期期望為業界輸出更多更有建設性的問題發現和深度優化手段。

在線咨詢

上一篇：簡單好用，css3實現上下元素移動
下一篇：每個非網站開發人員都應該了解的21個HTML基礎知識

您的項目需求

*請認真填寫需求信息，我們會在24小時內與您取得聯系。

整合營銷服務商