# 10.Class-cache\_t

之前进行类的结构的分析的时候，我们还有一个重要的东西没有进行研究，就是 `cache_t`。

## 一、cache\_t 的基本结构

和之前一样，我们在源码环境下编译调试。

回顾一下class的基础结构：

|    -   |     ISA    | superClass | cache |  bits  |
| :----: | :--------: | :--------: | :---: | :----: |
|   说明   | ISA（结构体指针） |  父类（结构体指针） |  方法缓存 | 类的具体信息 |
| 大小（字节） |      8     |      8     |   16  |    8   |

### 1.1 源码

其核心数据结构如下

```C++
struct cache_t {

// <<<< 主要数据结构
private:
    explicit_atomic<uintptr_t> _bucketsAndMaybeMask;
    union {
        struct {
            explicit_atomic<mask_t>    _maybeMask;
#if __LP64__
            uint16_t                   _flags;
#endif
            uint16_t                   _occupied;
        };
        explicit_atomic<preopt_cache_t *> _originalPreoptCache;
    };
    ...
};
```

* \_bucketsAndMaybeMask
  * \_bucketsAndMaybeMask is a buckets\_t pointer
  * 它是一个 `buckets_t` 类型的指针
* \_maybeMask
  * \_maybeMask is the buckets mask
* \_occupied
* \_originalPreoptCache

### 1.2 LLDB调试

```C++
(lldb) p/x RYModel.class
(Class) $10 = 0x00000001000087c8 RYModel

// 平移取 cache
(lldb) p (cache_t *)(0x00000001000087c8 + 0x10)
(cache_t *) $11 = 0x00000001000087d8
(lldb) p $11
(cache_t *) $11 = 0x00000001000087d8

// 取出 cache 内容
(lldb) p *$11
(cache_t) $12 = {
  _bucketsAndMaybeMask = {
    std::__1::atomic<unsigned long> = {
      Value = 4314911728
    }
  }
   = {
     = {
      _maybeMask = {
        std::__1::atomic<unsigned int> = {
          Value = 3
        }
      }
      _flags = 32820
      _occupied = 1
    }
    _originalPreoptCache = {
      std::__1::atomic<preopt_cache_t *> = {
        Value = 0x0001803400000003
      }
    }
  }
}
```

> 我们无法直接输出相关成员内容了，为了继续输出相关属性我们继续看源码

#### 相关get方法

```C++
public:
    // The following four fields are public for objcdt's use only.
    // objcdt reaches into fields while the process is suspended
    // hence doesn't care for locks and pesky little details like this
    // and can safely use these.
    unsigned capacity() const;// 容量
    struct bucket_t *buckets() const;
    Class cls() const;

#if CONFIG_USE_PREOPT_CACHES
    const preopt_cache_t *preopt_cache() const;
#endif

    mask_t occupied() const;// 缓存数量
```

我们使用 `buckets()` 方法继续输出调试

```C++
(lldb) p $12.buckets()[0]
(bucket_t) $13 = {
  _sel = {
    std::__1::atomic<objc_selector *> = (null) {
      Value = nil
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 0
    }
  }
}
(lldb) p $12.buckets()[1]
(bucket_t) $14 = {
  _sel = {
    std::__1::atomic<objc_selector *> = (null) {
      Value = nil
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 0
    }
  }
}
(lldb) p $12.buckets()[2]
(bucket_t) $15 = {
  _sel = {
    std::__1::atomic<objc_selector *> = "" {
      Value = ""
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 48856
    }
  }
}
(lldb) p $15.sel()
(SEL) $16 = "dosomething"

(lldb) p $15.imp(nil, RYModel.class)
(IMP) $17 = 0x0000000100003910 (KCObjcBuild`-[RYModel dosomething])
(lldb) 
```

这里你会发现用到了 `$12.buckets()[2]` 的下标是2，0和1都是空的。因为这里 `buckets` 不是数组，是哈希表的结构。

### 1.3、bucket\_t的数据结构

包含了 `SEL` 和 `IMP`

```C++
struct bucket_t {
private:
    // IMP-first is better for arm64e ptrauth and no worse for arm64.
    // SEL-first is better for armv7* and i386 and x86_64.
#if __arm64__
    explicit_atomic<uintptr_t> _imp;
    explicit_atomic<SEL> _sel;
#else
    explicit_atomic<SEL> _sel;
    explicit_atomic<uintptr_t> _imp;
#endif
...
};
```

## 二、自己实现 cache\_t 数据结构

> 使用LLDB进行调试感觉到太麻烦了，为了更方便的进行调试我们尝试对数据结构进行模仿。

```C++
#import <objc/runtime.h>

typedef uint32_t mask_t;  // x86_64 & arm64 asm are less efficient with 16-bits

struct lw_bucket_t {
    SEL _sel;
    IMP _imp;
};
struct lw_cache_t {
    struct lw_bucket_t *_bukets;
    mask_t    _maybeMask;
    uint16_t  _flags;
    uint16_t  _occupied;
};

struct lw_class_data_bits_t {
    uintptr_t bits;
};

// cache class
struct lw_objc_class {
    Class isa;
    Class superclass;
    struct lw_cache_t cache;
    struct lw_class_data_bits_t bits;
};
```

### 2.1 调试代码

```C++
RYModel *obj = [RYModel alloc];
[obj dosomethingA];
[obj dosomethingB];
struct lw_objc_class *lw_class = (__bridge struct lw_objc_class *)(RYModel.class);
NSLog(@"缓存数量：%hu - 容量：%u",lw_class->cache._occupied,lw_class->cache._maybeMask);

for (mask_t i = 0; i<lw_class->cache._maybeMask; i++) {
    struct lw_bucket_t bucket = lw_class->cache._bukets[i];
    NSLog(@"SEL:%@ - IMP:%pf",NSStringFromSelector(bucket._sel),bucket._imp);
}
```

输出：

```C++
-[RYModel dosomethingA]
-[RYModel dosomethingB]
缓存数量：2 - 容量：3
SEL:dosomethingA - IMP:0xb300f
SEL:dosomethingB - IMP:0xb330f
EL:(null) - IMP:0x0f
```

### 2.2 扩容

我们调整一下代码

```C++
[obj dosomethingA];
[obj dosomethingB];
[obj dosomethingC];

输出：
2021-06-27 16:00:51.939958+0800 LWCacheT[73396:867351] -[RYModel dosomethingA]
2021-06-27 16:00:51.940352+0800 LWCacheT[73396:867351] -[RYModel dosomethingB]
2021-06-27 16:00:51.940441+0800 LWCacheT[73396:867351] -[RYModel dosomethingC]
2021-06-27 16:00:51.940602+0800 LWCacheT[73396:867351] 缓存数量：1 - 容量：7
2021-06-27 16:00:51.940668+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940742+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940780+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940814+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940900+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.941025+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.941124+0800 LWCacheT[73396:867351] SEL:dosomethingC - IMP:0xbcd8f
```

我们发现这里进行了扩容。这里又是什么逻辑呢？

## 三、insert 与扩容

为了更好的理解扩容，我们先了解一下方法缓存是怎么插入的。

### 3.1 核心源码解读

```C++
void cache_t::insert(SEL sel, IMP imp, id receiver)
{

    ...

    // Use the cache as-is if until we exceed our expected fill ratio.
    // 计数 + 1
    mask_t newOccupied = occupied() + 1;
    unsigned oldCapacity = capacity(), capacity = oldCapacity;
    if (slowpath(isConstantEmptyCache())) {
        // 判空，没有缓存创建缓存，根据架构不同，通过位运算初始化一个一定大小的容器
        // Cache is read-only. Replace it.
        if (!capacity) capacity = INIT_CACHE_SIZE;
        reallocate(oldCapacity, capacity, /* freeOld */false);
    }
    else if (fastpath(newOccupied + CACHE_END_MARKER <= cache_fill_ratio(capacity))) {
        // Cache is less than 3/4 or 7/8 full. Use it as-is.
        // 未达到扩容临界点
    }
#if CACHE_ALLOW_FULL_UTILIZATION
    else if (capacity <= FULL_UTILIZATION_CACHE_SIZE && newOccupied + CACHE_END_MARKER <= capacity) {
        // Allow 100% cache utilization for small buckets. Use it as-is.

        // Allow 100% cache utilization for smaller cache sizes. This has the same
        // advantages and disadvantages as the fill ratio. A very large percentage
        // of caches end up with very few entries and the worst case of collision
        // chains in small tables is relatively small.
        // NOTE: objc_msgSend properly handles a cache lookup with a full cache.

        // 
    }
#endif
    else {
      // 扩容操作
        capacity = capacity ? capacity * 2 : INIT_CACHE_SIZE;
        if (capacity > MAX_CACHE_SIZE) {
            capacity = MAX_CACHE_SIZE;
        }
        reallocate(oldCapacity, capacity, true);
    }

    bucket_t *b = buckets();
    mask_t m = capacity - 1;
    mask_t begin = cache_hash(sel, m);
    mask_t i = begin;

    // Scan for the first unused slot and insert there.
    // There is guaranteed to be an empty slot.
    // 循环找到合适的下标
    do {
        if (fastpath(b[i].sel() == 0)) {
            incrementOccupied();
            /// 插入逻辑
            b[i].set<Atomic, Encoded>(b, sel, imp, cls());
            return;
        }
        if (b[i].sel() == sel) {
            // The entry was added to the cache by some other thread
            // before we grabbed the cacheUpdateLock.
            return;
        }
    } while (fastpath((i = cache_next(i, m)) != begin));

    // 未找到合适的下标 crash
    bad_cache(receiver, (SEL)sel);
#endif // !DEBUG_TASK_THREADS
}

```

#### a. INIT\_CACHE\_SIZE 初始化缓存

通过源码发现，这里是通过位运算进行的缓存空间大小初始化。

如，当前我们的是 `1 << 2 = 4`。同时我们还能发现最大值为 `1 << 16`。

```C++
/* Initial cache bucket count. INIT_CACHE_SIZE must be a power of two. */
enum {
#if CACHE_END_MARKER || (__arm64__ && !__LP64__)
    // When we have a cache end marker it fills a bucket slot, so having a
    // initial cache size of 2 buckets would not be efficient when one of the
    // slots is always filled with the end marker. So start with a cache size
    // 4 buckets.
    INIT_CACHE_SIZE_LOG2 = 2,
#else
    // Allow an initial bucket size of 2 buckets, since a large number of
    // classes, especially metaclasses, have very few imps, and we support
    // the ability to fill 100% of the cache before resizing.
    INIT_CACHE_SIZE_LOG2 = 1,
#endif
    INIT_CACHE_SIZE      = (1 << INIT_CACHE_SIZE_LOG2),
    MAX_CACHE_SIZE_LOG2  = 16,
    MAX_CACHE_SIZE       = (1 << MAX_CACHE_SIZE_LOG2),
    FULL_UTILIZATION_CACHE_SIZE_LOG2 = 3,
    FULL_UTILIZATION_CACHE_SIZE = (1 << FULL_UTILIZATION_CACHE_SIZE_LOG2),
};
```

#### b. 扩容临界点

`newOccupied + CACHE_END_MARKER <= cache_fill_ratio(capacity)`

> // Cache is less than 3/4 or 7/8 full. Use it as-is.
>
> > 真机为 7/8

```C++
// objc_msgSend has few registers available.
// Cache scan increments and wraps at special end-marking bucket.
#define CACHE_END_MARKER 1

// Historical fill ratio of 75% (since the new objc runtime was introduced).
static inline mask_t cache_fill_ratio(mask_t capacity) {
    return capacity * 3 / 4;
}

#elif __arm64__ && !__LP64__
#define CACHE_END_MARKER 0

// Historical fill ratio of 75% (since the new objc runtime was introduced).
static inline mask_t cache_fill_ratio(mask_t capacity) {
    return capacity * 3 / 4;
}

#elif __arm64__ && __LP64__

// objc_msgSend has lots of registers available.
// Cache scan decrements. No end marker needed.
#define CACHE_END_MARKER 0

// Allow 87.5% fill ratio in the fast path for all cache sizes.
// Increasing the cache fill ratio reduces the fragmentation and wasted space
// in imp-caches at the cost of potentially increasing the average lookup of
// a selector in imp-caches by increasing collision chains. Another potential
// change is that cache table resizes / resets happen at different moments.
static inline mask_t cache_fill_ratio(mask_t capacity) {
    return capacity * 7 / 8;
}
```

#### c. 扩容操作

> 扩容按 `x2` 进行扩容

```C++
capacity = capacity ? capacity * 2 : INIT_CACHE_SIZE;
if (capacity > MAX_CACHE_SIZE) {
    capacity = MAX_CACHE_SIZE;
}
reallocate(oldCapacity, capacity, true);
```

#### d. 扩容后

我们看一下触发扩容的那次Log：

```C++
2021-06-27 16:00:51.940602+0800 LWCacheT[73396:867351] 缓存数量：1 - 容量：7
2021-06-27 16:00:51.940668+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940742+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940780+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940814+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940900+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.941025+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.941124+0800 LWCacheT[73396:867351] SEL:dosomethingC - IMP:0xbcd8f
```

这次扩容后，我们发现。除了最新的，之前的缓存都没有了。

这是因为 `扩容` 并不是真正意义上的扩展之前的空间，而是重新开辟一块空间。

### 3.2 cache\_hash 计算哈希下标

```C++
static inline mask_t cache_hash(SEL sel, mask_t mask) 
{
    uintptr_t value = (uintptr_t)sel;
#if CONFIG_USE_PREOPT_CACHES
    value ^= value >> 7;// 真机 & m1
#endif
    return (mask_t)(value & mask);
}
```

### 3.3 cache\_next 寻找插入位置

```C++
#if CACHE_END_MARKER
static inline mask_t cache_next(mask_t i, mask_t mask) {
    return (i+1) & mask; // 向后找
}
#elif __arm64__
static inline mask_t cache_next(mask_t i, mask_t mask) {
    return i ? i-1 : mask; // 反复横跳
}
#else
#error unexpected configuration
#endif
```

### 3.4 set 差入逻辑

```C++
template<Atomicity atomicity, IMPEncoding impEncoding>
void bucket_t::set(bucket_t *base, SEL newSel, IMP newImp, Class cls)
{
...
    // objc_msgSend uses sel and imp with no locks.
    // It is safe for objc_msgSend to see new imp but NULL sel
    // (It will get a cache miss but not dispatch to the wrong place.)
    // It is unsafe for objc_msgSend to see old imp and new sel.
    // Therefore we write new imp, wait a lot, then write new sel.
    
    /// 对 IMP 进行编码操作
    uintptr_t newIMP = (impEncoding == Encoded
                        ? encodeImp(base, newImp, newSel, cls)
                        : (uintptr_t)newImp);
...
  保存
}
```

将方法插入缓存，并不是直接插入的。而是进行了编码。`(uintptr_t)newImp ^ (uintptr_t)cls;`

### 3.5 流程图

![cache\_t](/files/OUIzazk9j2v5WxqLEEPh)

## 关于类型转换的思考

这里使用自己定义的类型进行转换，感觉是不是很懵，感觉这也行？

我们抛开代码，思考一下：我们为什么要定义各种数据结构？为了能读写数据。

而能正确读写数据的前提是，计算机能认识这些数据。而我们定义的数据结构，就是计算机认识他们的基准。

只要两个数据结构的内存结构一致，那么进行转换就没有问题。

## 参考

[HashMap的负载因子为什么默认是0.75](https://blog.csdn.net/penghao_1/article/details/107631820)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ryukiedev.gitbook.io/wiki/ios/di-ceng/10.class-cache_t.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
