# 10.Class-cache\_t

之前进行类的结构的分析的时候，我们还有一个重要的东西没有进行研究，就是 `cache_t`。

## 一、cache\_t 的基本结构

和之前一样，我们在源码环境下编译调试。

回顾一下class的基础结构：

|    -   |     ISA    | superClass | cache |  bits  |
| :----: | :--------: | :--------: | :---: | :----: |
|   说明   | ISA（结构体指针） |  父类（结构体指针） |  方法缓存 | 类的具体信息 |
| 大小（字节） |      8     |      8     |   16  |    8   |

### 1.1 源码

其核心数据结构如下

```C++
struct cache_t {

// <<<< 主要数据结构
private:
    explicit_atomic<uintptr_t> _bucketsAndMaybeMask;
    union {
        struct {
            explicit_atomic<mask_t>    _maybeMask;
#if __LP64__
            uint16_t                   _flags;
#endif
            uint16_t                   _occupied;
        };
        explicit_atomic<preopt_cache_t *> _originalPreoptCache;
    };
    ...
};
```

* \_bucketsAndMaybeMask
  * \_bucketsAndMaybeMask is a buckets\_t pointer
  * 它是一个 `buckets_t` 类型的指针
* \_maybeMask
  * \_maybeMask is the buckets mask
* \_occupied
* \_originalPreoptCache

### 1.2 LLDB调试

```C++
(lldb) p/x RYModel.class
(Class) $10 = 0x00000001000087c8 RYModel

// 平移取 cache
(lldb) p (cache_t *)(0x00000001000087c8 + 0x10)
(cache_t *) $11 = 0x00000001000087d8
(lldb) p $11
(cache_t *) $11 = 0x00000001000087d8

// 取出 cache 内容
(lldb) p *$11
(cache_t) $12 = {
  _bucketsAndMaybeMask = {
    std::__1::atomic<unsigned long> = {
      Value = 4314911728
    }
  }
   = {
     = {
      _maybeMask = {
        std::__1::atomic<unsigned int> = {
          Value = 3
        }
      }
      _flags = 32820
      _occupied = 1
    }
    _originalPreoptCache = {
      std::__1::atomic<preopt_cache_t *> = {
        Value = 0x0001803400000003
      }
    }
  }
}
```

> 我们无法直接输出相关成员内容了，为了继续输出相关属性我们继续看源码

#### 相关get方法

```C++
public:
    // The following four fields are public for objcdt's use only.
    // objcdt reaches into fields while the process is suspended
    // hence doesn't care for locks and pesky little details like this
    // and can safely use these.
    unsigned capacity() const;// 容量
    struct bucket_t *buckets() const;
    Class cls() const;

#if CONFIG_USE_PREOPT_CACHES
    const preopt_cache_t *preopt_cache() const;
#endif

    mask_t occupied() const;// 缓存数量
```

我们使用 `buckets()` 方法继续输出调试

```C++
(lldb) p $12.buckets()[0]
(bucket_t) $13 = {
  _sel = {
    std::__1::atomic<objc_selector *> = (null) {
      Value = nil
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 0
    }
  }
}
(lldb) p $12.buckets()[1]
(bucket_t) $14 = {
  _sel = {
    std::__1::atomic<objc_selector *> = (null) {
      Value = nil
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 0
    }
  }
}
(lldb) p $12.buckets()[2]
(bucket_t) $15 = {
  _sel = {
    std::__1::atomic<objc_selector *> = "" {
      Value = ""
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 48856
    }
  }
}
(lldb) p $15.sel()
(SEL) $16 = "dosomething"

(lldb) p $15.imp(nil, RYModel.class)
(IMP) $17 = 0x0000000100003910 (KCObjcBuild`-[RYModel dosomething])
(lldb) 
```

这里你会发现用到了 `$12.buckets()[2]` 的下标是2，0和1都是空的。因为这里 `buckets` 不是数组，是哈希表的结构。

### 1.3、bucket\_t的数据结构

包含了 `SEL` 和 `IMP`

```C++
struct bucket_t {
private:
    // IMP-first is better for arm64e ptrauth and no worse for arm64.
    // SEL-first is better for armv7* and i386 and x86_64.
#if __arm64__
    explicit_atomic<uintptr_t> _imp;
    explicit_atomic<SEL> _sel;
#else
    explicit_atomic<SEL> _sel;
    explicit_atomic<uintptr_t> _imp;
#endif
...
};
```

## 二、自己实现 cache\_t 数据结构

> 使用LLDB进行调试感觉到太麻烦了，为了更方便的进行调试我们尝试对数据结构进行模仿。

```C++
#import <objc/runtime.h>

typedef uint32_t mask_t;  // x86_64 & arm64 asm are less efficient with 16-bits

struct lw_bucket_t {
    SEL _sel;
    IMP _imp;
};
struct lw_cache_t {
    struct lw_bucket_t *_bukets;
    mask_t    _maybeMask;
    uint16_t  _flags;
    uint16_t  _occupied;
};

struct lw_class_data_bits_t {
    uintptr_t bits;
};

// cache class
struct lw_objc_class {
    Class isa;
    Class superclass;
    struct lw_cache_t cache;
    struct lw_class_data_bits_t bits;
};
```

### 2.1 调试代码

```C++
RYModel *obj = [RYModel alloc];
[obj dosomethingA];
[obj dosomethingB];
struct lw_objc_class *lw_class = (__bridge struct lw_objc_class *)(RYModel.class);
NSLog(@"缓存数量：%hu - 容量：%u",lw_class->cache._occupied,lw_class->cache._maybeMask);

for (mask_t i = 0; i<lw_class->cache._maybeMask; i++) {
    struct lw_bucket_t bucket = lw_class->cache._bukets[i];
    NSLog(@"SEL:%@ - IMP:%pf",NSStringFromSelector(bucket._sel),bucket._imp);
}
```

输出：

```C++
-[RYModel dosomethingA]
-[RYModel dosomethingB]
缓存数量：2 - 容量：3
SEL:dosomethingA - IMP:0xb300f
SEL:dosomethingB - IMP:0xb330f
EL:(null) - IMP:0x0f
```

### 2.2 扩容

我们调整一下代码

```C++
[obj dosomethingA];
[obj dosomethingB];
[obj dosomethingC];

输出：
2021-06-27 16:00:51.939958+0800 LWCacheT[73396:867351] -[RYModel dosomethingA]
2021-06-27 16:00:51.940352+0800 LWCacheT[73396:867351] -[RYModel dosomethingB]
2021-06-27 16:00:51.940441+0800 LWCacheT[73396:867351] -[RYModel dosomethingC]
2021-06-27 16:00:51.940602+0800 LWCacheT[73396:867351] 缓存数量：1 - 容量：7
2021-06-27 16:00:51.940668+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940742+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940780+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940814+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940900+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.941025+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.941124+0800 LWCacheT[73396:867351] SEL:dosomethingC - IMP:0xbcd8f
```

我们发现这里进行了扩容。这里又是什么逻辑呢？

## 三、insert 与扩容

为了更好的理解扩容，我们先了解一下方法缓存是怎么插入的。

### 3.1 核心源码解读

```C++
void cache_t::insert(SEL sel, IMP imp, id receiver)
{

    ...

    // Use the cache as-is if until we exceed our expected fill ratio.
    // 计数 + 1
    mask_t newOccupied = occupied() + 1;
    unsigned oldCapacity = capacity(), capacity = oldCapacity;
    if (slowpath(isConstantEmptyCache())) {
        // 判空，没有缓存创建缓存，根据架构不同，通过位运算初始化一个一定大小的容器
        // Cache is read-only. Replace it.
        if (!capacity) capacity = INIT_CACHE_SIZE;
        reallocate(oldCapacity, capacity, /* freeOld */false);
    }
    else if (fastpath(newOccupied + CACHE_END_MARKER <= cache_fill_ratio(capacity))) {
        // Cache is less than 3/4 or 7/8 full. Use it as-is.
        // 未达到扩容临界点
    }
#if CACHE_ALLOW_FULL_UTILIZATION
    else if (capacity <= FULL_UTILIZATION_CACHE_SIZE && newOccupied + CACHE_END_MARKER <= capacity) {
        // Allow 100% cache utilization for small buckets. Use it as-is.

        // Allow 100% cache utilization for smaller cache sizes. This has the same
        // advantages and disadvantages as the fill ratio. A very large percentage
        // of caches end up with very few entries and the worst case of collision
        // chains in small tables is relatively small.
        // NOTE: objc_msgSend properly handles a cache lookup with a full cache.

        // 
    }
#endif
    else {
      // 扩容操作
        capacity = capacity ? capacity * 2 : INIT_CACHE_SIZE;
        if (capacity > MAX_CACHE_SIZE) {
            capacity = MAX_CACHE_SIZE;
        }
        reallocate(oldCapacity, capacity, true);
    }

    bucket_t *b = buckets();
    mask_t m = capacity - 1;
    mask_t begin = cache_hash(sel, m);
    mask_t i = begin;

    // Scan for the first unused slot and insert there.
    // There is guaranteed to be an empty slot.
    // 循环找到合适的下标
    do {
        if (fastpath(b[i].sel() == 0)) {
            incrementOccupied();
            /// 插入逻辑
            b[i].set<Atomic, Encoded>(b, sel, imp, cls());
            return;
        }
        if (b[i].sel() == sel) {
            // The entry was added to the cache by some other thread
            // before we grabbed the cacheUpdateLock.
            return;
        }
    } while (fastpath((i = cache_next(i, m)) != begin));

    // 未找到合适的下标 crash
    bad_cache(receiver, (SEL)sel);
#endif // !DEBUG_TASK_THREADS
}

```

#### a. INIT\_CACHE\_SIZE 初始化缓存

通过源码发现，这里是通过位运算进行的缓存空间大小初始化。

如，当前我们的是 `1 << 2 = 4`。同时我们还能发现最大值为 `1 << 16`。

```C++
/* Initial cache bucket count. INIT_CACHE_SIZE must be a power of two. */
enum {
#if CACHE_END_MARKER || (__arm64__ && !__LP64__)
    // When we have a cache end marker it fills a bucket slot, so having a
    // initial cache size of 2 buckets would not be efficient when one of the
    // slots is always filled with the end marker. So start with a cache size
    // 4 buckets.
    INIT_CACHE_SIZE_LOG2 = 2,
#else
    // Allow an initial bucket size of 2 buckets, since a large number of
    // classes, especially metaclasses, have very few imps, and we support
    // the ability to fill 100% of the cache before resizing.
    INIT_CACHE_SIZE_LOG2 = 1,
#endif
    INIT_CACHE_SIZE      = (1 << INIT_CACHE_SIZE_LOG2),
    MAX_CACHE_SIZE_LOG2  = 16,
    MAX_CACHE_SIZE       = (1 << MAX_CACHE_SIZE_LOG2),
    FULL_UTILIZATION_CACHE_SIZE_LOG2 = 3,
    FULL_UTILIZATION_CACHE_SIZE = (1 << FULL_UTILIZATION_CACHE_SIZE_LOG2),
};
```

#### b. 扩容临界点

`newOccupied + CACHE_END_MARKER <= cache_fill_ratio(capacity)`

> // Cache is less than 3/4 or 7/8 full. Use it as-is.
>
> > 真机为 7/8

```C++
// objc_msgSend has few registers available.
// Cache scan increments and wraps at special end-marking bucket.
#define CACHE_END_MARKER 1

// Historical fill ratio of 75% (since the new objc runtime was introduced).
static inline mask_t cache_fill_ratio(mask_t capacity) {
    return capacity * 3 / 4;
}

#elif __arm64__ && !__LP64__
#define CACHE_END_MARKER 0

// Historical fill ratio of 75% (since the new objc runtime was introduced).
static inline mask_t cache_fill_ratio(mask_t capacity) {
    return capacity * 3 / 4;
}

#elif __arm64__ && __LP64__

// objc_msgSend has lots of registers available.
// Cache scan decrements. No end marker needed.
#define CACHE_END_MARKER 0

// Allow 87.5% fill ratio in the fast path for all cache sizes.
// Increasing the cache fill ratio reduces the fragmentation and wasted space
// in imp-caches at the cost of potentially increasing the average lookup of
// a selector in imp-caches by increasing collision chains. Another potential
// change is that cache table resizes / resets happen at different moments.
static inline mask_t cache_fill_ratio(mask_t capacity) {
    return capacity * 7 / 8;
}
```

#### c. 扩容操作

> 扩容按 `x2` 进行扩容

```C++
capacity = capacity ? capacity * 2 : INIT_CACHE_SIZE;
if (capacity > MAX_CACHE_SIZE) {
    capacity = MAX_CACHE_SIZE;
}
reallocate(oldCapacity, capacity, true);
```

#### d. 扩容后

我们看一下触发扩容的那次Log：

```C++
2021-06-27 16:00:51.940602+0800 LWCacheT[73396:867351] 缓存数量：1 - 容量：7
2021-06-27 16:00:51.940668+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940742+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940780+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940814+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.940900+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.941025+0800 LWCacheT[73396:867351] SEL:(null) - IMP:0x0f
2021-06-27 16:00:51.941124+0800 LWCacheT[73396:867351] SEL:dosomethingC - IMP:0xbcd8f
```

这次扩容后，我们发现。除了最新的，之前的缓存都没有了。

这是因为 `扩容` 并不是真正意义上的扩展之前的空间，而是重新开辟一块空间。

### 3.2 cache\_hash 计算哈希下标

```C++
static inline mask_t cache_hash(SEL sel, mask_t mask) 
{
    uintptr_t value = (uintptr_t)sel;
#if CONFIG_USE_PREOPT_CACHES
    value ^= value >> 7;// 真机 & m1
#endif
    return (mask_t)(value & mask);
}
```

### 3.3 cache\_next 寻找插入位置

```C++
#if CACHE_END_MARKER
static inline mask_t cache_next(mask_t i, mask_t mask) {
    return (i+1) & mask; // 向后找
}
#elif __arm64__
static inline mask_t cache_next(mask_t i, mask_t mask) {
    return i ? i-1 : mask; // 反复横跳
}
#else
#error unexpected configuration
#endif
```

### 3.4 set 差入逻辑

```C++
template<Atomicity atomicity, IMPEncoding impEncoding>
void bucket_t::set(bucket_t *base, SEL newSel, IMP newImp, Class cls)
{
...
    // objc_msgSend uses sel and imp with no locks.
    // It is safe for objc_msgSend to see new imp but NULL sel
    // (It will get a cache miss but not dispatch to the wrong place.)
    // It is unsafe for objc_msgSend to see old imp and new sel.
    // Therefore we write new imp, wait a lot, then write new sel.
    
    /// 对 IMP 进行编码操作
    uintptr_t newIMP = (impEncoding == Encoded
                        ? encodeImp(base, newImp, newSel, cls)
                        : (uintptr_t)newImp);
...
  保存
}
```

将方法插入缓存，并不是直接插入的。而是进行了编码。`(uintptr_t)newImp ^ (uintptr_t)cls;`

### 3.5 流程图

![cache\_t](https://4193904735-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MI8JgbGh3U6X_oedqkm%2Fuploads%2Fgit-blob-01da036359a0b0d42780e51fe955dc0cc315e061%2Fcache_t.png?alt=media)

## 关于类型转换的思考

这里使用自己定义的类型进行转换，感觉是不是很懵，感觉这也行？

我们抛开代码，思考一下：我们为什么要定义各种数据结构？为了能读写数据。

而能正确读写数据的前提是，计算机能认识这些数据。而我们定义的数据结构，就是计算机认识他们的基准。

只要两个数据结构的内存结构一致，那么进行转换就没有问题。

## 参考

[HashMap的负载因子为什么默认是0.75](https://blog.csdn.net/penghao_1/article/details/107631820)
