Aggregator

如何确保移动存储介质的安全？威努特移动介质安检站为您解答

威努特工控安全

7 months 2 weeks ago

为用户构建安全的工作业务流程，提升企业网络环境的整体安全性和数据保护能力。

数字化转型下的研发安全痛点

我的安全视界观

7 months 2 weeks ago

本文是”深耕研发安全”系列文章的开篇，介绍在数字化转型过程中，研发安全的工作模式与方法的迭代升级。从研发安全体系建设的角度出发，总结出难度比较大的三个典型问题。

Sayrud：因为不想重复写 CRUD，我把 18 岁那年开的坑填完了

E99p1ant

7 months 3 weeks ago

少年 18 岁时的梦

记得我 18 岁那年高考完在家，还没放松几天就被我爸催着去找份暑假工作。当时我对工作一点概念也没有，糊了份简历就在 58 同城上乱投，投完第二天跟一家公司约了线下聊聊，结果还真让我聊到个在家兼职的工作。（后来发现其实巨不靠谱）

工作内容大致是开发微信小程序，我当时仅有一点自学的微信小程序的开发经验和 PHP CodeIgniter 后端经验，差不多能 Hold 住对面的需求，甚至还在 GitHub 上给一个小程序前端组件库提了 PR。（现在回过头看当初写的代码，真的是“满目疮痍”——前端 UI 没对齐，后端 SQL 注入满天飞，黑历史了属于是）

直到大学开学前，暑假的两个月里我给那边开发了两个微信小程序。因为每次都要用 CodeIgniter 框架写功能类似的后端，年少的我在想能否把 MVC 的 Model 操作数据库，Controller 处理逻辑，View 返回响应给封装成一个线上的服务，我在图形化的 Web 页面上点点点就可以实现建表、验证表单、定义 API 接口等操作。

我被自己这个天才般的点子所鼓舞，用 PHP 写了 WeBake ，当时的想法是用来快速构建微信小程序后端。年少的我以为自己在做前人从来没做过的东西，沉浸其中并暗自窃喜。直到进入大学的前一天夜里，我在知乎上偶然看到了一家同类型的 SaaS 应用推广，也是在跟我做相同的东西，并且已经开始了商业化，我才知道业内有很多公司都已经在做了。那天晚上我直接心态爆炸。关于 WeBake 这个项目后面也就理所当然的弃坑了。

后来发生的事，大家也都知道了：微信后面发布了「微信云开发」的一站式后端解决方案，直接官方必死同人。再后来 “LowCode 低代码”的概念开始流行，LeanCloud 被心动游戏收购，国外 AirTable、国内黑帕云、维格表 Vika 等产品开始流行起来…… 而那个当时让我心态爆炸的做小程序后端的 SaaS 产品，在互联网上几乎找不到它的痕迹了。

开始填坑

我在 2021 年的时候看到了 Hooopo 的文章 Let’s clone a Leancloud，里面介绍了使用 Postgres 实现类似 LeanCloud 的 Schemaless Table 的特性。我直呼好家伙，没想到 Postgres 的视图和 JSON 数据类型还可以这样玩出花来。我当时对着文章用 Go 实现了个小 Demo，感觉确实有意思。但是因为没有具体的需求，那个 Demo 一直躺在我的 GitHub 里。

今年我放弃 WordPress 使用 Hugo 重构了本博客，一直没找到个能满足我需求的静态博客评论组件，便想自己造轮子写一个。但是评论服务的后端，不就跟留言板一样，都是些很基础很无脑的 CRUD 吗？我已经不想再用 Go 无脑写 CRUD 了！要不我把需求抽象一层，直接写个“低代码数据中台”出来？好像有点意思哦……？

就这样，Sayrud 诞生了。

Schemaless 特性

Schemaless，中文机翻为「无模式」，让人听得云里雾里的，让我们一步步来。

首先，数据库语境的 Schema 可以简单的理解为是数据库的表结构定义，我有一张学生表，表里有学号、姓名、班级三列，然后学号是主键…… 这些就是 Schema 。在关系型数据库中，我们得写 SQL 语句来定义这张表：

CREATE TABLE students (no TEXT, name TEXT, class TEXT);

后面需求改了，要再新增一列记录“出生日期”，那我们得写 SQL 修改表结构：

ALTER TABLE students ADD COLUMN birth_date DATE;

如果改得多了，那这就有点烦了。况且在实际的项目里我们还得去编写数据库迁移的 SQL 并在线上运行迁移的 Migration 程序。聪明的你估计想到了我们可以用 MongoDB 来做呀！要新增一列直接在 JSON 中加一个字段就行，无所谓什么“表结构”的概念。表结构的概念没了，也就是 Schema 没了。英文中形容词 -less 后缀指 without ，这就有了 Schemaless 这个词。简单来说就是跟 MongoDB 一样不受表结构定义的条条框框，想加字段就加字段。

市面上的很多 Schemaless 特性的产品，其后端大多都使用 MongoDB 实现。但我前文中提到了 Hooopo 那篇文章，再加上我对 Postgres 的热爱，我决定另辟蹊径使用 Postgres 来实现。

我们平时写后端，需要先建表，定义表里有哪些字段，最后往表里插数据，对应到 Sayrud 使用 sl_tables sl_fields sl_records 三张表来存储。（以下列出的表结构精简了项目分组、gorm.Model 里包含的字段）

sl_tables： Schemaless 表

字段名类型（Go）说明 name string 表名，给程序看的 desc string 表备注名，前端给人看的 increment_index int64 记录当前自增 ID

sl_fields：Schemaless 字段

字段名类型（Go）说明 sl_table_id int64 属于哪张表 name string 字段名 label string 字段备注，前端给人看的 type string 字段类型，包括 int text bool float timestamp reference generated 等 options json.RawMessage 字段额外的属性，如默认值、约束条件等 position int 字段在表中的顺序

sl_records ：Schemaless 数据

字段名类型（Go）说明 sl_table_id int64 属于哪张表 data json.RawMessage JSON 存数据，Key 为字段的 ID，Value 为字段的值

然后神奇的事情就来了~ 我们按照 Hooopo 上述文章里所介绍的，为每一个 Schemaless 表当创建一张视图。以下是一个视图的 SQL 定义示例：

得益于 Postgres 对 JSON 类型的强大支持，我们可以从 sl_records 表中提取 JSON 字段的值作为内容，构建出一张“表”，效果如下：

当用户需要查询 Schemaless 表中的数据时，我们直接查询这张视图就行。对于 GORM 而言，这就跟查询一张普通的表一样！它都不会意识到这是由三张表拼凑提取出来的数据。更神奇的是，当你对着这张视图删除一条记录时，对应的 sl_records 原始表中的记录行也会被删除！Postgres 居然能把这俩关联起来。

具体到代码实现上，我们需要动态构造创建视图的 SQL 语句。而像字段、表名这类关键字在 SQL 语句中是不支持 SQL 预编译传入的，为了避免潜在的 SQL 注入风险，我使用了 github.com/tj/go-pg-escape 库来对字段名和表名进行转义。

正如 Hooopo 文章中所提到的，我将这个视图创建在了另一个 Postgres Schema 下，与默认的 public 进行区分，这也是一种简易的多租户实现了。

有坑注意！之前看到过这篇文章：《我们使用 Postgres 构建多租户 SaaS 服务时踩的坑》，文中提到使用 Postgres Schema 构建多租户时，如果每个 Postgres Schema 下都是同样的表结构，同时对所有 Postgres Schema 中的表结构变更会有性能问题。但上述场景在我们这里不存在，可以忽略该问题。引用列、生成列、字段约束的实现

当我们开发一个博客评论后端时，功能上需要支持回复他人的评论，即数据之间会存在引用关系，我们一般会在 comments 表中加一列 parent_comment_id 来存储父评论的 ID。对应到 Schemaless 的字段类型里，就需要有 reference 这样一种引用类型。

我的设计是，当字段类型为 reference 时，其字段值存储的是所引用记录的 UID，字段额外属性 options 里记录它实际展示的列，如下图所示：

在生成视图时，使用 Postgres json_build_object 来构造 reference 类型字段展示的 JSON。（再次感叹 Postgres 真是太强大了！）JSON 中的字段 u 为关联记录的唯一 UID，方便前端处理时找到这一条记录。v 为关联记录的展示字段，用于在前端 Table 表格上展示给用户看。

在实际的博客评论记录中，一条评论是不能将自己作为自己的父级评论的。即我们要对 reference 字段的引用值进行约束。我给 reference 字段加了一个 constraint 属性，用户可以输入 JavaScript 表达式来自定义约束行为。JavaScript 表达式返回 true / false ，来表示数据校验是否通过。背后的实现是接了 goja 这个 Go 的 JavaScript Engine 库。我将当前记录传入 JavaScript 运行时的 $this 变量中，将被关联的记录传入 $that 变量中，对于上述需求，我们只需要写 $this.uid !== $that.uid 就可以约束一条评论的父评论不能是它自身。

除了能引用他人的评论，在博客评论中还需要展示评论者的头像，通常的做法是使用评论者的电子邮箱去获取其 Gravatar 头像进行展示。即将评论者的电子邮箱地址全部转换为小写后，再做 MD5 哈希，拼接到 https://gravatar.com/avatar/ 或者其他镜像站地址之后。在 Postgres 里我们可以使用生成列（Generated Columns）来很轻松的做到这一点：

CREATE TABLE comments ( email TEXT, email_md5 TEXT GENERATED ALWAYS AS (md5(lower(email))) STORED );

但在 Schemaless Table 里呢？一开始我的想法是像上面做字段约束一样接 JavaScript Engine，在添加数据时跑一遍 JavaScript 表达式计算出生成列的值就行。但这存在一个问题：如果 JavaScript 表达式被修改了，那就得全表重新跑重新更新刷一遍数据，这是无法接受的。

最后还是选择让用户编写 Postgres SQL 语句片段，用作创建视图时生成列的定义，就像前面视图的 SQL 定义那张图里的：

md5(lower(sl_records.data ->> 'YXSQhESl'::text)) AS email_md5,

但既然用户能直接编写原生 SQL，SQL 还会被拼接进来创建视图，那我这不直接 SQL 注入被注烂了！就算用黑名单来过滤字符串特殊字符与关键字，保不齐后面出来个我不知道的方法给绕了。这里我使用了 auxten/postgresql-parser 这个库（Bytebase 也在用）来将用户输入的 SQL 语句解析成 AST，然后 Walk 遍历树上的每个节点，发现有 UNION JOIN 以及白名单外的函数调用就直接禁止提交。如果有人 bypass 了这个库的解析规则绕过了我的检验，那也就等同于他找到了 CockroachDB 的洞（这个 AST 解析库是从 CockroachDB 源码中拆出来的），那我直接拿去水个 CVE。😂

在具体代码实现中，由于 postgresql-parser 这个库只能解析完整的 SQL 语句，而用户输入的是 md5(lower(email)) 这样的 SQL 片段，我会在用户输入前拼一个 SELECT 再解析。而像 email 这种字段名，由于提供没有上下文，会被解析成 *tree.UnresolvedName 节点。我需要将这些 *tree.UnresolvedName 节点的值替换成 sl_records.data ->> 'YXSQhESl'::text 这样的 JSON 取值语句，直接修改节点的话出来的语句会是：

md5(lower("sl_records.data ->> 'YXSQhESl'::text"))

它将这整一块用双引号包裹，会被 Postgres 一整个当做列名去解析。我也没能找到在 Walk 里修改节点属性的方法，最后只能用一个比较丑陋的 HACK：替换节点内容时前后加上一段分隔符，在最后生成的 SQL 语句中找到这个分隔符，将分隔符和它前面的 " 引号去掉。（不由得想起 PHP 反序列化字符逃逸……）

最终实现大致如下，目前函数白名单仅放开了极少数的哈希函数和字符串处理函数。我还写了不少单元测试来测这个函数的安全性，希望没洞吧……

var whiteFunctions = []string{ "md5", "sha1", "sha256", "sha512", "concat", "substring", "substr", "length", "lower", "upper", } func SterilizeExpression(ctx context.Context, input string, allowFields map[string]string) (string, error) { w := &walk.AstWalker{ Fn: func(ctx interface{}, node interface{}) (stop bool) { switch v := node.(type) { ... case *tree.UnresolvedName: inputFields = append(inputFields, v.String()) // HACK: We add separator to get the field name. v.Parts[0] = "!<----!" + allowFields[v.Parts[0]] + "!---->!" ... return false }, } ... // Remove the separator. sql = strings.ReplaceAll(sql, `"!<----!`, "") sql = strings.ReplaceAll(sql, `!---->!"`, "") return sql, nil } API 接口设计

聊完了 Schemaless 特性的实现，我们再来看下自定义 API 接口的实现。这里直接上前端的操作页面，方便我来逐一介绍。

参考之前用过的 Pocketbase，我将接口分为 LIST VIEW CREATE UPDATE DELETE 五种类型。注意这与 HTTP 请求动词或数据库 DDL 操作并无关系，是偏业务上的定义。LIST 返回多条数据、VIEW 查询单条数据、CREATE 添加数据、UPDATE 修改数据、DELETE 删除数据。

就像我们写后端需要定义路由一样，每个 API 接口会有它请求方法和路径。以及会定义每个接口它从 GET Query 和 POST Body 处接收的字段。这些字段除了要有英文的参数名外，还需要有给人看的标签名，用于展示在数据校验的报错信息里。

然后我们会选择一张 Schemaless 数据表作为数据源（记得在 Dreamweaver 里叫“记录集”），把传入参数与数据表中的字段做映射，这样就完成了对数据的操作流程。而就整个请求而言，在请求开始前我们可能会想做一层限流或者验证码，请求结束后需要发送通知邮件或触发 WebHook，因此还需要支持配置路由中间件。

这里有两个值得拿来讨论的部分：数据源的筛选规则与前端拖拽配置路由中间件。

Filter DSL

我们的接口经常会有传入 ?id=1 来筛选指定一条数据的需求，确切的说是在 LIST VIEW UPDATE DELETE 四种类型下都会遇到。Schemaless 表的增删改查在代码上最终都是用 GORM 来构造 SQL 并执行的，“筛选”对应查询中的 WHERE ，对应 GORM 中的 Where 方法。用户在前端编辑好筛选条件后，需要能“翻译”成 GORM 的 Where 查询条件（一个 clause.Expression 类型的变量）。

我在这里设计了一种使用 JSON 格式来表示 Where 查询条件的方法。一个查询条件分为两种类型，一种是单操作符，仅接收一个或零个参数，如字面量 true、「非」操作 NOT xxxx ；另一种是常见的双操作符的，如「与」操作 xxx AND xxx、xxx LIKE xxx，它们接收两个参数。

我们定义一个 Operator 结构体，它记录了当前 WHERE 查询的操作类型 Type、单操作符的参数 Value 、双操作符的左值 Left 和右值 Right。注意左值和右值又可以是一个查询条件，构造 WHERE 条件的时候需要递归解析下去。

type Operator struct { Type OperatorType `json:"t"` Value json.RawMessage `json:"v,omitempty"` Left *Operator `json:"l,omitempty"` Right *Operator `json:"r,omitempty"` }

对应的操作符有以下这些，你可以看到上方的双操作符都是对应着 SQL 语句中的操作，下面单操作符中有两个特殊的操作 FIELD 和 LITERAL 。其中 FIELD 会被解析为 Schemaless 表中的字段，而 LITERAL 的内容将被放到 JavaScript Engine 中运行，请求的 Query 和 Body 会被解析后注入到 JavaScript Runtime 中。你可以通过一个值为 $request.query.id 的 LITERAL 操作拿到 id 这个 Query 参数的值。

const ( // Binary operators OperatorTypeAnd OperatorType = "AND" OperatorTypeOr OperatorType = "OR" OperatorTypeNotEqual OperatorType = "<>" OperatorTypeEqual OperatorType = "=" OperatorTypeGreater OperatorType = ">" OperatorTypeLess OperatorType = "<" OperatorTypeGreaterEqual OperatorType = ">=" OperatorTypeLessEqual OperatorType = "<=" OperatorTypeLike OperatorType = "LIKE" OperatorTypeIn OperatorType = "IN" // Unary operators OperatorTypeNot OperatorType = "NOT" OperatorTypeField OperatorType = "FIELD" OperatorTypeLiteral OperatorType = "LITERAL" )

形如上面前端图中的那段 Filter：

{ "l": { "t": "FIELD", "v": "raw" }, "r": { "t": "LITERAL", "v": "$request.query.raw" }, "t": "=" }

我们从最外层开始解析，就是将左值和右值做 = 操作，左值是数据表的 raw 字段，右值是 $request.query.raw 即 Query 参数 raw，所以上述这么一长串到最后的 Go 代码里形如：

query.Where("raw = ?", ctx.Query["raw"])

十分优雅，又十分安全。只是目前前端这个 Filter 还是给你个文本框自己填 Filter JSON，后续会做成纯图形化点点点的组件。（因为评估了下不太好写，所以先咕着🕊）

前端拖拽路由中间件

路由的中间件，我一开始就想把常用的功能封装成模块，然后前端直接拖拽着使用。其中对数据操作的主逻辑为 main 中间件，这个不可删除，其它的可以自由编排。

后端的实现很简单，相信看过任意 Go Web 框架源码的小伙伴都知道，又是些被说烂了的“洋葱模型”之类的东西。说穿了就是对整个中间件的 Slice for 遍历一下，判断发现其中的某个中间件返回响应（ctx.ResponseWriter().Written() 为 true ），就直接整个返回了，这里就不贴代码水字数了。

前端我使用了 vue3-smooth-dnd 这个库，我对比了 Vue 多个拖拽库，貌似只有这一家的动画最为丝滑，并且还带自动吸附。最后实现的效果我也是十分满意：

这个中间件模块的节点是我自己画的，背景设置为灰色，然后后面放一个细长的 div 作为流程的直线。鼠标放在中间件节点上时会有一个 popup 配置中间件的具体参数。这里是直接用的 TDesign 的 Popup 弹出层组件，里面再放一个 Card 卡片组件把弹出层空间撑开即可。

最后说几句

目前 Sayrud 已经初步开发完并部署到了线上，它已经完美支持了我想要一个静态博客评论后端的需求，后面只需要接上我写得前端就可以用了！（目前我开发的博客评论组件还没上，你现在看到的还是又丑又难用的 Waline）

你可能也注意到了编辑接口前端有一个「响应格式」的 Textarea，这块空着是因为我还没有找到一个能够简洁定义 JSON 数据结构的方式。所以目前接口的返回结构也是固定写死的，这块如果你有好的想法，欢迎告诉我。

这个项目的开发差不多花了一个月的时间，我平时下班后如果有空就会稍微写点。（注意是下班哦，我上班可是兢兢业业干满 8 小时+，恨不得住在鹅厂）由于开发时间不连贯，再加上有时回到家里比较困脑子不清醒，经常会出现后一天否定前一天的设计的情况。最后磕磕绊绊总算是完成了！由于是纯属为满足自己的需求，再加上我对它后端字段的校验还没统一梳理测试过，我目前并不会把这个站向公众开放。而像这种二开一下就能拿去恰烂钱的东西，我当然也更不会开源。

总的来说，Sayrud 也算是圆了自己当年 18 岁时的梦，将自己当时想得东西给做出来了。你可能注意到这个项目的名字也颇耐人寻味，Say - RUD 是 CRUD 的谐音，这其实也代表着我对这个项目未来的规划。嘻嘻😝

计算机安全实践，绝知此事要躬行

安全研究GoSSIP

7 months 3 weeks ago

纸上得来终觉浅，绝知此事要躬行

设计模式-单一职责原则

CodeAnalyzer Ultra

7 months 3 weeks ago

设计模式之单一职责原则

Rob T. Lee Chicago's Lurie Children's Hospital RANSOMWARE ATTACK

SANS Digital Forensics and Incident Response

7 months 3 weeks ago

SANS Digital Forensics and Incident Response

老板，安全不是成本部门！！！

天御攻防实验室

7 months 3 weeks ago

不卷就不会出事！

Breaking V8 Sandbox with Trusted Pointer Table

2019's blog

7 months 3 weeks ago

Recently, I have submitted my academic paper to NDSS 2025. Now it’s time to take a break. Following the deadline is the HITCON CTF 2024, so as the break, why not take a look? I really haven’t played CTF for quite a long time. :)

During the two days, I spent my efforts on the V8 Sandbox challenge. Actually I haven’t worked on V8 for a while. It seems that the sandbox feature now has already been enabled and incorporated into the bug bounty program. This is also a chance for me to catch up to the new progress on the V8 security. :)

0x00 Abstract

The challenge provides two exploitation primitives: writing a 64-bit value to the entry of the trusted pointer table and leaking the base address of PIE. We can use the first primitive to fake a WasmExportedFunctionData instance, allowing us to set rip to an 8-byte value in the trusted memory region. To control the content in this region, we leverage the immediate number arguments of the bytecode instruction AddSmi.ExtraWide. We can set the rip to point to the immediate numbers in the RWX page, whose address can be leaked by the second primitive, to execute our shellcode.

0x01 Trusted Pointer Table

According to the design documentation of V8 Sandbox, we can know that the trusted pointer table is used for storing the references to the trusted objects outside the sandbox. If a V8 object inside the sandbox needs to reference any trusted object, it stores a reference (i.e., index) to an entry of the trusted pointer table. The entry stores a pointer to the trusted object along with a tag value.

Some examples of the entries in the table are shown below. We can see that the high 16 bits are the tag, and the low 48 bits are the address to the V8 Object. In the example below, we job an entry containing pointer to a WasmExportedFunctionData instance. Another thing to note is that the first 0x2000 indices are read-only page, which seem to be unused.

gef➤ tel 0x007fff54010000 0x007fff54010000│+0x0000: 0x1b158f00040061 ("a"?) 0x007fff54010008│+0x0008: 0x001b158f00040181 0x007fff54010010│+0x0010: 0x001e158f0004021d 0x007fff54010018│+0x0018: 0x002b158f000402fd 0x007fff54010020│+0x0020: 0x002d158f00040321 0x007fff54010028│+0x0028: 0x001b158f00040369 0x007fff54010030│+0x0030: 0x001b158f000c0011 0x007fff54010038│+0x0038: 0x001b158f000403ad 0x007fff54010040│+0x0040: 0x80000000002009 ("\t "?) 0x007fff54010048│+0x0048: 0x8000000000200a ("\n "?) gef➤ job 0x158f00040321 # Entry `+0x0020:`, low 48 bits are address. 0x158f00040321: [WasmExportedFunctionData] - map: 0x336300001e15 <Map[56](WASM_EXPORTED_FUNCTION_DATA_TYPE)> - func_ref: 0x336300199ed9 <Other heap object (WASM_FUNC_REF_TYPE)> - internal: 0x158f000402fd <Other heap object (WASM_INTERNAL_FUNCTION_TYPE)> - wrapper_code: 0x33630003c1b9 <Code BUILTIN JSToWasmWrapper> - js_promise_flags: 10 - instance_data: 0x158f0004021d <Other heap object (WASM_TRUSTED_INSTANCE_DATA_TYPE)> - function_index: 0 - signature: 0x555557fe5ea0 - wrapper_budget: 1000 0x02 Faking Object

In this challenge, we can rewrite a table entry to an arbitrary value. After some trial, it seems that the only entry that may lead to the exploitation is the WasmExportedFunctionData shown above. I have also spent long time on other web-assembly object entries but none of them work. For example, I also tried WasmInternalFunction that is simpler and contains the web-assembly JIT code address directly (e.g., using WebAssembly.Table or calling the victim function in the web-assembly), but it seems that the JIT code address referenced by this entry is never used. The primary reason for this is that I didn’t know Sandbox.base can provide the base address of the sandboxed memory region, causing me to spend a lot of time in finding out how to fake an object using JIT immediate numbers on the RWX page whose address can be leaked by the provided PIE address, but this task is pretty hard especially the instance is as complex as WasmExportedFunctionData. Nonetheless, finally my teammate provides a PoC that informs me such leak.

The Sandbox instance provides many exploitation primitives, including arbitrary read and write in the sandboxed memory region. Therefore, we can easily fake a WasmExportedFunctionData instance in the sandboxed memory region, such as using a double array, and obtain the address of the faked object. We can simply copy the content from the WasmExportedFunctionData, except that we want the internal field to point to the correct address containing our controlled data. Besides, the field signature should also point to a buffer filled with zeros.

0x03 Controlling Trusted Region Content

The good news is that controlling internal field enables us to control rip because it reads a code address from the internal pointer and set it to rip, However, the bad news is that the field pointer is only an offset of the trusted memory region. In other words, we can only control the low 32 bits of the pointer, with the high 32 bits of the pointer being fixed to the base address of the trusted memory region.

Therefore, to control rip, we must somewhat be able to control some memory content in the trusted memory region, 8 bytes to be specific. However, it seems that most of the objects with content controllable are not in the trusted memory region but in the sandboxed memory region. After some investigation, we found that we may be able to control the arguments of AddSmi.ExtraWide to achieve such memory control. To be specific, the opcode contains two arguments: the first one is the immediate number we can control, and the second one is an index. I am not sure what the second argument is exactly, but the thing I notice is that it equals to the number of AddSmi appearing in the preceding part of the function. Therefore, if we insert x number of AddSmi instructions before AddSmi.ExtraWide, the value will be the x. Using this approach, we can control 6 bytes in the trusted region with two following bytes being zero. An example of AddSmi.ExtraWide is shown below. The bytes that can be used to set rip is in the bracket.

01 47 (c2 12 00 e0 55 55 00 00) AddSmi.ExtraWide [-536866110], [21845] 0x04 Executing Shellcode

Finally, we can set rip to our shellcode. We can use the similar approach I used previously to construct the shellcode inside the immediate number of a JIT JavaScript function. Fortunately directly copying function still gave the correct shellcode, since the offsets between the immediate numbers remain same after two years. The difference is that currently the JIT JavaScript function is stored in a RWX page whose address is not very random given the high 32 bits of base address of PIE, so we can set rip to the shellcode easily given such a leak.

It seems that in different platform, the offset of the shellcode with respect to the RWX page can be different. The problem does not appear in the original exploit that I used in the CTF, but appears in the simplified exploit that I prepared for this write-up. I am not sure why actually.

Finally, see the exploit here.

AI会变成控制世界的神？还是驱动一切的电？

表图

7 months 3 weeks ago

本文回顾了AI的发展历程，从符号处理到深度学习，分析了大语言模型的局限性，以及技术巨头在AI领域的不同战略布局。AI未来充满不确定性，但其增强人类能力的潜力巨大。

暑期限定｜隐私计算暑期夏令营，报名开启！

安全研究GoSSIP

7 months 3 weeks ago

升级组队学习模式，多个隐私计算专题可供自由选择！

灌水

青衣十三楼飞花堂

7 months 3 weeks ago

这篇毫无意义，纯灌水

刘洪善：大模型，重构安全产品体验

小迪随笔

7 months 3 weeks ago

给女儿的六一礼物：虎妞爸的演讲稿

威胁情报周报（7.8~7.14）

微步在线研究响应中心

7 months 3 weeks ago

一周情报速览

图解 JuiceFS CSI 工作流：K8s 创建带 PV 的 Pod 时，背后发生了什么（2024）

ARTHURCHIAO'S BLOG

7 months 3 weeks ago

JuiceFS 是一个架设在对象存储（S3、Ceph、OSS 等）之上的分布式文件系统，简单来说，

对象存储：只能通过 key/value 方式使用；
文件系统：日常看到的文件目录，能执行 ls/cat/find/truncate 等等之类的文件读写操作。

本文从 high-level 梳理了 JuiceFS CSI 方案中，当创建一个带 PV 的 pod 以及随后 pod 读写 PV 时， k8s/juicefs 组件在背后都做了什么，方便快速了解 K8s CSI 机制及 JuiceFS 的基本工作原理。

水平及维护精力所限，文中不免存在错误或过时之处，请酌情参考。 传播知识，尊重劳动，年满十八周岁，转载请注明出处。

1 背景知识
2 创建一个使用 PV 的 pod 时，k8s 和 juicefs 组件都做了什么
3 业务 pod 读写 juicefs volume 流程
4 总结
参考资料

1 背景知识

简单列几个基础知识，有背景的可直接跳过。

1.1 K8s CSI (Container Storage Interface )

The Container Storage Interface (CSI) is a standard for exposing arbitrary block and file storage systems to containerized workloads on Container Orchestration Systems (COs) like Kubernetes.

https://kubernetes-csi.github.io/docs/

CSI 是 K8s 支持的一种容器存储机制，扩展性非常好，各存储方案只要根据规范实现一些接口，就能集成到 k8s 中提供存储服务。

一般来说，存储方案需要在每个 node 上部署一个称为 “CSI plugin” 的服务， kubelet 在创建带 PV 容器的过程中会调用这个 plugin。但要注意，

K8s 的网络插件 CNI plugin 是一个可执行文件，放在 /opt/cni/bin/ 下面就行了，kubelet 在创建 pod 网络时直接运行 这个可执行文件；
K8s 的存储插件 CSI plugin 是一个服务（某种程度上，称为 agent 更好理解），kubelet 在初始化 PV 时通过 gRPC 调用这个 plugin；

1.2 FUSE (Filesystem in Userspace)

FUSE 是一种用户态文件系统，使得用户开发自己的文件系统非常方便。

懒得再重新画图，这里借 lxcfs（跟 juicefs 没关系，但也是一种 FUSE 文件系统）展示一下 FUSE 的基本工作原理：

Linux 容器底层工作机制：从 500 行 C 代码到生产级容器运行时（2023）

Fig. lxcfs/fuse workflow: how a read operation is handled [2]

JuiceFS 基于 FUSE 实现了一个用户态文件系统。

来自社区文档的一段内容，简单整理：

传统上，实现一个 FUSE 文件系统，需要基于 Linux libfuse 库，它提供两种 API：

high-level API：基于文件名和路径。

libfuse 内部做了 VFS 树的模拟，对外暴露基于路径的 API。

适合元数据本身是基于路径提供的 API 的系统，比如 HDFS 或者 S3 之类。如果元数据本身是基于 inode 的目录树，这种 inode → path →inode 的转换就会影响性能。
low-level API：基于 inode。内核的 VFS 跟 FUSE 库交互就使用 low-level API。

JuiceFS 的元数据基于 inode 组织，所以用 low-level API 实现（依赖 go-fuse 而非 libfuse），简单自然，性能好。

1.3 JuiceFS 三种工作模式

JuiceFS 有几种工作或部署方式：

进程挂载模式

JuiceFS client 运行在 CSI Node plugin 容器中，所有需要挂载的 JuiceFS PV 都会在这个容器内以进程模式挂载。
CSI 方式，又可分为两种：
1. mountpod 方式：在每个 node 上，CSI plugin 动态为每个被 local pod 使用的 PV 创建一个保姆 pod，
  - 这个 mount pod 是 per-PV 而非 per-business-pod 的，也就是说如果 node 上有多个业务 pod 在使用同一 PV，那只会有一个 mount pod，下图可以看出来，
    
    Fig. JuiceFS as K8s CSI solution: workflow when a business pod is created (JuiceFS mountpod mode).
  - mount pod 里面装了 juicefs client，替业务 pod 完成 juicefs 相关的读写操作；为了从字面上更容易理解，本文接下来把 mount pod 称为 dynamic client pod 或 client pod。
  - 这是 JuiceFS CSI 的默认工作方式；
  - FUSE 需要 mount pod 具有 privilege 权限；
  - client pod 重启会导致业务 pod 一段时间读写不可用，但 client pod 好了之后业务 pod 就能继续读写了。
2. . CSI sidecar 方式：给每个使用 juicefs PV 的业务 pod 创建一个 sidecar 容器。
  - per-pod 级别的 sidecar；
  - 注意 sidecar 就不是 JuiceFS plugin 创建的了，CSI Controller 会注册一个 Webhook 来监听容器变动，在创建 pod 时， webhook 给 pod yaml 自动注入一个 sidecar，跟 Istio 自动给 pod 注入 Envoy 容器类似；
  - Sidecar 重启需要重建业务 Pod 才能恢复。
  - 也依赖 FUSE，所以 sidecar 需要 privilege 权限。这会导致每个 sidecar 都能看到 node 上所有设备，有风险，所以不建议；

1.4 小结

有了以上基础，接下来看 k8s 中创建一个业务 pod 并且它要求挂载一个 PV 时，k8s 和 juicefs 组件都做了什么事情。

2 创建一个使用 PV 的 pod 时，k8s 和 juicefs 组件都做了什么

Fig. JuiceFS as K8s CSI solution: workflow when a business pod is created (JuiceFS mountpod mode).

Step 1：kubelet 启动，监听集群的 pod 资源变化

kubelet 作为 k8s 在每个 node 上的 agent，在启动后会监听整个 k8s 集群中的 pod 资源变化。具体来说就是，kube-apiserver 中有 pod create/update/delete events 发生时，kubelet 都会立即收到。

Step 2：kubelet 收到业务 pod 创建事件，开始创建 pod

kubelet 收到一条 pod create 事件后，首先判断这个 pod 是否在自己的管辖范围内（spec 中的 nodeName 是否是这台 node），是的话就开始创建这个 pod。

Step 2.1 创建业务 pod：初始化部分

kubelet.INFO 中有比较详细的日志：

10:05:57.410 Receiving a new pod "pod1(<pod1-id>)" 10:05:57.411 SyncLoop (ADD, "api"): "pod1(<pod1-id>)" 10:05:57.411 Needs to allocate 2 "nvidia.com/gpu" for pod "<pod1-id>" container "container1" 10:05:57.411 Needs to allocate 1 "our-corp.com/ip" for pod "<pod1-id>" container "container1" 10:05:57.413 Cgroup has some missing paths: [/sys/fs/cgroup/pids/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/systemd/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/cpuset/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/memory/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/hugetlb/kubepods/burstable/pod<pod1-id>] 10:05:57.413 Cgroup has some missing paths: [/sys/fs/cgroup/memory/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/systemd/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/hugetlb/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/pids/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/cpuset/kubepods/burstable/pod<pod1-id>] 10:05:57.413 Cgroup has some missing paths: [/sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/pids/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/cpuset/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/systemd/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/memory/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/pod<pod1-id> /sys/fs/cgroup/hugetlb/kubepods/burstable/pod<pod1-id>] 10:05:57.415 Using factory "raw" for container "/kubepods/burstable/pod<pod1-id>" 10:05:57.415 Added container: "/kubepods/burstable/pod<pod1-id>" (aliases: [], namespace: "") 10:05:57.419 Waiting for volumes to attach and mount for pod "pod1(<pod1-id>)" 10:05:57.432 SyncLoop (RECONCILE, "api"): "pod1(<pod1-id>)" 10:05:57.471 Added volume "meminfo" (volSpec="meminfo") for pod "<pod1-id>" to desired state. 10:05:57.471 Added volume "cpuinfo" (volSpec="cpuinfo") for pod "<pod1-id>" to desired state. 10:05:57.471 Added volume "stat" (volSpec="stat") for pod "<pod1-id>" to desired state. 10:05:57.480 Added volume "share-dir" (volSpec="pvc-6ee43741-29b1-4aa0-98d3-5413764d36b1") for pod "<pod1-id>" to desired state. 10:05:57.484 Added volume "data-dir" (volSpec="juicefs-volume1-pv") for pod "<pod1-id>" to desired state. ...

可以看出里面会依次处理 pod 所需的各种资源：

设备：例如 GPU；
IP 地址；
cgroup 资源隔离配置；
volumes。

本文主要关注 volume 资源。

Step 2.2 处理 pod 依赖的 volumes

上面日志可以看到，业务 pod 里面声明了一些需要挂载的 volumes。几种类型：

hostpath 类型：直接把 node 路径挂载到容器内；
lxcfs 类型：为了解决资源视图问题 [2]；
动态/静态 PV 类型

本文的 JuiceFS volume 就属于 PV 类型，继续看 kubelet 日志：

# kubelet.INFO 10:05:57.509 operationExecutor.VerifyControllerAttachedVolume started for volume "xxx" 10:05:57.611 Starting operationExecutor.MountVolume for volume "xxx" (UniqueName: "kubernetes.io/host-path/<pod1-id>-xxx") pod "pod1" (UID: "<pod1-id>") 10:05:57.611 operationExecutor.MountVolume started for volume "juicefs-volume1-pv" (UniqueName: "kubernetes.io/csi/csi.juicefs.com^juicefs-volume1-pv") pod "pod1" (UID: "<pod1-id>") 10:05:57.611 kubernetes.io/csi: mounter.GetPath generated [/var/lib/k8s/kubelet/pods/<pod1-id>/volumes/kubernetes.io~csi/juicefs-volume1-pv/mount] 10:05:57.611 kubernetes.io/csi: created path successfully [/var/lib/k8s/kubelet/pods/<pod1-id>/volumes/kubernetes.io~csi/juicefs-volume1-pv] 10:05:57.611 kubernetes.io/csi: saving volume data file [/var/lib/k8s/kubelet/pods/<pod1-id>/volumes/kubernetes.io~csi/juicefs-volume1-pv/vol_data.json] 10:05:57.611 kubernetes.io/csi: volume data file saved successfully [/var/lib/k8s/kubelet/pods/<pod1-id>/volumes/kubernetes.io~csi/juicefs-volume1-pv/vol_data.json] 10:05:57.613 MountVolume.MountDevice succeeded for volume "juicefs-volume1-pv" (UniqueName: "kubernetes.io/csi/csi.juicefs.com^juicefs-volume1-pv") pod "pod1" (UID: "<pod1-id>") device mount path "/var/lib/k8s/kubelet/plugins/kubernetes.io/csi/pv/juicefs-volume1-pv/globalmount" 10:05:57.616 kubernetes.io/csi: mounter.GetPath generated [/var/lib/k8s/kubelet/pods/<pod1-id>/volumes/kubernetes.io~csi/juicefs-volume1-pv/mount] 10:05:57.616 kubernetes.io/csi: Mounter.SetUpAt(/var/lib/k8s/kubelet/pods/<pod1-id>/volumes/kubernetes.io~csi/juicefs-volume1-pv/mount) 10:05:57.616 kubernetes.io/csi: created target path successfully [/var/lib/k8s/kubelet/pods/<pod1-id>/volumes/kubernetes.io~csi/juicefs-volume1-pv/mount] 10:05:57.618 kubernetes.io/csi: calling NodePublishVolume rpc [volid=juicefs-volume1-pv,target_path=/var/lib/k8s/kubelet/pods/<pod1-id>/volumes/kubernetes.io~csi/juicefs-volume1-pv/mount] 10:05:57.713 Starting operationExecutor.MountVolume for volume "juicefs-volume1-pv" (UniqueName: "kubernetes.io/csi/csi.juicefs.com^juicefs-volume1-pv") pod "pod1" (UID: "<pod1-id>") ... 10:05:59.506 kubernetes.io/csi: mounter.SetUp successfully requested NodePublish [/var/lib/k8s/kubelet/pods/<pod1-id>/volumes/kubernetes.io~csi/juicefs-volume1-pv/mount] 10:05:59.506 MountVolume.SetUp succeeded for volume "juicefs-volume1-pv" (UniqueName: "kubernetes.io/csi/csi.juicefs.com^juicefs-volume1-pv") pod "pod1" (UID: "<pod1-id>") 10:05:59.506 kubernetes.io/csi: mounter.GetPath generated [/var/lib/k8s/kubelet/pods/<pod1-id>/volumes/kubernetes.io~csi/juicefs-volume1-pv/mount]

对于每个 volume，依次执行，

operationExecutor.VerifyControllerAttachedVolume() 方法，做一些检查；
operationExecutor.MountVolume() 方法，将指定的 volume 挂载到容器目录；
对于 CSI 存储，还会调用到 CSI plugin 的 NodePublishVolume() 方法，初始化对应的 PV，JuiceFS 就是这种模式。

接下来 kubelet 会不断检测所有 volumes 是否都挂载好，没好的话不会进入下一步（创建 sandbox 容器）。

Step 3：kubelet --> CSI plugin（juicefs）：setup PV

下面进一步看一下 node CSI plugin 初始化 PV 挂载的逻辑。调用栈：

gRPC NodePublishVolume() kubelet ---------------------------> juicefs node plugin (also called "driver", etc) Step 4：JuiceFS CSI plugin 具体工作

看一下 JuiceFS CSI node plugin 的日志，这里直接在机器上看：

(node) $ docker logs --timestamps k8s_juicefs-plugin_juicefs-csi-node-xxx | grep juicefs-volume1 10:05:57.619 NodePublishVolume: volume_id is juicefs-volume1-pv 10:05:57.619 NodePublishVolume: creating dir /var/lib/k8s/kubelet/pods/<pod1-id>/volumes/kubernetes.io~csi/juicefs-volume1-pv/mount 10:05:57.620 ceFormat cmd: [/usr/local/bin/juicefs format --storage=OSS --bucket=xx --access-key=xx --secret-key=${secretkey} --token=${token} ${metaurl} juicefs-volume1] 10:05:57.874 Format output is juicefs <INFO>: Meta address: tikv://node1:2379,node2:2379,node3:2379/juicefs-volume1 10:05:57.874 cefs[1983] <INFO>: Data use oss://<bucket>/juicefs-volume1/ 10:05:57.875 Mount: mounting "tikv://node1:2379,node2:2379,node3:2379/juicefs-volume1" at "/jfs/juicefs-volume1-pv" with options [token=xx] 10:05:57.884 createOrAddRef: Need to create pod juicefs-node1-juicefs-volume1-pv. 10:05:57.891 createOrAddRed: GetMountPodPVC juicefs-volume1-pv, err: %!s(<nil>) 10:05:57.891 ceMount: mount tikv://node1:2379,node2:2379,node3:2379/juicefs-volume1 at /jfs/juicefs-volume1-pv 10:05:57.978 createOrUpdateSecret: juicefs-node1-juicefs-volume1-pv-secret, juicefs-system 10:05:59.500 waitUtilPodReady: Pod juicefs-node1-juicefs-volume1-pv is successful 10:05:59.500 NodePublishVolume: binding /jfs/juicefs-volume1-pv at /var/lib/k8s/kubelet/pods/<pod1-id>/volumes/kubernetes.io~csi/juicefs-volume1-pv/mount with options [] 10:05:59.505 NodePublishVolume: mounted juicefs-volume1-pv at /var/lib/k8s/kubelet/pods/<pod1-id>/volumes/kubernetes.io~csi/juicefs-volume1-pv/mount with options []

可以看到确实执行了 NodePublishVolume() 方法，这个方法是每个 CSI plugin 方案各自实现的，所以里面做什么事情就跟存储方案有很大关系。接下来具体看看 JuiceFS plugin 做的什么。

Step 4.1 给 pod PV 创建挂载路径，初始化 volume

默认配置下，每个 pod 会在 node 上对应一个存储路径，

(node) $ ll /var/lib/k8s/kubelet/pods/<pod-id> containers/ etc-hosts plugins/ volumes/

juicefs plugin 会在以上 volumes/ 目录内给 PV 创建一个对应的子目录和挂载点，

/var/lib/k8s/kubelet/pods/{pod1-id}/volumes/kubernetes.io~csi/juicefs-volume1-pv/mount。

然后用 juicefs 命令行工具格式化，

$ /usr/local/bin/juicefs format --storage=OSS --bucket=xx --access-key=xx --secret-key=${secretkey} --token=${token} ${metaurl} juicefs-volume1

例如，如果 JuiceFS 对接的是阿里云 OSS，上面就对应阿里云的 bucket 地址及访问秘钥。

Step 4.2 volume 挂载信息写入 MetaServer

此外，还会把这个挂载信息同步到 JuiceFS 的 MetaServer，这里用的是 TiKV，暂不展开：

Fig. JuiceFS as K8s CSI solution: workflow when a business pod is created (JuiceFS mountpod mode).

Step 4.3 JuiceFS plugin：如果 client pod 不存在，就创建一个

JuiceFS CSI plugin 判断这个 PV 在 node 上是否已经存在 client pod，如果不存在，就创建一个；存在就不用再创建了。

当 node 上最后一个使用某 PV 的业务 pod 销毁后，对应的 client pod 也会被 juicefs CSI plugin 自动删掉。

我们这个环境用的是 dynamic client pod 方式，因此会看到如下日志：

(node) $ docker logs --timestamps <csi plugin container> | grep ... 10:05:57.884 createOrAddRef: Need to create pod juicefs-node1-juicefs-volume1-pv. 10:05:57.891 createOrAddRed: GetMountPodPVC juicefs-volume1-pv, err: %!s(<nil>) 10:05:57.891 ceMount: mount tikv://node1:2379,node2:2379,node3:2379/juicefs-volume1 at /jfs/juicefs-volume1-pv 10:05:57.978 createOrUpdateSecret: juicefs-node1-juicefs-volume1-pv-secret, juicefs-system 10:05:59.500 waitUtilPodReady:

JuiceFS node plugin 会去 k8s 里面创建一个名为 juicefs-{node}-{volume}-pv 的 dynamic client pod。

Fig. JuiceFS as K8s CSI solution: workflow when a business pod is created (JuiceFS mountpod mode).

Step 5：kubelet 监听到 client pod 创建事件

这时候 kubelet 的业务 pod 还没创建好，“伺候”它的 juicefs client pod 又来“请求创建”了：

(node) $ grep juicefs-<node>-<volume>-pv /var/log/kubernetes/kubelet.INFO | grep "received " 10:05:58.288 SyncPod received new pod "juicefs-node1-volume1-pv_juicefs-system", will create a sandbox for it

所以接下来进入创建 juicefs dynamic client pod 的流程。

兵马未动，粮草先行。juicefs client pod 没有好，业务 pod 即使起来了也不能读写 juicefs volume。

Step 6：kubelet 创建 client pod

创建 client pod 的流程跟业务 pod 是类似的，但这个 pod 比较简单，我们省略细节，认为它直接就拉起来了。

查看这个 client pod 内运行的进程：

(node) $ dk top k8s_jfs-mount_juicefs-node1-juicefs-volume1-pv-xx /bin/mount.juicefs ${metaurl} /jfs/juicefs-volume1-pv -o enable-xattr,no-bgjob,allow_other,token=xxx,metrics=0.0.0.0:9567

/bin/mount.juicefs 其实只是个 alias，指向的就是 juicefs 可执行文件，

(pod) $ ls -ahl /bin/mount.juicefs /bin/mount.juicefs -> /usr/local/bin/juicefs Step 7：client pod 初始化、FUSE 挂载

查看这个 client pod 干了什么：

root@node:~ # dk top k8s_jfs-mount_juicefs-node1-juicefs-volume1-pv-xx <INFO>: Meta address: tikv://node1:2379,node2:2379,node3:2379/juicefs-volume1 <INFO>: Data use oss://<oss-bucket>/juicefs-volume1/ <INFO>: Disk cache (/var/jfsCache/<id>/): capacity (10240 MB), free ratio (10%), max pending pages (15) <INFO>: Create session 667 OK with version: admin-1.2.1+2022-12-22.34c7e973 <INFO>: listen on 0.0.0.0:9567 <INFO>: Mounting volume juicefs-volume1 at /jfs/juicefs-volume1-pv ... <INFO>: OK, juicefs-volume1 is ready at /jfs/juicefs-volume1-pv

初始化本地 volume 配置
与 MetaServer 交互
暴露 prometheus metrics
以 juicefs 自己的 mount 实现（前面看到的 /bin/mount.juicefs），将 volume 挂载到 /jfs/juicefs-volume1-pv，默认对应的是 /var/lib/juicefs/volume/juicefs-volume1-pv。

此时在 node 上就可以看到如下的挂载信息：

(node) $ cat /proc/mounts | grep JuiceFS:juicefs-volume1 JuiceFS:juicefs-volume1 /var/lib/juicefs/volume/juicefs-volume1-pv fuse.juicefs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other 0 0 JuiceFS:juicefs-volume1 /var/lib/k8s/kubelet/pods/<pod-id>/volumes/kubernetes.io~csi/juicefs-volume1-pv/mount fuse.juicefs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other 0 0

可以看到是 fuse.juicefs 方式的挂载。忘了 FUSE 基本工作原理的，再来借 lxcfs 快速回忆一下：

Fig. lxcfs/fuse workflow: how a read operation is handled [2]

这个 dynamic client pod 创建好之后， 业务 pod（此时还不存在）的读写操作都会进入 FUSE 模块，然后转发给用户态的 juicefs client 处理。juicefs client 针对不同的 object store 实现了对应的读写方法。

Step 8：kubelet 创建业务 pod：完成后续部分

至此，Pod 所依赖的 volumes 都处理好了，kubelet 就会打印一条日志：

# kubelet.INFO 10:06:06.119 All volumes are attached and mounted for pod "pod1(<pod1-id>)"

接下来就可以继续创建业务 pod 了：

# kubelet.INFO 10:06:06.119 No sandbox for pod "pod1(<pod1-id>)" can be found. Need to start a new one 10:06:06.119 Creating PodSandbox for pod "pod1(<pod1-id>)" 10:06:06.849 Created PodSandbox "885c3a" for pod "pod1(<pod1-id>)" ... 小结

更详细的 pod 创建过程，可以参考 [1]。

3 业务 pod 读写 juicefs volume 流程

juicefs dynamic client pod 先于业务 pod 创建，所以业务 pod 创建好之后，就可以直接读写 juicefs PV (volume) 了，

Fig. JuiceFS as K8s CSI solution: workflow when a business pod reads/writes (JuiceFS mountpod mode).

这个过程可以大致分为四步。

Step 1：pod 读写文件（R/W operations）

例如在 pod 内进入 volume 路径（e.g. cd /data/juicefs-pv-dir/），执行 ls、find 等等之类的操作。

Step 2：R/W 请求被 FUSE 模块 hook，转给 juicefs client 处理

直接贴两张官方的图略作说明 [3]，这两张图也透露了随后的 step 3 & 4 的一些信息：

读操作：

Fig. JuiceFS Internals: read operations.

写操作：

Fig. JuiceFS Internals: write operations.

Step 3：juicefs client pod 从 meta server 读取（文件或目录的）元数据

上面的图中已经透露了一些 JuiceFS 的元数据设计，例如 chunk、slice、block 等等。读写操作时，client 会与 MetaServer 有相关的元信息交互。

Step 4：juicefs client pod 从 object store 读写文件

这一步就是去 S3 之类的 object store 去读写文件了。

4 总结

以上就是使用 JuiceFS 作为 k8s CSI plugin 时，创建一个带 PV 的 pod 以及这个 pod 读写 PV 的流程。限于篇幅，省略了很多细节，感兴趣的可移步参考资料。

参考资料

源码解析：K8s 创建 pod 时，背后发生了什么（系列）（2021）
Linux 容器底层工作机制：从 500 行 C 代码到生产级容器运行时（2023）
官方文档：读写请求处理流程, juicefs.com
kubernetes-csi.github.io/docs/, K8s CSI documentation

Crooks Steal Phone, SMS Records for Nearly All AT&T Customers

Krebs on Security

7 months 3 weeks ago

AT&T Corp. disclosed today that a new data breach has exposed phone call and text message records for roughly 110 million people -- nearly all of its customers. AT&T said it delayed disclosing the incident in response to "national security and public safety concerns," noting that some of the records included data that could be used to determine where a call was made or text message sent. AT&T also acknowledged the customer records were exposed in a cloud database that was protected only by a username and password (no multi-factor authentication needed).

BrianKrebs