[{"data":1,"prerenderedAt":162},["ShallowReactive",2],{"article-17":3},{"id":4,"title":5,"body":6,"create":151,"description":12,"extension":152,"labels":153,"locked":155,"meta":156,"navigation":157,"path":158,"seo":159,"stem":160,"update":151,"__hash__":161},"articles/article/17.md","用 Rust 开发 .git 目录泄漏恢复工具",{"type":7,"value":8,"toc":148},"minimark",[9,13,24,32,42,49,63,69,80,90,96,103,117,120,126,141],[10,11,12],"p",{},"上篇文章我们从原理上分析了，从网站暴露的 .git/ 目录中恢复整个 Git 仓库中的文件是可行的。当然，在实践中，我只使用了普通的仓库，并且这个泄漏的目录在网页是被 403 禁止访问的。",[10,14,15,16,23],{},"虽然说能够手动对这些文件进行提取，但是大一点的 Git 仓库文件非常多，手动操作非常耗费时间并且容易出错，既然有方法，而且是重复性有规律的操作，那么就可以用代码自动进行下载，由于最近痴迷 rust，所以这次的工具还是用 rust 来实现，并且之前在做其他工具的时候用到了 git2 库来直接操作 Git 仓库的，就不用自己去实现，开发起来也比较方便。但在开发这个工具的时候，却犯了难，git2 根本不适用于这个工具的实现，于是这次的实现，我选用了 ",[17,18,22],"a",{"href":19,"rel":20},"https://github.com/GitoxideLabs/gitoxide",[21],"nofollow","gitoxide"," 库来实现，这个库把 git 一些底层操作都拆解成了一个小的库，非常适合这次的开发。",[10,25,26,27,31],{},"通过上一篇文章的分析，我们首先要通过 Git 仓库的配置文件获得分支名称才能获取到分支的对象 ID，而",[28,29,30],"code",{},"gix-config","库就能够对配置文件进行解析，其中解析配置文件的代码会提供一系列事件结构，事件的定义如下：",[33,34,40],"pre",{"className":35,"code":37,"language":38,"meta":39},[36],"language-rust","pub enum Event\u003C'a> {\n    /// A comment with a comment tag and the comment itself. Note that the\n    /// comment itself may contain additional whitespace and comment markers\n    /// at the beginning, like `# comment` or `; comment`.\n    Comment(Comment\u003C'a>),\n    /// A section header containing the section name and a subsection, if it\n    /// exists. For instance, `remote \"origin\"` is parsed to `remote` as section\n    /// name and `origin` as subsection name.\n    SectionHeader(section::Header\u003C'a>),\n    /// A name to a value in a section, like `url` in `remote.origin.url`.\n    SectionValueName(section::ValueName\u003C'a>),\n    /// A completed value. This may be any single-line string, including the empty string\n    /// if an implicit boolean value is used.\n    /// Note that these values may contain spaces and any special character. This value is\n    /// also unprocessed, so it may contain double quotes that should be\n    /// [normalized][crate::value::normalize()] before interpretation.\n    Value(Cow\u003C'a, BStr>),\n    /// Represents any token used to signify a newline character. On Unix\n    /// platforms, this is typically just `\\n`, but can be any valid newline\n    /// *sequence*. Multiple newlines (such as `\\n\\n`) will be merged as a single\n    /// newline event containing a string of multiple newline characters.\n    Newline(Cow\u003C'a, BStr>),\n    /// Any value that isn't completed. This occurs when the value is continued\n    /// onto the next line by ending it with a backslash.\n    /// A [`Newline`][Self::Newline] event is guaranteed after, followed by\n    /// either a ValueDone, a Whitespace, or another ValueNotDone.\n    ValueNotDone(Cow\u003C'a, BStr>),\n    /// The last line of a value which was continued onto another line.\n    /// With this it's possible to obtain the complete value by concatenating\n    /// the prior [`ValueNotDone`][Self::ValueNotDone] events.\n    ValueDone(Cow\u003C'a, BStr>),\n    /// A continuous section of insignificant whitespace.\n    ///\n    /// Note that values with internal whitespace will not be separated by this event,\n    /// hence interior whitespace there is always part of the value.\n    Whitespace(Cow\u003C'a, BStr>),\n    /// This event is emitted when the parser counters a valid `=` character\n    /// separating the key and value.\n    /// This event is necessary as it eliminates the ambiguity for whitespace\n    /// events between a key and value event.\n    KeyValueSeparator,\n}\n","rust","",[28,41,37],{"__ignoreMap":39},[10,43,44,45,48],{},"由于分支名是配置节后的小节名称，所以这些事件中，只需要关注",[28,46,47],{},"SectionHeader","事件。",[10,50,51,52,55,56,58,59,62],{},"接着，这个工具需要实现对 commmit 对象解析的功能，",[28,53,54],{},"gix-object","库能对仓库中的对象文件进行解析，比如这里的 commit 对象，在",[28,57,54],{},"中就能够被解析为",[28,60,61],{},"CommitRef","结构，其结构定义如下：",[33,64,67],{"className":65,"code":66,"language":38,"meta":39},[36],"pub struct CommitRef\u003C'a> {\n    /// HEX hash of tree object we point to. Usually 40 bytes long.\n    ///\n    /// Use [`tree()`](CommitRef::tree()) to obtain a decoded version of it.\n    #[cfg_attr(feature = \"serde\", serde(borrow))]\n    pub tree: &'a BStr,\n    /// HEX hash of each parent commit. Empty for first commit in repository.\n    pub parents: SmallVec\u003C[&'a BStr; 1]>,\n    /// Who wrote this commit. Name and email might contain whitespace and are not trimmed to ensure round-tripping.\n    ///\n    /// Use the [`author()`](CommitRef::author()) method to received a trimmed version of it.\n    pub author: gix_actor::SignatureRef\u003C'a>,\n    /// Who committed this commit. Name and email might contain whitespace and are not trimmed to ensure round-tripping.\n    ///\n    /// Use the [`committer()`](CommitRef::committer()) method to received a trimmed version of it.\n    ///\n    /// This may be different from the `author` in case the author couldn't write to the repository themselves and\n    /// is commonly encountered with contributed commits.\n    pub committer: gix_actor::SignatureRef\u003C'a>,\n    /// The name of the message encoding, otherwise [UTF-8 should be assumed](https://github.com/git/git/blob/e67fbf927dfdf13d0b21dc6ea15dc3c7ef448ea0/commit.c#L1493:L1493).\n    pub encoding: Option\u003C&'a BStr>,\n    /// The commit message documenting the change.\n    pub message: &'a BStr,\n    /// Extra header fields, in order of them being encountered, made accessible with the iterator returned by [`extra_headers()`](CommitRef::extra_headers()).\n    pub extra_headers: Vec\u003C(&'a BStr, Cow\u003C'a, BStr>)>,\n}\n",[28,68,66],{"__ignoreMap":39},[10,70,71,72,75,76,79],{},"在这里面，对我们有用的字段就是",[28,73,74],{},"tree","和",[28,77,78],{},"parents","，有了这个提交的树对象ID，我们就能够解析出提交的目录结构以及其中包含的文件，有了提交的父提交，我们就能够通过父提交对象来找到仓库中所有的提交对象 ID 。",[10,81,82,83,85,86,89],{},"由于 git 中文件对象也就是 blob 对象，是直接使用 zlib 压缩的文件，所以我们在处理 blob 对象的时候，只需要将对象解压到正确的目录即可，而树对象，同样需要",[28,84,54],{},"进行处理，其能够将树对象解析为",[28,87,88],{},"TreeRef","结构，该结构定义如下：",[33,91,94],{"className":92,"code":93,"language":38,"meta":39},[36],"pub struct TreeRef\u003C'a> {\n    /// The directories and files contained in this tree.\n    ///\n    /// Beware that the sort order isn't *quite* by name, so one may bisect only with a [`tree::EntryRef`] to handle ordering correctly.\n    #[cfg_attr(feature = \"serde\", serde(borrow))]\n    pub entries: Vec\u003Ctree::EntryRef\u003C'a>>,\n}\n\npub struct EntryRef\u003C'a> {\n    /// The kind of object to which `oid` is pointing.\n    pub mode: tree::EntryMode,\n    /// The name of the file in the parent tree.\n    pub filename: &'a BStr,\n    /// The id of the object representing the entry.\n    // TODO: figure out how these should be called. id or oid? It's inconsistent around the codebase.\n    //       Answer: make it 'id', as in `git2`\n    #[cfg_attr(feature = \"serde\", serde(borrow))]\n    pub oid: &'a gix_hash::oid,\n}\n\n",[28,95,93],{"__ignoreMap":39},[10,97,98,99,102],{},"其中，",[28,100,101],{},"mode","字段能够体现当前的实体是目录还是文件，oid 就是对象的 ID。",[10,104,105,106,109,110,112,113,116],{},"最后别忘了",[28,107,108],{},"index","文件，",[28,111,108],{},"文件的解析需要使用",[28,114,115],{},"gix-index","库来实现。",[10,118,119],{},"基本功能分析完毕，下面就是主要的代码逻辑部分：",[33,121,124],{"className":122,"code":123,"language":38,"meta":39},[36],"// 获得仓库的所有分支名称\nfn get_branches(){\n    获得 config 文件的二进制流\n    使用 gix_config::parse::from_bytes 解析二进制流\n    将得到的分支名称格式化为 'refs/heads/\u003Cbranches>' 路径\n    返回格式化后的字符串数组\n}\n// 解析 commit 对象\nfn dump_commit(commit_sha1){\n    获得 commit 对象二进制流\n    使用 CommitRef::from_bytes 解析二进制流\n    处理解析后的 CommitRef 结构\n    获取父提交对象 ID 数组\n    获取树对象 ID\n    返回父亲提交对象 ID 数组以及树对象 ID\n}\n// 解析树对象\nfn dump_tree(tree_sha1){\n    获得树对象二进制流\n    使用 TreeRef::from_bytes 解析二进制流\n    处理解析后的 TreeRef 结构，获得树对象 ID 以及二进制对象 ID\n    返回树对象 ID 以及二进制对象 ID\n}\n",[28,125,123],{"__ignoreMap":39},[10,127,128,129,134,135,140],{},"具体实现可以查看",[17,130,133],{"href":131,"rel":132},"https://github.com/ttdly/my_safe_tools/blob/main/core/crates/remote-git-dump/src/lib.rs",[21],"这个","代码文件，该代码仅实现了一层的对象处理功能，并返回了详细的格式化信息，其目的是使得调用方能自定义自己的解析流程，我将自己的解析流程写在了",[17,136,139],{"href":137,"rel":138},"https://github.com/ttdly/my_safe_tools/blob/main/core/crates/remote-git-dump/src/example.rs#L76",[21],"这个函数","中，仅供参考。",[10,142,143,144,147],{},"虽然说 ",[17,145,22],{"href":19,"rel":146},[21]," 库提供的子库都是比较底层的实现，刚开始的时候看它的文档找不到怎么才能实现自己的功能，于是使用 AI 查找，结果也不是很满意。后续翻阅代码仓库的时候发现这个仓库包含大量的测试用例，通过测试用例才最终找到了怎么实现自己想要的功能。",{"title":39,"searchDepth":149,"depth":149,"links":150},2,[],"2025-08-15T05:24:53.000Z","md",[38,154],"安全",false,{},true,"/article/17",{"title":5,"description":12},"article/17","w7sgDhSWDgoW7yL9RXjvNcF4GLHMd3EtHulLBPropHo",1755235549196]