关于如何查找历史中的大型文件那些事

2018-09-22  本文已影响0人  Wavky
  1. 检查Repo体积是否有异常
du -sh
  1. 运行脚本,在Repo历史提交中搜索大型文件(按体积倒序排列前10位,单位为kB,SHA1为blob编码)

git_find_big.sh

#!/bin/bash
#set -x 

# Shows you the largest objects in your repo's pack file.
# Written for osx.
#
# @see http://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
# @author Antony Stubbs

# set the internal field spereator to line break, so that we can iterate easily over the verify-pack output
IFS=$'\n';

# list all objects including their size, sort by size, take top 10
objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head`

echo "All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file."

output="size,pack,SHA,location"
for y in $objects
do
    # extract the size in bytes
    size=$((`echo $y | cut -f 5 -d ' '`/1024))
    # extract the compressed size in bytes
    compressedSize=$((`echo $y | cut -f 6 -d ' '`/1024))
    # extract the SHA
    sha=`echo $y | cut -f 1 -d ' '`
    # find the objects location in the repository tree
    other=`git rev-list --all --objects | grep $sha`
    #lineBreak=`echo -e "\n"`
    output="${output}\n${size},${compressedSize},${other}"
done

echo -e $output | column -t -s ', '
  1. 根据给定的文件blob’s SHA1编码定位提交历史(SHA1以第一个参数形式传入调用)

git_find_blob_commit.sh

#!/bin/sh
obj_name="$1"
shift
git log "$@" --pretty=format:'%T %h %s' \
| while read tree commit subject ; do
    if git ls-tree -r $tree | grep -q "$obj_name" ; then
        echo $commit "$subject"
    fi
done

git - Which commit has this blob? - Stack Overflow

从所有历史中删除特定文件

使用filter-branch命令,参考 Gitリポジトリをメンテナンスして軽量化する

上一篇下一篇

猜你喜欢

热点阅读