markdown 从repo历史记录中删除大尺寸文件

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了markdown 从repo历史记录中删除大尺寸文件相关的知识,希望对你有一定的参考价值。

## Before start with this stuff
This document lists some commands to remove specific elements from the repository history. If you are interested in remove elements with certain properties (min size, formated name, etc.) you can use BFG Repo-Cleaner app (not such powerfull but lot faster).
Related link: https://rtyley.github.io/bfg-repo-cleaner/

## Identify the big sized elements in the repo
List all the SHA identification number for all the files in the repo:

    $ git rev-list --objects --all | sort -k 2 > allfileshas.txt

Get a list of the files ordered by the size (from biggest to smallest):

    $ git gc && git verify-pack -v .git/objects/pack/pack-*.idx | egrep "^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$" | sort -k 3 -n -r > bigobjects.txt

The file generated in the previous step includes only the SHA values to identify each file. Now, we need to include the file name/path for each entry:

    $ for SHA in `cut -f 1 -d\  < bigobjects.txt`; do
    echo $(grep $SHA bigobjects.txt) $(grep $SHA allfileshas.txt) | awk '{print $1,$3,$7}' >> bigtosmall.txt
    done;

## Filter the repository history
* Download the repo to have a clean copy.
* Check the big files in the history.
* Remove the big files/folders with _git filter-branch_ command:


    $ git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch PATH-TO-FILE-TO-BE-REMOVED' --prune-empty --tag-name-filter cat -- --all
    Note: If you want to remove a folder add '-r' after 'git rm' so
    => '... git rm -r --cached ...

Where 'PATH-TO-FILE-TO-BE-REMOVED' is the path to the file or folder you want to remove.
* [OPTIONAL] Add your file with sensitive or big data to .gitignore to ensure that you don't accidentally commit it again.
* Double-check that you've removed everything you wanted to from your  repository's history, and that all of your branches are checked out.
* After all the changes are validated (ideally after some time) the garbage must be collected and erased with:


    $ git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
    $ git reflog expire --expire=now --all
    $ git gc --prune=now

## Upload changes to the remote
* Once the final state of the repository is the desired, push all the changes to the repo forcing the rebase:


    $ git push origin --force --all

* To update the tagged releases force-push the tags as well.


    $ git push origin --force --tags

* Tell to all the collaborators to rebase, NOT MERGE, any branches they created off of your old (tainted) repository history. One merge commit could reintroduce some or all of the tainted history that you just went to the trouble of purging.
* [OPTIONAL] If you want to prune the data in the server, go to the location of the repo in the server and call:;


    $ git reflog expire --expire=now --all
    $ git gc --aggressive --prune=now

**IMPORTANT**: If you need to upload the branches to other servers, and they are not present in the current cloned repo (that you have pruned), DO NOT PULL changes from the remote. Instead, only checkout the branches you need and thats it.

## Tell your partners to sync their local repos
They can not pull the changes (this could be catastrophic), but there is a way to synchronize the repos in a save way.
For those with extra commits:

    $ cd MY_LOCAL_GIT_REPO
    $ git fetch origin
    $ git rebase
    $ git reflog expire --expire=now --all
    $ git gc --aggressive --prune=now

For those with no extra data (Warning: This options reases all not pushed data):

    $ cd MY_LOCAL_GIT_REPO
    $ git fetch origin
    # WARNING: can destroy unpublished data!

    $ git reset --hard origin/master
    $ git reflog expire --expire=now --all
    $ git gc --aggressive --prune=now

---

## References:
https://help.github.com/articles/removing-sensitive-data-from-a-repository/
https://help.github.com/articles/removing-sensitive-data-from-a-repository/
http://naleid.com/blog/2012/01/17/finding-and-purging-big-files-from-git-history
http://naleid.com/blog/2012/01/17/finding-and-purging-big-files-from-git-history
http://blog.ostermiller.org/git-remove-from-history
http://blog.ostermiller.org/git-remove-from-history
https://git-scm.com/docs/git-filter-branch

以上是关于markdown 从repo历史记录中删除大尺寸文件的主要内容,如果未能解决你的问题,请参考以下文章

sh 用于从repo历史记录中删除所有PSD文件的Bash脚本。

从历史记录中删除文件(磁盘空间不足)

从 Git 历史记录中删除大文件

从存储库历史记录中删除大文件后,Git 存储库仍然很大

markdown Git删除我们的所有提交历史记录

markdown GitHub - 使用git命令删除提交历史记录