Git is one of the version control systems. What is a version control system?
It is a system with functions such as. Here, "change history" is
Refers to a series of information recorded in chronological order.
In addition to Git, there are other version control systems such as CVS and Subversion, but Git has the following features.
First, let's take a look at how git works as a content hangar. First, create an empty git repository
% mkdir repo1
% cd repo1
% git init
Initialized empty Git repository in repo1/.git/
Create a file called readme.txt, store its contents in the repository, overwrite the file with other contents, and store it in the repository.
% echo aaa > readme.txt
% git hash-object -w readme.txt
72943a16fb2c8f38f9dde202b7a70ccc19c52f34
% echo bbb > readme.txt
% git hash-object -w readme.txt
f761ec192d9f0dca3329044b96ebdb12839dbff6
% rm -f readme.txt
The contents of the stored file can be retrieved using the character string displayed when the hash-object command is executed as a key.
% git cat-file -p 72943a16fb2c8f38f9dde202b7a70ccc19c52f34 > readme.txt
% cat readme.txt
aaa
% git cat-file -p f761ec192d9f0dca3329044b96ebdb12839dbff6 > readme.txt
% cat readme.txt
bbb
What should be noted
That is. When the unfinished part becomes possible, it becomes possible to keep the history of changes, but before looking at it, "something I don't understand" is the SHA1 hash value of the file contents (+ α). Let's make sure there is.
hash-object.py
import hashlib
import sys
if len(sys.argv) != 2:
print("usage: %s file" % sys.argv[0])
sys.exit(-1)
try:
f = open(sys.argv[1])
except Exception:
print("open %s failed" % sys.argv[1])
sys.exit(-1)
data = f.read()
sha1 = hashlib.sha1("blob %d" % len(data) + "\0" + data).hexdigest()
print(sha1)
This python program
Calculates the SHA1 hash value of the concatenation of and outputs its hexadecimal representation. When I actually use it for the above two contents,
% echo aaa | python hash-object.py /dev/stdin
72943a16fb2c8f38f9dde202b7a70ccc19c52f34
% echo bbb | python hash-object.py /dev/stdin
f761ec192d9f0dca3329044b96ebdb12839dbff6
And you can see that the calculation result matches the key used to store the content earlier (the content is stored using the SHA1 hash value of the file content + α as the key). Now, where is the content stored in the repository?
% find .git/objects -type f
.git/objects/72/943a16fb2c8f38f9dde202b7a70ccc19c52f34
.git/objects/f7/61ec192d9f0dca3329044b96ebdb12839dbff6
The SHA1 hash value is the concatenation of the directory name and the file name. If you take the SHA1 value of these files themselves
% sha1sum `find .git/objects -type f`
cf6e4f80cfae36e20ae7eb1a90919ca48f59514b .git/objects/72/943a16fb2c8f38f9dde202b7a70ccc19c52f34
cdb05607e2e073287a81a908564d9d901ccdd687 .git/objects/f7/61ec192d9f0dca3329044b96ebdb12839dbff6
And the value is different. This is because the contents are compressed and stored, for example.
decompress_sha1.py
import hashlib
import sys
import zlib
if len(sys.argv) != 2:
print("usage: %s git_object_file" % sys.argv[0])
sys.exit(-1)
path = sys.argv[1]
try:
f = open(path)
except Exception:
print("open %s failed" % path)
sys.exit(-1)
data = zlib.decompress(f.read())
sha1 = hashlib.sha1(data).hexdigest()
print("%s: %s" % (path, sha1))
If you calculate the hash value after decompressing using the program
% for i in `find .git/objects -type f`; do python ../decompress_sha1.py $i; done
.git/objects/72/943a16fb2c8f38f9dde202b7a70ccc19c52f34: 72943a16fb2c8f38f9dde202b7a70ccc19c52f34
.git/objects/f7/61ec192d9f0dca3329044b96ebdb12839dbff6: f761ec192d9f0dca3329044b96ebdb12839dbff6
You can see that they match properly (the hash values match, so the contents are expected to match).
I saw how to store the contents of a file under .git / objects /. Git also stores filename and commit log message information in a file under .git / objects / called a Git object.
At the time of creating the repository, no objects are stored.
% mkdir repo2
% cd repo2
% git init
Initialized empty Git repository in repo2/.git/
% ls .git
HEAD config hooks/ objects/
branches/ description info/ refs/
% find .git/objects -type f
Let's add a file to the staging area with git add.
% echo aaa > readme.txt
% git add readme.txt
% find .git/objects -type f
.git/objects/72/943a16fb2c8f38f9dde202b7a70ccc19c52f34
% ls .git
HEAD config hooks/ info/ refs/
branches/ description index objects/
One object has been added and a file called index has been created. You can check the contents of the Git object with git cat-file.
% git cat-file -t 729
fatal: Not a valid object name 729
% git cat-file -t 7294
blob
% git cat-file -s 7294
4
% wc -c readme.txt
4 readme.txt
% git cat-file -p 7294
aaa
% cat readme.txt
aaa
As a way to use cat-file
-t
, it is a blob object that stores the contents of the file as we saw in the previous section.-s
and the contents with -p
, it matched the contents of the actual file.Next, let's write out the information contained in the index as an object.
% git write-tree
580c73c39691399d09ad01152ad0a691ce80bccf
% find .git/objects -type f
.git/objects/58/0c73c39691399d09ad01152ad0a691ce80bccf
.git/objects/72/943a16fb2c8f38f9dde202b7a70ccc19c52f34
% git cat-file -t 580c
tree
% git cat-file -p 580c
100644 blob 72943a16fb2c8f38f9dde202b7a70ccc19c52f34 readme.txt
At this time,
580c
has been storedI understand this.
Next, create a directory and a file under it and try git add.
% mkdir tmp
% echo bbb > tmp/bbb.txt
% git add tmp/bbb.txt
% find .git/objects -type f
.git/objects/58/0c73c39691399d09ad01152ad0a691ce80bccf
.git/objects/72/943a16fb2c8f38f9dde202b7a70ccc19c52f34
.git/objects/f7/61ec192d9f0dca3329044b96ebdb12839dbff6
% git cat-file -t f761
blob
% git cat-file -p f761
bbb
The newly added object is a blob object that contains the contents of bbb.txt. If you write index as an object again in this state,
% git write-tree
6434b2415497a42647800c7e828038a2fb6fbbaf
% find .git/objects -type f
.git/objects/58/0c73c39691399d09ad01152ad0a691ce80bccf
.git/objects/5c/40d98927de9cdb27df5b3a7bd4f7ee95dbfc85
.git/objects/64/34b2415497a42647800c7e828038a2fb6fbbaf
.git/objects/72/943a16fb2c8f38f9dde202b7a70ccc19c52f34
.git/objects/f7/61ec192d9f0dca3329044b96ebdb12839dbff6
% git cat-file -t 6434
tree
% git cat-file -p 6434
100644 blob 72943a16fb2c8f38f9dde202b7a70ccc19c52f34 readme.txt
040000 tree 5c40d98927de9cdb27df5b3a7bd4f7ee95dbfc85 tmp
% git cat-file -t 5c40
tree
% git cat-file -p 5c40
100644 blob f761ec192d9f0dca3329044b96ebdb12839dbff6 bbb.txt
here,
580c
we saw earlier remains intact.580c
or 6434
), you can identify the set of files at a certain point in time.You can see that.
I wrote a parser for tree because it was a big deal.
parse_tree.py
import hashlib
import sys
import zlib
if len(sys.argv) != 2:
print("usage: %s git_object_file" % sys.argv[0])
sys.exit(-1)
try:
f = open(sys.argv[1])
except Exception:
print("open %s failed" % sys.argv[1])
sys.exit(-1)
data = zlib.decompress(f.read())
sha1 = hashlib.sha1(data).hexdigest()
eoh = data.find("\0")
if eoh < 0:
print("no end of header")
sys.exit(-1)
header = data[:eoh]
t, n = header.split(" ")
if len(data) - eoh - 1 != int(n):
print("size mismatch %d,%d" % (len(data) - eoh - 1, int(n)))
sys.exit(-1)
if t != "tree":
print("not tree: %s" % t)
sys.exit(-1)
dsize = hashlib.sha1().digest_size
ptr = eoh + 1
while ptr < len(data):
eorh = data.find("\0", ptr)
if eorh < 0:
print("no end of reference header")
sys.exit(-1)
mode, name = data[ptr:eorh].split(" ")
sha1_ = "".join(map(lambda x: "%02x" % ord(x), data[eorh+1:eorh+1+dsize]))
print("%s (%6s) %s" % (sha1_, mode, name))
ptr = eorh + 1 + dsize
% python parse_tree.py .git/objects/64/34b2415497a42647800c7e828038a2fb6fbbaf
72943a16fb2c8f38f9dde202b7a70ccc19c52f34 (100644) readme.txt
5c40d98927de9cdb27df5b3a7bd4f7ee95dbfc85 ( 40000) tmp
The data structure of the tree object is in zlib-compressed data (similar to blobs).
The content part is
It is a repetition of.
Now that we've seen two types of objects, blob and tree, let's look at the commit object at the end.
Try creating a commit that references the tree object 580c
.
% git commit-tree -m "initial commit" 580c
7a5c786478f17fd96b385c725c95d10fa74e4576
% ls .git/objects/7a/5c786478f17fd96b385c725c95d10fa74e4576
.git/objects/7a/5c786478f17fd96b385c725c95d10fa74e4576
% git cat-file -t 7a5c
commit
% git cat-file -p 7a5c
tree 580c73c39691399d09ad01152ad0a691ce80bccf
author Yoichi Nakayama <[email protected]> 1447772602 +0900
committer Yoichi Nakayama <[email protected]> 1447772602 +0900
initial commit
Next, let's create a commit that references the tree object 6434
, with that commit object 7a5c
as the parent.
% git commit-tree -p 7a5c -m "second commit" 6434
88470d975c1875e2e03a46877c13dde9ed2fd1ea
% ls .git/objects/88/470d975c1875e2e03a46877c13dde9ed2fd1ea
.git/objects/88/470d975c1875e2e03a46877c13dde9ed2fd1ea
% git cat-file -t 8847
commit
% git cat-file -p 8847
tree 6434b2415497a42647800c7e828038a2fb6fbbaf
parent 7a5c786478f17fd96b385c725c95d10fa74e4576
author Yoichi Nakayama <[email protected]> 1447772754 +0900
committer Yoichi Nakayama <[email protected]> 1447772754 +0900
second commit
If you enter the hash value of this commit object in master referenced by HEAD, You can see the history with git log.
% cat .git/HEAD
ref: refs/heads/master
% echo 88470d975c1875e2e03a46877c13dde9ed2fd1ea > .git/refs/heads/master
% git log
commit 88470d975c1875e2e03a46877c13dde9ed2fd1ea
Author: Yoichi Nakayama <[email protected]>
Date: Wed Nov 18 00:05:54 2015 +0900
second commit
commit 7a5c786478f17fd96b385c725c95d10fa74e4576
Author: Yoichi Nakayama <[email protected]>
Date: Wed Nov 18 00:03:22 2015 +0900
initial commit
You can now see the history you normally see after git commit. You can also give git diff the hash value of the target commit object to see the diff.
% git diff 7a5c 8847
diff --git a/tmp/bbb.txt b/tmp/bbb.txt
new file mode 100644
index 0000000..f761ec1
--- /dev/null
+++ b/tmp/bbb.txt
@@ -0,0 +1 @@
+bbb
The data structure of the commit object is the same as the blob object except that it starts with "commit" and as content
Includes.
Recommended Posts