Skip to content

index reading #294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 57 commits into from
Jan 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
eab421c
first research on index reading (#293)
Byron Jan 8, 2022
3040857
notes on how test indices have been created (#293)
Byron Jan 8, 2022
bce67d8
preempt the eventual need for a worktree implementation (#293)
Byron Jan 8, 2022
b3ee7c6
update changelog (#293)
Byron Jan 8, 2022
ddb1bf4
Release git-worktree v0.0.0
Byron Jan 8, 2022
aa60fdf
base setup for index testing (#293)
Byron Jan 8, 2022
b481f13
The realization that FileBuffer really shouldn't be used anymore (#293)
Byron Jan 9, 2022
4dec3ea
git-ref uses memmap2 (#293)
Byron Jan 9, 2022
0c946f5
use memmap2 in git-commitgraph (#293)
Byron Jan 9, 2022
fbfea28
git-index uses memmap2 (#293)
Byron Jan 9, 2022
d9011c7
git-pack uses `memmap2` instead of `filebuffer` (#293)
Byron Jan 9, 2022
5a68d2f
thanks clippy
Byron Jan 9, 2022
494ed46
refactor (#293)
Byron Jan 9, 2022
826ca0c
first stab at basic index file parsing (#293)
Byron Jan 9, 2022
c526811
remove byteorder dependency from git-commitgraph (#293)
Byron Jan 9, 2022
4122306
remove byteorder from git-pack (#293)
Byron Jan 9, 2022
5c731f8
parse index header (#293)
Byron Jan 9, 2022
068c716
first step towards reading the EOIE extension (#293)
Byron Jan 9, 2022
9b28b18
refactor (#293)
Byron Jan 9, 2022
79ca582
right before implementing a traversal over extension chunks (#293)
Byron Jan 9, 2022
591511a
thanks clippy
Byron Jan 9, 2022
9ffd523
Another big step, even though EOIE checksum is still bugged (#293)
Byron Jan 9, 2022
cc33752
Fix counting issue, checksum matches now (#293)
Byron Jan 9, 2022
9fdd34b
refactor (#293)
Byron Jan 9, 2022
8acd65b
Write down some idea for a db system I want
Byron Jan 10, 2022
d4b3a07
refactor (#293)
Byron Jan 10, 2022
c17240d
the first actual assetion (#293)
Byron Jan 10, 2022
07e8fb2
refactor (#293)
Byron Jan 10, 2022
49fcb6f
Get closer to implementing a simple TREE extension decoding (#293)
Byron Jan 10, 2022
a2ea498
parse TREE chunk (#293)
Byron Jan 10, 2022
e7e0679
Prepare a more complex test for tree parsing, requires entry parsing …
Byron Jan 10, 2022
5526020
thanks clippy
Byron Jan 10, 2022
620d2e6
Extensions are optional, and so is their iteration (#293)
Byron Jan 10, 2022
53e2d75
Most of the entry decoding, name is still missing (#293)
Byron Jan 10, 2022
01036ad
a step towards pasing V2 paths (#293)
Byron Jan 10, 2022
0a03f19
All code needed to load extensions… (#293)
Byron Jan 11, 2022
da556b0
Use correct post-header slice when parsing entries (#293)
Byron Jan 11, 2022
77a062c
Now with counting of consumed bytes in extensions (#293)
Byron Jan 11, 2022
f865ef6
The first test to validate an entry (#293)
Byron Jan 11, 2022
f477032
thanks clippy
Byron Jan 11, 2022
273853f
more thorough tests for more complex repo with more entries (#293)
Byron Jan 11, 2022
b8400ed
feat: decoding of variable int numbers (#293).
Byron Jan 11, 2022
52e3c6f
Adapt to changes in git-features: use var-int decoding from there (#293)
Byron Jan 11, 2022
7558844
Assure we are right about the leb64 buffer needed for a 64 bit int (#…
Byron Jan 11, 2022
06640e3
parse V4 delta-paths (#293)
Byron Jan 11, 2022
6f04f8b
refactor (#293)
Byron Jan 11, 2022
35bdee4
Basic IEOT parsing (#293)
Byron Jan 11, 2022
99d7224
cleanup (#293)
Byron Jan 11, 2022
30de988
prepare decode options for better control of threads (#293)
Byron Jan 11, 2022
a22cb0f
single and multi-threaded index tests (#293)
Byron Jan 11, 2022
ca095ed
feat: Make a scope-like abstraction available (#293)
Byron Jan 12, 2022
6fea17d
Frame for using the new 'scoped threads' feature in git-features (#293)
Byron Jan 12, 2022
de84a3a
parallel loading of entries right before reducing them (#293)
Byron Jan 12, 2022
cb7e4e7
feat: Add InOrderIter to 'parallel' module (#293)
Byron Jan 12, 2022
7721b5f
Use InOrderIter from git-features (#293)
Byron Jan 12, 2022
e3977fe
fix build (#293)
Byron Jan 12, 2022
995994a
Aggregation for index entries loaded in parallel (#293)
Byron Jan 12, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 37 additions & 5 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ members = [
"git-diff",
"git-traverse",
"git-index",
"git-worktree",
"git-packetline",
"git-transport",
"git-protocol",
Expand Down
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ check: ## Build all code in suitable configurations
&& cargo check
cd git-object && cargo check --all-features \
&& cargo check --features verbose-object-parsing-errors
cd git-index && cargo check --features serde1
cd git-actor && cargo check --features serde1
cd git-pack && cargo check --features serde1 \
&& cargo check --features pack-cache-lru-static \
Expand Down Expand Up @@ -139,6 +140,8 @@ unit-tests: ## run all unit tests
cd git-object && cargo test && cargo test --features verbose-object-parsing-errors
cd git-pack && cargo test --features internal-testing-to-avoid-being-run-by-cargo-test-all \
&& cargo test --features "internal-testing-git-features-parallel"
cd git-index && cargo test --features internal-testing-to-avoid-being-run-by-cargo-test-all \
&& cargo test --features "internal-testing-git-features-parallel"
cd git-packetline && cargo test \
&& cargo test --features blocking-io,maybe-async/is_sync --test blocking-packetline \
&& cargo test --features "async-io" --test async-packetline
Expand Down
23 changes: 15 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,9 +83,9 @@ Follow linked crate name for detailed status. Please note that all crates follow
* [git-repository](https://github.com/Byron/gitoxide/blob/main/crate-status.md#git-repository)
* `gitoxide-core`
* **very early**
* **idea**
* [git-index](https://github.com/Byron/gitoxide/blob/main/crate-status.md#git-index)
* git-status
* **idea**
* [git-worktree](https://github.com/Byron/gitoxide/blob/main/crate-status.md#git-worktree)
* [git-tui](https://github.com/Byron/gitoxide/blob/main/crate-status.md#git-tui)
* [git-bundle](https://github.com/Byron/gitoxide/blob/main/crate-status.md#git-bundle)

Expand Down Expand Up @@ -240,14 +240,9 @@ Provide a CLI to for the most basic user journey:

* [ ] `gix tool open-remote` open the URL of the remote, possibly after applying known transformations to go from `ssh` to `https`.
* [ ] Open up SQL for git using [sqlite virtual tables](https://github.com/rusqlite/rusqlite/blob/master/tests/vtab.rs). Check out gitqlite
as well. What would an MVP look like? Maybe even something that could ship with gitoxide.
as well. What would an MVP look like? Maybe even something that could ship with gitoxide. See [this go implementation as example](https://github.com/filhodanuvem/gitql).
* [ ] A truly awesome history rewriter which makes it easy to understand what happened while avoiding all pitfalls. Think BFG, but more awesome, if that's possible.
* [ ] `git-tui` should learn a lot from [fossil-scm] regarding the presentation of data. Maybe [this](https://github.com/Lutetium-Vanadium/requestty/) can be used for prompts. Probably [magit] has a lot to offer, too.
* [ ] Can markdown be used as database so issue-trackers along with meta-data could just be markdown files which are mostly human-editable? Could user interfaces
be meta-data aware and just hide the meta-data chunks which are now editable in the GUI itself? Doing this would make conflicts easier to resolve than an `sqlite`
database.
* ~~A git-backend for `sqlite` which should allow embedding sqlite databases into git repositories, which in turn can be used for bug-trackers, wikis or other
features, making for a fully distributed github like experience, maybe.~~

### Ideas for Spin-Offs

Expand All @@ -259,6 +254,18 @@ Provide a CLI to for the most basic user journey:
* [ ] A [syncthing] like client/server application. This is to demonstrate how lower-level crates can be combined into custom applications that use
only part of git's technology to achieve their very own thing. Watch out for big file support, multi-device cross-syncing, the possibility for
untrusted destinations using full-encryption, case-insensitive and sensitive filesystems, and extended file attributes as well as ignore files.
* An event-based database that uses commit messages to store deltas, while occasionally aggregating the actual state in a tree. Of course it's distributed by nature, allowing
people to work offline.
- It's abstracted to completely hide the actual data model behind it, allowing for all kinds of things to be implemented on top.
- Commits probably need a nanosecond component for the timestamp, which can be added via custom header field.
- having recording all changes allows for perfect merging, both on the client or on the server, while keeping a natural audit log which makes it useful for mission critical
databases in business.
* **Applications**
- Can markdown be used as database so issue-trackers along with meta-data could just be markdown files which are mostly human-editable? Could user interfaces
be meta-data aware and just hide the meta-data chunks which are now editable in the GUI itself? Doing this would make conflicts easier to resolve than an `sqlite`
database.
- A time tracker - simple data, very likely naturally conflict free, and interesting to see it in terms of teams or companies using it with maybe GitHub as Backing for authentication.
- How about supporting multiple different trackers, as in different remotes?

[syncthing]: https://github.com/syncthing/syncthing
[fossil-scm]: https://www.fossil-scm.org
Expand Down
31 changes: 28 additions & 3 deletions crate-status.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,10 +206,35 @@ Check out the [performance discussion][git-traverse-performance] as well.
* [x] API documentation
* [ ] Some examples

### git-worktree
* handle the working tree/checkout
* manage multiple worktrees
* deal with exclude specifications, like .gitignore and other exclude files.

### git-index
* read and write a git-index file
* non-sparse
* sparse (search for [`sparse index` here](https://github.blog/2021-08-16-highlights-from-git-2-33/))
* read
* [ ] V2
* [ ] V3
* [ ] V4
* optional threading
* [ ] concurrent loading of index extensions
* [ ] threaded cache entry reading
* `stat` update
* [ ] optional threaded `stat` based on thread_cost (aka preload)
* [ ] handling of `.gitignore` and system file exclude configuration
* [ ] handle potential races
* extensions
* [ ] TREE for speeding up tree generation
* [ ] REUC resolving undo
* [ ] UNTR untracked cache
* [ ] FSMN file system monitor cache V1 and V2
* [ ] EOIE end of index entry
* [ ] IEOT index entry offset table
* [ ] link base indices to take information from, split index
* [ ] sdir sparse directory entries
* additinoal support
* [ ] non-sparse
* [ ] sparse (search for [`sparse index` here](https://github.blog/2021-08-16-highlights-from-git-2-33/))
* add and remove entries
* [x] API documentation
* [ ] Some examples
Expand Down
3 changes: 2 additions & 1 deletion etc/check-package-size.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,13 @@ echo "in root: gitoxide CLI"
#indent cargo diet -n --package-size-limit 25KB - fails right now because of dotted profile.dev.package
(enter cargo-smart-release && indent cargo diet -n --package-size-limit 85KB)
(enter git-actor && indent cargo diet -n --package-size-limit 5KB)
(enter git-index && indent cargo diet -n --package-size-limit 15KB)
(enter git-tempfile && indent cargo diet -n --package-size-limit 25KB)
(enter git-lock && indent cargo diet -n --package-size-limit 15KB)
(enter git-config && indent cargo diet -n --package-size-limit 65KB)
(enter git-hash && indent cargo diet -n --package-size-limit 10KB)
(enter git-chunk && indent cargo diet -n --package-size-limit 10KB)
(enter git-features && indent cargo diet -n --package-size-limit 35KB)
(enter git-features && indent cargo diet -n --package-size-limit 40KB)
(enter git-ref && indent cargo diet -n --package-size-limit 50KB)
(enter git-diff && indent cargo diet -n --package-size-limit 10KB)
(enter git-traverse && indent cargo diet -n --package-size-limit 10KB)
Expand Down
3 changes: 1 addition & 2 deletions git-commitgraph/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,7 @@ git-hash = { version ="^0.8.0", path = "../git-hash" }
git-chunk = { version ="^0.2.0", path = "../git-chunk" }

bstr = { version = "0.2.13", default-features = false, features = ["std"] }
byteorder = "1.2.3"
filebuffer = "0.4.0"
memmap2 = "0.5.0"
serde = { version = "1.0.114", optional = true, default-features = false, features = ["derive"] }
thiserror = "1.0.26"

Expand Down
18 changes: 11 additions & 7 deletions git-commitgraph/src/file/commit.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ use std::{
slice::Chunks,
};

use byteorder::{BigEndian, ByteOrder};

use crate::{
file::{self, File, EXTENDED_EDGES_MASK, LAST_EXTENDED_EDGE_MASK, NO_PARENT},
graph,
Expand Down Expand Up @@ -38,17 +36,23 @@ pub struct Commit<'a> {
root_tree_id: &'a git_hash::oid,
}

#[inline]
fn read_u32(b: &[u8]) -> u32 {
u32::from_be_bytes(b.try_into().unwrap())
}

impl<'a> Commit<'a> {
pub(crate) fn new(file: &'a File, pos: file::Position) -> Self {
let bytes = file.commit_data_bytes(pos);
Commit {
file,
pos,
root_tree_id: git_hash::oid::from_bytes_unchecked(&bytes[..file.hash_len]),
parent1: ParentEdge::from_raw(BigEndian::read_u32(&bytes[file.hash_len..][..4])),
parent2: ParentEdge::from_raw(BigEndian::read_u32(&bytes[file.hash_len + 4..][..4])),
generation: BigEndian::read_u32(&bytes[file.hash_len + 8..][..4]) >> 2,
commit_timestamp: BigEndian::read_u64(&bytes[file.hash_len + 8..][..8]) & 0x0003_ffff_ffff,
parent1: ParentEdge::from_raw(read_u32(&bytes[file.hash_len..][..4])),
parent2: ParentEdge::from_raw(read_u32(&bytes[file.hash_len + 4..][..4])),
generation: read_u32(&bytes[file.hash_len + 8..][..4]) >> 2,
commit_timestamp: u64::from_be_bytes(bytes[file.hash_len + 8..][..8].try_into().unwrap())
& 0x0003_ffff_ffff,
}
}

Expand Down Expand Up @@ -173,7 +177,7 @@ impl<'a> Iterator for ParentIterator<'a> {
},
ParentIteratorState::Extra(mut chunks) => {
if let Some(chunk) = chunks.next() {
let extra_edge = BigEndian::read_u32(chunk);
let extra_edge = read_u32(chunk);
match ExtraEdge::from_raw(extra_edge) {
ExtraEdge::Internal(pos) => {
self.state = ParentIteratorState::Extra(chunks);
Expand Down
21 changes: 14 additions & 7 deletions git-commitgraph/src/file/init.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@ use std::{
};

use bstr::ByteSlice;
use byteorder::{BigEndian, ByteOrder};
use filebuffer::FileBuffer;
use memmap2::Mmap;

use crate::file::{
ChunkId, File, BASE_GRAPHS_LIST_CHUNK_ID, COMMIT_DATA_CHUNK_ID, COMMIT_DATA_ENTRY_SIZE_SANS_HASH,
Expand Down Expand Up @@ -66,10 +65,18 @@ impl TryFrom<&Path> for File {
type Error = Error;

fn try_from(path: &Path) -> Result<Self, Self::Error> {
let data = FileBuffer::open(path).map_err(|e| Error::Io {
err: e,
path: path.to_owned(),
})?;
let data = std::fs::File::open(path)
.and_then(|file| {
// SAFETY: we have to take the risk of somebody changing the file underneath. Git never writes into the same file.
#[allow(unsafe_code)]
unsafe {
Mmap::map(&file)
}
})
.map_err(|e| Error::Io {
err: e,
path: path.to_owned(),
})?;
let data_size = data.len();
if data_size < MIN_FILE_SIZE {
return Err(Error::Corrupt(
Expand Down Expand Up @@ -241,7 +248,7 @@ impl TryFrom<&Path> for File {
fn read_fan(d: &[u8]) -> ([u32; FAN_LEN], usize) {
let mut fan = [0; FAN_LEN];
for (c, f) in d.chunks(4).zip(fan.iter_mut()) {
*f = BigEndian::read_u32(c);
*f = u32::from_be_bytes(c.try_into().unwrap());
}
(fan, FAN_LEN * 4)
}
4 changes: 2 additions & 2 deletions git-commitgraph/src/file/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ use std::{
path::PathBuf,
};

use filebuffer::FileBuffer;
use memmap2::Mmap;

pub use self::{commit::Commit, init::Error};

Expand Down Expand Up @@ -42,7 +42,7 @@ pub struct File {
base_graph_count: u8,
base_graphs_list_offset: Option<usize>,
commit_data_offset: usize,
data: FileBuffer,
data: Mmap,
extra_edges_list_range: Option<Range<usize>>,
fan: [u32; FAN_LEN],
oid_lookup_offset: usize,
Expand Down
3 changes: 1 addition & 2 deletions git-commitgraph/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@
//! As generating the full commit graph from scratch can take some time, git may write new commits
//! to separate [files][file::File] instead of overwriting the original file.
//! Eventually, git will merge these files together as the number of files grows.
#![forbid(unsafe_code)]
#![deny(rust_2018_idioms, missing_docs)]
#![deny(unsafe_code, rust_2018_idioms, missing_docs)]

pub mod file;
pub mod graph;
Expand Down
Loading