Book representation #371

Michael-F-Bryan · 2017-07-02T23:35:54Z

This PR adds a new internal representation for the book, plus functionality for loading that book using the Summary we got when parsing SUMMARY.md.

It is part of a larger issue (#359) which is working towards refactoring the Book structure and decouple it from the other bits like rendering and configuration.

Michael-F-Bryan · 2017-07-03T10:15:21Z

Believe it or not but this PR is pretty much complete. It only took 2 or 3 commits (pretty sure the first commit is 8310cce) to recursively walk the Summary and load each chapter from disk. I also added an integration test and it can successfully load the example book, plus looking through the JSON representation it looks correct.

@azerupi or @budziq would you be able to have a look through #360 before merging this one. I branched directly from there instead of master, so this PR contains a lot of commits from that PR. When you merge it the history should clean itself up.

budziq · 2017-07-03T11:13:29Z

Wow that s a lot of work @Michael-F-Bryan !

But would you be willing to squash the commits somewhat?
The commits represent your internal work in progress and I'm not sure if any of them is self contained enough for a per commit review.
ATM the review will be quite hard for instance I cannot say if the code you have introduced has been plugged anywhere into the mdbook binary. I have not started to review yet just due to sheer number of internal WIP / refactor / lint fix commits (as opposed to rather manageable final size of the PR).

Michael-F-Bryan · 2017-07-03T11:42:06Z

@azerupi, sure thing. Most of the commits are checkpoints I've made along the way and #360 will probably be easier to review as a single unit.

I'll squash #360 to make that easier to review and merge, then once we're all happy with that I can clean up the history for this commit.

Would you like me to post a small summary as a comment on #360 so you know what it's meant to do? I haven't tied any of this code into the actual mdbook binary, so the mdbook::loader module isn't being called/used by anything other than integration tests yet.

budziq · 2017-07-03T12:07:14Z

Would you like me to post a small summary as a comment on #360 so you know what it's meant to do?

I'm not sure if its worth the effort to keep two separate PR's Why not incorporate all of the changes to #360 change it's title and close this one? I would advise against merging it until it's actually used anyway.

As for the comment/summary. Yes please do! And please add descriptive information to the squashed commit.

Michael-F-Bryan · 2017-07-03T12:45:16Z

Why not incorporate all of the changes to #360 change it's title and close this one?

Yeah, that's a good idea. I was thinking of keeping the two separate because they are logically separate (one parses SUMMARY.md, the other turns it into a Book), but merging them will probably be less work for you guys.

I'm not sure if I want to hold out on merging until it's actually used though. Integrating the new loader module with the rest of mdbook will take a bit of work and probably increase the amount of code to review to 1500-2000 lines.

Anyway, here's a rough summary of the work I've done so far:

Summary Module

Added a private submodule called mdbook::loader::summary which contains all the code for parsing SUMMARY.md
A Summary contains a title (optional), then some prefix, numbered, and suffix chapters (technically Vec<SummaryItem>)
A SummaryItem is either a Link (i.e. link to a chapter), or a separator
A Link contains the chapter name, its location relative to the book's src/ directory, and a list of nested SummaryItems
The SummaryParser (a state machine parser) uses pulldown_cmark to turn the SUMMARY.md string into a stream of Events, it then iterates over those events changing its behaviour depending on the current state,
- The states are Start, PrefixChapters, NestedChapters(u32) (the u32 represents your nesting level, because lists can contain lists), SuffixChapters, and End
- Each state will read the appropriate link and build up the Summary, skipping any events which aren't a link, horizontal rule (separator), or a list

Book Module

Created another private submodule, mdbook::loader::book
This submodule contains the data types representing a Book
- For now the Book just contains a list of BookItems (either chapters or separators)
A Chapter contains its name, contents (as one long string), an optional section number (only numbered chapters have numbers, obviously), and any nested chapters
There's a function for loading a single Chapter from disk using it's associated Link entry from the SUMMARY.md
Another function builds up the Book by recursively visiting all Links and separators in the Summary and joining them into a single Vec<SummaryItem>. This is the only non-dumb-data-type item which is actually exported from the book module

Loader Module

The mdbook::loader module contains a Loader which ties together the functions and data types exposed by the summary and book modules into a more convenient API

Tests

Added a heap of tests, about half the PR is actually just unit and integration tests, in particular:
Added one or two unit tests for each state in the parser's state machine
Made sure you can load from disk by writing some files to a temporary directory
Added integration tests for parsing a dummy SUMMARY.md then asserting the result is exactly what we expected
Made sure the Loader can load the entire example-book from disk and doesn't crash or hit an error
Increased test coverage from 34.4% to 47.7% (as reported by cargo kcov) 😃

I've made sure everything has documentation (with #![deny(missing_docs)]), so if you check out a local copy of my book-representation branch you can open up the docs for a more high-level view of how loading works and what the data types are.

I feel like I'm exposing more than I'd like to here because all the data types (Summary, Book, etc) are completely public. I originally did that because it's meant to be a plain old data tree which people can inspect to modify and browse in plugins or when making their own Renderer, so let me know if things could be made private.

I'm also trying out the new rustfmt-nightly and have that set to automatically reformat on save, so if there are any style issues don't be shy about criticizing 😜

Sorry for the wall of words and the massive PR, in retrospect I really should have broken this up into much smaller pieces!

EDIT: Forgot to mention all the tests

budziq · 2017-07-03T13:09:36Z

@Michael-F-Bryan thanks for the summary. It will help with the review 👍

I'm not sure if I want to hold out on merging until it's actually used though.

I might just be me but I subscribe to the line of thought that anything that is not used or does not implement a required usecase should be immediately purged from the code base (mostly because I work on very large legacy codebases day to day and hate code that was "just left because it might be used someday" with heat of thousand suns) 😉 These things tend to quickly bitrot add complexity, reduce readability and maintainability.

On the other hand actually plugging your changes, seeing how they interact with rest of the codebase and replace the old versions will let you iterate and minimise the design if required.

Integrating the new loader module with the rest of mdbook will take a bit of work

No rush. I'll gladly wait to see how the final api will turn out 👍

and probably increase the amount of code to review to 1500-2000 lines.

As long as the changes are relatively self contained in commits this number is not that staggering. But First thing that comes to my mind is that you might be trying to make the design too broad. How about enumerating the actual usecases that are not covered by the legacy code and problems that you are trying to address in a short list here?

Michael-F-Bryan · 2017-07-04T08:24:40Z

These things tend to quickly bitrot add complexity, reduce readability and maintainability.

When you put it that way it makes a lot of sense!

First thing that comes to my mind is that you might be trying to make the design too broad

I was worried about that too, but over half the code is actually tests to ensure the implementation does what I expect it to. I've found doing unit and integration tests like that also helps quite a bit with the overall API ergonomics.

I originally wanted to do this refactoring because I was trying to make a PDF backend and found the existing Renderer and MdBook components too complicated and coupled to the HTML backend, so I'm trying to make sure the API will be easy for others to build plugins and renderers on top of.

I've tried to make sure the loader module in general is quite self-contained. As far as I'm concerned, its only responsibility is to load some in-memory representation of a book into memory, using the SUMMARY.md as a guide. As such, it shouldn't really need to interact with any other parts of the system.

Michael-F-Bryan · 2017-07-04T08:29:33Z

Here are a couple use cases I was thinking of:

MathJax Plugin Author

I want to have easy access to all of the source text from each chapter in a book. I should be able to implement a Visitor trait (which looks something like the syn::visit::Visitor) and only have to override a single method to do my mathjax substitution.

struct MathJaxPreprocessor;

impl Visitor for MathJaxPreprocessor {
  fn visit_chapter_content(&mut self, content: &mut String) {
    let pattern = Regex::new(...).unwrap();
    content = pattern.replace_all(...);
  }
}

PDF Renderer Author

As a renderer author I want complete access to all facets of the book, preferably in some sort of AST-like structure so I can walk the AST, generating a LaTeX document in memory which I can then compile to PDF. Similar to the plugin author I would like some sort of Visitor trait which I can implement, then I can create a mdbook-pdf crate which looks something like this:

use pdf::Renderer;
use mdbook::{MdBook, Visitor};

fn main() {
  let renderer = Renderer::new(...);
  let mut md = MdBook::new("/path/to/book/root");  // Get the actual path using `clap`
  md.set_renderer("pdf", Box::new(renderer));  // takes a boxed `Visitor`

  md.build().unwrap();
}

MdBook Developer

It'd be nice to structure the process of building a book as a pipeline of transformations, with each step manipulating a tree-like Book. kinda like how you'd do multiple passes and graph manipulations in a compiler. the Book would be a really dumb data object which lets you access its attributes and manipulate them. Each step can then be configured based on the contents of Book.toml. This should make the entire process easier to reason about and cleaner overall.

The Book should be completely (or as much as possible) decoupled from any particular renderer and allow me to make changes without breaking other people's code.

Thanks for asking for a couple use cases. Even as I was writing them up I figured out a nicer way to expose the Book's contents in a way that still allows it to evolve.

budziq · 2017-07-04T11:38:41Z

@Michael-F-Bryan thanks for the writeup. Here are some of my thoughts:

I originally wanted to do this refactoring because I was trying to make a PDF backend and found the existing Renderer and MdBook components too complicated and coupled to the HTML backend,

Yep that is a serious problem currently and major roadblock in supporting other renderers 👍

MathJax Plugin

This will probably the hardest one to tackle as MathJax is a superset of of Some LaTeX with math specific plugins + MathML (so the output of MathJax on html + MathJax.js will most likely be largely incompatible with pdf one that would be LaTeX based). I suspect that we cannot do this cleanly and it might be easier to have separate incompatible plugins for Math Symbols for each renderer (otherwise we might have a MathJax to image converter).

I would strongly advise to implement a second renderer (the more stripped and unfeatureful as possible, certainly lacking MathJax support) along this PR to actually show what is needed from this PR and most likely we will arrive at a simpler design which we might always rewrite again if need arises.

I want to have easy access to all of the source text from each chapter in a book. I should be able to implement a Visitor trait
...
AST-like structure so I can walk the AST,

Ar we sure that we need a design of such caliber (yet)? The book is a relatively flat sequential structure (summary is actually needed only to establish viewing order and render TOC).
Most likely there will be no need for actual AST representation for each page until we stumble on a real roadblock in implementation that would make implementation impossible otherwise.
Most likely html, epub, mobi/azw renderers would never need and AST (as these are html based and would be handled by pulldown-cmark possibly without need for knowledge about MarkDown AST)
If a renderer plugin would like to have access to MarkDown AST I would suggest it just uses the pulldown-cmark (we would not be able to cater to all possible renderer needs with our simplied AST anyway so lets not try to and delegat the work to the more qualified crate)

I would rather see a minimal pdf renderer along with major decoupling and simplification than a more complicated (and arguably over engineered) MdBook design without actual renderer implementation that warrants it.

md.set_renderer("pdf", Box::new(renderer)); // takes a boxed Visitor

I would suggest avoiding the visitor pattern if possible as this will needlessly complicate the design without much benefit. The question is which element should drive the iteration (the book or the renderer). Arguably the renderer will have a more domain knowledge about its output and requirements and will be able to make better choices about items and data from the book it would like to visit/revisit.

Most likely making the book contents iterable and accessible in some sane fashion from the renderer would make the design easier (Granted there would be some minor code duplication between renderers that we might extract into separate logic if we have more than 1-2 renderers 😉 )

book as a pipeline of transformations,

This is a nice idea and would be quite easily implemented with some simple imperative design that would boil down to following pseudocode. Please do not take it as a real suggestion buth rather another point of view for discussion:

  let mut md = MdBook::new("/path/to/book/root");
  // for renderer agnostic preprocessing that works on bare markdown (like the `{{#playpen}}` and upcoming `{{#include}}` )
  for prep in preprocessors {
      // possibly plugins here too
       prep.process(md)?;
  }
  for renderer in renderers {
       renderer.register_plugins(ctx); // get the plugins from some global context - 
       // run renderer dependent plugins and preprocessing internally in render implementation
       render.render(md).build()?;
  }

Michael-F-Bryan · 2017-07-04T12:16:31Z

Are we sure that we need a design of such caliber (yet)? The book is a relatively flat sequential structure (summary is actually needed only to establish viewing order and render TOC).

I wasn't planning on making it too complex, to me a book is just a collection of chapters, and each chapter looks roughly like this:

struct Chapter {
  name: String,
  content: String,
  number: Option<SectionNumber>, // SectionNumber is just a newtype'd Vec<u32>
  nested_chapters: Vec<Chapter>,  // because chapters can have sub-chapters
}

I probably shouldn't have used the word "AST", what I was meaning is a graph where each node is a Chapter and each Chapter may have sub-chapters. The Visitor just gives you a convenient trait for all plugins and renderers to implement, allowing the MdBook to accept any boxed Visitor so people can do their thing. It also means users don't have to make sure they implement all the recursion manually, they can rely on Visitor's default impls for stuff they don't care about.

This is roughly the design we already have, except I've replaced the PathBuf with a plain String containing the content. Presumably we were storing the PathBuf so you can open a File instead of holding everything in memory. That feels like premature optimisation to me and just holding everything in memory (it's a couple hundred k of text at most) makes things a lot simpler for everyone and means you are no longer bound to the filesystem (i.e. tests can be done entirely in-memory instead of in a tempdir).

budziq · 2017-07-04T13:04:20Z

number: Option, // SectionNumber is just a newtype'd Vec

Would we be able to use this data except for formatting a string prefix for chapter name? I don't see a real possibility for it in case of visitor approach. In case of non visitor approach we might use these for some wacky indexing/traversal if need be (but please see my next comment)

I was meaning is a graph where each node is a Chapter and each Chapter may have sub-chapters.

I hope that we are not planning to create any cycles . I guess that we could use a tree structure. We might even get away with a plain list/vec representing the tree traversal and I would suggest a vec as a first POC to findout if we need anything more sophisticated.

It also means users don't have to make sure they implement all the recursion manually, they can rely on Visitor's default impls for stuff they don't care about.

Visitor approach will most likely make implementing navigation a lot harder.
implementing recursion by hand can be avoided by simply providing an iterator representing tree traversal (possibly double ended one). This is the approach I would advocate
please see my previous comment about the fact that we do not know if some renderer will not need a non sequential traversal imposed by visitor. (Nontrivial cases would require direct node indexing but we would need to encounter such case before adding such api)

Presumably we were storing the PathBuf so you can open a File instead of holding everything in memory.

Might be. I'm not opposed to storing everything in memory. We are looking at trivial amounts of data anyway.

Anyhow I would suggest that this effort would be driven by the needs of a budding pdf (or any other) renderer. Then we would be able to iterate on the design to find the cleanest and most maintainable one.

Michael-F-Bryan · 2017-07-04T13:19:12Z

I've started an initial PDF renderer here. It actually works and took about 5 minutes to write up, I just need to break the markdown up and convert it to LaTeX but that problem is kinda orthogonal to the book's representation.

I've attached the outputted document after running it over the example book (just knock off the trailing ".txt", apparently github doesn't like attaching "*.tex" documents).

book-example.tex.txt

budziq · 2017-07-04T13:53:43Z

Very cool 👍 (although I would call it LaTex renderer which might be even more awesome amd versatile in its own right)
I venture that using iterator instead of visitor would make the implementation even shorter 😄

Also the second important question is if we bundle some additional renderrers with mdbook or do we split (even on repo level). I guess that I would start with approach similar to ripgrep. One canonical repo with several crates. Then you could add this WIP rendererr to this PR making the design much easier to reason. And then experiment with split to different repos.

Michael-F-Bryan · 2017-07-04T13:57:48Z

Would we be able to use this data except for formatting a string prefix for chapter name

I think that's the only reason we need it. MdBook makes a distinction between prefix chapters, numbered chapters, and suffix chapters so I thought we'd still need it. I prefer having a newtype'd list of numbers though because I spent forever trying to figure out what the string in mdbook::book::BookItem::Chapter(String, Chapter) was. With a newtype you can add a trivial Display impl, plus get the extra context of knowing what it is and not needing to do string parsing to get the elements.

I hope that we are not planning to create any cycles . I guess that we could use a tree structure.

Yep, Rust's borrowing doesn't like graphs with cycles (you end up with multiple mutable references), plus a tree structure seems to be the logical way of representing a book because your book almost always reflects how you've laid things out on the file system.

I venture that using iterator instead of visitor would make the implementation even shorter

Haha, I'll implement a Book::iter() function which gives you an iterator over all of the elements. I guess if anyone wants to iterate over the chapters in a Book they can implement it themselves, like I did in the pdf renderer.

The visitor pattern may be a premature optimisation/generalisation on my part because I've been working on a toy language and that's about the only (feasible) way of inspecting and manipulating your AST 😜

Michael-F-Bryan · 2017-07-07T13:28:39Z

Sorry it's taken me so long to get back to this! I got the go ahead to start working on a Rust library/DLL which will be integrated into a product at work so I've been a tad distracted.

I've squashed the commits to make the history a bit easier to navigate, plus I implemented a depth-first iterator over the items in a book as per popular demand 😉 I reckon my implementation may be a bit nicer than the existing one, so that's a bonus too.

Should my next step be to try and start integrating Book into the MdBook struct?

Michael-F-Bryan · 2017-07-07T14:52:24Z

Ugh, apparently Appveyor's "stable" target isn't actually the stable release, so even though struct field shorthands are stable (as of 1.17) the build fails trying to compile the unicode-bidi dependency.

From what I can tell the dependency chain is hyper > url > idna > unicode-bidi, but I'm not sure what I can do to fix it on my end...

EDIT: Fixed. Rust was being installed directly from static.rust-lang.org and that's probably gone stale, so I'm using rustup instead.

budziq · 2017-07-07T19:43:53Z

Thanks a lot @Michael-F-Bryan . I would put the last commit azerupi@092b445 in a separate PR as it is not really related to rest of the work and could be merged immediately. I'll try to give a review of rest of the PR shortly

budziq

Hi, here are some of my random thoughts.

I find the code very clean and very good craftsmanship 👍 ! But I have a feeling that Summary and Loader over more than a little over-complicated for relatively simple task at hand while the Book seams ok. Could we pare down the implementation and strip the number of entities to the bare minimum? Currently It's few times longer and more complex while performing exactly the same tasks (not counting the enormous amount of tests which I adore ❤️ ).

Don't get me wrong. I love the refactoring and separation for the most part (I haven't done a serious code review yet as Imho the top level organization could do some more thinking)

And I would love to see the code actually used in action replacing the original implementations (I also think that the actual removal and replacement of original implementations should be part of this PR) once it would be pared down.

budziq · 2017-07-07T20:01:18Z

src/loader/book.rs

@@ -0,0 +1,322 @@
+#![allow(missing_docs, unused_variables, unused_imports, dead_code)]


I would not add any of these pragmas. If anything I would add most of these in deny mode

Oops! I added those during development to decrease the signal-to-noise ratio of the compiler output. Looks like I forgot to remove it afterwards.

budziq · 2017-07-07T20:27:28Z

src/lib.rs

-extern crate log;
+extern crate pretty_assertions;
+#[cfg(test)]
+extern crate tempdir;


do we need it here? It's used only in loader/book.rs tests mod

I found the pretty_assertions crate invaluable when you try to debug assertions where the assert!() is comparing two complex structures. However you have a point in that it's not really worth adding yet another dependency for a feature which is only used 3 or 4 times... I think I'll remove it and see how I go without it.

actually I meant tempdir here. pretty_assertions cannot be imported anywhere else due to macros.

Ah okay, what would you prefer me to do here? Do you think I should try to remove the tempdir dependency, rearrange the imports, or even move the extern crate statement to loader/book.rs?

tempdir is very handy in tests so I guess that we could make the import it localized to them as long as it's used only in one suite. If you plan to use it elsewere (or in other suites) then its fine where it is.

budziq · 2017-07-07T20:46:38Z

src/loader/book.rs

+///
+/// You need to pass in the book's source directory because all the links in
+/// `SUMMARY.md` give the chapter locations relative to it.
+pub fn load_book_from_disk<P: AsRef<Path>>(summary: &Summary, src_dir: P) -> Result<Book> {


i would suggest renaming the method to just load_book

budziq · 2017-07-07T20:56:17Z

src/loader/summary.rs

@@ -0,0 +1,726 @@
+#![allow(dead_code, unused_variables)]


I would not add these pragmas

budziq · 2017-07-07T21:00:14Z

src/loader/book.rs

+
+/// Enum representing any type of item which can be added to a book.
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
+pub enum BookItem {


I'm not sure why we need separation between Chapter, SummaryItem and BookItem

The extra level of indirection from BookItem is necessary because the SUMMARY.md format says your book is (conceptually) a list of Chapters and section separators.

The difference between BookItem and SummaryItem is because summary parsing is a separate process to loading a book. Parsing the summary should just need to take in your SUMMARY.md as a string and not need to touch the filesystem or load chapters into memory. That's why Summary is a list of SummaryItems which are either section separators or Links (e.g. - [Introduction](./src/intro.md)"). Hence the distinct Link entity instead of reusing Chapter.

budziq · 2017-07-07T21:40:31Z

src/loader/summary.rs

+        .iter_mut()
+        .enumerate()
+        .filter_map(|(i, item)| item.maybe_link_mut().map(|l| (i, l)))
+        .rev()


wouldn't just last() work here?

That's what I originally tried too. What you are trying to do is get the last chapter from the sub_items so you can append the new Chapter to it, but what happens when you call last() and the last item is a Separator instead of a Chapter?

I agree that it's not overly pretty or easy to understand, I'll see if I can pull it out into a method.

Also, maybe_link_mut() feels like a bit of a hack so I don't need to write a match statement inside a closure. It needs to go...

budziq · 2017-07-07T21:43:58Z

src/loader/summary.rs

+///
+/// This is roughly the equivalent of `[Some section](./path/to/file.md)`.
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
+pub struct Link {


do we need another entity?

You'll probably want to look at my previous comment for more info, but basically a Chapter and a Link have a very different purpose even if they both look quite similar. One is a chapter and its contents loaded from disk (with all the IO and logic that entails), whereas the other represents a single entry in the SUMMARY.md file.

budziq · 2017-07-07T21:44:37Z

src/loader/summary.rs

+/// Given a particular level (e.g. 3), go that many levels down the `Link`'s
+/// nested items then append the provided item to the last `Link` in the
+/// list.
+fn push_item_at_nesting_level(links: &mut Vec<SummaryItem>, mut item: SummaryItem, level: usize, mut section_number: SectionNumber)


this signature is quite awkward. Why not make this a method over summary?

budziq · 2017-07-07T22:04:08Z

appveyor.yml

-  - rust-%RUST_VERSION%-%TARGET%.exe /VERYSILENT /NORESTART /DIR="C:\Program Files (x86)\Rust"
-  - SET PATH=%PATH%;C:\Program Files (x86)\Rust\bin
-  - rustc -V
+  - ps: >-


please separate it to a different PR. This is orthogonal and is most likely to be merged much earlier than the rest of this PR.

Done, I pulled it out into #373 so it won't be held up by this PR.

Michael-F-Bryan · 2017-07-08T06:24:44Z

@budziq I definitely agree with what you are saying about over-complicating things and introducing loads of entities which may or may not even be necessary. I've had a bit of a re-think about what I'm doing and which things

I've completely removed the mdbook::loader::summary exports from mdbook's public interface (summary parsing is just an implementation detail of loading a book) and replaced the Loader struct with a single load_book() function because it doesn't really pull its weight. This removes a lot of the exposed entities while exporting the one bit of functionality we actually care about (i.e. the load_book function).

I think a lot of the complication comes about because I've considered parsing SUMMARY.md and loading a Book from disk to be two orthogonal tasks, whereas the original implementation combined both into a single function. This means I'm creating an intermediate tree structure (Summary, with its SummaryItem and Link) with all the extra complexity that entails, plus the bit which loads the book then needs to traverse this tree.

I'd kinda like to leave the SummaryParser in the summary submodule because parsing is one of those things which is hard to do well and requires a structured approach. Otherwise it'll turn into a code monster which has lots of hard to find bugs and implicit state and will make you very sad in the long run. At the very start of the PR I was just trying to clean up the existing implementation and slightly alter the Book structure but eventually gave up because of exactly those reasons.

Out of curiosity, I just went through and did a really rough count of lines of executable code excluding tests and docs after incorporating some of the feedback you've given me. This PR would replace 320 lines (230 from mdbook::parse and 90 from mdbook::book::bookitem) with about 450 lines (300 lines in mdbook::loader::summary, 120 from mdbook::loader::book, and 20 from the top level mdbook::loader). Obviously you'll want to take those numbers with a big pinch of salt because I'll be biased in what I call "executable lines of code", but they're some ballpark figures for how much change this PR is proposing.

budziq · 2017-07-08T09:36:31Z

Out of curiosity, I just went through and did a really rough count of lines of executable code excluding tests and docs after incorporating some of the feedback you've given me.

So lets take these out of the clean room! I very much like to see these lines replacing the original and how these interact with rest of the codebase in this PR (removing the old implementations would help show how all of it comes together).

Michael-F-Bryan · 2017-07-08T11:25:16Z

@budziq I started integrating this PR in and surprisingly it merged quite nicely. I just needed to changed book::BookItem imports into loader::BookItem then fix a couple compilation errors and everything fell into place 😁

The only non-trivial issue is that MDBook::new() creates a book in a potentially uninitialised state, deferring checking whether a book is valid and loading it until you manually call parse_summary() which will mutate the book's state or throw an error. My Book doesn't like this because you can only construct it using the load_book() function (which requires everything to be valid), you can't create an unintialised Book then update it later on.

@budziq and @azerupi what are your thoughts on MDBook::new() checking the book directory up front and returning a Result<MDBook> so that any time you have a MDBook you know it's already been initialized and everything is correct? Then the MDBook::init() method will turn into a second constructor that also initializes everything on disk and returns your new MDBook (or an error if initialization failed for whatever reason).

I'm also a little scared because once the compiler was satisfied all the tests passed... The reason I'm worried is I know I've at least broken inter-chapter links, so how many other things could I have accidentally broken and not know about? 😨

budziq · 2017-07-08T11:35:58Z

what are your thoughts on MDBook::new()

I would make a builder type

Michael-F-Bryan · 2017-07-15T04:14:46Z

I'm thinking I might need to back up a commit or two and rethink how I'm merging this into the main library. I'd prefer to try and keep the changes to MDBook as minimal as possible because this PR is mainly about the book's internal representation, whereas as I started integrating the builder in, I started creeping into configuration and overall design and broke a lot of stuff.

It'll be a bit easier when #374 lands because then I can use those tests to make sure I haven't accidentally introduced regressions.

budziq · 2017-07-15T07:43:57Z

Sorry :( I was swamped last week and I'm on vacation for the next two weeks too. I'll try to get back on track as soon as I return!

…

On 15 July 2017 at 06:14, Michael Bryan ***@***.***> wrote: I'm thinking I might need to back up a commit or two and rethink how I'm merging this into the main library. I'd prefer to try and keep the changes to MDBook as minimal as possible because this PR is mainly about the book's internal representation, whereas as I started integrating the builder in, I started creeping into configuration and overall design and broke a lot of stuff. It'll be a bit easier when #374 <https://github.com/azerupi/mdBook/pull/374> lands because then I can use those tests to make sure I haven't accidentally introduced regressions. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/azerupi/mdBook/pull/371#issuecomment-315508299>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AANfSA26iPA7qTZxeCzrKlohSF0jLfRTks5sODy3gaJpZM4OLuRh> .

Michael-F-Bryan · 2017-07-15T09:23:02Z

No stress. As I think you said at some point earlier on, I'd rather take a little longer and put some thought into the design than rush things 😁

I'm thinking the next victim for refactoring will probably be the configuration system. It feels like it's threaded through the entire codebase as it's evolved and features have been added, so it may need some TLC before we can do things like plugins and alternative renderers.

From the [pull request comment][pr], here's a rough summary of what was done in the squashed commits. --- \# Summary Parser - Added a private submodule called `mdbook::loader::summary` which contains all the code for parsing `SUMMARY.md` - A `Summary` contains a title (optional), then some prefix, numbered, and suffix chapters (technically `Vec<SummaryItem>`) - A `SummaryItem` is either a `Link` (i.e. link to a chapter), or a separator - A `Link` contains the chapter name, its location relative to the book's `src/` directory, and a list of nested `SummaryItems` - The `SummaryParser` (a state machine-based parser) uses `pulldown_cmark` to turn the `SUMMARY.md` string into a stream of `Events`, it then iterates over those events changing its behaviour depending on the current state, - The states are `Start`, `PrefixChapters`, `NestedChapters(u32)` (the `u32` represents your nesting level, because lists can contain lists), `SuffixChapters`, and `End` - Each state will read the appropriate link and build up the `Summary`, skipping any events which aren't a link, horizontal rule (separator), or a list \# Loader - Created a basic loader which can be used to load the `SUMMARY.md` in a directory. \# Tests - Added a couple unit tests for each state in the parser's state machine - Added integration tests for parsing a dummy SUMMARY.md then asserting the result is exactly what we expected [pr]: https://github.com/azerupi/mdBook/pull/371#issuecomment-312636102

This is a squashed commit. It roughly encompasses the following changes. --- \# Book - Created another private submodule, mdbook::loader::book - This submodule contains the data types representing a Book - For now the Book just contains a list of BookItems (either chapters or separators) - A Chapter contains its name, contents (as one long string), an optional section number (only numbered chapters have numbers, obviously), and any nested chapters - There's a function for loading a single Chapter from disk using it's associated Link entry from the SUMMARY.md - Another function builds up the Book by recursively visiting all Links and separators in the Summary and joining them into a single Vec<SummaryItem>. This is the only non-dumb-data-type item which is actually exported from the book module \# Loader - Made the loader use the book::load_book_from_disk function for loading a book in the loader's directory. \# Tests - Made sure you can load from disk by writing some files to a temporary directory - Made sure the Loader can load the entire example-book from disk and doesn't crash or hit an error - Increased test coverage from 34.4% to 47.7% (as reported by cargo kcov)

Michael-F-Bryan · 2017-08-02T16:59:53Z

Hey @budziq and @azerupi I've finally found time to get back to this PR 🎉

I feel like I'm going to need to make a couple major breaking changes to the way MDBook earlier than I originally intended. A lot of the configuration and initialization seems to be threaded through the type and that makes it really difficult to change anything without breaking the world. For example, the MDBook is currently used somewhat as a builder already to let you alter different bits of the configuration and relies on having an uninitialized state until you call the parse_config() method.

I'll probably need to refactor it to be less coupled to the configuration and rest of the system, and so you can only ever get an instance of MDBook if it's in a valid state ("RAII").

To help get a better idea of how things should look,

What is the overall purpose of the MDBook type?
Do you want to be able to change the configuration by calling methods on MDBook, or should you instead do all configuration up front then pass the BookConfig in?
Can you just pass the underlying Book (the in-memory representation of a book's contents) and BookConfig structs to some Renderer, skipping the need for a MDBook altogether? Then presumably testing is just another Renderer and the mdbook init command is just a function which takes a config.

azerupi · 2017-08-05T11:43:25Z

What is the overall purpose of the MDBook type?

When I started, it was supposed to be the "facade" of the library. The type you use to manipulate the book. I am not attached to it per se, so if there is a better way to structure the code it's worth discussing. :)

Do you want to be able to change the configuration by calling methods on MDBook, or should you instead do all configuration up front then pass the BookConfig in?

Most of the time, configuration will be done once. But I would like to keep the ability to change the configuration on the fly. It doesn't mean the MDBook type needs a method for every configuration option though. I can for example imagine a method for returning a struct representing the global config (renderer independent options), another to return one (or multiple) registered renderers holding their own configuration, etc.

Can you just pass the underlying Book (the in-memory representation of a book's contents) and BookConfig structs to some Renderer, skipping the need for a MDBook altogether?

I think the MDBook type is more higher level. It will contain a list of books (multi-language), a configuration, a list of renderers, a list of plugins, etc. When we build, MDBook will figure out what to call and when to call it in the process. Does that make any sense?

I'm open to other ideas, it's just what I have come up with at this time :)

testing is just another Renderer

I like that idea! 👍

Michael-F-Bryan · 2017-08-12T07:17:26Z

When I started, it was supposed to be the "facade" of the library. The type you use to manipulate the book.

I think it would be better in the long run if you can still load a book into memory and render it manually, so MDBook is just a convenience wrapper and not the pseudo-god object it is at the moment. I don't want to go and rewrite the MDBook convenience wrapper though because that'll result in loads of changes and breaking the entire world.

What do you think the best thing for me to do from here is? I've currently got a nicely decoupled way of loading the book from disk, now I need to convert the current system to using that instead of the old book representation. The issue is the current system assumes you can have a partially uninitialized state and be able to change all sorts of configuration things on the fly, while the way I've done things, you load the book from disk and shouldn't be able to change configuration options like the source directory afterwards because that'd result in an in inconsistent state.

Sorry it's taking so long to finish off this PR! I was really enthusiastic about making the internal representation for a book less coupled and "nicer" to start off with, but when I started trying to integrate it with the MDBook type and encountered a lot of friction trying to integrate it in I started to lose motivation. It's kinda demoralizing when a single trivial-looking change can result in 30+ compilation errors spread out over the entire crate from rendering to configuration or even the main mdbook binary :(

budziq · 2017-08-16T13:14:02Z

@Michael-F-Bryan

Sorry for late replies. It's pretty hectic lately. IMHO It would be best if MDBook was just a dumb iterable datastructure that happens to be renderrable and would not provide any functions for loading or composing or setters by itself. These things would be better left for some builder type (or types if there would be some mutually exclusive build variants) otherwise we'll always get inconsistent state

Sorry it's taking so long to finish off this PR!

Don't sweat it you are doing important work and doing great job at it 👍 There is no rush.

when I started trying to integrate it with the MDBook type and encountered a lot of friction trying to integrate

Yep. Largish refactors like this one tend to be on the painful side. Usually it is easier to refactor and integrate organically along the way instead of transplanting in one go.

It's kinda demoralizing when a single trivial-looking change can result in 30+ compilation errors

On the bright side I've looked on such things as a todo list given for free by the compiler. I always shudder at larger python code rewrites if test coverage is not great (end even if coverage is good it tends to require soaking most of the code to my frontal lobes before we even starte ;) ).

azerupi · 2017-08-16T14:43:20Z

It would be best if MDBook was just a dumb iterable datastructure that happens to be renderrable

@budziq, I am not sure I agree with you here. I don't have much time to go into details, but the way I see it is that we indeed need a book struct that is iterable but it is not the MDBook struct.

The reason why I think that is that if we want to support multi-lang books we will need to easily duplicate those iterable book structures without duplicating all the common metadata. So I think we need an overarching structure (which was the MDBook struct) containing the different books, configurations, etc.

If I understand your stream of thoughts correctly, you want to dissociate the configuration from the book data as much as possible? That doesn't sound like a bad idea, but at some point we need something to orchestrate the whole, how do you envision that?

budziq · 2017-08-16T20:02:01Z

The reason why I think that is that if we want to support multi-lang books we will need to easily duplicate those iterable book structures without duplicating all the common metadata.

Right. I keep forgetting about the planned multilang support as It has been outside of my requirements (for now).

Well I kind of like the idea of Rendition/Variant/Translation (naming things is hard ;)) which would represent a single book (as in single language variant) with all the qualities that I've described previously (simple renderable collection). Then we could turn have a higher level entity spanning multiple variants. Which in turn would not be much else than a container and mapping between summaries, which may or may not be renderable (html one might be but it might make less sense for other renderers).

I'm still unclear if we would actually need any mapping between book translation summaries at all, except for some lints about translations diverging from the original. @azerupi do you have a use-case in mind in regard to the common metadata? Most likely I'm missing something there.

Michael-F-Bryan · 2017-08-17T00:47:41Z

To me, it sounds like there are three general structures at play here. First there's a Book, this would roughly translate to the contents of your src/ directory and the markdown files inside (or a single language, if the book is multilingual), then there's some Config` struct which encapsulates all configuration.

On top of those two there'll be a MDBook which orchestrates the entire rendering process and lets you tie in the configuration with rendering and possibly also how books are loaded from disk if it's multilingual.

It sounds like the MDBook and Book structs may have been confused at some point (naming things is hard!).

Is there a issue/PR for multilingual support I can read through to get up to speed with how we're wanting to do it? From what I've seen of the MDBook struct so far, there isn't (yet) any concrete code for dealing with multiple versions of a book in different languages.

azerupi · 2017-08-17T07:40:32Z

@Michael-F-Bryan Yes, that is mostly how I envisioned things :)

Is there a issue/PR for multilingual support I can read through to get up to speed

There isn't anything complete. The best I can give you is issue #146 where I described how I imagined the different structs. The related PR #147 that never got merged, could be a good starting point.

azerupi · 2017-08-17T07:45:23Z

I'm still unclear if we would actually need any mapping between book translation summaries at all

I think we should leave that to the renderer. Like you said, only few renderers will need mapping between languages (mainly the HTML one), so it makes sense to let the renderers implement it to suit their needs.

azerupi · 2017-08-20T09:32:25Z

@Michael-F-Bryan I propose that we try to integrate some part of this into master already.
The part I am most interested in is the book structures. So I propose that we try to integrate some mixture of the structs in this PR and #147 without all the other refactorings to begin and see where that gets us.

I am pushing this specific part to move one step closer to multi-lingual books. Does that sound ok? If you don't feel like making a new PR I can definitely take care of it. I have some free time today and this week in general :)

Michael-F-Bryan · 2017-08-21T03:49:18Z

I had a couple hours between classes so I decided to finish this off. I've rolled back to when I finished the Book (before trying to integrate it into MDBook) and redone the integration, massively scaling back the number of changes.

I've made a PR (#409) and all the tests pass (woohoo! 🎉) so it looks like everything should work. @budziq and @azerupi should we start the reviewing process now?

From the [pull request comment][pr], here's a rough summary of what was done in the squashed commits. --- \# Summary Parser - Added a private submodule called `mdbook::loader::summary` which contains all the code for parsing `SUMMARY.md` - A `Summary` contains a title (optional), then some prefix, numbered, and suffix chapters (technically `Vec<SummaryItem>`) - A `SummaryItem` is either a `Link` (i.e. link to a chapter), or a separator - A `Link` contains the chapter name, its location relative to the book's `src/` directory, and a list of nested `SummaryItems` - The `SummaryParser` (a state machine-based parser) uses `pulldown_cmark` to turn the `SUMMARY.md` string into a stream of `Events`, it then iterates over those events changing its behaviour depending on the current state, - The states are `Start`, `PrefixChapters`, `NestedChapters(u32)` (the `u32` represents your nesting level, because lists can contain lists), `SuffixChapters`, and `End` - Each state will read the appropriate link and build up the `Summary`, skipping any events which aren't a link, horizontal rule (separator), or a list \# Loader - Created a basic loader which can be used to load the `SUMMARY.md` in a directory. \# Tests - Added a couple unit tests for each state in the parser's state machine - Added integration tests for parsing a dummy SUMMARY.md then asserting the result is exactly what we expected [pr]: https://github.com/azerupi/mdBook/pull/371#issuecomment-312636102

Michael-F-Bryan · 2017-10-04T12:01:40Z

I'm closing this in favor of #409.

Michael-F-Bryan changed the title ~~Book representation~~ [WIP] Book representation Jul 2, 2017

Michael-F-Bryan changed the title ~~[WIP] Book representation~~ Book representation Jul 3, 2017

Michael-F-Bryan mentioned this pull request Jul 3, 2017

Initial Book loader and summary parsing #360

Closed

Michael-F-Bryan force-pushed the book-representation branch from f3ed090 to da03672 Compare July 7, 2017 13:20

budziq suggested changes Jul 7, 2017

View reviewed changes

budziq reviewed Jul 7, 2017

View reviewed changes

Michael-F-Bryan mentioned this pull request Jul 8, 2017

Updated appveyor.yml to use rustup for installing Rust #373

Merged

Michael-F-Bryan force-pushed the book-representation branch from 092b445 to da03672 Compare July 8, 2017 05:47

Michael-F-Bryan added 9 commits August 3, 2017 00:30

Added a depth-first chapter iterator

2670510

Started integrating some of the feedback from budziq

773784a

Started cleaning up some of the lints and warnings

01301d9

Integrated the new Book structure into MDBook

65a60bf

Updated the binary to use the new signature

ed83e0e

Started working on a MDBookBuilder

92f0fa0

Updated the documentation

77fe87e

Michael-F-Bryan force-pushed the book-representation branch from 8ff7664 to 77fe87e Compare August 2, 2017 17:32

Michael-F-Bryan mentioned this pull request Aug 21, 2017

Book representation - Attempt #2 #409

Closed

Michael-F-Bryan closed this Oct 4, 2017

Michael-F-Bryan deleted the book-representation branch December 22, 2017 17:50

		@@ -0,0 +1,322 @@
		#![allow(missing_docs, unused_variables, unused_imports, dead_code)]

Book representation #371

Book representation #371

Conversation

Michael-F-Bryan commented Jul 2, 2017 • edited Loading

Michael-F-Bryan commented Jul 3, 2017 • edited Loading

budziq commented Jul 3, 2017

Michael-F-Bryan commented Jul 3, 2017

budziq commented Jul 3, 2017

Michael-F-Bryan commented Jul 3, 2017 • edited Loading

Summary Module

Book Module

Loader Module

Tests

budziq commented Jul 3, 2017 • edited Loading

Michael-F-Bryan commented Jul 4, 2017

Michael-F-Bryan commented Jul 4, 2017

MathJax Plugin Author

PDF Renderer Author

MdBook Developer

budziq commented Jul 4, 2017 • edited Loading

Michael-F-Bryan commented Jul 4, 2017

budziq commented Jul 4, 2017 • edited Loading

Michael-F-Bryan commented Jul 4, 2017

budziq commented Jul 4, 2017

Michael-F-Bryan commented Jul 4, 2017

Michael-F-Bryan commented Jul 7, 2017 • edited Loading

Michael-F-Bryan commented Jul 7, 2017 • edited Loading

budziq commented Jul 7, 2017

budziq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Michael-F-Bryan Jul 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Michael-F-Bryan commented Jul 8, 2017

budziq commented Jul 8, 2017

Michael-F-Bryan commented Jul 8, 2017

budziq commented Jul 8, 2017

Michael-F-Bryan commented Jul 15, 2017

budziq commented Jul 15, 2017 via email

Michael-F-Bryan commented Jul 15, 2017

Michael-F-Bryan commented Aug 2, 2017

azerupi commented Aug 5, 2017

Michael-F-Bryan commented Aug 12, 2017

budziq commented Aug 16, 2017

azerupi commented Aug 16, 2017

budziq commented Aug 16, 2017

Michael-F-Bryan commented Aug 17, 2017

azerupi commented Aug 17, 2017

azerupi commented Aug 17, 2017

azerupi commented Aug 20, 2017

Michael-F-Bryan commented Aug 21, 2017 • edited Loading

Michael-F-Bryan commented Oct 4, 2017

Michael-F-Bryan commented Jul 2, 2017 •

edited

Loading

Michael-F-Bryan commented Jul 3, 2017 •

edited

Loading

Michael-F-Bryan commented Jul 3, 2017 •

edited

Loading

budziq commented Jul 3, 2017 •

edited

Loading

budziq commented Jul 4, 2017 •

edited

Loading

budziq commented Jul 4, 2017 •

edited

Loading

Michael-F-Bryan commented Jul 7, 2017 •

edited

Loading

Michael-F-Bryan commented Jul 7, 2017 •

edited

Loading

Michael-F-Bryan Jul 8, 2017 •

edited

Loading

Michael-F-Bryan commented Aug 21, 2017 •

edited

Loading