Book representation - Attempt #2 #409

Michael-F-Bryan · 2017-08-21T03:48:03Z

This is a cut down version of #371.

It breaks up the process of parsing the SUMMARY.md and loading the book from disk into two discrete stages. This also removes the dependency on a file system so the book can be constructed entirely in memory (chapter content is attached to the Chapter as one big string).

azerupi

I went trough the PR a first time and left a couple of comments :)
One of the comments is hidden by GitHub because it was attached to old code that has been deleted.

azerupi · 2017-08-21T09:33:20Z

Cargo.toml

-    "book-example/*",
-    "src/theme/stylus",
-]
+version = "0.0.22-pre"


Why has everything been re-ordered in the Cargo.toml?
Did you use something like cargo-edit?

Could we roll back to the previous order? It made more sense to me and this adds noise to the diff :)

Yeah I'm going to do that. I used cargo-edit to add a dependency and forgot that it rewrites your entire Cargo.toml, removing all comments and shuffling things around.

azerupi · 2017-08-21T09:43:46Z

src/book/book.rs

+    /// The chapter's contents.
+    pub content: String,
+    /// The chapter's section number, if it has one.
+    pub number: Option<SectionNumber>,


From what I have seen so far, I assume that unnumbered "frontmatter" and "backmatter" chapters are distinguished from the others just by the absence of a number here?

I wonder if it is not better to encode this into the BookItem or the Book directly as was considered in #146

pub struct Book { metadata: BookMetadata, preface: Vec<Chapter>, chapters: Vec<Chapter>, appendix: Vec<Chapter>, }

Doing this has the advantage of not requiring the renderers to check the number constantly and keep state. What do you think?

I wanted to join the old BookItem::Affix and BookItem::Chapter variants into one thing because they were essentially identical apart from the presence of a section number.

Do you think I should go back to making a BookItem something like this?

enum BookItem { Affix(Chapter), NumberedChapter(SectionNumber, Chapter), Separator, }

I'm not sure, you are right that they are almost identical. The issue here is that we introduce yet another check. First we need to match on the BookItem then we need to see if number is Some(_).

I kind of liked the original distinction for

enum BookItem { Affix(Chapter), NumberedChapter(SectionNumber, Chapter), Separator, }

current solution with Option<SectionNumber> is a little implicit but I cannot find anything wrong with it.

the variant with separate lists is also nice but enforces that the non numbered chapters should be always pre and post the numbered chapters. I would prefer to leave the decision to the users.

the variant with separate lists is also nice but enforces that the non numbered chapters should be always pre and post the numbered chapters. I would prefer to leave the decision to the users.

That is already a constraint currently. Would you let the user intersperse unnumbered chapters in between numbered chapters? Does numbering restart after that?

That's a very good point, it's probably better to adjust the BookItem type and the Book struct so that the preface, numbered chapters, and appendices are three separate lists.

Hmm... How would you deal with the types then? From what I can see, you want the ability to have only bare chapters (i.e. without chapter numbers) and separators in the appendices and preface, but then only numbered chapters and separators in the numbered chapters section. If we use the BookItem definition which contains Affix, NumberedChapter, and Separator then in either the numbered section or the outsides you're going to be doing a match where one variant is technically unreachable!(). Would we then need two enums?

It almost feels like in an attempt to avoid the Option<SectionNumber> problem we've gone and brought up an even bigger one...

That is already a constraint currently.

Yeah after acquainting myself with this part of code I see it now.

Would you let the user intersperse unnumbered chapters in between numbered chapters? Does numbering restart after that?

I would but only if anyone would need such feature. I see no reason to invent artificial use-cases so lets just forget about my previous comment ;)

azerupi · 2017-08-21T09:45:01Z

src/book/book.rs

+    /// The chapter's section number, if it has one.
+    pub number: Option<SectionNumber>,
+    /// Nested items.
+    pub sub_items: Vec<BookItem>,


To me, chapters can only contain sub-chapters. Separators need to be on the root level so we can change BookItem here to Chapter

That makes sense, I'll change the sub_items thing to be a Vec<Chapter>. Should we be making those sorts of decisions for people though? What if someone wanted to have three levels of chapters (going down to sub-sub-chapters) or if they wanted to separate the first couple sub-chapters in a chapter from the rest?

What if someone wanted to ...

I am not sure I understand what you are saying. You mean adding separators between sub-chapters? That seems a bit strange, no?

Sorry, I thought you were referring more to the fact that you can technically have a book with as many levels as you want, and were asking to constrain a book to having just the top level chapters and sub-chapters.

I agree separators between sub-chapters seems a bit strange, so I'll see if I can make the parser prevent that.

azerupi · 2017-08-21T10:37:25Z

src/book/mod.rs

-        // parse SUMMARY.md, and create the missing item related file
-        self.parse_summary()?;
-
-        debug!("[*]: constructing paths for missing files");


It seems like this is gone? This is a feature that should definitely not go away. When running init or build and the summary contains files that do not exist, they are created. This allows someone to plan out the whole structure of a book in the SUMMARY.md and get it auto-generated.

I have used this a couple of days ago when I got a bug report and a snippet of a SUMMARY.md, having the files auto-generated instead of having to create them manually was very useful.

That's interesting. I've actually found that feature to be annoying in the past because if you accidentally make a typo in your SUMMARY.md or run mdbook build in the wrong directory it'll silently succeed and generate stubs for the files. When really it should let you know your SUMMARY.md is incorrect so you can manually fix it.

If you're running mdbook init, doesn't that imply that you don't already have a SUMMARY.md file though? So it doesn't really make sense to look for one in that case, when init should just be generating a stub book to get you going... That was my logic anyway. But it's definitely open to discussion.

What about adding a switch to the command line program so you can do mdbook build --create-stubs to tell mdbook to parse the SUMMARY.md then create dummy files for any chapters which don't exist?

I just feel creating the stubs without the user explicitly asking for it is a bit too implicit/magic, which is why I left out the feature when merging the Book stuff in.

if you accidentally make a typo in your SUMMARY.md or run mdbook build in the wrong directory it'll silently succeed and generate stubs for the files.

Maybe we should remove this behaviour from the build command but leave it in the init.

If you're running mdbook init, doesn't that imply that you don't already have a SUMMARY.md file though?

Not necessarily, I think of init as a way to generate the missing files. In that perspective it makes sense. But as you pointed out, it may not align with everyones intuition.

I am ok with having this as an opt-in with an extra argument. I'm not sure about the name of the argument though --create-chapters, --gen-chapters, ...

@budziq do you have an opinion about this?

I am ok with having this as an opt-in with an extra argument. I'm not sure about the name of the argument though --create-chapters, --gen-chapters, ...
budziq do you have an opinion about this?

@azerupi sorry for lagging with the review (I'll try to make some time on Monday).
In regard to automatic generation of chapters. In terms of an API, I would suggest additional argument for a builder type. And in terms of actual mdbook binary I would go with additional option as you suggested.

sorry for lagging with the review

No worries, do not feel forced to review anything if you don't have the time :)

And in terms of actual mdbook binary I would go with additional option as you suggested.

Ok, good! I think everyone is in agreement here, so I propose that we make this opt-in behind a CLI flag both available for build and init

Sounds good. I have a feeling the MDBook helper struct might need a bit of restructuring at some point (perhaps converting it into a builder?) so auto-generating missing bits would just be a case of setting a flag or calling a method.

For now should I just add a boolean flag to the MDBook and a method for setting that flag, then work it into the build() and init() methods?

Yes, sounds good :)

azerupi · 2017-08-25T13:08:44Z

src/book/book.rs

+}
+
+
+#[cfg(test)]


In another project, I have successfully placed tests in their own files. In this case we would have a file src/book/tests.rs containing all the tests. This clearly separates the tests from the code and reduces the size of the files, especially if we have a lot of tests.

True, but it also splits them up from the module they're testing so then you'd have issues where you can't directly test the private methods, wouldn't it?

From what I've seen it's common practice to put unit tests in the same file as the things they're testing instead of moving them out into their own files, so that's why I originally put the tests where they are.

you'd have issues where you can't directly test the private methods

Yeah I don't remember the rules exactly. Sub-modules can access private methods but I am not sure about sibling modules.

From what I've seen it's common practice to put unit tests in the same file as the things they're testing

Certainly, but when the number of lines of tests largely exceed the number of lines of code, I am not necessarily fond of this convention. But this is not a blocker for the PR, I don't mind if we merge it like this and see if we can improve the situation in the future if it becomes unwieldy.

we can always decide to split the tests along the lines of public interface / implementation details. Impl detail tests could live next to the code under test and the "smoke tests" of the public interface could be moved into a separate file.

budziq

I had no time to time to look into summary.rs but the rest looks really solid 👍

budziq · 2017-08-29T13:45:05Z

src/bin/build.rs

@@ -24,8 +24,8 @@ pub fn execute(args: &ArgMatches) -> Result<()> {
        None => book,
    };

-    if args.is_present("no-create") {
-        book.create_missing = false;
+    if args.is_present("create") {


how about?

book.create_missing = args.is_present("create");

budziq · 2017-08-29T13:46:00Z

Cargo.toml

@@ -43,6 +43,10 @@ ws = { version = "0.7", optional = true}
 [build-dependencies]
 error-chain = "0.10"

+[dev-dependencies]
+pretty_assertions = "*"


I would suggest against wildcard dependencies

budziq · 2017-08-29T14:09:40Z

src/book/book.rs

+    /// The chapter's contents.
+    pub content: String,
+    /// The chapter's section number, if it has one.
+    pub number: Option<SectionNumber>,


I kind of liked the original distinction for

enum BookItem { Affix(Chapter), NumberedChapter(SectionNumber, Chapter), Separator, }

current solution with Option<SectionNumber> is a little implicit but I cannot find anything wrong with it.

the variant with separate lists is also nice but enforces that the non numbered chapters should be always pre and post the numbered chapters. I would prefer to leave the decision to the users.

budziq · 2017-08-29T14:22:27Z

src/book/book.rs

+}
+
+
+#[cfg(test)]


we can always decide to split the tests along the lines of public interface / implementation details. Impl detail tests could live next to the code under test and the "smoke tests" of the public interface could be moved into a separate file.

budziq · 2017-08-29T14:25:37Z

src/book/mod.rs

                writeln!(f, "# Summary")?;
-                writeln!(f, "")?;
+                writeln!(f)?;


how about writing it all in one go with one call?

Also I would suggest having these strings stored somewhere in one place as constants instead hardcoding the values along the code This will make it easier to change and inspect later on.

I like this idea. I've pulled them out into a const string at the very top so we can change them easily later on.

budziq · 2017-08-29T14:38:17Z

src/book/mod.rs

-                    writeln!(f, "# {}", ch.name)?;
-                }
+                let mut f = File::create(&ch_1)?;
+                writeln!(f, "# Chapter 1")?;


init is more than a little bloated. I'm not sure if we should not refactor some of its code into separate methods ie create_missing_*

Good idea, the init method looks a lot cleaner now.

budziq · 2017-08-30T07:32:22Z

src/book/summary.rs

+    }
+
+    fn step_suffix(&mut self, event: Event<'a>) -> Result<()> {
+        // FIXME: This has been copy/pasted from step_prefix. make DRY.


how about passing state State::PrefixChapters|State::SuffixChapters to step_prefix (renaming it to something like step_affix) and remove the copy pasted code?

budziq

I've red through summary.rs. Sans minor nits The code looks ok but running it with book-example yelds visual regression in summary sidebar presentation.

Additional styling was added to SUMMARY.md to check if backticks are still respected

- [Rust `Library`](lib/lib.md)

git master

this PR

budziq · 2017-08-30T07:35:52Z

src/book/summary.rs

+    fn fmt(&self, f: &mut Formatter) -> fmt::Result {
+        let dotted_number: String = self.0
+            .iter()
+            .map(|i| format!("{}", i))


how about using to_string

budziq · 2017-08-30T07:37:06Z

src/book/summary.rs

+
+        for (input, should_be) in inputs {
+            let section_number = SectionNumber(input);
+            let string_repr = format!("{}", section_number);


to_string should do the trick

Michael-F-Bryan · 2017-08-31T10:04:18Z

@budziq I incorporated the latest round of feedback and was wondering why you hadn't replied yet... Turns out I forgot to push my changes to GitHub 😆

I noticed that styling issue when testing on my machine too. Do you know what I might have messed up in the rendering stage for the CSS to not be applied properly? If we can figure out the exact class/text which is missing I'll be able to add it as part of our suite of integration tests.

EDIT: I figured out what happened. In the sidebar the ul tag which contains the links is opened and immediately closed, and then it goes through and adds all the li elements.

<div id="sidebar" class="sidebar">
  <ul class="chapter"></ul>
  <li><a href="README.html" class="active"><strong>1</strong> mdBook</a></li>
    ...
  <li><a href="misc/contributors.html">Contributors</a></li>
</div>

Michael-F-Bryan · 2017-09-02T10:52:01Z

I figured out the cause of the rendering bug. I wrote a section number as 1.2.3, while the mdbook::renderer::html_handlebars::helpers::toc::RenderToc helper assumes it'll be 1.2.3. and counts the number of dots to figure out the nesting level. The missing trailing dot means you have an off-by-one error and the TOC's ul tag gets closed too early.

Once I adjusted SectionNumber's Display impl to be the same as what RenderToc assumes, everything renders as normal.

@budziq and @azerupi now that I've pretty much finished incorporating the new Book representation in, what's next? I can't think of much else that needs to be done other than some manual testing to make sure there are no regressions.

Also, to help pick up issues like this in the future I'll see if I can find a rust library along the lines of Python's beautifulsoup (a really ergonomic library for inspecting a HTML document) and make a PR to incorporate it into our integration tests.

budziq · 2017-09-24T22:43:12Z

@Michael-F-Bryan Darn, I've completely missed this thread sorry! I'll try to look into this PR more next week when I'm away on RustFest.

I'll see if I can find a rust library along the lines of Python's beautifulsoup

That would be the select crate :)

Michael-F-Bryan · 2017-09-25T15:36:43Z

Darn, I've completely missed this thread sorry!

I've installed this PR on my own computer and started using it on projects to try it out. So far I haven't really found any issues apart from 7dbeebc where mdbook build would always overwrite your stuff with the stub mdbook init files.

It looks like the bulk of this PR is done, now we just need to review and refactor if necessary.

From the [pull request comment][pr], here's a rough summary of what was done in the squashed commits. --- \# Summary Parser - Added a private submodule called `mdbook::loader::summary` which contains all the code for parsing `SUMMARY.md` - A `Summary` contains a title (optional), then some prefix, numbered, and suffix chapters (technically `Vec<SummaryItem>`) - A `SummaryItem` is either a `Link` (i.e. link to a chapter), or a separator - A `Link` contains the chapter name, its location relative to the book's `src/` directory, and a list of nested `SummaryItems` - The `SummaryParser` (a state machine-based parser) uses `pulldown_cmark` to turn the `SUMMARY.md` string into a stream of `Events`, it then iterates over those events changing its behaviour depending on the current state, - The states are `Start`, `PrefixChapters`, `NestedChapters(u32)` (the `u32` represents your nesting level, because lists can contain lists), `SuffixChapters`, and `End` - Each state will read the appropriate link and build up the `Summary`, skipping any events which aren't a link, horizontal rule (separator), or a list \# Loader - Created a basic loader which can be used to load the `SUMMARY.md` in a directory. \# Tests - Added a couple unit tests for each state in the parser's state machine - Added integration tests for parsing a dummy SUMMARY.md then asserting the result is exactly what we expected [pr]: https://github.com/azerupi/mdBook/pull/371#issuecomment-312636102

This is a squashed commit. It roughly encompasses the following changes. --- \# Book - Created another private submodule, mdbook::loader::book - This submodule contains the data types representing a Book - For now the Book just contains a list of BookItems (either chapters or separators) - A Chapter contains its name, contents (as one long string), an optional section number (only numbered chapters have numbers, obviously), and any nested chapters - There's a function for loading a single Chapter from disk using it's associated Link entry from the SUMMARY.md - Another function builds up the Book by recursively visiting all Links and separators in the Summary and joining them into a single Vec<SummaryItem>. This is the only non-dumb-data-type item which is actually exported from the book module \# Loader - Made the loader use the book::load_book_from_disk function for loading a book in the loader's directory. \# Tests - Made sure you can load from disk by writing some files to a temporary directory - Made sure the Loader can load the entire example-book from disk and doesn't crash or hit an error - Increased test coverage from 34.4% to 47.7% (as reported by cargo kcov)

Michael-F-Bryan · 2017-09-30T09:21:16Z

@budziq and @azerupi, I've got this installed locally and it seems to work fine. What are your thoughts on starting the merge process?

There are still one or two unanswered questions (e.g. breaking book sections out into preface, numbered_chapters, and appendices), but there's no reason why they can't be fixed in follow-up issues.

Michael-F-Bryan · 2017-11-18T14:54:52Z

Due to excessive bitrot I've cherry-picked a lot of the changes from this PR and will be closing this in favour of the new version (#491).

Michael-F-Bryan mentioned this pull request Aug 21, 2017

Book representation #371

Closed

azerupi reviewed Aug 25, 2017

View reviewed changes

budziq reviewed Aug 29, 2017

View reviewed changes

budziq reviewed Aug 30, 2017

View reviewed changes

budziq suggested changes Aug 30, 2017

View reviewed changes

Michael-F-Bryan mentioned this pull request Sep 3, 2017

Regression tests #422

Merged

Michael-F-Bryan added 20 commits September 30, 2017 15:52

Added a depth-first chapter iterator

b925c7c

Started integrating some of the feedback from budziq

4202ead

Started cleaning up some of the lints and warnings

02d1d93

Deleted previous bookitem.md and moved loader contents across

ce6dbd6

Moved most of MDBook over to using the new Book format

7ca198a

Removed old bookitem.md, now everyone uses the correct BookItem

c4da845

Unit tests now all pass

d39352a

Everything compiles and all the tests pass.

3a51e4a

Reverted some churn (cheers cargo-edit) and added myself to contributors

7821835

Removed an unused function

1826fbd

Fixed up Cargo.toml

b530b67

Added a flag to create missing files

af8a548

Made non-existent file creation opt-in instead of opt-out

98a8ce9

Incorporated Budziq's feedback

f8cec4c

mdbook init will stub out chapters if SUMMARY already exists

e89e6a0

Reduced some of the code duplication

227f406

Fixed the rendering bug

1bd26fb

Fixed mdbook build overwriting your book with the mdbook init stuff

2bdca9e

Michael-F-Bryan force-pushed the book-representation-2 branch from 7dbeebc to 2bdca9e Compare September 30, 2017 08:10

Updated the hrefs in the integration tests

e9370c9

Michael-F-Bryan mentioned this pull request Oct 1, 2017

Spaghetti Time! #458

Open

6 tasks

This was referenced Nov 11, 2017

Allow for non-link summary items #483

Open

Making configuration more flexible #457

Merged

Michael-F-Bryan added this to the 0.1.0 milestone Nov 12, 2017

This was referenced Nov 18, 2017

Mdbook should translate internal references #408

Closed

Book representation - Attempt 3 #491

Merged

Michael-F-Bryan closed this Nov 18, 2017

Michael-F-Bryan deleted the book-representation-2 branch December 22, 2017 17:50

		}


		#[cfg(test)]

		}


		#[cfg(test)]

Book representation - Attempt #2 #409

Book representation - Attempt #2 #409

Uh oh!

Conversation

Michael-F-Bryan commented Aug 21, 2017

Uh oh!

azerupi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

budziq left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

azerupi left a comment •

edited

Loading

budziq left a comment •

edited

Loading

Michael-F-Bryan commented Aug 31, 2017 •

edited

Loading

Michael-F-Bryan commented Sep 30, 2017 •

edited

Loading