Skip to content

Book representation - Attempt #2 #409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Michael-F-Bryan
Copy link
Contributor

This is a cut down version of #371.

It breaks up the process of parsing the SUMMARY.md and loading the book from disk into two discrete stages. This also removes the dependency on a file system so the book can be constructed entirely in memory (chapter content is attached to the Chapter as one big string).

Copy link
Contributor

@azerupi azerupi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went trough the PR a first time and left a couple of comments :)
One of the comments is hidden by GitHub because it was attached to old code that has been deleted.

Cargo.toml Outdated
"book-example/*",
"src/theme/stylus",
]
version = "0.0.22-pre"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has everything been re-ordered in the Cargo.toml?
Did you use something like cargo-edit?

Could we roll back to the previous order? It made more sense to me and this adds noise to the diff :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm going to do that. I used cargo-edit to add a dependency and forgot that it rewrites your entire Cargo.toml, removing all comments and shuffling things around.

/// The chapter's contents.
pub content: String,
/// The chapter's section number, if it has one.
pub number: Option<SectionNumber>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I have seen so far, I assume that unnumbered "frontmatter" and "backmatter" chapters are distinguished from the others just by the absence of a number here?

I wonder if it is not better to encode this into the BookItem or the Book directly as was considered in #146

pub struct Book {
    metadata: BookMetadata,

    preface: Vec<Chapter>,
    chapters: Vec<Chapter>,
    appendix: Vec<Chapter>,
}

Doing this has the advantage of not requiring the renderers to check the number constantly and keep state. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to join the old BookItem::Affix and BookItem::Chapter variants into one thing because they were essentially identical apart from the presence of a section number.

Do you think I should go back to making a BookItem something like this?

enum BookItem {
  Affix(Chapter),
  NumberedChapter(SectionNumber, Chapter),
  Separator,
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, you are right that they are almost identical. The issue here is that we introduce yet another check. First we need to match on the BookItem then we need to see if number is Some(_).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of liked the original distinction for

enum BookItem {
  Affix(Chapter),
  NumberedChapter(SectionNumber, Chapter),
  Separator,
}

current solution with Option<SectionNumber> is a little implicit but I cannot find anything wrong with it.

the variant with separate lists is also nice but enforces that the non numbered chapters should be always pre and post the numbered chapters. I would prefer to leave the decision to the users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variant with separate lists is also nice but enforces that the non numbered chapters should be always pre and post the numbered chapters. I would prefer to leave the decision to the users.

That is already a constraint currently. Would you let the user intersperse unnumbered chapters in between numbered chapters? Does numbering restart after that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point, it's probably better to adjust the BookItem type and the Book struct so that the preface, numbered chapters, and appendices are three separate lists.

Hmm... How would you deal with the types then? From what I can see, you want the ability to have only bare chapters (i.e. without chapter numbers) and separators in the appendices and preface, but then only numbered chapters and separators in the numbered chapters section. If we use the BookItem definition which contains Affix, NumberedChapter, and Separator then in either the numbered section or the outsides you're going to be doing a match where one variant is technically unreachable!(). Would we then need two enums?

It almost feels like in an attempt to avoid the Option<SectionNumber> problem we've gone and brought up an even bigger one...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is already a constraint currently.

Yeah after acquainting myself with this part of code I see it now.

Would you let the user intersperse unnumbered chapters in between numbered chapters? Does numbering restart after that?

I would but only if anyone would need such feature. I see no reason to invent artificial use-cases so lets just forget about my previous comment ;)

/// The chapter's section number, if it has one.
pub number: Option<SectionNumber>,
/// Nested items.
pub sub_items: Vec<BookItem>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, chapters can only contain sub-chapters. Separators need to be on the root level so we can change BookItem here to Chapter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I'll change the sub_items thing to be a Vec<Chapter>. Should we be making those sorts of decisions for people though? What if someone wanted to have three levels of chapters (going down to sub-sub-chapters) or if they wanted to separate the first couple sub-chapters in a chapter from the rest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if someone wanted to ...

I am not sure I understand what you are saying. You mean adding separators between sub-chapters? That seems a bit strange, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I thought you were referring more to the fact that you can technically have a book with as many levels as you want, and were asking to constrain a book to having just the top level chapters and sub-chapters.

I agree separators between sub-chapters seems a bit strange, so I'll see if I can make the parser prevent that.

src/book/mod.rs Outdated
// parse SUMMARY.md, and create the missing item related file
self.parse_summary()?;

debug!("[*]: constructing paths for missing files");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this is gone? This is a feature that should definitely not go away. When running init or build and the summary contains files that do not exist, they are created. This allows someone to plan out the whole structure of a book in the SUMMARY.md and get it auto-generated.

I have used this a couple of days ago when I got a bug report and a snippet of a SUMMARY.md, having the files auto-generated instead of having to create them manually was very useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting. I've actually found that feature to be annoying in the past because if you accidentally make a typo in your SUMMARY.md or run mdbook build in the wrong directory it'll silently succeed and generate stubs for the files. When really it should let you know your SUMMARY.md is incorrect so you can manually fix it.

If you're running mdbook init, doesn't that imply that you don't already have a SUMMARY.md file though? So it doesn't really make sense to look for one in that case, when init should just be generating a stub book to get you going... That was my logic anyway. But it's definitely open to discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding a switch to the command line program so you can do mdbook build --create-stubs to tell mdbook to parse the SUMMARY.md then create dummy files for any chapters which don't exist?

I just feel creating the stubs without the user explicitly asking for it is a bit too implicit/magic, which is why I left out the feature when merging the Book stuff in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you accidentally make a typo in your SUMMARY.md or run mdbook build in the wrong directory it'll silently succeed and generate stubs for the files.

Maybe we should remove this behaviour from the build command but leave it in the init.

If you're running mdbook init, doesn't that imply that you don't already have a SUMMARY.md file though?

Not necessarily, I think of init as a way to generate the missing files. In that perspective it makes sense. But as you pointed out, it may not align with everyones intuition.

I am ok with having this as an opt-in with an extra argument. I'm not sure about the name of the argument though --create-chapters, --gen-chapters, ...

@budziq do you have an opinion about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with having this as an opt-in with an extra argument. I'm not sure about the name of the argument though --create-chapters, --gen-chapters, ...
budziq do you have an opinion about this?

@azerupi sorry for lagging with the review (I'll try to make some time on Monday).
In regard to automatic generation of chapters. In terms of an API, I would suggest additional argument for a builder type. And in terms of actual mdbook binary I would go with additional option as you suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for lagging with the review

No worries, do not feel forced to review anything if you don't have the time :)

And in terms of actual mdbook binary I would go with additional option as you suggested.

Ok, good! I think everyone is in agreement here, so I propose that we make this opt-in behind a CLI flag both available for build and init

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I have a feeling the MDBook helper struct might need a bit of restructuring at some point (perhaps converting it into a builder?) so auto-generating missing bits would just be a case of setting a flag or calling a method.

For now should I just add a boolean flag to the MDBook and a method for setting that flag, then work it into the build() and init() methods?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good :)

}


#[cfg(test)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another project, I have successfully placed tests in their own files. In this case we would have a file src/book/tests.rs containing all the tests. This clearly separates the tests from the code and reduces the size of the files, especially if we have a lot of tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but it also splits them up from the module they're testing so then you'd have issues where you can't directly test the private methods, wouldn't it?

From what I've seen it's common practice to put unit tests in the same file as the things they're testing instead of moving them out into their own files, so that's why I originally put the tests where they are.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'd have issues where you can't directly test the private methods

Yeah I don't remember the rules exactly. Sub-modules can access private methods but I am not sure about sibling modules.

From what I've seen it's common practice to put unit tests in the same file as the things they're testing

Certainly, but when the number of lines of tests largely exceed the number of lines of code, I am not necessarily fond of this convention. But this is not a blocker for the PR, I don't mind if we merge it like this and see if we can improve the situation in the future if it becomes unwieldy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can always decide to split the tests along the lines of public interface / implementation details. Impl detail tests could live next to the code under test and the "smoke tests" of the public interface could be moved into a separate file.

Copy link
Contributor

@budziq budziq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had no time to time to look into summary.rs but the rest looks really solid 👍

src/bin/build.rs Outdated
@@ -24,8 +24,8 @@ pub fn execute(args: &ArgMatches) -> Result<()> {
None => book,
};

if args.is_present("no-create") {
book.create_missing = false;
if args.is_present("create") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about?

book.create_missing = args.is_present("create");

Cargo.toml Outdated
@@ -43,6 +43,10 @@ ws = { version = "0.7", optional = true}
[build-dependencies]
error-chain = "0.10"

[dev-dependencies]
pretty_assertions = "*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest against wildcard dependencies

/// The chapter's contents.
pub content: String,
/// The chapter's section number, if it has one.
pub number: Option<SectionNumber>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of liked the original distinction for

enum BookItem {
  Affix(Chapter),
  NumberedChapter(SectionNumber, Chapter),
  Separator,
}

current solution with Option<SectionNumber> is a little implicit but I cannot find anything wrong with it.

the variant with separate lists is also nice but enforces that the non numbered chapters should be always pre and post the numbered chapters. I would prefer to leave the decision to the users.

}


#[cfg(test)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can always decide to split the tests along the lines of public interface / implementation details. Impl detail tests could live next to the code under test and the "smoke tests" of the public interface could be moved into a separate file.

src/book/mod.rs Outdated
writeln!(f, "# Summary")?;
writeln!(f, "")?;
writeln!(f)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about writing it all in one go with one call?

Also I would suggest having these strings stored somewhere in one place as constants instead hardcoding the values along the code This will make it easier to change and inspect later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea. I've pulled them out into a const string at the very top so we can change them easily later on.

src/book/mod.rs Outdated
writeln!(f, "# {}", ch.name)?;
}
let mut f = File::create(&ch_1)?;
writeln!(f, "# Chapter 1")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init is more than a little bloated. I'm not sure if we should not refactor some of its code into separate methods ie create_missing_*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, the init method looks a lot cleaner now.

}

fn step_suffix(&mut self, event: Event<'a>) -> Result<()> {
// FIXME: This has been copy/pasted from step_prefix. make DRY.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about passing state State::PrefixChapters|State::SuffixChapters to step_prefix (renaming it to something like step_affix) and remove the copy pasted code?

Copy link
Contributor

@budziq budziq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've red through summary.rs. Sans minor nits The code looks ok but running it with book-example yelds visual regression in summary sidebar presentation.

Additional styling was added to SUMMARY.md to check if backticks are still respected

- [Rust `Library`](lib/lib.md)

git master

ok

this PR

faulty

fn fmt(&self, f: &mut Formatter) -> fmt::Result {
let dotted_number: String = self.0
.iter()
.map(|i| format!("{}", i))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using to_string


for (input, should_be) in inputs {
let section_number = SectionNumber(input);
let string_repr = format!("{}", section_number);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_string should do the trick

@Michael-F-Bryan
Copy link
Contributor Author

Michael-F-Bryan commented Aug 31, 2017

@budziq I incorporated the latest round of feedback and was wondering why you hadn't replied yet... Turns out I forgot to push my changes to GitHub 😆

I noticed that styling issue when testing on my machine too. Do you know what I might have messed up in the rendering stage for the CSS to not be applied properly? If we can figure out the exact class/text which is missing I'll be able to add it as part of our suite of integration tests.

EDIT: I figured out what happened. In the sidebar the ul tag which contains the links is opened and immediately closed, and then it goes through and adds all the li elements.

<div id="sidebar" class="sidebar">
  <ul class="chapter"></ul>
  <li><a href="README.html" class="active"><strong>1</strong> mdBook</a></li>
    ...
  <li><a href="misc/contributors.html">Contributors</a></li>
</div>

@Michael-F-Bryan
Copy link
Contributor Author

I figured out the cause of the rendering bug. I wrote a section number as 1.2.3, while the mdbook::renderer::html_handlebars::helpers::toc::RenderToc helper assumes it'll be 1.2.3. and counts the number of dots to figure out the nesting level. The missing trailing dot means you have an off-by-one error and the TOC's ul tag gets closed too early.

Once I adjusted SectionNumber's Display impl to be the same as what RenderToc assumes, everything renders as normal.

@budziq and @azerupi now that I've pretty much finished incorporating the new Book representation in, what's next? I can't think of much else that needs to be done other than some manual testing to make sure there are no regressions.


Also, to help pick up issues like this in the future I'll see if I can find a rust library along the lines of Python's beautifulsoup (a really ergonomic library for inspecting a HTML document) and make a PR to incorporate it into our integration tests.

@Michael-F-Bryan Michael-F-Bryan mentioned this pull request Sep 3, 2017
@budziq
Copy link
Contributor

budziq commented Sep 24, 2017

@Michael-F-Bryan Darn, I've completely missed this thread sorry! I'll try to look into this PR more next week when I'm away on RustFest.

I'll see if I can find a rust library along the lines of Python's beautifulsoup

That would be the select crate :)

@Michael-F-Bryan
Copy link
Contributor Author

Darn, I've completely missed this thread sorry!

I've installed this PR on my own computer and started using it on projects to try it out. So far I haven't really found any issues apart from 7dbeebc where mdbook build would always overwrite your stuff with the stub mdbook init files.

It looks like the bulk of this PR is done, now we just need to review and refactor if necessary.

From the [pull request comment][pr], here's a rough summary of what
was done in the squashed commits.

---

\# Summary Parser

- Added a private submodule called `mdbook::loader::summary` which
contains all the code for parsing `SUMMARY.md`

- A `Summary` contains a title (optional), then some prefix, numbered,
and suffix chapters (technically `Vec<SummaryItem>`)

- A `SummaryItem` is either a `Link` (i.e. link to a chapter), or a
separator

- A `Link` contains the chapter name, its location relative to the
book's `src/` directory, and a list of nested `SummaryItems`

- The `SummaryParser` (a state machine-based parser) uses
`pulldown_cmark` to turn the `SUMMARY.md` string into a stream of
`Events`, it then iterates over those events changing its behaviour
depending on the current state,

  - The states are `Start`, `PrefixChapters`, `NestedChapters(u32)` (the
  `u32` represents your nesting level, because lists can contain lists),
  `SuffixChapters`, and `End`

  - Each state will read the appropriate link and build up the
  `Summary`, skipping any events which aren't a link, horizontal rule
  (separator), or a list

\# Loader

- Created a basic loader which can be used to load the `SUMMARY.md` in
a directory.

\# Tests

- Added a couple unit tests for each state in the parser's state
machine

- Added integration tests for parsing a dummy SUMMARY.md then asserting
the result is exactly what we expected

[pr]: https://github.com/azerupi/mdBook/pull/371#issuecomment-312636102
This is a squashed commit. It roughly encompasses the following changes.

---

\# Book

- Created another private submodule, mdbook::loader::book

- This submodule contains the data types representing a Book

- For now the Book just contains a list of BookItems (either chapters or
separators)

- A Chapter contains its name, contents (as one long string), an
optional section number (only numbered chapters have numbers,
obviously), and any nested chapters

- There's a function for loading a single Chapter from disk using it's
associated Link entry from the SUMMARY.md

- Another function builds up the Book by recursively visiting all Links
and separators in the Summary and joining them into a single
Vec<SummaryItem>. This is the only non-dumb-data-type item which is
actually exported from the book module

\# Loader

- Made the loader use the book::load_book_from_disk function for
loading a book in the loader's directory.

\# Tests

- Made sure you can load from disk by writing some files to a temporary
directory

- Made sure the Loader can load the entire example-book from disk and
doesn't crash or hit an error

- Increased test coverage from 34.4% to 47.7% (as reported by cargo
kcov)
@Michael-F-Bryan
Copy link
Contributor Author

Michael-F-Bryan commented Sep 30, 2017

@budziq and @azerupi, I've got this installed locally and it seems to work fine. What are your thoughts on starting the merge process?

There are still one or two unanswered questions (e.g. breaking book sections out into preface, numbered_chapters, and appendices), but there's no reason why they can't be fixed in follow-up issues.

@Michael-F-Bryan
Copy link
Contributor Author

Due to excessive bitrot I've cherry-picked a lot of the changes from this PR and will be closing this in favour of the new version (#491).

@Michael-F-Bryan Michael-F-Bryan deleted the book-representation-2 branch December 22, 2017 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants