-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Book representation - Attempt #2 #409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Book representation - Attempt #2 #409
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went trough the PR a first time and left a couple of comments :)
One of the comments is hidden by GitHub because it was attached to old code that has been deleted.
Cargo.toml
Outdated
"book-example/*", | ||
"src/theme/stylus", | ||
] | ||
version = "0.0.22-pre" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has everything been re-ordered in the Cargo.toml?
Did you use something like cargo-edit?
Could we roll back to the previous order? It made more sense to me and this adds noise to the diff :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm going to do that. I used cargo-edit to add a dependency and forgot that it rewrites your entire Cargo.toml
, removing all comments and shuffling things around.
/// The chapter's contents. | ||
pub content: String, | ||
/// The chapter's section number, if it has one. | ||
pub number: Option<SectionNumber>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I have seen so far, I assume that unnumbered "frontmatter" and "backmatter" chapters are distinguished from the others just by the absence of a number here?
I wonder if it is not better to encode this into the BookItem
or the Book
directly as was considered in #146
pub struct Book {
metadata: BookMetadata,
preface: Vec<Chapter>,
chapters: Vec<Chapter>,
appendix: Vec<Chapter>,
}
Doing this has the advantage of not requiring the renderers to check the number constantly and keep state. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to join the old BookItem::Affix
and BookItem::Chapter
variants into one thing because they were essentially identical apart from the presence of a section number.
Do you think I should go back to making a BookItem
something like this?
enum BookItem {
Affix(Chapter),
NumberedChapter(SectionNumber, Chapter),
Separator,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, you are right that they are almost identical. The issue here is that we introduce yet another check. First we need to match on the BookItem
then we need to see if number
is Some(_)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of liked the original distinction for
enum BookItem {
Affix(Chapter),
NumberedChapter(SectionNumber, Chapter),
Separator,
}
current solution with Option<SectionNumber>
is a little implicit but I cannot find anything wrong with it.
the variant with separate lists is also nice but enforces that the non numbered chapters should be always pre and post the numbered chapters. I would prefer to leave the decision to the users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the variant with separate lists is also nice but enforces that the non numbered chapters should be always pre and post the numbered chapters. I would prefer to leave the decision to the users.
That is already a constraint currently. Would you let the user intersperse unnumbered chapters in between numbered chapters? Does numbering restart after that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very good point, it's probably better to adjust the BookItem
type and the Book
struct so that the preface, numbered chapters, and appendices are three separate lists.
Hmm... How would you deal with the types then? From what I can see, you want the ability to have only bare chapters (i.e. without chapter numbers) and separators in the appendices and preface, but then only numbered chapters and separators in the numbered chapters section. If we use the BookItem
definition which contains Affix
, NumberedChapter
, and Separator
then in either the numbered section or the outsides you're going to be doing a match where one variant is technically unreachable!()
. Would we then need two enums?
It almost feels like in an attempt to avoid the Option<SectionNumber>
problem we've gone and brought up an even bigger one...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is already a constraint currently.
Yeah after acquainting myself with this part of code I see it now.
Would you let the user intersperse unnumbered chapters in between numbered chapters? Does numbering restart after that?
I would but only if anyone would need such feature. I see no reason to invent artificial use-cases so lets just forget about my previous comment ;)
/// The chapter's section number, if it has one. | ||
pub number: Option<SectionNumber>, | ||
/// Nested items. | ||
pub sub_items: Vec<BookItem>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, chapters can only contain sub-chapters. Separators need to be on the root level so we can change BookItem
here to Chapter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, I'll change the sub_items
thing to be a Vec<Chapter>
. Should we be making those sorts of decisions for people though? What if someone wanted to have three levels of chapters (going down to sub-sub-chapters) or if they wanted to separate the first couple sub-chapters in a chapter from the rest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if someone wanted to ...
I am not sure I understand what you are saying. You mean adding separators between sub-chapters? That seems a bit strange, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I thought you were referring more to the fact that you can technically have a book with as many levels as you want, and were asking to constrain a book to having just the top level chapters and sub-chapters.
I agree separators between sub-chapters seems a bit strange, so I'll see if I can make the parser prevent that.
src/book/mod.rs
Outdated
// parse SUMMARY.md, and create the missing item related file | ||
self.parse_summary()?; | ||
|
||
debug!("[*]: constructing paths for missing files"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this is gone? This is a feature that should definitely not go away. When running init
or build
and the summary contains files that do not exist, they are created. This allows someone to plan out the whole structure of a book in the SUMMARY.md
and get it auto-generated.
I have used this a couple of days ago when I got a bug report and a snippet of a SUMMARY.md
, having the files auto-generated instead of having to create them manually was very useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting. I've actually found that feature to be annoying in the past because if you accidentally make a typo in your SUMMARY.md
or run mdbook build
in the wrong directory it'll silently succeed and generate stubs for the files. When really it should let you know your SUMMARY.md
is incorrect so you can manually fix it.
If you're running mdbook init
, doesn't that imply that you don't already have a SUMMARY.md
file though? So it doesn't really make sense to look for one in that case, when init
should just be generating a stub book to get you going... That was my logic anyway. But it's definitely open to discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding a switch to the command line program so you can do mdbook build --create-stubs
to tell mdbook
to parse the SUMMARY.md
then create dummy files for any chapters which don't exist?
I just feel creating the stubs without the user explicitly asking for it is a bit too implicit/magic, which is why I left out the feature when merging the Book
stuff in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you accidentally make a typo in your SUMMARY.md or run mdbook build in the wrong directory it'll silently succeed and generate stubs for the files.
Maybe we should remove this behaviour from the build
command but leave it in the init
.
If you're running mdbook init, doesn't that imply that you don't already have a SUMMARY.md file though?
Not necessarily, I think of init
as a way to generate the missing files. In that perspective it makes sense. But as you pointed out, it may not align with everyones intuition.
I am ok with having this as an opt-in with an extra argument. I'm not sure about the name of the argument though --create-chapters
, --gen-chapters
, ...
@budziq do you have an opinion about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with having this as an opt-in with an extra argument. I'm not sure about the name of the argument though --create-chapters, --gen-chapters, ...
budziq do you have an opinion about this?
@azerupi sorry for lagging with the review (I'll try to make some time on Monday).
In regard to automatic generation of chapters. In terms of an API, I would suggest additional argument for a builder type. And in terms of actual mdbook binary I would go with additional option as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for lagging with the review
No worries, do not feel forced to review anything if you don't have the time :)
And in terms of actual mdbook binary I would go with additional option as you suggested.
Ok, good! I think everyone is in agreement here, so I propose that we make this opt-in behind a CLI flag both available for build
and init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I have a feeling the MDBook
helper struct might need a bit of restructuring at some point (perhaps converting it into a builder?) so auto-generating missing bits would just be a case of setting a flag or calling a method.
For now should I just add a boolean flag to the MDBook
and a method for setting that flag, then work it into the build()
and init()
methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sounds good :)
} | ||
|
||
|
||
#[cfg(test)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In another project, I have successfully placed tests in their own files. In this case we would have a file src/book/tests.rs
containing all the tests. This clearly separates the tests from the code and reduces the size of the files, especially if we have a lot of tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but it also splits them up from the module they're testing so then you'd have issues where you can't directly test the private methods, wouldn't it?
From what I've seen it's common practice to put unit tests in the same file as the things they're testing instead of moving them out into their own files, so that's why I originally put the tests where they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'd have issues where you can't directly test the private methods
Yeah I don't remember the rules exactly. Sub-modules can access private methods but I am not sure about sibling modules.
From what I've seen it's common practice to put unit tests in the same file as the things they're testing
Certainly, but when the number of lines of tests largely exceed the number of lines of code, I am not necessarily fond of this convention. But this is not a blocker for the PR, I don't mind if we merge it like this and see if we can improve the situation in the future if it becomes unwieldy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can always decide to split the tests along the lines of public interface / implementation details. Impl detail tests could live next to the code under test and the "smoke tests" of the public interface could be moved into a separate file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had no time to time to look into summary.rs but the rest looks really solid 👍
src/bin/build.rs
Outdated
@@ -24,8 +24,8 @@ pub fn execute(args: &ArgMatches) -> Result<()> { | |||
None => book, | |||
}; | |||
|
|||
if args.is_present("no-create") { | |||
book.create_missing = false; | |||
if args.is_present("create") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about?
book.create_missing = args.is_present("create");
Cargo.toml
Outdated
@@ -43,6 +43,10 @@ ws = { version = "0.7", optional = true} | |||
[build-dependencies] | |||
error-chain = "0.10" | |||
|
|||
[dev-dependencies] | |||
pretty_assertions = "*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest against wildcard dependencies
/// The chapter's contents. | ||
pub content: String, | ||
/// The chapter's section number, if it has one. | ||
pub number: Option<SectionNumber>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of liked the original distinction for
enum BookItem {
Affix(Chapter),
NumberedChapter(SectionNumber, Chapter),
Separator,
}
current solution with Option<SectionNumber>
is a little implicit but I cannot find anything wrong with it.
the variant with separate lists is also nice but enforces that the non numbered chapters should be always pre and post the numbered chapters. I would prefer to leave the decision to the users.
} | ||
|
||
|
||
#[cfg(test)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can always decide to split the tests along the lines of public interface / implementation details. Impl detail tests could live next to the code under test and the "smoke tests" of the public interface could be moved into a separate file.
src/book/mod.rs
Outdated
writeln!(f, "# Summary")?; | ||
writeln!(f, "")?; | ||
writeln!(f)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about writing it all in one go with one call?
Also I would suggest having these strings stored somewhere in one place as constants instead hardcoding the values along the code This will make it easier to change and inspect later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. I've pulled them out into a const
string at the very top so we can change them easily later on.
src/book/mod.rs
Outdated
writeln!(f, "# {}", ch.name)?; | ||
} | ||
let mut f = File::create(&ch_1)?; | ||
writeln!(f, "# Chapter 1")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init
is more than a little bloated. I'm not sure if we should not refactor some of its code into separate methods ie create_missing_*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, the init
method looks a lot cleaner now.
src/book/summary.rs
Outdated
} | ||
|
||
fn step_suffix(&mut self, event: Event<'a>) -> Result<()> { | ||
// FIXME: This has been copy/pasted from step_prefix. make DRY. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about passing state State::PrefixChapters
|State::SuffixChapters
to step_prefix
(renaming it to something like step_affix
) and remove the copy pasted code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/book/summary.rs
Outdated
fn fmt(&self, f: &mut Formatter) -> fmt::Result { | ||
let dotted_number: String = self.0 | ||
.iter() | ||
.map(|i| format!("{}", i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about using to_string
src/book/summary.rs
Outdated
|
||
for (input, should_be) in inputs { | ||
let section_number = SectionNumber(input); | ||
let string_repr = format!("{}", section_number); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_string
should do the trick
@budziq I incorporated the latest round of feedback and was wondering why you hadn't replied yet... Turns out I forgot to push my changes to GitHub 😆 I noticed that styling issue when testing on my machine too. Do you know what I might have messed up in the rendering stage for the CSS to not be applied properly? If we can figure out the exact class/text which is missing I'll be able to add it as part of our suite of integration tests. EDIT: I figured out what happened. In the sidebar the <div id="sidebar" class="sidebar">
<ul class="chapter"></ul>
<li><a href="README.html" class="active"><strong>1</strong> mdBook</a></li>
...
<li><a href="misc/contributors.html">Contributors</a></li>
</div> |
I figured out the cause of the rendering bug. I wrote a section number as Once I adjusted @budziq and @azerupi now that I've pretty much finished incorporating the new Also, to help pick up issues like this in the future I'll see if I can find a rust library along the lines of Python's beautifulsoup (a really ergonomic library for inspecting a HTML document) and make a PR to incorporate it into our integration tests. |
@Michael-F-Bryan Darn, I've completely missed this thread sorry! I'll try to look into this PR more next week when I'm away on RustFest.
That would be the select crate :) |
I've installed this PR on my own computer and started using it on projects to try it out. So far I haven't really found any issues apart from 7dbeebc where It looks like the bulk of this PR is done, now we just need to review and refactor if necessary. |
From the [pull request comment][pr], here's a rough summary of what was done in the squashed commits. --- \# Summary Parser - Added a private submodule called `mdbook::loader::summary` which contains all the code for parsing `SUMMARY.md` - A `Summary` contains a title (optional), then some prefix, numbered, and suffix chapters (technically `Vec<SummaryItem>`) - A `SummaryItem` is either a `Link` (i.e. link to a chapter), or a separator - A `Link` contains the chapter name, its location relative to the book's `src/` directory, and a list of nested `SummaryItems` - The `SummaryParser` (a state machine-based parser) uses `pulldown_cmark` to turn the `SUMMARY.md` string into a stream of `Events`, it then iterates over those events changing its behaviour depending on the current state, - The states are `Start`, `PrefixChapters`, `NestedChapters(u32)` (the `u32` represents your nesting level, because lists can contain lists), `SuffixChapters`, and `End` - Each state will read the appropriate link and build up the `Summary`, skipping any events which aren't a link, horizontal rule (separator), or a list \# Loader - Created a basic loader which can be used to load the `SUMMARY.md` in a directory. \# Tests - Added a couple unit tests for each state in the parser's state machine - Added integration tests for parsing a dummy SUMMARY.md then asserting the result is exactly what we expected [pr]: https://github.com/azerupi/mdBook/pull/371#issuecomment-312636102
This is a squashed commit. It roughly encompasses the following changes. --- \# Book - Created another private submodule, mdbook::loader::book - This submodule contains the data types representing a Book - For now the Book just contains a list of BookItems (either chapters or separators) - A Chapter contains its name, contents (as one long string), an optional section number (only numbered chapters have numbers, obviously), and any nested chapters - There's a function for loading a single Chapter from disk using it's associated Link entry from the SUMMARY.md - Another function builds up the Book by recursively visiting all Links and separators in the Summary and joining them into a single Vec<SummaryItem>. This is the only non-dumb-data-type item which is actually exported from the book module \# Loader - Made the loader use the book::load_book_from_disk function for loading a book in the loader's directory. \# Tests - Made sure you can load from disk by writing some files to a temporary directory - Made sure the Loader can load the entire example-book from disk and doesn't crash or hit an error - Increased test coverage from 34.4% to 47.7% (as reported by cargo kcov)
7dbeebc
to
2bdca9e
Compare
@budziq and @azerupi, I've got this installed locally and it seems to work fine. What are your thoughts on starting the merge process? There are still one or two unanswered questions (e.g. breaking book sections out into |
Due to excessive bitrot I've cherry-picked a lot of the changes from this PR and will be closing this in favour of the new version (#491). |
This is a cut down version of #371.
It breaks up the process of parsing the
SUMMARY.md
and loading the book from disk into two discrete stages. This also removes the dependency on a file system so the book can be constructed entirely in memory (chapter content is attached to theChapter
as one big string).