Skip to content

Add complete preprocessor example #629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 24, 2018
Merged

Add complete preprocessor example #629

merged 7 commits into from
Feb 24, 2018

Conversation

Byron
Copy link
Member

@Byron Byron commented Feb 21, 2018

As I recently implemented my own preprocessor (see termbook), I learned how hard it can be to get started with the current level of documentation, and with the current infrastructure.

In order to achieve what I wanted, I created a crate which allows to serialize pulldown-cmark events back to a string representation that is very close to the original (events are lossy to some extend).

I thought it would be a waste just to keep it for myself and thought I should add a paragraph to the preprocessor chapter of the book and add an example to lower the bar for others to add their own preprocessors.

Please let me know what I should change to make it possible to merge.

@Michael-F-Bryan
Copy link
Contributor

Oh wow, I'm really pleased to see other people using mdbook and the preprocessor for their own projects! 😁

I learned how hard it can be to get started with the current level of documentation, and with the current infrastructure.

Sorry if it feels difficult to use at the moment. We thought we'd start off simple with the preprocessor/alternate renderer architecture and then expand things as we implement things in real-life and get feedback from the community. As such, we're definitely open to suggestions and helping solve any pain points you may be experiencing!

Copy link
Contributor

@Michael-F-Bryan Michael-F-Bryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea behind this example. At less than 100 lines it's simple enough for people to wrap their heads around, yet still quite relevant and something which isn't unreasonable for people to want to do.

I made a couple comments. They're mostly stylistic small things, but overall I'm really happy with this PR 👍

extern crate pulldown_cmark;
extern crate pulldown_cmark_to_cmark;

// This program removes all forms of emphasis from the markdown of the book.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be better suited as a crate-level comment (//! ...)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! I totally forgot these exist!

let mut res: Option<_> = None;
let mut num_removed_items = 0;
book.for_each_mut(|item: &mut BookItem| {
if let &Some(Err(_)) = &res {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just seeing this makes me think it'd be nicer to make the Book::sections field public and let people recursively walk the book's contents. The reason I originally wanted to have a for_each_mut() method is to prevent accidental iterator invalidation-like issues (e.g. if I add a nested item to some chapter, should we visit that too?) or forgetting to recurse.

Looking at how painful it is to continue on or return early, this approach may be a bit too restrictive... Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! These were my thoughts too. I was wondering too as to why these fields are hidden - they making interacting with the book more difficult on both sides. But on the other hand I am someone who is a sworn enemy of data hiding in case the internal state is fully understood.

My suggestion would be to just make the state of the mdbook public, as there is no 'invalid' state thanks to Rusts type system. There are no special variants that would have to be maintained by methods, for example.

As a sidenote: I noticed that it's impossible to currently get rid of the default renderer, which is an inconvenience for termbook play for example, as it will always generate html docs as side-effect.

"md-links-to-html-links"
}

fn run(&self, _ctx: &PreprocessorContext, book: &mut Book) -> ::std::result::Result<(), Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you glob import mdbook's errors(use mdbook::errors::*) you won't need to fully qualify the ::std::result::Result<Error> here. It's a personal style thing, but I tend to find fully qualified names less readable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Except that I just import Result as I think spelling it out is preferable if the item count is low.

});
cmark(events, &mut buf, None)
});
if let &Some(Ok(_)) = &res {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you don't need to borrow res here if you aren't using the thing inside it (the _).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely true! I changed things around to ... something so different this doesn't apply anymore.

}
}

fn do_it(book: OsString) -> ::std::result::Result<(), Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, what was the reasoning for explicitly take an OsString here? It's not every day you see OsString being used outside of an FFI/OS context.

We also load books from the file system, so I usually take a &Path or even P: Into<PathBuf> or P: AsRef<Path> like in File::open().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguments are natively stored in OsStrings which have arbitrary encoding. Paths also assume no encoding, as they are interpreted by the OS.
Thus I generally don't convert program arguments to String if they should end up in a Path as this would unnecessarily attempt to decode them as utf-8, which might fail.

chapter.content = buf;
}
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a newline in here so the code is a little less crowded?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changed a lot, please re-evaluate. Otherwise, anytime :)!

return;
}
if let BookItem::Chapter(ref mut chapter) = *item {
eprintln!("{}: processing chapter '{}'", self.name(), chapter.name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably pull this closure body out into its own function (e.g. remove_emphasis(ch: &mut Chapter)). That way the example is easier for people to read and wrap their heads around when they're trying to understand using mdbook as a library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! And pulled out a part of it. Thanks to the shared state and the hackery with the result pulling it out on the highest level makes it even more complicated.
However, I think I found a sweet spot that makes it better actually, and I hope you will feel similarly.

## A complete Example

The magic happens within the `run(...)` method of the `Preprocessor` trait implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a good idea to add a link to the Preprocessor trait definition here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! For now I chose to link to the version directly, without wildcards.

@Byron
Copy link
Member Author

Byron commented Feb 23, 2018

Sorry if it feels difficult to use at the moment.

I didn't mean to emphasize the bad, actually I am really happy with mdbook and the value it provides. All I tried to say is: Right now, it's not as easy in that regard, but here I am to make it better. I can imagine a bunch of changes and contributions to make termbook nicer and easier to plug-in, and am happy help.

Besides, please have a look once more - I hope to have addressed all the things that came up during the review.

@Byron
Copy link
Member Author

Byron commented Feb 23, 2018

Oh, and by the way: I love that you like this PR!
termbook wouldn't have been possible without mdbook, and I truly love the capabilities it provides to developers. After all, it's really easy to implement whatever you need.

For me a viable next step would be to improve the Preprocessor-Journey by making chapter access easier, and allowing plugin-programs to be used, too.

Ideally, termbook could just be a bunch of programs that are configured in book.toml, and I also thinks it's totally possible to get there.

@Michael-F-Bryan Michael-F-Bryan merged commit bb043ef into rust-lang:master Feb 24, 2018
@Michael-F-Bryan
Copy link
Contributor

Ideally, termbook could just be a bunch of programs that are configured in book.toml, and I also thinks it's totally possible to get there.

When I first had a look at the termbook repository and skimmed through the source code I got the feeling that it could quite easily be implemented as a couple preprocessors and a backend for the "playback" functionality.

If you want to propose changes to how chapters are accessed then be more than happy to hear what you propose! I've found shelling out to a user-defined program for rendering a book to be quite a nice way to do things. Your renderer can then pull in the mdbook crate to deserialize a RenderContext from stdin and have full access to the book and configuration file.

In the long term I'd like pre-processors to be somewhat similar, although I'm stumped on a couple things, for instance some preprocessors will only be relevant when combined with a specific backend, and so on.

@Byron
Copy link
Member Author

Byron commented Feb 24, 2018

I've found shelling out to a user-defined program for rendering a book to be quite a nice way to do things. Your renderer can then pull in the mdbook crate to deserialize a RenderContext from stdin and have full access to the book and configuration file.

It's a lovely idea! I thought that preprocessor programs would be the same, except that they output the possibly changed book to stdout, making it mutable.

If you want to propose changes to how chapters are accessed then be more than happy to hear what you propose!

To me it seems the lowest-hanging fruit is to return a Result in the for_each_mut loop. That wouldn't even break anybody.
The next step could be to just allow direct access to the chapters (and preprocessors and renderers) - as I said, I was unable to not have the default html renderer after loading a book.

[...] I'm stumped on a couple things, for instance some preprocessors will only be relevant when combined with a specific backend, and so on.

To me it's alright to expect the user to configure everything correctly in the book.toml, and to expect preprocessors and renderers to not just panic if they don't find what they expect.
I wouldn't add complexity to the implementation by trying to remove all foot-guns, but start thinking about solving issues when they are made public via issues or PRs (and the first victims roll into the ER ;)).

Michael-F-Bryan added a commit that referenced this pull request Apr 7, 2018
Drive-by refactoring because the de-emphasize preprocessor was broken
Ruin0x11 pushed a commit to Ruin0x11/mdBook that referenced this pull request Aug 30, 2020
* First version of preprocessor example, with quicli

It seems it's not worth it right now.

* Remove quicli, just to simplify everything

* Finish de-emphasise example

* Finish preprocessor example in book

* Rename preprocessor type

* Apply changes requested in review

* Update preprocessor docs with latest code

[skip CI]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants