Skip to content

move percent_encoding to its own crate #347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ travis-ci = { repository = "servo/rust-url" }
appveyor = { repository = "servo/rust-url" }

[workspace]
members = [".", "idna", "url_serde"]
members = [".", "idna", "percent_encoding", "url_serde"]

[[test]]
name = "unit"
Expand All @@ -44,5 +44,6 @@ encoding = {version = "0.2", optional = true}
heapsize = {version = ">=0.1.1, <0.4", optional = true}
idna = { version = "0.1.0", path = "./idna" }
matches = "0.1"
percent_encoding = { version = "1.0.0", path = "./percent_encoding" }
rustc-serialize = {version = "0.3", optional = true}
serde = {version = ">=0.6.1, <0.9", optional = true}
21 changes: 21 additions & 0 deletions percent_encoding/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
[package]
name = "percent_encoding"
version = "1.0.0"
authors = ["The rust-url developers"]
description = "Percent encoding and decoding"
repository = "https://github.com/servo/rust-url/"
license = "MIT/Apache-2.0"

[lib]
doctest = false
test = false

[[test]]
name = "tests"
harness = false

[dev-dependencies]
rustc-test = "0.1"
rustc-serialize = "0.3"

[dependencies]
28 changes: 23 additions & 5 deletions src/percent_encoding.rs → percent_encoding/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.

use encoding;
use std::ascii::AsciiExt;
use std::borrow::Cow;
use std::fmt;
Expand Down Expand Up @@ -44,8 +43,8 @@ pub trait EncodeSet: Clone {
/// =======
///
/// ```rust
/// #[macro_use] extern crate url;
/// use url::percent_encoding::{utf8_percent_encode, SIMPLE_ENCODE_SET};
/// #[macro_use] extern crate percent_encoding;
/// use percent_encoding::{utf8_percent_encode, SIMPLE_ENCODE_SET};
/// define_encode_set! {
/// /// This encode set is used in the URL parser for query strings.
/// pub QUERY_ENCODE_SET = [SIMPLE_ENCODE_SET] | {' ', '"', '#', '<', '>'}
Expand All @@ -62,7 +61,7 @@ macro_rules! define_encode_set {
#[allow(non_camel_case_types)]
pub struct $name;

impl $crate::percent_encoding::EncodeSet for $name {
impl $crate::EncodeSet for $name {
#[inline]
fn contains(&self, byte: u8) -> bool {
match byte as char {
Expand Down Expand Up @@ -339,6 +338,25 @@ impl<'a> PercentDecode<'a> {
/// Invalid UTF-8 percent-encoded byte sequences will be replaced � U+FFFD,
/// the replacement character.
pub fn decode_utf8_lossy(self) -> Cow<'a, str> {
encoding::decode_utf8_lossy(self.clone().into())
decode_utf8_lossy(self.clone().into())
}
}

fn decode_utf8_lossy(input: Cow<[u8]>) -> Cow<str> {
match input {
Cow::Borrowed(bytes) => String::from_utf8_lossy(bytes),
Cow::Owned(bytes) => {
let raw_utf8: *const [u8];
match String::from_utf8_lossy(&bytes) {
Cow::Borrowed(utf8) => raw_utf8 = utf8.as_bytes(),
Cow::Owned(s) => return s.into(),
}
// from_utf8_lossy returned a borrow of `bytes` unchanged.
debug_assert!(raw_utf8 == &*bytes as *const [u8]);
// Reuse the existing `Vec` allocation.
unsafe { String::from_utf8_unchecked(bytes) }.into()
}
}
}


47 changes: 46 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ assert_eq!(css_url.as_str(), "http://servo.github.io/rust-url/main.css")
#[cfg(feature="heapsize")] #[macro_use] extern crate heapsize;

pub extern crate idna;
pub extern crate percent_encoding;

use encoding::EncodingOverride;
#[cfg(feature = "heapsize")] use heapsize::HeapSizeOf;
Expand Down Expand Up @@ -131,7 +132,6 @@ mod parser;
mod slicing;

pub mod form_urlencoded;
pub mod percent_encoding;
pub mod quirks;

/// A parsed URL record.
Expand Down Expand Up @@ -1881,3 +1881,48 @@ impl<'a> Drop for UrlQuery<'a> {
self.url.restore_already_parsed_fragment(self.fragment.take())
}
}


/// Define a new struct
/// that implements the [`EncodeSet`](percent_encoding/trait.EncodeSet.html) trait,
/// for use in [`percent_decode()`](percent_encoding/fn.percent_encode.html)
/// and related functions.
///
/// Parameters are characters to include in the set in addition to those of the base set.
/// See [encode sets specification](http://url.spec.whatwg.org/#simple-encode-set).
///
/// Example
/// =======
///
/// ```rust
/// #[macro_use] extern crate url;
/// use url::percent_encoding::{utf8_percent_encode, SIMPLE_ENCODE_SET};
/// define_encode_set! {
/// /// This encode set is used in the URL parser for query strings.
/// pub QUERY_ENCODE_SET = [SIMPLE_ENCODE_SET] | {' ', '"', '#', '<', '>'}
/// }
/// # fn main() {
/// assert_eq!(utf8_percent_encode("foo bar", QUERY_ENCODE_SET).collect::<String>(), "foo%20bar");
/// # }
/// ```
#[macro_export]
macro_rules! define_encode_set {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a way to reexport a macro from a different crate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the only way on current stable Rust is pub use some_crate::*;, which of course also re-exports other things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's #[macro_reexport], which is unstable, and there's a the glob trick, which I'm still not certain is on purpose, but still, it re-exports too many things, so duplicating the macro seems required...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nox you said in IRC you prefer the glob trick. To reiterate, that requires a pub use percent_encoding::* inside the url root. That would suddenly make all the things normally in that module/crate pollute the url namespace. Suddenly there is url::SIMPLE_ENCODE_SET, besides url::percent_encoding::SIMPLE_ENCODE_SET and friends.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just put the macro in a submodule of percent_encoding and glob import that module from both percent_encoding and url?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's possible, but puts another public (useless) module in the percent_encoding crate. Of all the possible options, I find just redefining the macro in url to be the best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given both crates are in the same repository, IMO it is better to just make a #[doc(hidden)] module and avoid the duplication of the macro, but I'll let @SimonSapin the final say.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When macros 2.0 arrive (already major pieces in nightly!), they can be named explicit, so it could eventually just become in url pub use percent_encoding::define_encode_set;.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't think these two crates can really afford to rely on hot new stuff, given how central it is to the ecosystem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can cross that bridge when we get there, but this crate has already made upgrades to 1.15, and other than using the newest the day after, it seems to be fine to upgrade to newer version.

($(#[$attr: meta])* pub $name: ident = [$base_set: expr] | {$($ch: pat),*}) => {
$(#[$attr])*
#[derive(Copy, Clone)]
#[allow(non_camel_case_types)]
pub struct $name;

impl $crate::percent_encoding::EncodeSet for $name {
#[inline]
fn contains(&self, byte: u8) -> bool {
match byte as char {
$(
$ch => true,
)*
_ => $base_set.contains(byte)
}
}
}
}
}