Skip to content

Proposed TcpStream::open and TcpStream::open_timeout #13919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 13, 2014

Conversation

thomaslee
Copy link
Contributor

Been meaning to try my hand at something like this for a while, and noticed something similar mentioned as part of #13537. The suggestion on the original ticket is to use TcpStream::open(&str) to pass in a host + port string, but seems a little cleaner to pass in host and port separately -- so a signature like TcpStream::open(&str, u16).

Also means we can use std::io::net::addrinfo directly instead of using e.g. liburl to parse the host+port pair from a string.

One outstanding issue in this PR that I'm not entirely sure how to address: in open_timeout, the timeout_ms will apply for every A record we find associated with a hostname -- probably not the intended behavior, but I didn't want to waste my time on elaborate alternatives until the general idea was a-OKed. :)

Anyway, perhaps there are other reasons for us to prefer the original proposed syntax, but thought I'd get some thoughts on this. Maybe there are some solid reasons to prefer using liburl to do this stuff.

@alexcrichton
Copy link
Member

I've avoided adding this to the standard library because I think that doing it may require a significant amount of code. I think the implementation here is fine, it's just not what I would quite expect. I would expect this to be a pure convenience method which only takes a string argument, which is the address to open up (or bind to). Taking a string and a number for a port I'm not sure is getting us much closer to the goal.

Additionally, the semantics around DNS resolution are a little tricky here, especially in the timeout case. The timeout being supplied is not an overall timeout, but rather only for one connect operation, not the DNS lookup, nor the whole batch of connects (if there are many). It's close to the best that we can do, but I think that filling this out more will require a good deal more code (just guessing, though).

All in all, I've avoided doing this because I think doing it as taking one string argument and handling weird edge cases always seemed like a good chunk of code that probably didn't belong in the standard library itself.

That being said, this is incredibly useful, I hate having to type out SocketAddr and Ipv4Addr all the time! Perhaps for now (until libstd is reorganized a bit), this functionality could go in liburl? Here's some things that I think would need to be touched up one way or another:

  • The timeout semantics aren't documented and may not be obvious to some.
  • The semantics of how connecting to a named host actually works should be documented (each ip is tried in turn).
  • This functionality of using just a string I'd also like to have for both binding a TCP socket and UDP socket.

@thomaslee
Copy link
Contributor Author

I would expect this to be a pure convenience method which only takes a string argument,
which is the address to open up (or bind to). Taking a string and a number for a port
I'm not sure is getting us much closer to the goal.

IMO (and as you later observe) the most annoying part is setting up the SocketAddr jazz. I find the string-only API a little surprising. With the exception of Go, most mainstream languages I've seen tend to separate host+port for this sort of convenience method:

Ruby, Python (the host+port is a tuple here), [Java](http://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#Socket%28java.lang.String, int%29), C#

Not that we need to blindly follow that convention, I guess I just don't see the gap between the convention-friendly TcpStream::open("google.com", 80) and what you're proposing. :)

Happy to move things to liburl if that seems more appropriate -- I don't feel that strongly about it -- but given we're talking about potentially moving it out of libstd to satisfy the proposed interface I thought I might offer up this alternative.

Anyway, let me know if you're still set on the single string arg for open/bind & a move to liburl -- easy enough either way. I assume you don't want to move the entirety of TcpStream into liburl -- imagine something like url::open("google.com:80")?

Additionally, the semantics around DNS resolution are a little tricky here, especially in the
timeout case. The timeout being supplied is not an overall timeout, but rather only for one
connect operation, not the DNS lookup, nor the whole batch of connects (if there are many).
It's close to the best that we can do, but I think that filling this out more will require a good
deal more code (just guessing, though).

I totally agree with all this. We could subtract the DNS lookup time from the remaining connect timeout, but then if the first connect fails & we fall back to a second A record we're left with ambiguity wrt what the "correct" behavior should be.

Options for dealing with this in other languages vary -- Ruby offers no timeout support out of the box & seems to rely on things like select to handle timeouts, Python ignores the DNS aspect entirely & assumes the given timeout is for the connect call.

The more I think about it, the more I'd be just as happy to ditch the open_timeout implementation for now & maybe look at our options there another time.

@alexcrichton
Copy link
Member

You have convinced me, a string + port doesn't sound that bad at all! Perhaps in that case this could live in libstd for now, seeing how the implementation is small enough. I think it would be cool to have something like url::open one day, but I'm not entirely sure how that would work...

Could you add some tests for opening a stream with hosts that look like "127.0.0.1" or "[::1]", etc? I'm curious what happens when you pass IPs in. Additionally, let's mark the function as experimental for now, I'm not entirely sure if it's in the right location for forever.

I also agree with removing open_timeout for now. If it's necessary, you can manually resolve the host and then use connect_timeout for now.

@thomaslee
Copy link
Contributor Author

Glad you came around :) Assuming there are no problems with the Travis CI builds, I think this is what TcpStream::open looks like for now.

Any thoughts on what the bind equivalent should be called? Proposals: open, start, bind_addr_port, bind_str ... nothing really grabs me as "obvious" except for bind, and that's already taken :)

@thomaslee
Copy link
Contributor Author

Guess we could rename the more-difficult-to-use bind to bind_addr and call the "string+int" version bind?

e.g.

let mut listener1 = TcpListener::bind("0.0.0.0", 8080);
// vs.
let mut listener2 = TcpListener::bind_addr(SocketAddr{...})

/// `host` can be a hostname or IP address string.
#[experimental = "this function may eventually move to another module"]
pub fn open(host: &str, port: u16) -> IoResult<TcpStream> {
match get_host_addresses(host) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be better expresses (less indentation) via:

let addreses = try!(get_host_addresses(host));

@alexcrichton
Copy link
Member

How about listen? I'm a little worried about have two functions with different names though that do very similar things, however.

@thomaslee
Copy link
Contributor Author

No worse than open/connect I guess. Patch incoming shortly, though in writing the test I wonder if maybe listen should live in & return a TcpAcceptor rather than TcpListener? Otherwise you wind up with rubbish like this:

let mut listener = TcpListener::listen("0.0.0.0", 8080);
let mut acceptor = listener.listen(); // listen *again*?
// ...

Instead, maybe something like this?

let mut acceptor = TcpAcceptor::listen("0.0.0.0", 8080); // or TcpAcceptor::bind(...)?
// ...

Though then you can't get at TcpListener::socket_name(), which is a problem if you pass in zero for the port number ... hrm.

@alexcrichton
Copy link
Member

Hm that is a good point, ::listen followed by .listen is a little silly. I don't think that we should short circuit the listener => acceptor transition just yet, but we could also provide socket_name on acceptors if we really need to.

One route we could go is to add foo and foo_addr methods, but it's a bit unfortunate to have that sort of duplication, and I would like to try to avoid it.

Thinking out loud here for a second, perhaps we could only provide support for foo(&str, u16). The semantics of connect could be that it first attempts to parse as a SocketAddr, falling back to getaddrinfo. The semantics of bind would be it attempts to parse a SocketAddr, returning an error if that fails. We'd have to commit to these apis, but it may be worth thinking about them carefully rather than duplicating the functionality?

@thomaslee
Copy link
Contributor Author

Oh to have overloading eh? :) It'd be bind(SocketAddr) + bind(&str, u16) and this conversation would be a whole lot shorter.

I can kind of imagine the struct SocketAddr version of foo coming back one day (e.g. folks making their own DNS calls because maybe they want to control DNS lookup timeouts (!)). Without it, they'd have to pay the overhead of converting from SocketAddr to str and back again inside foo(&str, u16).

All that said, I feel like foo(&str, u16) is a nicer API for most use cases if I had to pick one right now.

Another random idea: TcpListener::from_socket_addr/TcpListener::from_host_port? Same deal for TcpStream. A little wordy, but maybe clearer. Naming this stuff is hard because I'm not sure we have conventions for this sort of thing that feel Rust-y yet.

@thomaslee
Copy link
Contributor Author

Ugh, then connect_timeout becomes from_socket_addr_timeout ...

@thomaslee
Copy link
Contributor Author

Huh, I actually don't mind the from_XXX stuff either:

iotest!(fn listen_ip4_localhost() {
    let socket_addr = next_test_ip4();
    let ip_str = socket_addr.ip.to_str();
    let port = socket_addr.port;
    let listener = TcpListener::from_host_port(ip_str, port);                                                      
    let mut acceptor = listener.listen();

    spawn(proc() {
        let mut stream = TcpStream::from_host_port("localhost", port);
        stream.write([144]).unwrap();
    });

    let mut stream = acceptor.accept();
    let mut buf = [0];
    stream.read(buf).unwrap();
    assert!(buf[0] == 144);
})

iotest!(fn open_localhost() {
    let addr = next_test_ip4();
    let mut acceptor = TcpListener::from_socket_addr(addr).listen();

    spawn(proc() {
        let mut stream = TcpStream::from_host_port("localhost", addr.port);
        stream.write([64]).unwrap();
    });

    let mut stream = acceptor.accept();
    let mut buf = [0];
    stream.read(buf).unwrap();
    assert!(buf[0] == 64);
})

Imagine that would break tests everywhere, but y'know :)

@alexcrichton
Copy link
Member

I think that the one downside of the &str, u16 api is that it's tougher to use if you've got a SocketAddr, but I imagine that it's pretty rare to have a SocketAddr. I'm not that worried about the perf because opening a connection to a remote client is already quite expensive.

@thomaslee
Copy link
Contributor Author

Sure. So to be clear, I should proceed with replacing the ::connect and ::bind APIs using SocketAddr with &str, u16? Likely to be quite a few test breakages, so I'd like to be sure before I embark down that path. :)

@alexcrichton
Copy link
Member

@thomaslee, I discussed a bit with some other folks, and we're confident this is the best way to go!

As part of this change, can you make sure the commit message outlines the changes made as well? See https://mail.mozilla.org/pipermail/rust-dev/2014-April/009543.html for more info.

@thomaslee
Copy link
Contributor Author

Sweet, see you on the other side. Shall I squash the commits from earlier in this PR, or do you want the history?

@alexcrichton
Copy link
Member

Squashing is ok, the comments should have enough history. Thanks again, and let me know if you need any help!

@thomaslee
Copy link
Contributor Author

Making good progress, but what should I do with connect_timeout? It's going to have the same issues with timeouts+DNS that we discussed earlier.

@alexcrichton
Copy link
Member

Let's leave it taking SocketAddr for now. It's #[experimental] already, and subject to change.

@thomaslee
Copy link
Contributor Author

Ack, looks like a rebase got messed up. See if I can address that ...

@thomaslee
Copy link
Contributor Author

@alexcrichton here you go, let me know how you feel about this. There's similar stuff in udp:: that we should probably look at too, but maybe that's a separate PR.

/// `host` can be a hostname or IP address string. If no error is
/// encountered, then `Ok(stream)` is returned.
pub fn connect(host: &str, port: u16) -> IoResult<TcpStream> {
let addresses = try!(get_host_addresses(host));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to avoid the syscall, could this first try from_str into an IpAddr? If that succeeds, then we know we don't need to go call getaddrinfo.

@alexcrichton
Copy link
Member

This looks fantastic, thanks @thomaslee! I agree that the same treatment needs to happen with the UDP bindings, but if you're running low on time, I can take care of that as well!

@thomaslee
Copy link
Contributor Author

Oh I'm happy to do it unless you're particularly keen to jump on it. Standard libs + compiler is my best avenue for learning the language a little more thoroughly. :) Thanks for the comments, I'll get that sorted out.

@thomaslee
Copy link
Contributor Author

@alexcrichton here you go

@thomaslee
Copy link
Contributor Author

@alexcrichton curious -- looks like src/test/run-pass/tcp-stress.rs is failing on mac-64. Any reason why this would run on mac-64 but not on linux-64? I'm not seeing any failures locally, though I think I can see why it would fail if it did try to build & run that test. I'll see if I can fix that, but I imagine bors will need another nudge.

@alexcrichton
Copy link
Member

Could you rebase the fix into the original commit as well? It looks like it'll fix the problem, though!

@thomaslee
Copy link
Contributor Author

@alexcrichton done -- let's see if this one does any better :)

bors added a commit that referenced this pull request May 11, 2014
…en, r=alexcrichton

Been meaning to try my hand at something like this for a while, and noticed something similar mentioned as part of #13537. The suggestion on the original ticket is to use `TcpStream::open(&str)` to pass in a host + port string, but seems a little cleaner to pass in host and port separately -- so a signature like `TcpStream::open(&str, u16)`.

Also means we can use std::io::net::addrinfo directly instead of using e.g. liburl to parse the host+port pair from a string.

One outstanding issue in this PR that I'm not entirely sure how to address: in open_timeout, the timeout_ms will apply for every A record we find associated with a hostname -- probably not the intended behavior, but I didn't want to waste my time on elaborate alternatives until the general idea was a-OKed. :)

Anyway, perhaps there are other reasons for us to prefer the original proposed syntax, but thought I'd get some thoughts on this. Maybe there are some solid reasons to prefer using liburl to do this stuff.
@alexcrichton
Copy link
Member

Let's see if it was a fluke.

@alexcrichton
Copy link
Member

I suspect that getaddrinfo bindings on android may be broken somehow. I'll try to take a look into this today.

@alexcrichton
Copy link
Member

Aha! It looks like the definition of struct getaddrinfo is wrong on android. This patch should fix things up

diff --git a/src/liblibc/lib.rs b/src/liblibc/lib.rs
index e2b647f..446f753 100644
--- a/src/liblibc/lib.rs
+++ b/src/liblibc/lib.rs
@@ -431,8 +431,17 @@ pub mod types {
                     pub ai_socktype: c_int,
                     pub ai_protocol: c_int,
                     pub ai_addrlen: socklen_t,
+
+                    #[cfg(target_os = "linux")]
                     pub ai_addr: *sockaddr,
+                    #[cfg(target_os = "linux")]
+                    pub ai_canonname: *c_char,
+
+                    #[cfg(target_os = "android")]
                     pub ai_canonname: *c_char,
+                    #[cfg(target_os = "android")]
+                    pub ai_addr: *sockaddr,
+
                     pub ai_next: *addrinfo,
                 }
                 pub struct sockaddr_un {

thomaslee added 4 commits May 12, 2014 21:41
Prior to this commit, TcpStream::connect and TcpListener::bind took a
single SocketAddr argument. This worked well enough, but the API felt a
little too "low level" for most simple use cases.

A great example is connecting to rust-lang.org on port 80. Rust users would
need to:

  1. resolve the IP address of rust-lang.org using
     io::net::addrinfo::get_host_addresses.

  2. check for errors

  3. if all went well, use the returned IP address and the port number
     to construct a SocketAddr

  4. pass this SocketAddr to TcpStream::connect.

I'm modifying the type signature of TcpStream::connect and
TcpListener::bind so that the API is a little easier to use.

TcpStream::connect now accepts two arguments: a string describing the
host/IP of the host we wish to connect to, and a u16 representing the
remote port number.

Similarly, TcpListener::bind has been modified to take two arguments:
a string describing the local interface address (e.g. "0.0.0.0" or
"127.0.0.1") and a u16 port number.

Here's how to port your Rust code to use the new TcpStream::connect API:

  // old ::connect API
  let addr = SocketAddr{ip: Ipv4Addr{127, 0, 0, 1}, port: 8080};
  let stream = TcpStream::connect(addr).unwrap()

  // new ::connect API (minimal change)
  let addr = SocketAddr{ip: Ipv4Addr{127, 0, 0, 1}, port: 8080};
  let stream = TcpStream::connect(addr.ip.to_str(), addr.port()).unwrap()

  // new ::connect API (more compact)
  let stream = TcpStream::connect("127.0.0.1", 8080).unwrap()

  // new ::connect API (hostname)
  let stream = TcpStream::connect("rust-lang.org", 80)

Similarly, for TcpListener::bind:

  // old ::bind API
  let addr = SocketAddr{ip: Ipv4Addr{0, 0, 0, 0}, port: 8080};
  let mut acceptor = TcpListener::bind(addr).listen();

  // new ::bind API (minimal change)
  let addr = SocketAddr{ip: Ipv4Addr{0, 0, 0, 0}, port: 8080};
  let mut acceptor = TcpListener::bind(addr.ip.to_str(), addr.port()).listen()

  // new ::bind API (more compact)
  let mut acceptor = TcpListener::bind("0.0.0.0", 8080).listen()

[breaking-change]
Fall back to get_host_addresses to try a DNS lookup if we can't
parse it as an IP address.
@thomaslee
Copy link
Contributor Author

@alexcrichton I assume you intend for me to apply that patch to this PR? Rebased from master & done in 218d01e.

bors added a commit that referenced this pull request May 13, 2014
…en, r=alexcrichton

Been meaning to try my hand at something like this for a while, and noticed something similar mentioned as part of #13537. The suggestion on the original ticket is to use `TcpStream::open(&str)` to pass in a host + port string, but seems a little cleaner to pass in host and port separately -- so a signature like `TcpStream::open(&str, u16)`.

Also means we can use std::io::net::addrinfo directly instead of using e.g. liburl to parse the host+port pair from a string.

One outstanding issue in this PR that I'm not entirely sure how to address: in open_timeout, the timeout_ms will apply for every A record we find associated with a hostname -- probably not the intended behavior, but I didn't want to waste my time on elaborate alternatives until the general idea was a-OKed. :)

Anyway, perhaps there are other reasons for us to prefer the original proposed syntax, but thought I'd get some thoughts on this. Maybe there are some solid reasons to prefer using liburl to do this stuff.
@bors bors closed this May 13, 2014
@bors bors merged commit 218d01e into rust-lang:master May 13, 2014
@thomaslee thomaslee deleted the thomaslee_proposed_tcpstream_open branch May 13, 2014 07:33
@yuriks
Copy link
Contributor

yuriks commented May 13, 2014

A bit late to the party, but this change seems pretty bad to me. The addr version was removed entirely, and now you have to keep slinging strings around, which has much more cognitive overload with lifetimes than a simple copyable SocketAddr. How often do you really connect to a hardcoded remote endpoint that optimizing the string-directly-in-the-call case is important?

(To be clear, I'm mostly advocating that the version that takes SockAddr be brought back, probably as open_addr as mentioned in the discussion.)

@alexcrichton
Copy link
Member

When a function is taking &str then there's not a whole lot of lifetimes involved because nothing is being chained through, any lifetime will satisfy the requirements.

It's not that connecting to a hardcoded endpoint is uncommon, I found it much easier to deal with the string representations of ips rather than the SocketAddr type itself. I believe this case to be important because it does not require duplication, which isn't all that necessary. The open_addr implementation would probably look a lot like open(addr.ip.to_str(), addr.port), which seems less clean due to the duplication in the API to me.

@zargony
Copy link
Contributor

zargony commented May 16, 2014

I also feel strong about keeping SocketAddr methods. I gave strings a try for a few days (because I'm used to use SocketAddr), but I still don't like it much. SocketAddr/IpAddr is a perfect data structure for holding an ip address. It represents a distinct address, is comparable, can be shown without extra allocations (fmt::Show) and can be easily parsed from a string (FromStr). I think a system-level language like Rust should provide a socket interface that uses this native data types.

I understand that strings are convenient to use (especially with user interactions) and I like the idea of a method that parses a string, does dns lookups and connects to one address - but imo this method shouldn't hide the real connect method using a native type.

E.g. in a lib I'm currently working on, there's some p2p stuff that keeps list of peers, exchanges them, connects them and so on. After this PR, I changed 9 connect calls. 7 of them ended up with connect(addr.ip.to_str(), addr.port) (which is awful, since it allocates a new string just to write the SocketAddr to it and connect runs a parser that gets a SocketAddr back out of it).

I would very welcome if Rust would keep methods using SocketAddr for system-level-like access and provide convenience methods with string parameters without taking away the possibility to use SocketAddr. How about this:

// TcpStream convenience methods (internally calls connect)
fn open(host: &str, port: u16) -> IoResult<TcpStream>
fn open_timeout(host: &str, port: u16, timeout_ms: u64) -> IoResult<TcpStream>

// TcpStream raw methods
fn connect(addr: &SocketAddr) -> IoResult<TcpStream>
fn connect_timeout(addr: &SocketAddr, timeout_ms: u64) -> IoResult<TcpStream>

// TcpListener convenience methods (internally calls bind and listen and loops accept)
fn open(host: &str, port: u16, f: |TcpStream|) -> IoResult<()>

// TcpListener raw methods
fn bind(addr: &SocketAddr) -> IoResult<TcpListener>

@alexcrichton
Copy link
Member

I agree that this isn't necessarily perfect, but I believe this situation to be the best of both worlds. Duplication in the API is unfortunate because you'll constantly forget whether to call open/connect or bind/open with what you have, and it also just leads to a general explosion of related methods. I generally find that reducing the API is one of the most important concerns.

The other downside you pointed out, allocating strings, I don't consider that large of a problem. When compared with opening a TCP socket or binding a TCP socket, I don't think that the allocation will show up that high in the profiles.

@huonw
Copy link
Member

huonw commented May 17, 2014

Could we use a trait implemented for SocketAddr and (&str, u16)?

(I have little context for this, so this may be a complete nonsense suggestion.)

@alexcrichton
Copy link
Member

A trait for overloading is a possible route to take, but it's unclear how much better in terms of documentation that would be (a one-off trait for one method with unknown implementors when you first see it).

I would, however, prefer overloading via a trait to method duplication.

@zargony
Copy link
Contributor

zargony commented May 17, 2014

trait ToSocketAddr could be implemented by SocketAddr, (&str, u16) and &str which would allow to use connect(addr), connect(("example.com", 80)) as well as connect("example.com:80"). It can be used for bind as well, possibly others (udp, url library, http client library) too.

Documentation would be at a single place, no method/functionality duplication. The generic parameter of connect or bind could look strange to newcomers, but would be easy to understand if an example states the possible uses.

@emberian
Copy link
Member

@zargony I'm in favor of that.

cc @Seldaek, for the url::open bits above.

@thomaslee
Copy link
Contributor Author

Happy to make another pass using traits if nobody else is running with it. Does seem like the best of both worlds.

@Seldaek
Copy link
Contributor

Seldaek commented May 19, 2014

I'd tend to agree that either the APIs should be explicit or they should take in various inputs (if technically feasible with that ToSocketAddr trait). I started working on a universal open method a while back though it didn't get very far due to lack of time, but you can see it at https://gist.github.com/Seldaek/73aefd28ca5ff5655bac - I think having explicit internal libs + a "magic" open() function wrapping them for ease of use with strings is at least a better solution than just supporting string inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants