-
Notifications
You must be signed in to change notification settings - Fork 643
Allow multiple keywords in crate search #1543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow multiple keywords in crate search #1543
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @pwoolcoc! It looks like you need to run cargo +stable fmt
. Also, your collecting-hashmap
crate includes edition = "2018"
which is failing on CI in stable. We deploy with stable, so if you're not using any edition features you'll want to remove that for the next 6 weeks.
src/controllers/krate/search.rs
Outdated
query = query.filter( | ||
crates::id.eq_any( | ||
crates_keywords::table | ||
.select(crates_keywords::crate_id) | ||
.inner_join(keywords::table) | ||
.filter(::lower(keywords::keyword).eq(::lower(kw))), | ||
.filter(::lower(keywords::keyword).eq(any(names))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #1461 it sounds like the intention is to have the search return only results that include all keywords. I think that means any
would be replaced with all
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually would have expected searching for multiple keywords to be any
, not all
-- which maybe raises the question of what is the general expectation here, if this should have another name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked the OP and they said they expected it to be the intersect operation, not the union...but for some reason changing any
to all
doesn't "just work" like I was hoping it would so I'm still investigating.
I agree that my intuition was that sending multiple keywords to the search endpoint would be an any
operation, so I'm open to suggestions about where to put this...probably in src/controllers/keyword.rs
? maybe a new endpoint there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm split on this. I think my initial intuition was any
, but that doesn't fit with OP's use case which I think is also reasonable. I guess when I search, I expect results that match all my search terms (although lately Google sometimes seems insistent that I really don't want to include a particular term).
I think its also worth considering how this looks in the URL. Currently, this uses multiple keyword
parameters. Maybe a space separated list under a single keyword
parameter would be slightly better. Or a new keywords
or all_keywords
parameter.
So I guess calling all
doesn't work directly here because of the join, since a crates_keywords
row can't possibly match multiple keywords at the same time. @sgrif how would this be expressed in Diesel if we want only crates that match all keywords? I've seen some SQL examples that would involve joining, grouping by crate, and then HAVING COUNT (DISTINCT crate.keyword_id) = {keyword_count}
. That seems a bit... complicated, but I can't think of anything simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't use a having clause for this in PG personally. I'd do something like
.filter(
crate_keywords::table
.inner_join(keywords::table)
.filter(crate_keywords::crate_id.eq(crates::id))
.select(array_agg(keywords::keyword))
.single_value()
.contains(names)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I know what array_agg
does in postgres, but there doesn't seem to be a way to use it through diesel. Should I just drop down to sql_query
for that? Or is array_agg
in another crate that I just haven't been able to find?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array_agg
is an aggregate function that returns all values it collects as an array. You can declare any SQL function for use with Diesel's query builder by using sql_function!
I've been slowly working on moving this pull request forward by changing the query to return crates that have all the keywords specified, rather than any. So far all I've been successful at accomplishing is merging in master to resolve the merge conflicts and adding a failing test for the ALL functionality. I've been attempting to turn the pseudocode sketch in this comment into a working diesel query, but I have been unsuccessful. I have a runnable reproduction of the issue extracted into a smaller codebase showing the set of diesel queries I've tried along with notes on why they didn't work in this repo. I would drop into raw sql, but I don't want to lose the automatic escaping of the queried keywords to prevent against SQLI. I don't know how to generate just an escaped SQL string and insert it into a raw SQL query. I'm now at a complete loss of how to proceed, and I think this PR is going to need your help to unblock, @sgrif. |
316002d
to
d86c231
Compare
I've rebased and added a working query. I do still have concerns that this isn't necessarily the expected behavior when passing multiple |
☔ The latest upstream changes (presumably #1763) made this pull request unmergeable. Please resolve the merge conflicts. |
What's the status on this effort? |
The backend API requests are working, but there is no way to interact with this via the frontend. I'm going to make a param named |
d86c231
to
77ae1b8
Compare
r? @jtgeibel because both sgrif and i have now contributed code to this PR |
This commit allows multiple `keyword=` parameters in the querystring to enable searching for all crates under the specified keywords Towards rust-lang#1461
Diesel has a bug where array expression methods aren't implemented on nullable expressions, so we can't just call Diesel's `.contains` method until the fix lands. I've added a workaround for the time being. This results in a runtime error since `keyword` was defined as `varchar`, which coerces to `text`, but arrays do not. I've changed the column type to address this. The keywords table is small enough and read infrequently enough that I'm not concerned about the exclusive lock in this migration.
Pass along any specified value for the all_keywords parameter to the API request to search crates. This lets us manually construct URLs in the browser like: - /search?all_keywords=foo+bar - /search?all_keywords=foo+bar&q=test but there is no form field in the search form as yet; eventually, we'll need an "advanced search" form or similar.
77ae1b8
to
9a3485b
Compare
@jtgeibel - Can you review this PR? It doesn't have any conflicts (so far) and looks good to me. |
LGTM! Sorry it took so long for me to review. Amazingly, no merge conflicts so... @bors r+ |
📌 Commit 9a3485b has been approved by |
…tgeibel Allow multiple keywords in crate search This PR allows URLs like crates.io/search?all_keywords=gotham+middleware to enable linking to a search that returns crates that have all keywords specified in the parameter. Towards #1461
☀️ Test successful - checks-travis |
This PR allows URLs like crates.io/search?all_keywords=gotham+middleware to enable linking to a search that returns crates that have all keywords specified in the parameter.
Towards #1461