Skip to content

Allow multiple keywords in crate search #1543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 4, 2020

Conversation

pwoolcoc
Copy link

@pwoolcoc pwoolcoc commented Oct 28, 2018

This PR allows URLs like crates.io/search?all_keywords=gotham+middleware to enable linking to a search that returns crates that have all keywords specified in the parameter.

Towards #1461

Copy link
Member

@jtgeibel jtgeibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @pwoolcoc! It looks like you need to run cargo +stable fmt. Also, your collecting-hashmap crate includes edition = "2018" which is failing on CI in stable. We deploy with stable, so if you're not using any edition features you'll want to remove that for the next 6 weeks.

query = query.filter(
crates::id.eq_any(
crates_keywords::table
.select(crates_keywords::crate_id)
.inner_join(keywords::table)
.filter(::lower(keywords::keyword).eq(::lower(kw))),
.filter(::lower(keywords::keyword).eq(any(names))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #1461 it sounds like the intention is to have the search return only results that include all keywords. I think that means any would be replaced with all here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually would have expected searching for multiple keywords to be any, not all -- which maybe raises the question of what is the general expectation here, if this should have another name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked the OP and they said they expected it to be the intersect operation, not the union...but for some reason changing any to all doesn't "just work" like I was hoping it would so I'm still investigating.

I agree that my intuition was that sending multiple keywords to the search endpoint would be an any operation, so I'm open to suggestions about where to put this...probably in src/controllers/keyword.rs? maybe a new endpoint there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm split on this. I think my initial intuition was any, but that doesn't fit with OP's use case which I think is also reasonable. I guess when I search, I expect results that match all my search terms (although lately Google sometimes seems insistent that I really don't want to include a particular term).

I think its also worth considering how this looks in the URL. Currently, this uses multiple keyword parameters. Maybe a space separated list under a single keyword parameter would be slightly better. Or a new keywords or all_keywords parameter.

So I guess calling all doesn't work directly here because of the join, since a crates_keywords row can't possibly match multiple keywords at the same time. @sgrif how would this be expressed in Diesel if we want only crates that match all keywords? I've seen some SQL examples that would involve joining, grouping by crate, and then HAVING COUNT (DISTINCT crate.keyword_id) = {keyword_count}. That seems a bit... complicated, but I can't think of anything simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't use a having clause for this in PG personally. I'd do something like

.filter(
    crate_keywords::table
        .inner_join(keywords::table)
        .filter(crate_keywords::crate_id.eq(crates::id))
        .select(array_agg(keywords::keyword))
        .single_value()
        .contains(names)
)

Copy link
Author

@pwoolcoc pwoolcoc Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I know what array_agg does in postgres, but there doesn't seem to be a way to use it through diesel. Should I just drop down to sql_query for that? Or is array_agg in another crate that I just haven't been able to find?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_agg is an aggregate function that returns all values it collects as an array. You can declare any SQL function for use with Diesel's query builder by using sql_function!

@carols10cents
Copy link
Member

I've been slowly working on moving this pull request forward by changing the query to return crates that have all the keywords specified, rather than any. So far all I've been successful at accomplishing is merging in master to resolve the merge conflicts and adding a failing test for the ALL functionality.

I've been attempting to turn the pseudocode sketch in this comment into a working diesel query, but I have been unsuccessful. I have a runnable reproduction of the issue extracted into a smaller codebase showing the set of diesel queries I've tried along with notes on why they didn't work in this repo.

I would drop into raw sql, but I don't want to lose the automatic escaping of the queried keywords to prevent against SQLI. I don't know how to generate just an escaped SQL string and insert it into a raw SQL query.

I'm now at a complete loss of how to proceed, and I think this PR is going to need your help to unblock, @sgrif.

@sgrif sgrif force-pushed the allow-multiple-keywords-in-search branch from 316002d to d86c231 Compare June 14, 2019 18:11
@sgrif
Copy link
Contributor

sgrif commented Jun 14, 2019

I've rebased and added a working query. I do still have concerns that this isn't necessarily the expected behavior when passing multiple keyword params. I agree with @jtgeibel that we may want to introduce a new param for this.

@sgrif sgrif removed their assignment Jun 14, 2019
@bors
Copy link
Contributor

bors commented Jul 16, 2019

☔ The latest upstream changes (presumably #1763) made this pull request unmergeable. Please resolve the merge conflicts.

@locks
Copy link
Contributor

locks commented Sep 4, 2019

What's the status on this effort?

@carols10cents
Copy link
Member

carols10cents commented Oct 21, 2019

The backend API requests are working, but there is no way to interact with this via the frontend. I'm going to make a param named all_keywords so that at least we can manually create links that use this, but eventually I think we're going to need an advanced search form.

@carols10cents
Copy link
Member

r? @jtgeibel because both sgrif and i have now contributed code to this PR

Paul Woolcock and others added 5 commits November 1, 2019 21:12
This commit allows multiple `keyword=` parameters in the querystring to
enable searching for all crates under the specified keywords

Towards rust-lang#1461
Diesel has a bug where array expression methods aren't implemented on
nullable expressions, so we can't just call Diesel's `.contains` method
until the fix lands. I've added a workaround for the time being. This
results in a runtime error since `keyword` was defined as `varchar`,
which coerces to `text`, but arrays do not. I've changed the column type
to address this. The keywords table is small enough and read
infrequently enough that I'm not concerned about the exclusive lock in
this migration.
Pass along any specified value for the all_keywords parameter to the
API request to search crates.

This lets us manually construct URLs in the browser like:

- /search?all_keywords=foo+bar
- /search?all_keywords=foo+bar&q=test

but there is no form field in the search form as yet; eventually, we'll
need an "advanced search" form or similar.
@carols10cents carols10cents force-pushed the allow-multiple-keywords-in-search branch from 77ae1b8 to 9a3485b Compare November 2, 2019 01:12
@kzys
Copy link
Contributor

kzys commented Nov 19, 2019

@jtgeibel - Can you review this PR? It doesn't have any conflicts (so far) and looks good to me.

@jtgeibel
Copy link
Member

jtgeibel commented Jan 4, 2020

LGTM! Sorry it took so long for me to review. Amazingly, no merge conflicts so...

@bors r+

@bors
Copy link
Contributor

bors commented Jan 4, 2020

📌 Commit 9a3485b has been approved by jtgeibel

@bors
Copy link
Contributor

bors commented Jan 4, 2020

⌛ Testing commit 9a3485b with merge 20a589e...

bors added a commit that referenced this pull request Jan 4, 2020
…tgeibel

Allow multiple keywords in crate search

This PR allows URLs like crates.io/search?all_keywords=gotham+middleware to enable linking to a search that returns crates that have all keywords specified in the parameter.

Towards #1461
@bors
Copy link
Contributor

bors commented Jan 4, 2020

☀️ Test successful - checks-travis
Approved by: jtgeibel
Pushing 20a589e to master...

@bors bors merged commit 9a3485b into rust-lang:master Jan 4, 2020
bors added a commit that referenced this pull request Dec 28, 2021
tests/search: Add test case for query parameter forwarding

This should ensure to not break the functionality that was added in #1543 when working on implementing #491.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants