-
Notifications
You must be signed in to change notification settings - Fork 80
Tantivy 0.19.2 #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tantivy 0.19.2 #67
Conversation
Add Cargo.lock to the repo Set the git-fetch-with-cli cargo flag so that we can override fetch settings Renaming .cargo/config to .cargo/config.toml Adding github-quiq-sh cargo registry Point dependencies at our github-quiq-sh registry Trying to resolve this build issue, pointing pyo3-build-config at our github-quiq-sh registry SER-21487: Enable support for all standard Tantivy languages plus Chinese + Japanese in tantivy-py SER-21487: Use uname rather than UNAME in the Makefile SER-21487: Fix document date handling SER-23013: Upgrade Tantivy and other dependencies
|
Tantivy 0.19.1 version fixes so many bugs vs the current 0.17. The incremental indexes were increasing index size exponentially till v0.17. Thanks a lot @pdavies011010 for adding support for the newer version. |
|
Thanks @pratyushmittal ! I'm admittedly not a Rust expert so please let me know if there's anything I should do to help get this merged in, the changes should probably looked over by somebody with a little more familiarity with Rust, but all the tests do pass and I have been using this version successfully for my needs in python. One thing that's hopefully okay is I switched the makefile to use maturin for the build, that's the way that has been working best for me to create builds wheels for multiple python3 versions, if not okay I can back that change out. |
src/schemabuilder.rs
Outdated
| let fast = match fast { | ||
| Some(f) => { | ||
| let f = f.to_lowercase(); | ||
| match f.as_ref() { | ||
| "single" => Some(schema::Cardinality::SingleValue), | ||
| "multi" => Some(schema::Cardinality::MultiValues), | ||
| _ => return Err(exceptions::PyValueError::new_err( | ||
| "Invalid index option, valid choices are: 'multivalue' and 'singlevalue'" | ||
| )), | ||
| } | ||
| } | ||
| None => None, | ||
| }; | ||
| opts = opts.set_fast(fast.unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let fast = match fast { | |
| Some(f) => { | |
| let f = f.to_lowercase(); | |
| match f.as_ref() { | |
| "single" => Some(schema::Cardinality::SingleValue), | |
| "multi" => Some(schema::Cardinality::MultiValues), | |
| _ => return Err(exceptions::PyValueError::new_err( | |
| "Invalid index option, valid choices are: 'multivalue' and 'singlevalue'" | |
| )), | |
| } | |
| } | |
| None => None, | |
| }; | |
| opts = opts.set_fast(fast.unwrap()); | |
| let cardinality = match fast { | |
| Some(f) => { | |
| let f = f.to_lowercase(); | |
| match f.as_ref() { | |
| "single" => Some(schema::Cardinality::SingleValue), | |
| "multi" => Some(schema::Cardinality::MultiValues), | |
| _ => return Err(exceptions::PyValueError::new_err( | |
| "Invalid index option, valid choices are: 'multivalue' and 'singlevalue'" | |
| )), | |
| } | |
| } | |
| None => None, | |
| }; | |
| if fast.is_some(){ | |
| opts = opts.set_fast(fast.unwrap()); | |
| } |
I tried these changes out with some sample projects and this seems to fail in case fast is 'None', as
pyo3_runtime.PanicException: called 'Option::unwrap()' on a None value.
Apart from this looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also you can format rust code using
rustfmt --check src/*.rs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Sidhant29 Good finds, I updated to put in the fix you suggested, and applied rustfmt.
|
tantivy |
|
@PSeitz Updated to Tantivy 0.19.2 |
|
@ChillFish8 Can you have a look? |
ChillFish8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good overall, just a few small things to sort.
Cargo.toml
Outdated
| pyo3-build-config = { version = "0.18.0" } | ||
|
|
||
| [dependencies] | ||
| chrono = "0.4.19" | ||
| tantivy = "0.17" | ||
| itertools = "0.10.3" | ||
| futures = "0.3.21" | ||
| serde_json = "1.0.64" | ||
| chrono = { version = "0.4.23" } | ||
| tantivy = { version = "0.19.2" } | ||
| itertools = { version = "0.10.5" } | ||
| futures = { version = "0.3.26" } | ||
| serde_json = { version = "1.0.91" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think this change is worth it (using version = xxx) we don't use any of the features so we can just use the standard dep = "x.x.x" format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
rustfmt.toml
Outdated
| @@ -1 +1,2 @@ | |||
| max_width = 80 | |||
| edition = "2018" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed. It'll read the cargo file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it will be read from Cargo.toml if you use cargo fmt, but not if you run rustfmt directly. If we want to standardize around using cargo fmt I can update the Makefile to use it instead of rustfmt, but otherwise I think we want to leave this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think standardising everything to use cargo would be a good idea 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K, updated the Makefile and removed this line
src/document.rs
Outdated
| d.into_utc().year(), | ||
| d.into_utc().month() as u8, | ||
| d.into_utc().day() as u8, | ||
| d.into_utc().hour() as u8, | ||
| d.into_utc().minute() as u8, | ||
| d.into_utc().second() as u8, | ||
| d.into_utc().microsecond() as u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably easier here to have once into_utc() call and then work on that result for all the values rather than calling it every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/document.rs
Outdated
| Value::Bool(b) => format!("{}", b), | ||
| Value::IpAddr(i) => format!("{}", *i), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, clippy now recommends using the new formatting method format!("{b}") for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linted with clippy
src/document.rs
Outdated
| return Python::with_gil(|py| -> PyResult<Vec<PyObject>> { | ||
| self.get_all(py, field_name) | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needless return
| return Python::with_gil(|py| -> PyResult<Vec<PyObject>> { | |
| self.get_all(py, field_name) | |
| }); | |
| Python::with_gil(|py| -> PyResult<Vec<PyObject>> { | |
| self.get_all(py, field_name) | |
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/index.rs
Outdated
| let mut analyzers = HashMap::new(); | ||
| analyzers.insert("ar_stem", Language::Arabic); | ||
| analyzers.insert("da_stem", Language::Danish); | ||
| analyzers.insert("nl_stem", Language::Dutch); | ||
| analyzers.insert("fi_stem", Language::Finnish); | ||
| analyzers.insert("fr_stem", Language::French); | ||
| analyzers.insert("de_stem", Language::German); | ||
| analyzers.insert("el_stem", Language::Greek); | ||
| analyzers.insert("hu_stem", Language::Hungarian); | ||
| analyzers.insert("it_stem", Language::Italian); | ||
| analyzers.insert("no_stem", Language::Norwegian); | ||
| analyzers.insert("pt_stem", Language::Portuguese); | ||
| analyzers.insert("ro_stem", Language::Romanian); | ||
| analyzers.insert("ru_stem", Language::Russian); | ||
| analyzers.insert("es_stem", Language::Spanish); | ||
| analyzers.insert("sv_stem", Language::Swedish); | ||
| analyzers.insert("ta_stem", Language::Tamil); | ||
| analyzers.insert("tr_stem", Language::Turkish); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be a hashmap here, an array would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
@ChillFish8 Cleaned things up per your suggestions, thanks! |
|
Looks good 👍 Just need to run fmt to fix the CI and then this can be merged |
|
Whoops, the CI config was also set to use rustfmt instead of |
* Tantivy 0.19.2 (#67) * Adding __init__.py file to the tantivy folder to make maturin happy Add Cargo.lock to the repo Set the git-fetch-with-cli cargo flag so that we can override fetch settings Renaming .cargo/config to .cargo/config.toml Adding github-quiq-sh cargo registry Point dependencies at our github-quiq-sh registry Trying to resolve this build issue, pointing pyo3-build-config at our github-quiq-sh registry SER-21487: Enable support for all standard Tantivy languages plus Chinese + Japanese in tantivy-py SER-21487: Use uname rather than UNAME in the Makefile SER-21487: Fix document date handling SER-23013: Upgrade Tantivy and other dependencies * Upgrade to Tantivy 0.19.1 * Apply rustfmt and fix bug when fast option = None * Upgrade to tantivy-0.19.2 * Standardize around using 'cargo fmt' rather than 'rustfmt' * Reverting to old style dependencies * Linting with clippy * Switching out hashmap for defining tokenizers for an array, and adding test for Spanish indexing * Use cargo fmt instead of rustfmt on the Lint ci step * Add python release build * workflow dispatch * simple * add release * fix publish pipeline * update maturin args * test * maturin config * build * maturin * build(deps): bump step-security/harden-runner from 1.4.4 to 2.0.0 Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 1.4.4 to 2.0.0. - [Release notes](https://github.com/step-security/harden-runner/releases) - [Commits](step-security/harden-runner@74b568e...ebacdc2) --- updated-dependencies: - dependency-name: step-security/harden-runner dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump actions/checkout Bumps [actions/checkout](https://github.com/actions/checkout) from d171c3b028d844f2bf14e9fdec0c58114451e4bf to 61b9e3751b92087fd0b06925ba6dd6314e06f089. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@d171c3b...61b9e37) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump alexellis/upload-assets from 0.2.2 to 0.4.0 Bumps [alexellis/upload-assets](https://github.com/alexellis/upload-assets) from 0.2.2 to 0.4.0. - [Release notes](https://github.com/alexellis/upload-assets/releases) - [Commits](alexellis/upload-assets@eaab147...259de51) --- updated-dependencies: - dependency-name: alexellis/upload-assets dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump messense/maturin-action from 1.28.3 to 1.34.0 Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.28.3 to 1.34.0. - [Release notes](https://github.com/messense/maturin-action/releases) - [Commits](PyO3/maturin-action@20111a7...7208c29) --- updated-dependencies: - dependency-name: messense/maturin-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump actions-rs/toolchain Bumps [actions-rs/toolchain](https://github.com/actions-rs/toolchain) from 63eb9591781c46a70274cb3ebdf190fce92702e8 to 16499b5e05bf2e26879000db0c1d13f7e13fa3af. - [Release notes](https://github.com/actions-rs/toolchain/releases) - [Changelog](https://github.com/actions-rs/toolchain/blob/master/CHANGELOG.md) - [Commits](actions-rs/toolchain@63eb959...16499b5) --- updated-dependencies: - dependency-name: actions-rs/toolchain dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * testing * Update publish.yaml * Update publish.yaml * Update publish.yaml * build(deps): bump actions/upload-artifact from 3.1.0 to 3.1.2 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3.1.0 to 3.1.2. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@3cea537...0b7f8ab) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump Swatinem/rust-cache from 2.0.0 to 2.2.0 Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.0.0 to 2.2.0. - [Release notes](https://github.com/Swatinem/rust-cache/releases) - [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md) - [Commits](Swatinem/rust-cache@6720f05...359a70e) --- updated-dependencies: - dependency-name: Swatinem/rust-cache dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump messense/maturin-action from 1.34.0 to 1.35.0 Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.34.0 to 1.35.0. - [Release notes](https://github.com/messense/maturin-action/releases) - [Commits](PyO3/maturin-action@7208c29...ac0a1ec) --- updated-dependencies: - dependency-name: messense/maturin-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump actions/setup-python from 4.2.0 to 4.5.0 Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4.2.0 to 4.5.0. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v4.2.0...d27e3f3) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump step-security/harden-runner from 2.0.0 to 2.1.0 Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.0.0 to 2.1.0. - [Release notes](https://github.com/step-security/harden-runner/releases) - [Commits](step-security/harden-runner@ebacdc2...18bf8ad) --- updated-dependencies: - dependency-name: step-security/harden-runner dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Added float fields to schema with tests (#36) * Added float fields to schema with tests * Fixed typo * [StepSecurity] Apply security best practices (#38) Signed-off-by: StepSecurity Bot <[email protected]> Signed-off-by: StepSecurity Bot <[email protected]> * Harden CI (#39) * Harden the egress and add dependabot cargo * delete file * harden codeql * build(deps): bump messense/maturin-action from 1.35.0 to 1.35.2 (#40) Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.35.0 to 1.35.2. - [Release notes](https://github.com/messense/maturin-action/releases) - [Commits](PyO3/maturin-action@ac0a1ec...7559b9d) --- updated-dependencies: - dependency-name: messense/maturin-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump github/codeql-action from 2.1.39 to 2.2.1 (#41) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.39 to 2.2.1. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@a34ca99...3ebbd71) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat: create custom kapiche tokenizer (#42) * Added a custom Kapiche tokenizer that is inline with the current Tokenizer in Kapiche. * Lint fixes * build(deps): bump messense/maturin-action from 1.35.2 to 1.36.0 (#47) Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.35.2 to 1.36.0. - [Release notes](https://github.com/messense/maturin-action/releases) - [Commits](PyO3/maturin-action@7559b9d...7c85798) --- updated-dependencies: - dependency-name: messense/maturin-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Tantivy_0.19.1_upgrade (#48) * Adding __init__.py file to the tantivy folder to make maturin happy Add Cargo.lock to the repo Set the git-fetch-with-cli cargo flag so that we can override fetch settings Renaming .cargo/config to .cargo/config.toml Adding github-quiq-sh cargo registry Point dependencies at our github-quiq-sh registry Trying to resolve this build issue, pointing pyo3-build-config at our github-quiq-sh registry SER-21487: Enable support for all standard Tantivy languages plus Chinese + Japanese in tantivy-py SER-21487: Use uname rather than UNAME in the Makefile SER-21487: Fix document date handling SER-23013: Upgrade Tantivy and other dependencies * Upgrade to Tantivy 0.19.1 * Added changes and fixed issues * Formatting fixes --------- Co-authored-by: Phill Mell-Davies <[email protected]> * build(deps): bump pyo3-build-config from 0.18.0 to 0.18.1 (#49) Bumps [pyo3-build-config](https://github.com/pyo3/pyo3) from 0.18.0 to 0.18.1. - [Release notes](https://github.com/pyo3/pyo3/releases) - [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md) - [Commits](PyO3/pyo3@v0.18.0...v0.18.1) --- updated-dependencies: - dependency-name: pyo3-build-config dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump tantivy from 0.19.1 to 0.19.2 (#53) Bumps [tantivy](https://github.com/quickwit-oss/tantivy) from 0.19.1 to 0.19.2. - [Release notes](https://github.com/quickwit-oss/tantivy/releases) - [Changelog](https://github.com/quickwit-oss/tantivy/blob/main/CHANGELOG.md) - [Commits](quickwit-oss/tantivy@0.19.1...0.19.2) --- updated-dependencies: - dependency-name: tantivy dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump github/codeql-action from 2.2.1 to 2.2.4 (#50) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.2.1 to 2.2.4. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@3ebbd71...17573ee) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump serde_json from 1.0.92 to 1.0.93 (#51) Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.92 to 1.0.93. - [Release notes](https://github.com/serde-rs/json/releases) - [Commits](serde-rs/json@v1.0.92...v1.0.93) --- updated-dependencies: - dependency-name: serde_json dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update Cargo.lock * Update Makefile --------- Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: StepSecurity Bot <[email protected]> Co-authored-by: Phill Mell-Davies <[email protected]> Co-authored-by: Cam Parry <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Cameron <[email protected]> Co-authored-by: StepSecurity Bot <[email protected]> Co-authored-by: Phill Mell-Davies <[email protected]>
Upgrade all Cargo dependencies, including upgrade to Tantivy 0.19.2
Build in customer tokenizers so other supported languages can be used