Skip to content

Conversation

@pdavies011010
Copy link
Contributor

@pdavies011010 pdavies011010 commented Feb 5, 2023

Upgrade all Cargo dependencies, including upgrade to Tantivy 0.19.2
Build in customer tokenizers so other supported languages can be used

Add Cargo.lock to the repo

Set the git-fetch-with-cli cargo flag so that we can override fetch settings

Renaming .cargo/config to .cargo/config.toml

Adding github-quiq-sh cargo registry

Point dependencies at our github-quiq-sh registry

Trying to resolve this build issue, pointing pyo3-build-config at our github-quiq-sh registry

SER-21487: Enable support for all standard Tantivy languages plus Chinese + Japanese in tantivy-py

SER-21487: Use uname rather than UNAME in the Makefile

SER-21487: Fix document date handling

SER-23013: Upgrade Tantivy and other dependencies
@pratyushmittal
Copy link
Contributor

Tantivy 0.19.1 version fixes so many bugs vs the current 0.17. The incremental indexes were increasing index size exponentially till v0.17.

Thanks a lot @pdavies011010 for adding support for the newer version.

@pdavies011010
Copy link
Contributor Author

pdavies011010 commented Feb 7, 2023

Thanks @pratyushmittal ! I'm admittedly not a Rust expert so please let me know if there's anything I should do to help get this merged in, the changes should probably looked over by somebody with a little more familiarity with Rust, but all the tests do pass and I have been using this version successfully for my needs in python. One thing that's hopefully okay is I switched the makefile to use maturin for the build, that's the way that has been working best for me to create builds wheels for multiple python3 versions, if not okay I can back that change out.

Comment on lines 214 to 227
let fast = match fast {
Some(f) => {
let f = f.to_lowercase();
match f.as_ref() {
"single" => Some(schema::Cardinality::SingleValue),
"multi" => Some(schema::Cardinality::MultiValues),
_ => return Err(exceptions::PyValueError::new_err(
"Invalid index option, valid choices are: 'multivalue' and 'singlevalue'"
)),
}
}
None => None,
};
opts = opts.set_fast(fast.unwrap());
Copy link
Collaborator

@Sidhant29 Sidhant29 Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let fast = match fast {
Some(f) => {
let f = f.to_lowercase();
match f.as_ref() {
"single" => Some(schema::Cardinality::SingleValue),
"multi" => Some(schema::Cardinality::MultiValues),
_ => return Err(exceptions::PyValueError::new_err(
"Invalid index option, valid choices are: 'multivalue' and 'singlevalue'"
)),
}
}
None => None,
};
opts = opts.set_fast(fast.unwrap());
let cardinality = match fast {
Some(f) => {
let f = f.to_lowercase();
match f.as_ref() {
"single" => Some(schema::Cardinality::SingleValue),
"multi" => Some(schema::Cardinality::MultiValues),
_ => return Err(exceptions::PyValueError::new_err(
"Invalid index option, valid choices are: 'multivalue' and 'singlevalue'"
)),
}
}
None => None,
};
if fast.is_some(){
opts = opts.set_fast(fast.unwrap());
}

I tried these changes out with some sample projects and this seems to fail in case fast is 'None', as
pyo3_runtime.PanicException: called 'Option::unwrap()' on a None value.

Apart from this looks good!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you can format rust code using
rustfmt --check src/*.rs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sidhant29 Good finds, I updated to put in the fix you suggested, and applied rustfmt.

@PSeitz
Copy link
Contributor

PSeitz commented Feb 10, 2023

tantivy 0.19.2 has been released

@pdavies011010
Copy link
Contributor Author

@PSeitz Updated to Tantivy 0.19.2

@pdavies011010 pdavies011010 changed the title Tantivy 0.19.1 Tantivy 0.19.2 Feb 10, 2023
This was referenced Feb 12, 2023
@PSeitz
Copy link
Contributor

PSeitz commented Feb 12, 2023

@ChillFish8 Can you have a look?

Copy link
Collaborator

@ChillFish8 ChillFish8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good overall, just a few small things to sort.

Cargo.toml Outdated
Comment on lines 14 to 21
pyo3-build-config = { version = "0.18.0" }

[dependencies]
chrono = "0.4.19"
tantivy = "0.17"
itertools = "0.10.3"
futures = "0.3.21"
serde_json = "1.0.64"
chrono = { version = "0.4.23" }
tantivy = { version = "0.19.2" }
itertools = { version = "0.10.5" }
futures = { version = "0.3.26" }
serde_json = { version = "1.0.91" }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this change is worth it (using version = xxx) we don't use any of the features so we can just use the standard dep = "x.x.x" format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

rustfmt.toml Outdated
@@ -1 +1,2 @@
max_width = 80
edition = "2018"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed. It'll read the cargo file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it will be read from Cargo.toml if you use cargo fmt, but not if you run rustfmt directly. If we want to standardize around using cargo fmt I can update the Makefile to use it instead of rustfmt, but otherwise I think we want to leave this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think standardising everything to use cargo would be a good idea 😃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K, updated the Makefile and removed this line

src/document.rs Outdated
Comment on lines 61 to 67
d.into_utc().year(),
d.into_utc().month() as u8,
d.into_utc().day() as u8,
d.into_utc().hour() as u8,
d.into_utc().minute() as u8,
d.into_utc().second() as u8,
d.into_utc().microsecond() as u32,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably easier here to have once into_utc() call and then work on that result for all the values rather than calling it every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/document.rs Outdated
Comment on lines 100 to 101
Value::Bool(b) => format!("{}", b),
Value::IpAddr(i) => format!("{}", *i),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, clippy now recommends using the new formatting method format!("{b}") for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linted with clippy

src/document.rs Outdated
Comment on lines 398 to 400
return Python::with_gil(|py| -> PyResult<Vec<PyObject>> {
self.get_all(py, field_name)
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needless return

Suggested change
return Python::with_gil(|py| -> PyResult<Vec<PyObject>> {
self.get_all(py, field_name)
});
Python::with_gil(|py| -> PyResult<Vec<PyObject>> {
self.get_all(py, field_name)
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/index.rs Outdated
Comment on lines 374 to 391
let mut analyzers = HashMap::new();
analyzers.insert("ar_stem", Language::Arabic);
analyzers.insert("da_stem", Language::Danish);
analyzers.insert("nl_stem", Language::Dutch);
analyzers.insert("fi_stem", Language::Finnish);
analyzers.insert("fr_stem", Language::French);
analyzers.insert("de_stem", Language::German);
analyzers.insert("el_stem", Language::Greek);
analyzers.insert("hu_stem", Language::Hungarian);
analyzers.insert("it_stem", Language::Italian);
analyzers.insert("no_stem", Language::Norwegian);
analyzers.insert("pt_stem", Language::Portuguese);
analyzers.insert("ro_stem", Language::Romanian);
analyzers.insert("ru_stem", Language::Russian);
analyzers.insert("es_stem", Language::Spanish);
analyzers.insert("sv_stem", Language::Swedish);
analyzers.insert("ta_stem", Language::Tamil);
analyzers.insert("tr_stem", Language::Turkish);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be a hashmap here, an array would be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@pdavies011010
Copy link
Contributor Author

@ChillFish8 Cleaned things up per your suggestions, thanks!

@ChillFish8
Copy link
Collaborator

Looks good 👍 Just need to run fmt to fix the CI and then this can be merged

@pdavies011010
Copy link
Contributor Author

Whoops, the CI config was also set to use rustfmt instead of cargo fmt, which is why the Lint step failed. I updated it (hopefully successfully) to use cargo fmt instead.

@ChillFish8 ChillFish8 merged commit 164adc8 into quickwit-oss:master Feb 14, 2023
@pdavies011010 pdavies011010 deleted the tantivy-0.19.1 branch February 14, 2023 14:48
wallies referenced this pull request in Kapiche/tantivy-py Feb 15, 2023
* Tantivy 0.19.2 (#67)

* Adding __init__.py file to the tantivy folder to make maturin happy

Add Cargo.lock to the repo

Set the git-fetch-with-cli cargo flag so that we can override fetch settings

Renaming .cargo/config to .cargo/config.toml

Adding github-quiq-sh cargo registry

Point dependencies at our github-quiq-sh registry

Trying to resolve this build issue, pointing pyo3-build-config at our github-quiq-sh registry

SER-21487: Enable support for all standard Tantivy languages plus Chinese + Japanese in tantivy-py

SER-21487: Use uname rather than UNAME in the Makefile

SER-21487: Fix document date handling

SER-23013: Upgrade Tantivy and other dependencies

* Upgrade to Tantivy 0.19.1

* Apply rustfmt and fix bug when fast option = None

* Upgrade to tantivy-0.19.2

* Standardize around using 'cargo fmt' rather than 'rustfmt'

* Reverting to old style dependencies

* Linting with clippy

* Switching out hashmap for defining tokenizers for an array, and adding test for Spanish indexing

* Use cargo fmt instead of rustfmt on the Lint ci step

* Add python release build

* workflow dispatch

* simple

* add release

* fix publish pipeline

* update maturin args

* test

* maturin config

* build

* maturin

* build(deps): bump step-security/harden-runner from 1.4.4 to 2.0.0

Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 1.4.4 to 2.0.0.
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](step-security/harden-runner@74b568e...ebacdc2)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump actions/checkout

Bumps [actions/checkout](https://github.com/actions/checkout) from d171c3b028d844f2bf14e9fdec0c58114451e4bf to 61b9e3751b92087fd0b06925ba6dd6314e06f089.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@d171c3b...61b9e37)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump alexellis/upload-assets from 0.2.2 to 0.4.0

Bumps [alexellis/upload-assets](https://github.com/alexellis/upload-assets) from 0.2.2 to 0.4.0.
- [Release notes](https://github.com/alexellis/upload-assets/releases)
- [Commits](alexellis/upload-assets@eaab147...259de51)

---
updated-dependencies:
- dependency-name: alexellis/upload-assets
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump messense/maturin-action from 1.28.3 to 1.34.0

Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.28.3 to 1.34.0.
- [Release notes](https://github.com/messense/maturin-action/releases)
- [Commits](PyO3/maturin-action@20111a7...7208c29)

---
updated-dependencies:
- dependency-name: messense/maturin-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump actions-rs/toolchain

Bumps [actions-rs/toolchain](https://github.com/actions-rs/toolchain) from 63eb9591781c46a70274cb3ebdf190fce92702e8 to 16499b5e05bf2e26879000db0c1d13f7e13fa3af.
- [Release notes](https://github.com/actions-rs/toolchain/releases)
- [Changelog](https://github.com/actions-rs/toolchain/blob/master/CHANGELOG.md)
- [Commits](actions-rs/toolchain@63eb959...16499b5)

---
updated-dependencies:
- dependency-name: actions-rs/toolchain
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* testing

* Update publish.yaml

* Update publish.yaml

* Update publish.yaml

* build(deps): bump actions/upload-artifact from 3.1.0 to 3.1.2

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3.1.0 to 3.1.2.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@3cea537...0b7f8ab)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump Swatinem/rust-cache from 2.0.0 to 2.2.0

Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.0.0 to 2.2.0.
- [Release notes](https://github.com/Swatinem/rust-cache/releases)
- [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md)
- [Commits](Swatinem/rust-cache@6720f05...359a70e)

---
updated-dependencies:
- dependency-name: Swatinem/rust-cache
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump messense/maturin-action from 1.34.0 to 1.35.0

Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.34.0 to 1.35.0.
- [Release notes](https://github.com/messense/maturin-action/releases)
- [Commits](PyO3/maturin-action@7208c29...ac0a1ec)

---
updated-dependencies:
- dependency-name: messense/maturin-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump actions/setup-python from 4.2.0 to 4.5.0

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4.2.0 to 4.5.0.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v4.2.0...d27e3f3)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* build(deps): bump step-security/harden-runner from 2.0.0 to 2.1.0

Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.0.0 to 2.1.0.
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](step-security/harden-runner@ebacdc2...18bf8ad)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* Added float fields to schema with tests (#36)

* Added float fields to schema with tests

* Fixed typo

* [StepSecurity] Apply security best practices (#38)

Signed-off-by: StepSecurity Bot <[email protected]>

Signed-off-by: StepSecurity Bot <[email protected]>

* Harden CI (#39)

* Harden the egress and add dependabot cargo

* delete file

* harden codeql

* build(deps): bump messense/maturin-action from 1.35.0 to 1.35.2 (#40)

Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.35.0 to 1.35.2.
- [Release notes](https://github.com/messense/maturin-action/releases)
- [Commits](PyO3/maturin-action@ac0a1ec...7559b9d)

---
updated-dependencies:
- dependency-name: messense/maturin-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github/codeql-action from 2.1.39 to 2.2.1 (#41)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.39 to 2.2.1.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@a34ca99...3ebbd71)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat: create custom kapiche tokenizer (#42)

* Added a custom Kapiche tokenizer that is inline with the current Tokenizer in Kapiche.

* Lint fixes

* build(deps): bump messense/maturin-action from 1.35.2 to 1.36.0 (#47)

Bumps [messense/maturin-action](https://github.com/messense/maturin-action) from 1.35.2 to 1.36.0.
- [Release notes](https://github.com/messense/maturin-action/releases)
- [Commits](PyO3/maturin-action@7559b9d...7c85798)

---
updated-dependencies:
- dependency-name: messense/maturin-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Tantivy_0.19.1_upgrade (#48)

* Adding __init__.py file to the tantivy folder to make maturin happy

Add Cargo.lock to the repo

Set the git-fetch-with-cli cargo flag so that we can override fetch settings

Renaming .cargo/config to .cargo/config.toml

Adding github-quiq-sh cargo registry

Point dependencies at our github-quiq-sh registry

Trying to resolve this build issue, pointing pyo3-build-config at our github-quiq-sh registry

SER-21487: Enable support for all standard Tantivy languages plus Chinese + Japanese in tantivy-py

SER-21487: Use uname rather than UNAME in the Makefile

SER-21487: Fix document date handling

SER-23013: Upgrade Tantivy and other dependencies

* Upgrade to Tantivy 0.19.1

* Added changes and fixed issues

* Formatting fixes

---------

Co-authored-by: Phill Mell-Davies <[email protected]>

* build(deps): bump pyo3-build-config from 0.18.0 to 0.18.1 (#49)

Bumps [pyo3-build-config](https://github.com/pyo3/pyo3) from 0.18.0 to 0.18.1.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md)
- [Commits](PyO3/pyo3@v0.18.0...v0.18.1)

---
updated-dependencies:
- dependency-name: pyo3-build-config
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump tantivy from 0.19.1 to 0.19.2 (#53)

Bumps [tantivy](https://github.com/quickwit-oss/tantivy) from 0.19.1 to 0.19.2.
- [Release notes](https://github.com/quickwit-oss/tantivy/releases)
- [Changelog](https://github.com/quickwit-oss/tantivy/blob/main/CHANGELOG.md)
- [Commits](quickwit-oss/tantivy@0.19.1...0.19.2)

---
updated-dependencies:
- dependency-name: tantivy
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github/codeql-action from 2.2.1 to 2.2.4 (#50)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.2.1 to 2.2.4.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@3ebbd71...17573ee)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump serde_json from 1.0.92 to 1.0.93 (#51)

Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.92 to 1.0.93.
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](serde-rs/json@v1.0.92...v1.0.93)

---
updated-dependencies:
- dependency-name: serde_json
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update Cargo.lock

* Update Makefile

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: StepSecurity Bot <[email protected]>
Co-authored-by: Phill Mell-Davies <[email protected]>
Co-authored-by: Cam Parry <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Cameron <[email protected]>
Co-authored-by: StepSecurity Bot <[email protected]>
Co-authored-by: Phill Mell-Davies <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants