Skip to content

Presubmission: harmonize-wq #132

Closed
Closed
@jbousquin

Description

@jbousquin

Submitting Author: Justin Bousquin (@jbousquin)
Package Name: harmonize-wq
One-Line Description of Package: Standardize, clean and wrangle Water Quality Portal data into more analytic-ready formats
Repository Link (if existing): https://github.com/USEPA/harmonize-wq


Code of Conduct & Commitment to Maintain Package

Description

  • Include a brief paragraph describing what your package does:
    The US EPA's Water Quality Portal (WQP) is a data warehouse that facilitates access to data stored in large water quality databases in a common format. There are tools to facilitate both publishing data to and retrieving data from WQP, harmonize-wq is focused on retrieved data (1) cleaning to ensure it meets the required quality standards, and (2) wrangling to get it in a more analytic-ready format. Although there are many examples where this has been done, standardized tools to perform this task could make it less time-intensive, more standardized, and more reproducible.

Community Partnerships

We partner with communities to support peer review with an additional layer of
checks that satisfy community requirements. If your package fits into an
existing community please check below:

Scope

Scope

  • Please indicate which category or categories.
    Check out our package scope page to learn more about our
    scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):

    • Data retrieval
    • Data extraction
    • Data processing/munging
    • Data deposition
    • Data validation and testing
    • Data visualization
    • Workflow automation
    • Citation management and bibliometrics
    • Scientific software wrappers
    • Database interoperability

Domain Specific & Community Partnerships

- [ ] Geospatial
- [ ] Education
- [ ] Pangeo
- [ ] Unsure/Other (explain below)
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
    Package has some limited geospatial (leverages geopandas) to handle where samples were taken and to build retrieval queries from an area of interest geometry. Likewise some data validation (e.g., checking metadata consistency) and visualization, but these are intentionally limited.

  • Who is the target audience and what are the scientific applications of this package?
    Water quality domain experts trying to synthesize available data in a stream, bay, estuary, etc.. More standardized data cleansing and wrangling allows outputs to be integrated into other tools in the water quality data pipeline, e.g., for integration into dashboards for visualization (Beck et al., 2021) or decision support tools (Booth et al., 2011).

  • Are there other Python packages that accomplish similar things? If so, how does yours differ?
    No packages to my knowledge, there is in R: USEPA/TADA

  • Any other questions or issues we should be aware of:
    Would like to leverage the relationship with JOSS, paper.md and documentation needs to clear internal review before this can be submitted. Package leverages USGS's dataretrieval to retrieve the data and pint for units handling. Current focus of development (open branches) is on demonstrations in jupyter notebooks, examples as part of doc strings, and expanding handling of sample fraction in combination with similar characteristicName (e.g., dissolved vs filtered nitrogen). Very open to ideas for making it more maintainable!

P.S. Have feedback/comments about our review process? Leave a comment here

Metadata

Metadata

Assignees

Type

No type

Projects

Status

pre-submission

Status

Closed

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions