Skip to content

Presubmission Inquiry for sciform (float -> string scientific formatting) #114

Closed
@jagerber48

Description

@jagerber48

Submitting Author: Justin Gerber (@jagerber48)
Package Name: sciform
One-Line Description of Package: Provides extended functionality for formatting floats into strings according to scientific standards
Repository Link (if existing): https://github.com/jagerber48/sciform


Code of Conduct & Commitment to Maintain Package

Description

  • Include a brief paragraph describing what your package does: sciform is used to convert python float objects into strings according to a variety of user-selected scientific formatting options including fixed-pointa and decimal and binary scientific and engineering notations. Where possible, formatting follows documented standards such as those published by BIPM or IEC. sciform provides certain options, such as engineering notation, well-controlled significant figure rounding, and separator customization which are not provided by the python built-in format specification mini-language (FSML). In addition, sciform provides functionality for formatting pairs of floats as value +/- uncertainty pairs according to a variety of scientific standards.

Community Partnerships

We partner with communities to support peer review with an additional layer of
checks that satisfy community requirements. If your package fits into an
existing community please check below:

Scope

Scope

  • Please indicate which category or categories.
    Check out our package scope page to learn more about our
    scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):

    • Data retrieval
    • Data extraction
    • Data processing/munging
    • Data deposition
    • Data validation and testing
    • Data visualization
    • Workflow automation
    • Citation management and bibliometrics
    • Scientific software wrappers
    • Database interoperability

Domain Specific & Community Partnerships

- [ ] Geospatial
- [ ] Education
- [ ] Pangeo
- [ ] Unsure/Other (explain below)
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
    Sciform allows for improved formatting of floats into strings according to scientific standards. These strings will be output to terminals, plots, data documents, text documents, and possibly more. Making the displayed strings more readable as per scientific standards improves the visualization of "printed number" data.

There are no existing community partnerships for this project, though there may be opportunities for education around significant figures and uncertainty.

  • Who is the target audience and what are the scientific applications of this package?
    Any scientist who uses python is in the potential target audience for this package, but especially those who are concerned with displaying data values in a way that is commensurate with the corresponding uncertainties. Most scientists likely use the python built-in string formatting for this purpose, but there are some shortcomings to python built-in formatting. Scientists who seek more formatting features could consider sciform.

  • Are there other Python packages that accomplish similar things? If so, how does yours differ?
    Yes there are similar packages.

  1. Python built-in string formatting mini language (https://docs.python.org/3/library/string.html#format-specification-mini-language). sciform includes its own string formatting mini language closely based on the built in one, but with some differences. Notably sciform includes well-controlled significant figure formatting, engineering notation, binary formatting, SI/IEC prefix substitution, digit grouping and decimal symbol options (helpful for a diversity of locales), exponent value coercion, as well as value +/- uncertainty formatting functionality.
  2. The uncertainties package (https://pythonhosted.org/uncertainties/). sciform was heavily motivated by this package. This package has sophisticated statistical handling of value +/- uncertainty pairs, handling error propagation and simulation under-the-hood. In addition, it has its own extension of the mini language for formatting value +/- uncertainty pairs. sciform has more formatting functionality than the uncertainties package including, especially, engineering notation, grouping separator controls, and prefix substitution. sciform is also a much lighter weight requirement than the uncertainties package. This may be desirable when a user wants to format strings, but they don't need the rest of the full statistical machinery of the uncertainties package.
  3. The prefixed package (https://github.com/Rockhopper-Technologies/prefixed). sciform was also motivated by the prefixed package. This package provides a sort of engineering notation where exponents are rounded to multiples of 3, and then exponents area always replaced with their corresponding SI exponent. prefixed package is a more conservative extension of the built-in formatting language. sciform includes more functionality including engineering notation without prefix substitution and more grouping/decimal symbol control. sciform also includes global configuration options for handling optional SI prefixes such as c, d, da, and h.
  4. The sigfig package (https://sigfig.readthedocs.io/en/latest/). The sigfig package has similar functionality to sciform including sig fig rounding, separator control, value +/- uncertainty formatting including some features that are only forthcoming in sciform. sig fig does not currently support binary formatting. sig fig also does not provide a format specification mini language for formatting floats. Rather floats are formatted using an overload of the built-in round function which I find to be slightly awkward compared to a Formatter object or function.
  • Any other questions or issues we should be aware of:
    Much of the code is still a work in progress. I'm still working on documenting the existing features, more unit tests are necessary for existing features, and the value +/- uncertainty features are still young and not thoroughly tested. I have important ideas in mind for more value +/- uncertainty formatting features. But I would say the core of the package is in place. One glaring gap for this package is support for Decimal number rather that float numbers. I would like to add that functionality after the functionality for formatting floats is stable.

This package is very new and has 1 user so far. Me. But, I've been kicking around code for this sort of formatting for quite some time now and think many others would find it useful. Having a small authoritative package for this sort of formatting could be useful for the scientific community. There is also some interest in getting some of these features into the python built in string formatting feature set which would be very useful. Having a package like this could be a stepping stone towards that. See https://discuss.python.org/t/new-format-specifiers-for-string-formatting-of-floats-with-si-and-iec-prefixes/26914/46. Though I do note that the format specification mini language is intentionally not 100% backwards compatible with the built in format specification mini language, so it would not be a top candidate for that role.

I'm also not very experienced when it comes to contributing to open source software. This is one of my first forays into that world, so I am learning as I go.

P.S. Have feedback/comments about our review process? Leave a comment here

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    pre-submission

    Status

    Closed

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions