Skip to content

Add BSRN format reader to iotools #1015

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wholmgren opened this issue Aug 3, 2020 · 6 comments · Fixed by #1145
Closed

Add BSRN format reader to iotools #1015

wholmgren opened this issue Aug 3, 2020 · 6 comments · Fixed by #1145
Labels
io solarfx2 DOE SETO Solar Forecasting 2 / Solar Forecast Arbiter
Milestone

Comments

@wholmgren
Copy link
Member

I need a parser for the NASA Langley CAPABLE BSRN site. Given the importance and the quality of the BSRN sites, I expect that others would benefit from this parser as well.

Does anyone have experience with this site or have a parser for the format?

https://capable.larc.nasa.gov/data/

https://cove.larc.nasa.gov/BSRN/LRC49/

Data from December 2014 to present.

1 month of data per file. Appears to be uploaded in the first few days of the following month. 1 minute intervals.

Another fun fixed width file. Entries like:

  1  987   1003 -99.9 -999 -999    975 -99.9 -999 -999
             72 -99.9 -999 -999    287 -99.9 -999 -999     19.9  37.2 1026
  1  988   1006 -99.9 -999 -999    977 -99.9 -999 -999
             72 -99.9 -999 -999    290 -99.9 -999 -999     19.8  36.9 1026
  1 1438     22 -99.9 -999 -999      0 -99.9 -999 -999
             21 -99.9 -999 -999    307 -99.9 -999 -999     17.7  57.9 1023
  1 1439     21 -99.9 -999 -999      0 -99.9 -999 -999
             20 -99.9 -999 -999    307 -99.9 -999 -999     17.6  56.8 1023

The first number is the day of the month. The second is the minute of the day. Times appear to be in UTC.

I believe the ordering is:

  • CM22 pyranometer GHI (upper left)
  • CM31 pyranometer DHI (lower left)
  • CH1 pyrheliometer DNI (upper right)
  • PIR infrared (lower right)

I'd probably read the file into a DataFrame without meaningful columns, split it into two DataFrames using .iloc[::2] and .iloc[1::2], parse the date time information into and index, then stitch the data back together.

CAPABLE Site Coordinates:
Latitude: 37.1038
Longitude: -76.3872
Elevation: 3 m ASL

cross post from SolarArbiter/solarforecastarbiter-core#541

@wholmgren wholmgren added solarfx2 DOE SETO Solar Forecasting 2 / Solar Forecast Arbiter io labels Aug 3, 2020
@kandersolar
Copy link
Member

Is that dataset different from the BSRN-formatted LRC dataset provided through pangaea? For example: https://doi.pangaea.de/10.1594/PANGAEA.913689

If it's the same dataset, I'd vote for adding a parser for the standard BSRN format instead because the BSRN format is used for many other stations as well. I actually thought pvlib already had a read_bsrn function but I guess not. The BSRN format is pretty straightforward -- a metadata header followed by nice TSV data.

@wholmgren
Copy link
Member Author

Thanks @kanderso-nrel that sounds like a better idea. I created an account but I still don't have permission to download the tsv file. I also tried to download the file via ftp with wget but got a "login incorrect" message. Do you know anything about getting access to more data?

Do you know if the data available over ftp is in tsv format too? It appears that only the ftp files have a regular naming scheme. I'd like to automate the fetch and parsing so I'd prefer a regular naming scheme to random DOIs.

@kandersolar
Copy link
Member

I created an account but I still don't have permission to download the tsv file. I also tried to download the file via ftp with wget but got a "login incorrect" message.

Ah, there is a username and password (and for whatever reason, the account you can create yourself doesn't work). I think there is just a global login that everyone uses -- I don't feel comfortable sharing it publicly, but you can email Amelie Driemel for it: https://bsrn.awi.de/?id=393.

Do you know if the data available over ftp is in tsv format too?

Looks like the FTP files are in the same format as your example, which I think is called "station-to-archive" format: https://bsrn.awi.de/data/station-to-archive-file-format/

I'd like to automate the fetch and parsing so I'd prefer a regular naming scheme to random DOIs.

I wrote a scraper a while back that uses the pangaea search function to list the datasets I wanted: https://www.pangaea.de/?q=project%3Alabel%3ABSRN+%2Bevent%3Alabel%3ALRC+%2Bcitation%3ABasic+-guidelines

Fetching data from the FTP archive would have been cleaner. I don't think I knew about the FTP archive back then. So maybe for your use case, implementing the more complex "station-to-archive" format would be better. Seems like the choice is a trade-off between nicer data format and easier file retrieval. Side note: I assume you'll want to be fetching the data automatically in the future, but if you just want historical BSRN data, I have local copies of all the US station data and can share if you want.

Possibly helpful links:

@AdamRJensen
Copy link
Member

AdamRJensen commented Oct 17, 2020

Hi @wholmgren

The file (.dat) in the second link you refer to is indeed in the "Station-to-archive" file format used by the BSRN. It is described in in detail in the BSRN Technical Plan and briefly on their website: https://bsrn.awi.de/data/station-to-archive-file-format/

As you noted the file format is note very user friendly as data for each timestamp is split over two lines (probably due to archeic restrictions).

The easiest way I have found to access them is through BSRN's FTP server: https://bsrn.awi.de/data/data-retrieval-via-ftp/

I have written a function to parse 'station-to-archive' files (read_bsrn) and a function to get bsrn files from the ftp-server (get_bsrn): https://github.com/AdamRJensen/BSRN/blob/main/bsrn_v3.ipynb

I would be happy to get some feedback on the functions and contribute them to pvlib.

@wholmgren
Copy link
Member Author

@AdamRJensen thanks, the functions in the notebook look like a great start and we'd welcome the pull request!

@AdamRJensen
Copy link
Member

@wholmgren I have rewritten the function a bit to make it simpler and tested it on a few thousand of BSRN files. It's my first pull request, so perhaps you could review it and tell me if I am missing something?

@wholmgren wholmgren changed the title Add NASA Langley CAPABLE BSRN site to iotools Add BSRN format reader to iotools Jan 26, 2021
@wholmgren wholmgren added this to the 0.9.0 milestone Jan 26, 2021
wholmgren added a commit that referenced this issue Feb 11, 2021
* Add bsrn file to read bsrn files

Related to issue #1015.

* simplified read_bsrn function

Simplified how the start and end line of the data is determined. Improved documentation, e.g. moved constants outside of function.

* Simplified selection of rows in read_bsrn

* Added read_bsrn to api.rst

* Delete 2021_01_16_read_bsrn_pull_request_v2.py

* Improved format, e.g removed trailing white spaces

* Fixed spacing issues

* Update v0.9.0.rst

* Add iotools.bsrn and import read_bsrn

* Split multiple lines to obey 75 character limit

* Corrected indentation

* Fixed indentation again

* Remove bsrn email in description

Co-authored-by: Cliff Hansen <[email protected]>

* Correct COL_SPEC variable

The previous values in the COL_SPEC variables were not all correct, leading to incorrect parsing of the data.

* Changed air_temperature to temp_air

* Add test_bsrn file

File is not complete, as I'm awaiting permission from BSRN to upload test file

* Reference to FTP updated

* Add zipped bsrn test file

* Update test filename

* Get file month/year from file instead of filename

Previously the month and year of the file were determined from the filename. This has now been changed such that the month/year is found from within the file's metadata section (second line).

* Fixed formatting/stickler issues

* Fixed formatting/stickler issues

* Fixed formatting/stickler issues

* Fix to test_format_index

* Refactored file opening and utc localization

* Fixed indentation issue

* Fixed hyperlink

* Fixed doc error

Air temperature was listed as air_temperature in the docstring instead of temp_air.

* Handle file start date explicitly

Co-authored-by: Will Holmgren <[email protected]>

* Correct pytest fixture magic

Co-authored-by: Will Holmgren <[email protected]>

* Fix indentation broken by previous commit

* Correct Dataframe to DataFrame in doc string

* Add offset to line num after explicitly handling start date

* Update test_bsrn.py

* Added compression='infer', fixed end line number issue

* Fixed test issue

* Changed timedelta unit from min to minute

* Add files via upload

All logical records after LR0100 have been removed to reduce space (be below 25 MB), but also to test the functionality of files with few logical records.

* Changed to_timedelta unit from minute' to 'T'

* Updated test to cover unzipped and zipped files

* Removed error causing blank line in test file

* Change to Unix end of line character from file by wholmgren

* Remove extra line at end of file

* Fix typo in bsrn.py doc string

Co-authored-by: Kevin Anderson <[email protected]>

Co-authored-by: Cliff Hansen <[email protected]>
Co-authored-by: Will Holmgren <[email protected]>
Co-authored-by: Kevin Anderson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io solarfx2 DOE SETO Solar Forecasting 2 / Solar Forecast Arbiter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants