Skip to content

DataEntry and BlobStore Options #550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 28, 2020
Merged

DataEntry and BlobStore Options #550

merged 14 commits into from
Jul 28, 2020

Conversation

Affie
Copy link
Member

@Affie Affie commented Jul 22, 2020

Note: I gave functions new names to avoid having to deprecate everything. It's also temporary in a different file structure to make it easier.

There are basically 2 design concepts: BlobStoreEntry vs <: AbstractDataEntry
BlobStoreEntry has a generic entry with the blobstores for every type.
<: AbstractDataEntry has a custom type with no blobstore.

  • BlobStoreEntry is the general data entry that goes with blob stores.
    • FolderStore
  • Other AbstractDataEntry types (used without blobstores)
    • FileDataEntry
    • InMemoryDataEntry - save anything, persist nothing.
  • DataBlob CRUD on <: AbstractDataEntry and AbstractDataStore
  • Data CRUD on <: AbstractDataEntry and AbstractDataStore

TODO

  • Store stores in DFG? - @GearsAD Got this 👍
  • Clean up
  • Remove old/deprecate
  • Maybe choose one EDIT: keep both for now, but focus on blobstore
  • Some test and all methods Expand tests and methods
  • Separate PR - perhaps Mongo blob CRUD - @GearsAD Yep separate is good. Opened issue to track: Implement Mongo blob store #552

@Affie Affie added the data: entry=>blob Previously bigdata, suggested over `smalldata` label Jul 22, 2020
@Affie Affie added this to the v0.10.0 milestone Jul 22, 2020
@Affie Affie force-pushed the maint/20Q3/datablob_concepts branch from 2d038ca to 91c1245 Compare July 22, 2020 21:24
@codecov
Copy link

codecov bot commented Jul 22, 2020

Codecov Report

Merging #550 into master will increase coverage by 5.34%.
The diff coverage is 69.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #550      +/-   ##
==========================================
+ Coverage   74.14%   79.49%   +5.34%     
==========================================
  Files          34       33       -1     
  Lines        2054     2126      +72     
==========================================
+ Hits         1523     1690     +167     
+ Misses        531      436      -95     
Impacted Files Coverage Δ
src/Deprecated.jl 0.00% <0.00%> (ø)
src/DistributedFactorGraphs.jl 100.00% <ø> (ø)
src/services/Serialization.jl 82.35% <ø> (ø)
src/services/AbstractDFG.jl 84.63% <50.00%> (+0.61%) ⬆️
src/DataBlobs/services/BlobStores.jl 60.37% <60.37%> (ø)
src/DataBlobs/services/FileDataEntryBlob.jl 83.33% <83.33%> (ø)
src/DataBlobs/services/AbstractDataEntries.jl 95.45% <92.30%> (+3.14%) ⬆️
src/DataBlobs/entities/AbstractDataEntries.jl 100.00% <100.00%> (ø)
src/DataBlobs/services/DataEntryBlob.jl 100.00% <100.00%> (ø)
src/DataBlobs/services/InMemoryDataEntryBlob.jl 100.00% <100.00%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce22180...2407a9e. Read the comment docs.

@Affie Affie marked this pull request as ready for review July 27, 2020 17:56
@Affie Affie requested review from dehann and GearsAD and removed request for dehann July 27, 2020 20:50
mydata = Dict(:soundBite => randn(Float32, 10000), :meta => "something about lazy foxes and fences.")

# store the data
addDataEntry!( fg, :x0, datastore, :SOUND_DATA, "application/json", Vector{UInt8}(JSON2.write( mydata )) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be new API addData!?

Related

addDataEntry!, addData!, fetchData, fetchDataEntryElement
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we deprecating this, what is the direct replacement for this documented example in the new API -- specifically the combo?

addDataEntry!( fg, :x0, datastore, :SOUND_DATA, "application/json", Vector{UInt8}(JSON2.write( mydata ))  )
entry, rawData = fetchData(fg, :x0, datastore, :SOUND_DATA)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what default to use yet. For persistence: the FolderStore, for quick and easy (not persisted or copied to the cloud) I like the in-memory data entry.

We currently have:

dataset1 = rand(UInt8, 1000)

ade,adb = addData!(InMemoryDataEntry, dfg, :x1, :random, dataset1)
#or
ade,adb = addData!(FileDataEntry, dfg, :x1, :random, "/tmp/dfgFilestore", dataset1)
#or
ds = FolderStore{Vector{UInt8}}(:filestore, "/tmp/dfgFilestore")
addBlobStore!(dfg, ds)
ade,adb = addData!(dfg, :filestore, :x1, :random, dataset1)

The docs are a bit behind and we need good examples. The main purpose of PR was to have a clean API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct replacement should be (if the store is not saved in the DFG):

blobstore = FolderStore{Vector{UInt8}}(:filestore, "/tmp/dfgFilestore")
addData!(fg, blobstore, :x0, :SOUND_DATA, Vector{UInt8}(JSON2.write( mydata )), mimeType="application/json")

rawData, entry = getData(fg, blobstore, :x0, :SOUND_DATA)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for changing the parameter order. The DFG and BlobStore are mutated so they should go first and there were ambiguities with dispatch.

The idea is to only need the CRUD verbs getData/addData!/updateData!/deleteData! and not have other verbs such as fetch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRUD API

  • Verbs: get,add,update,delete
  • Nouns: Data, DataBlob, DataEntry

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha thanks! This helps

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for changing the parameter order.

Totally fine, I'm just trying to follow

@Affie Affie added the API label Jul 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API breaking change data: entry=>blob Previously bigdata, suggested over `smalldata` standardization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use UUIDs in FileDataStore DataEntry should be refactored CRC SHA checksum big data blobs to ensure immutability
2 participants