Skip to content

Reslove Blob to an existing ArrayBuffer #143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Zhang-Junzhi opened this issue Dec 6, 2019 · 16 comments
Closed

Reslove Blob to an existing ArrayBuffer #143

Zhang-Junzhi opened this issue Dec 6, 2019 · 16 comments

Comments

@Zhang-Junzhi
Copy link

Currently, Blobs can only be resloved with a newly-created ArrayBuffer.

Sometimes, it would be much more efficient if a Blob can be directly resolved to an existing ArrayBuffer(If the size of ArrayBuffer is sufficient). For example, a content of a very large file can be directly read to the ArrayBuffer of a WASM memory. Without this feature, we need to first call File.ArrayBuffer to resolve the large content to a newly-created ArrayBuffer, and then copy it to the WASM memory.

@annevk
Copy link
Member

annevk commented Dec 6, 2019

Where is reslove defined?

What would the exact semantics of this be?

@Zhang-Junzhi
Copy link
Author

Zhang-Junzhi commented Dec 6, 2019

Sorry, to clarify the definition:

Reslove here means returning a Promise that resolves with the contents of the blob as binary data contained in the ArrayBuffer, like Blob.ArrayBuffer, FileReader.readAsArrayBuffer

@annevk
Copy link
Member

annevk commented Dec 6, 2019

Would there be a single write for all bytes?

@Zhang-Junzhi
Copy link
Author

AFAIK, currently there's no such way of resolving Blob to an existing ArrayBuffer(Or I am happy to be wrong).

If I use WebAssembly, and I want to read the content of a file, I need to first call File.ArrayBuffer to resolve the large content to a newly-created ArrayBuffer, and then copy it to the WASM memory. This means double writes.

@annevk
Copy link
Member

annevk commented Dec 6, 2019

I understand, I was basically asking if the writing would be done similarly to https://encoding.spec.whatwg.org/#dom-textencoder-encodeinto. So unless you're multi-threaded and use SharedArrayBuffer, you cannot observe a partially filled buffer.

Also, if we're doing this we should probably do it by changing the Body mixin in Fetch.

@Zhang-Junzhi
Copy link
Author

I understand, I was basically asking if the writing would be done similarly to https://encoding.spec.whatwg.org/#dom-textencoder-encodeinto. So unless you're multi-threaded and use SharedArrayBuffer, you cannot observe a partially filled buffer.

Also, if we're doing this we should probably do it by changing the Body mixin in Fetch.

I am not sure which specific way to resolve the issue, but just raised the efficiency issue in my use case.

@Zhang-Junzhi
Copy link
Author

Zhang-Junzhi commented Dec 7, 2019

I understand, I was basically asking if the writing would be done similarly to https://encoding.spec.whatwg.org/#dom-textencoder-encodeinto. So unless you're multi-threaded and use SharedArrayBuffer, you cannot observe a partially filled buffer.

Did you mean that I can use SharedArrayBuffer together with encodeInto via multi-threads, so WASM can work early while still coping buffer to WASM memory, as a workaround to reduce latency?

But consider if I am reading a 100MB file which is not of streaming format, that means unless the whole 100MB content of the file has been ready in WSAM memory, partically ready content doesn't have much value. In that case, SharedArrayBuffer + encodeInto still doesn't help much.

@mkruisselbrink
Copy link
Collaborator

If I understand what is being requested correctly, I believe you can mostly do that already. I.e. to read a Blob into a script-supplied (pre-allocated) ArrayBuffer, you can call Blob.stream() to get a ReadableStream, and then use a ReadableStreamBYOBReader to read from that stream into a script-supplied array buffer.

@Zhang-Junzhi
Copy link
Author

Good method! Thanks for your reply. That is a specific guide to achieve my purpose(Though it seems none of the major browsers has implemented ReadableStreamBYOBReader yet.

@Zhang-Junzhi
Copy link
Author

One more issue to put this topic further:

If WASM runs in a worker thread, since File blobs can only be used in the window thread. two copies still seems unavoidable, because WASM memory buffer cannot be detached and transferred to the window thread, and File objects in the window thread cannot be used in the worker thread.

@annevk
Copy link
Member

annevk commented Dec 10, 2019

You can message a File object to a worker. If that's not sufficient for some reason it might be best to open a separate issue for that as it seems different from reading a blob into an existing buffer.

@Zhang-Junzhi
Copy link
Author

You can message a File object to a worker.

Oops, didn't know of that. Thanks for pointing out.

After checking the definition of File, and structuredserializeinternal in the HTML spec, I realised I had misunderstood File.

If that's not sufficient for some reason it might be best to open a separate issue for that as it seems different from reading a blob into an existing buffer.

Since it's still connected to the topic "read Blob into an allocated ArrayBuffer", just a special case of it. So I decided to post it in the same issue.

@jimmywarting
Copy link

jimmywarting commented Jan 9, 2020

fyi a ReadableStream is also transferable with postMessages

try {
  const readable = new ReadableStream()
  const mc = new MessageChannel()
  mc.port1.postMessage(readable, [readable])
  // tested support for transferable readable streams
} catch(err) {}

but it only works in chrome right now i think...

@annevk
Copy link
Member

annevk commented Jan 9, 2020

Let's dupe this into #83.

@annevk annevk closed this as completed Jan 9, 2020
@surferjeff
Copy link

You can message a File object to a worker.

Oops, didn't know of that. Thanks for pointing out.

After checking the definition of File, and structuredserializeinternal in the HTML spec, I realised I had misunderstood File.

If that's not sufficient for some reason it might be best to open a separate issue for that as it seems different from reading a blob into an existing buffer.

Since it's still connected to the topic "read Blob into an allocated ArrayBuffer", just a special case of it. So I decided to post it in the same issue.

Would someone be willing to confirm that the following line doesn't copy the underlying file contents?

    ctx.postMessage(file);

If I understand @Zhang-Junzhi's comments correctly, then they conclude that a file's contents isn't copied into another buffer as it is passed via postMessage().

I read the two documents linked to above and don't understand them well enough to come to the same conclusion.

Context: People like to drop huge files (~1GB) into my web application, and I need to post them to workers. Copying buffers exceeds Chrome's memory limits:

DataCloneError
Failed to execute 'postMessage' on 'Worker': Data cannot be cloned, out of memory.

@annevk
Copy link
Member

annevk commented May 21, 2022

@surferjeff currently it does say that it makes a full copy, but I don't think that's correct. Could you file a new issue on that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants