Skip to content

Two RFCs entailing changes to PowerArchiver's internals. #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 25, 2016

Conversation

OneWingedShark
Copy link
Contributor

@OneWingedShark OneWingedShark commented Aug 19, 2016

More completely separating the interface (command-line) from the underlying process-executor allows for a more uniform handling of processes, which would enable the usage of non-text typed-streams in inter-process communication which, in turn, will increase reliability and eliminate an entire class of vulnerabilities.


This change is Reviewable

@msftclas
Copy link

Hi @OneWingedShark, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes. I promise there's no faxing. https://cla.microsoft.com.

TTYL, MSBOT;

@msftclas
Copy link

@OneWingedShark, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR.

Thanks, MSBOT;

@lzybkr
Copy link
Contributor

lzybkr commented Aug 20, 2016

PowerShell implements strongly typed IPC which is used in PowerShell remoting over PSRP as well as local IPC between PowerShell processes (via the powershell command line parameter -OutputFormat XML). You can read more about the format here.

@jpsnover
Copy link

jpsnover commented Aug 20, 2016

Thanks for the RFC - I appreciate your contribution. We definitely need to do something in this area - I've been calling that a move to a microservices model.

@lzybkr points out - we do have a strong typed IPC. That said, the design center for our mechanism is to talk to an PS engine and set of cmdlets.
I think the RFC adds value in pointing out that we need something similar but for a much smaller design center - I'm thinking module or set of modules.

For me, the heart of the problem is that certain modules may require runtimes which conflict. Having that separation will give many of the benefits you are looking for here but also if we get the ipc contract right, we open ourselves up to being able to support completely different runtime systems.

The weight/overhead of such a mechanism should be dramatically lower - it is essentially just implementing the cmdlet contract w/serialization/deserialization.

This is the heart of what I refer to as shifting to a microservices architecture.
We are going to have to do this to deal with PowerShell Core needing to invoke modules which will only work with Windows PowerShell.

If you think about it - we already do this. This is how WMI-based cmdlets are implemented. We just need to generalize that model.

Thanks again!
jps[MSFT]

@OneWingedShark
Copy link
Contributor Author

@lzybkr -- XML is a pretty bad serialization format for computer-interchange efficiency because of all the computational-energy to parse and the rather bloated textual-format; even considering it as a serialization/deserialzation method it is deficient in that w/o a DTD [or similar functionality] the validity of the data is much more difficult to check.

Erik Naggum details how the usage of ASN.1 took 1/200th the processing-time and 75% of the memory to replicate the in-memory structure (serialize/deserialize) in this newsgroup posting. (Directly relevant in that XML is a subset of SGML.)

As an example, the XML serialization you referenced defines a (True) boolean's serialization as "true" (eleven bytes), in contrast the ASN.1 DER encoding of the same is 16#01_01C_3BF# (three bytes) -- that's a lot of bloat (366%).

What would be nice is if, in addition to merely ensuring that the type is respected that the constraints are as well; as exemplified below:

-- Provided as standard [sub]types, Natural and Positive include constraints on the
-- acceptable values of the type in addition to the "supertype".
Subtype Natural is Integer range 0..Integer'Last;
Subtype Positive is Natural range 1..Natural'Last;

@jpsnover -- WRT the IPC contract, why not something like Ada's task interface types?

type Stream_Handle is not null access Root_Stream_Type'Class; -- or something.
type IProcess is task interface;

-- For creating we need the two streams, one for the data, the other for the options.
Function Create( Data, Options : Stream_Handle ) return IProcess is abstract;
-- Run is obvious.
Procedure Run( Thread : IProcess ) is abstract;
-- Done returns the stream-handle of a stream containing its data-output.
Function Done( Thread : IProcess ) return Stream_Handle;

Then the execution core could run the conceptual command "A | B | C" which would result in an array/tree/list of execution where:
C:= ( Get_Data(Cx), Get_Options(Cx) ); B:= ( C.Done, Get_Options(Cx) ); A:= ( B.Done, Get_Options(Bx) )
{Xx being the internal handle the execution core associated the options/data with.}

...or am I misunderstanding your problem?

WRT runtimes, are you talking language runtimes or (given the DOTNET) differing versions of the Common Language Runtime?

If the latter it seems more like a DOTNET issue (GAC messiness?) and the correct solution would be the use of a database indexing the actual function off of the tuple (function-profile, version) with version defaulting to the latest. Another benefit could be saved space, as functions unchanged in-between versions wouldn't need a new entry but rather implement a "get latest lower than or equal to X" retrieval-function. (Given the namsepaced nature of assemblies, perhaps a hierarchical database would be better than a relational.) -- It would be tricky to do, but getting it under control would make things easier in the future.

Taking it to its more extreme levels you wouldn't even have assemblies, but rather an 'assembly' would be a view grouping together types, functions, procedures, etc, broken down into semantically meaningful constructs (possibly with preservation of comments) that are associated with the CIL-object (or machine-code) that is the result of compiling the deconstructed code. -- Though admittedly this would be more useful for compilers and package-manager systems than managing the GAC.

@lzybkr lzybkr merged commit c107dc1 into PowerShell:master Aug 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants