Skip to content

Commit 4d5096f

Browse files
committed
Use IRBuilder in the binary parser
IRBuilder is a utility for turning arbitrary valid streams of Wasm instructions into valid Binaryen IR. It is already used in the text parser, so now use it in the binary parser as well. Since the IRBuilder API for building each intruction requires only the information that the binary and text formats include as immediates to that instruction, the parser is now much simpler than before. In particular, it does not need to manage a stack of instructions to figure out what the children of each expression should be; IRBuilder handles this instead. There are some differences between the IR constructed by IRBuilder and the IR the binary parser constructed before this change. Most importantly, IRBuilder generates better multivalue code because it avoids eagerly breaking up multivalue results into individual components that might need to be immediately reassembled into a tuple. It also parses try-delegate more correctly, allowing the delegate to target arbitrary labels, not just other `try`s. There are also a couple superficial differences in the generated label and scratch local names. There are two remaining bugs: First, support for creating DWARF location spans is missing because IRBuilder does not have an API for that yet (but source map locations work fine). Second, IRBuilder generates pops inside nameless blocks in some circumstances involving stacky code. This is currently an IR validation error, so #6950 will have to be resolved before this can land. This change also makes the binary parser significantly slower (by about 50%). The lowest hanging performance fruit seems to be tracking branch targets in IRBuilder to avoid having to scan for branches when finalizing blocks.
1 parent 098bd4f commit 4d5096f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+3928
-6858
lines changed

src/wasm-binary.h

Lines changed: 9 additions & 181 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
#include "ir/module-utils.h"
3030
#include "parsing.h"
3131
#include "wasm-builder.h"
32+
#include "wasm-ir-builder.h"
3233
#include "wasm-traversal.h"
3334
#include "wasm-validator.h"
3435
#include "wasm.h"
@@ -1543,10 +1544,6 @@ class WasmBinaryReader {
15431544
Signature getSignatureByTypeIndex(Index index);
15441545
Signature getSignatureByFunctionIndex(Index index);
15451546

1546-
size_t nextLabel;
1547-
1548-
Name getNextLabel();
1549-
15501547
// We read functions, globals, etc. before we know their final names, so we
15511548
// need to backpatch the names later. Map the original names to their indices
15521549
// so we can find the final names based on index.
@@ -1559,81 +1556,27 @@ class WasmBinaryReader {
15591556
std::unordered_map<Name, Index> elemIndices;
15601557

15611558
Function* currFunction = nullptr;
1562-
// before we see a function (like global init expressions), there is no end of
1563-
// function to check
1564-
Index endOfFunction = -1;
15651559

15661560
std::map<Index, Name> elemTables;
15671561

1568-
// Throws a parsing error if we are not in a function context
1569-
void requireFunctionContext(const char* error);
1570-
15711562
void readFunctions();
15721563
void readVars();
15731564

1565+
[[nodiscard]] Result<> readInst();
1566+
15741567
std::map<Export*, Index> exportIndices;
15751568
std::vector<std::unique_ptr<Export>> exportOrder;
15761569
void readExports();
15771570

15781571
// The strings in the strings section (which are referred to by StringConst).
15791572
std::vector<Name> strings;
15801573
void readStrings();
1574+
Name getIndexedString();
15811575

15821576
Expression* readExpression();
15831577
void readGlobals();
15841578

1585-
struct BreakTarget {
1586-
Name name;
1587-
Type type;
1588-
BreakTarget(Name name, Type type) : name(name), type(type) {}
1589-
};
1590-
std::vector<BreakTarget> breakStack;
1591-
// the names that breaks target. this lets us know if a block has breaks to it
1592-
// or not.
1593-
std::unordered_set<Name> breakTargetNames;
1594-
// the names that delegates target.
1595-
std::unordered_set<Name> exceptionTargetNames;
1596-
1597-
std::vector<Expression*> expressionStack;
1598-
1599-
// Control flow structure parsing: these have not just the normal binary
1600-
// data for an instruction, but also some bytes later on like "end" or "else".
1601-
// We must be aware of the connection between those things, for debug info.
1602-
std::vector<Expression*> controlFlowStack;
1603-
1604-
// Called when we parse the beginning of a control flow structure.
1605-
void startControlFlow(Expression* curr);
1606-
1607-
// set when we know code is unreachable in the sense of the wasm spec: we are
1608-
// in a block and after an unreachable element. this helps parse stacky wasm
1609-
// code, which can be unsuitable for our IR when unreachable.
1610-
bool unreachableInTheWasmSense;
1611-
1612-
// set when the current code being processed will not be emitted in the
1613-
// output, which is the case when it is literally unreachable, for example,
1614-
// (block $a
1615-
// (unreachable)
1616-
// (block $b
1617-
// ;; code here is reachable in the wasm sense, even though $b as a whole
1618-
// ;; is not
1619-
// (unreachable)
1620-
// ;; code here is unreachable in the wasm sense
1621-
// )
1622-
// )
1623-
bool willBeIgnored;
1624-
1625-
BinaryConsts::ASTNodes lastSeparator = BinaryConsts::End;
1626-
1627-
// process a block-type scope, until an end or else marker, or the end of the
1628-
// function
1629-
void processExpressions();
1630-
void skipUnreachableCode();
1631-
1632-
void pushExpression(Expression* curr);
1633-
Expression* popExpression();
1634-
Expression* popNonVoidExpression();
1635-
Expression* popTuple(size_t numElems);
1636-
Expression* popTypedExpression(Type type);
1579+
IRBuilder builder;
16371580

16381581
void validateBinary(); // validations that cannot be performed on the Module
16391582
void processNames();
@@ -1661,127 +1604,12 @@ class WasmBinaryReader {
16611604
void readNextDebugLocation();
16621605
void readSourceMapHeader();
16631606

1664-
// AST reading
1665-
int depth = 0; // only for debugging
1666-
1667-
BinaryConsts::ASTNodes readExpression(Expression*& curr);
1668-
void pushBlockElements(Block* curr, Type type, size_t start);
1669-
void visitBlock(Block* curr);
1670-
1671-
// Gets a block of expressions. If it's just one, return that singleton.
1672-
Expression* getBlockOrSingleton(Type type);
1673-
1674-
BreakTarget getBreakTarget(int32_t offset);
1675-
Name getExceptionTargetName(int32_t offset);
1676-
16771607
Index readMemoryAccess(Address& alignment, Address& offset);
1608+
std::tuple<Name, Address, Address> getMemarg();
16781609

1679-
void visitIf(If* curr);
1680-
void visitLoop(Loop* curr);
1681-
void visitBreak(Break* curr, uint8_t code);
1682-
void visitSwitch(Switch* curr);
1683-
void visitCall(Call* curr);
1684-
void visitCallIndirect(CallIndirect* curr);
1685-
void visitLocalGet(LocalGet* curr);
1686-
void visitLocalSet(LocalSet* curr, uint8_t code);
1687-
void visitGlobalGet(GlobalGet* curr);
1688-
void visitGlobalSet(GlobalSet* curr);
1689-
bool maybeVisitLoad(Expression*& out,
1690-
uint8_t code,
1691-
std::optional<BinaryConsts::ASTNodes> prefix);
1692-
bool maybeVisitStore(Expression*& out,
1693-
uint8_t code,
1694-
std::optional<BinaryConsts::ASTNodes> prefix);
1695-
bool maybeVisitNontrappingTrunc(Expression*& out, uint32_t code);
1696-
bool maybeVisitAtomicRMW(Expression*& out, uint8_t code);
1697-
bool maybeVisitAtomicCmpxchg(Expression*& out, uint8_t code);
1698-
bool maybeVisitAtomicWait(Expression*& out, uint8_t code);
1699-
bool maybeVisitAtomicNotify(Expression*& out, uint8_t code);
1700-
bool maybeVisitAtomicFence(Expression*& out, uint8_t code);
1701-
bool maybeVisitConst(Expression*& out, uint8_t code);
1702-
bool maybeVisitUnary(Expression*& out, uint8_t code);
1703-
bool maybeVisitBinary(Expression*& out, uint8_t code);
1704-
bool maybeVisitTruncSat(Expression*& out, uint32_t code);
1705-
bool maybeVisitSIMDBinary(Expression*& out, uint32_t code);
1706-
bool maybeVisitSIMDUnary(Expression*& out, uint32_t code);
1707-
bool maybeVisitSIMDConst(Expression*& out, uint32_t code);
1708-
bool maybeVisitSIMDStore(Expression*& out, uint32_t code);
1709-
bool maybeVisitSIMDExtract(Expression*& out, uint32_t code);
1710-
bool maybeVisitSIMDReplace(Expression*& out, uint32_t code);
1711-
bool maybeVisitSIMDShuffle(Expression*& out, uint32_t code);
1712-
bool maybeVisitSIMDTernary(Expression*& out, uint32_t code);
1713-
bool maybeVisitSIMDShift(Expression*& out, uint32_t code);
1714-
bool maybeVisitSIMDLoad(Expression*& out, uint32_t code);
1715-
bool maybeVisitSIMDLoadStoreLane(Expression*& out, uint32_t code);
1716-
bool maybeVisitMemoryInit(Expression*& out, uint32_t code);
1717-
bool maybeVisitDataDrop(Expression*& out, uint32_t code);
1718-
bool maybeVisitMemoryCopy(Expression*& out, uint32_t code);
1719-
bool maybeVisitMemoryFill(Expression*& out, uint32_t code);
1720-
bool maybeVisitTableSize(Expression*& out, uint32_t code);
1721-
bool maybeVisitTableGrow(Expression*& out, uint32_t code);
1722-
bool maybeVisitTableFill(Expression*& out, uint32_t code);
1723-
bool maybeVisitTableCopy(Expression*& out, uint32_t code);
1724-
bool maybeVisitTableInit(Expression*& out, uint32_t code);
1725-
bool maybeVisitRefI31(Expression*& out, uint32_t code);
1726-
bool maybeVisitI31Get(Expression*& out, uint32_t code);
1727-
bool maybeVisitRefTest(Expression*& out, uint32_t code);
1728-
bool maybeVisitRefCast(Expression*& out, uint32_t code);
1729-
bool maybeVisitBrOn(Expression*& out, uint32_t code);
1730-
bool maybeVisitStructNew(Expression*& out, uint32_t code);
1731-
bool maybeVisitStructGet(Expression*& out, uint32_t code);
1732-
bool maybeVisitStructSet(Expression*& out, uint32_t code);
1733-
bool maybeVisitArrayNewData(Expression*& out, uint32_t code);
1734-
bool maybeVisitArrayNewElem(Expression*& out, uint32_t code);
1735-
bool maybeVisitArrayNewFixed(Expression*& out, uint32_t code);
1736-
bool maybeVisitArrayGet(Expression*& out, uint32_t code);
1737-
bool maybeVisitArraySet(Expression*& out, uint32_t code);
1738-
bool maybeVisitArrayLen(Expression*& out, uint32_t code);
1739-
bool maybeVisitArrayCopy(Expression*& out, uint32_t code);
1740-
bool maybeVisitArrayFill(Expression*& out, uint32_t code);
1741-
bool maybeVisitArrayInit(Expression*& out, uint32_t code);
1742-
bool maybeVisitStringNew(Expression*& out, uint32_t code);
1743-
bool maybeVisitStringAsWTF16(Expression*& out, uint32_t code);
1744-
bool maybeVisitStringConst(Expression*& out, uint32_t code);
1745-
bool maybeVisitStringMeasure(Expression*& out, uint32_t code);
1746-
bool maybeVisitStringEncode(Expression*& out, uint32_t code);
1747-
bool maybeVisitStringConcat(Expression*& out, uint32_t code);
1748-
bool maybeVisitStringEq(Expression*& out, uint32_t code);
1749-
bool maybeVisitStringWTF16Get(Expression*& out, uint32_t code);
1750-
bool maybeVisitStringSliceWTF(Expression*& out, uint32_t code);
1751-
void visitSelect(Select* curr, uint8_t code);
1752-
void visitReturn(Return* curr);
1753-
void visitMemorySize(MemorySize* curr);
1754-
void visitMemoryGrow(MemoryGrow* curr);
1755-
void visitNop(Nop* curr);
1756-
void visitUnreachable(Unreachable* curr);
1757-
void visitDrop(Drop* curr);
1758-
void visitRefNull(RefNull* curr);
1759-
void visitRefIsNull(RefIsNull* curr);
1760-
void visitRefFunc(RefFunc* curr);
1761-
void visitRefEq(RefEq* curr);
1762-
void visitTableGet(TableGet* curr);
1763-
void visitTableSet(TableSet* curr);
1764-
void visitTryOrTryInBlock(Expression*& out);
1765-
void visitTryTable(TryTable* curr);
1766-
void visitThrow(Throw* curr);
1767-
void visitRethrow(Rethrow* curr);
1768-
void visitThrowRef(ThrowRef* curr);
1769-
void visitCallRef(CallRef* curr);
1770-
void visitRefAsCast(RefCast* curr, uint32_t code);
1771-
void visitRefAs(RefAs* curr, uint8_t code);
1772-
void visitContNew(ContNew* curr);
1773-
void visitContBind(ContBind* curr);
1774-
void visitResume(Resume* curr);
1775-
void visitSuspend(Suspend* curr);
1776-
1777-
[[noreturn]] void throwError(std::string text);
1778-
1779-
// Struct/Array instructions have an unnecessary heap type that is just for
1780-
// validation (except for the case of unreachability, but that's not a problem
1781-
// anyhow, we can ignore it there). That is, we also have a reference typed
1782-
// child from which we can infer the type anyhow, and we just need to check
1783-
// that type is the same.
1784-
void validateHeapTypeUsingChild(Expression* child, HeapType heapType);
1610+
[[noreturn]] void throwError(std::string text) {
1611+
throw ParseException(text, 0, pos);
1612+
}
17851613

17861614
private:
17871615
bool hasDWARFSections();

src/wasm-ir-builder.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,10 @@ class IRBuilder : public UnifiedExpressionVisitor<IRBuilder, Result<>> {
4848
// of instructions after this is called.
4949
[[nodiscard]] Result<Expression*> build();
5050

51+
// If the IRBuilder is empty, then it's ready to parse a new self-contained
52+
// sequence of instructions.
53+
[[nodiscard]] bool empty() { return scopeStack.empty(); }
54+
5155
// Call visit() on an existing Expression with its non-child fields
5256
// initialized to initialize the child fields and refinalize it.
5357
[[nodiscard]] Result<> visit(Expression*);

0 commit comments

Comments
 (0)