Skip to content

Transfrom languages to use the base property#3991

Merged
DmitrySharabin merged 67 commits into
v2from
transform-languages
Jul 25, 2025
Merged

Transfrom languages to use the base property#3991
DmitrySharabin merged 67 commits into
v2from
transform-languages

Conversation

@DmitrySharabin
Copy link
Copy Markdown
Member

@DmitrySharabin DmitrySharabin commented Jul 21, 2025

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jul 21, 2025

No JS Changes

Generated by 🚫 dangerJS against eb97fb9

@DmitrySharabin DmitrySharabin marked this pull request as ready for review July 22, 2025 12:20
@DmitrySharabin DmitrySharabin requested a review from LeaVerou July 22, 2025 12:42
Comment thread src/languages/actionscript.ts Outdated
delete actionscript['parameter'];
delete actionscript['literal-property'];
delete base!['parameter'];
delete base!['literal-property'];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this delete these from the actual base language, rather than this one?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we somehow tell TS that if the language has a base property, then base is not null so we don't have to use base! everywhere?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this delete these from the actual base language, rather than this one?

Nope, we pass a clone of the base language as the value of base. See line 173 in #3990 (the base PR):

const baseGrammar = base && cloneGrammar(required(base.id), base.id);

Copy link
Copy Markdown
Member Author

@DmitrySharabin DmitrySharabin Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we somehow tell TS that if the language has a base property, then base is not null so we don't have to use base! everywhere?

It bothers me as well. I'll take a closer look.

@LeaVerou
Copy link
Copy Markdown
Member

Thanks. Let's see what your TS research brings up.
Some other questions:

  • Are there any languages where this type of rewrite produced worse code?
  • Are there any languages that you couldn't transform to use base even though conceptually they do use a base language?

@DmitrySharabin DmitrySharabin force-pushed the transform-languages branch 3 times, most recently from 04a6940 to 51f75ea Compare July 23, 2025 18:12
@DmitrySharabin
Copy link
Copy Markdown
Member Author

DmitrySharabin commented Jul 23, 2025

Let's see what your TS research brings up.

Okay, I seem to have found a way to distinguish (on the level of TS) between cases where a language definition has a base property and those where it doesn't. 🎉 See #3990.

Are there any languages where this type of rewrite produced worse code?

Can't think of any. But there are a couple of languages where the grammar function returns an empty object (e.g., cilkc, cilkcpp, etc.), which looks a bit weird. However, before, those languages extended the base language with nothing (an empty object). So it's just the same bizarre thing, but in a different place. 😅

@DmitrySharabin
Copy link
Copy Markdown
Member Author

DmitrySharabin commented Jul 23, 2025

Are there any languages that you couldn't transform to use base even though conceptually they do use a base language?

There are some. For example, coffeescript. The transformation is rather straightforward, but one group of tests (inline-javascript_feature) still fails. And I don't understand (yet) why. 🤷‍♂️

In cpp, we use the language we are (partially) defined (by extending the base language and modifying it on the way):

const baseInside = { ...cpp };
I don't know how to address that case either.

In tt2, the language has only one rule—tt2 but inside, it's the extended version of the base language. Techicaly, we shouldn't extend the value returned from the grammar() function of this language and use it as-is. I'd say, this language definition is an edge case for the concept of a base language.

@DmitrySharabin
Copy link
Copy Markdown
Member Author

For example, coffeescript. The transformation is rather straightforward, but one group of tests (inline-javascript_feature) still fails. And I don't understand (yet) why. 🤷‍♂️

Some observations. The coffescript grammar defines an inline-javascript token. I compared how this token looks in the resolved grammar in the source and transformed languages. And it turned out that now (not transformed), coffeescript['inline-javascript'].script.inside looks exactly the same as it's defined, i.e.,
inside: 'javascript'.

However, after I transformed the language to use the base property, the same part of the resolved grammar looks different. Its inside property's value is an object—(resolved) grammar of the javascript language, and, for some reason, it acts differently when it is being tested. It seems to me that's the reason why the inline-javascript_feature fails.

@LeaVerou
Copy link
Copy Markdown
Member

LeaVerou commented Jul 25, 2025

Let's see what your TS research brings up.

Okay, I seem to have found a way to distinguish (on the level of TS) between cases where a language definition has a base property and those where it doesn't. 🎉 See #3990.

Yay! Though this complexity makes me wonder once more whether we should consider switching back to JS + JSDoc. We keep spending time jumping through TS hoops. I’m also worried how the TS rewrite will affect performance, since it's a fairly significant abstraction.
Do you have a sense of how much work it would take? I wonder if in the long term it could save us time. 🤔

Are there any languages where this type of rewrite produced worse code?

Can't think of any. But there are a couple of languages where the grammar function returns an empty object (e.g., cilkc, cilkcpp, etc.), which looks a bit weird. However, before, those languages extended the base language with nothing (an empty object). So it's just the same bizarre thing, but in a different place. 😅

They shouldn't have to. Returning undefined (what happens when there is no return) should just work.

@LeaVerou
Copy link
Copy Markdown
Member

For example, coffeescript. The transformation is rather straightforward, but one group of tests (inline-javascript_feature) still fails. And I don't understand (yet) why. 🤷‍♂️

Some observations. The coffescript grammar defines an inline-javascript token. I compared how this token looks in the resolved grammar in the source and transformed languages. And it turned out that now (not transformed), coffeescript['inline-javascript'].script.inside looks exactly the same as it's defined, i.e., inside: 'javascript'.

However, after I transformed the language to use the base property, the same part of the resolved grammar looks different. Its inside property's value is an object—(resolved) grammar of the javascript language, and, for some reason, it acts differently when it is being tested. It seems to me that's the reason why the inline-javascript_feature fails.

I wonder if we end up accidentally modifying an existing object when we think we're working with a clone.

@DmitrySharabin
Copy link
Copy Markdown
Member Author

For example, coffeescript. The transformation is rather straightforward, but one group of tests (inline-javascript_feature) still fails. And I don't understand (yet) why. 🤷‍♂️

Some observations. The coffescript grammar defines an inline-javascript token. I compared how this token looks in the resolved grammar in the source and transformed languages. And it turned out that now (not transformed), coffeescript['inline-javascript'].script.inside looks exactly the same as it's defined, i.e., inside: 'javascript'.
However, after I transformed the language to use the base property, the same part of the resolved grammar looks different. Its inside property's value is an object—(resolved) grammar of the javascript language, and, for some reason, it acts differently when it is being tested. It seems to me that's the reason why the inline-javascript_feature fails.

I wonder if we end up accidentally modifying an existing object when we think we're working with a clone.

I also think we might. However, I can't pinpoint exactly where it happens—something to explore in the next iteration.

Base automatically changed from base-property to v2 July 25, 2025 15:11
@netlify
Copy link
Copy Markdown

netlify Bot commented Jul 25, 2025

Deploy Preview for dev-prismjs-com ready!

Name Link
🔨 Latest commit eb97fb9
🔍 Latest deploy log https://app.netlify.com/projects/dev-prismjs-com/deploys/6883a3bb1a5b200008d5694d
😎 Deploy Preview https://deploy-preview-3991--dev-prismjs-com.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@DmitrySharabin DmitrySharabin merged commit 328f579 into v2 Jul 25, 2025
14 of 15 checks passed
@DmitrySharabin DmitrySharabin deleted the transform-languages branch July 25, 2025 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants