Custom analysis for packages which are tools #3657

mit-mit · 2020-05-26T11:35:11Z

Some of our analysis guidelines make no sense for packages which are tools (e.g. stagehand), for example here's an issue with a tool loosing points over not having an example file: dart-archive/stagehand#638 (comment)

isoos · 2020-05-26T17:31:50Z

I think package:stagehand readme's "Usage" section does contain content that is similar to stereotypical package's example tab: https://pub.dev/packages/stagehand#usage
Similarly, they have an Installing section that is a subset of our Installing tab.

Maybe we should do a top-level analysis of the readme, and if the section has a recognized title (e.g. installing, setup, usage, example use) then we can use that and do not require the separate file?

ramyak-mehra · 2021-03-14T18:31:30Z

I think package:stagehand readme's "Usage" section does contain content that is similar to stereotypical package's example tab: https://pub.dev/packages/stagehand#usage
Similarly, they have an Installing section that is a subset of our Installing tab.

Maybe we should do a top-level analysis of the readme, and if the section has a recognized title (e.g. installing, setup, usage, example use) then we can use that and do not require the separate file?

Hey, I would like to work on this. Could you point me in a direction from where I could get started?

isoos · 2021-03-15T09:45:52Z

Hey, I would like to work on this. Could you point me in a direction from where I could get started?

@ramyak-mehra: we are using package:markdown to parse the .md file content. It has an AST, and it would be nice to process that AST to extract the hierarchical section structure of a document. From that on we could do not only table-of-contents but also this Usage extraction.

ramyak-mehra · 2021-03-15T22:57:13Z

@isoos So far I have come up with something like this (not refined)

List<String> _recognizedTitles = ['installing', 'setup', 'usage', 'example'];
var document = Document(); 
var markdown = '';
var lines = markdown.replaceAll('\r\n', '\n').split('\n');  
var htmlLines = HtmlRenderer().render(document.parseLines(lines)).split('\n');
_extract(htmlLines);


bool _extract(List<String> htmlLines) {
  htmlLines.forEach((element) {
    if (_checkIfTitle(element)) {
      return true;
    }
  });
  return false;
}

bool _checkIfTitle(String content) {
  _recognizedTitles.forEach((element) {
    if (content.contains(element)) {
      return true;
    }
  });
  return false;
}

We can use this here
We should also check if the title is a heading or not probably using regex

isoos · 2021-03-16T09:32:50Z

@ramyak-mehra: Code like this may be good for a large number of text content, but in general we try to recognise the structure from the parsed syntax tree. One example of such processing is the current changelog updater code:
https://github.com/dart-lang/pub-dev/blob/master/app/lib/shared/markdown.dart#L322-L358

We would like to see a generic processing similar to that, which would extract the hierarchical structure of the markdown (in typed classes), and then decide the content extraction based on that structure.

ramyak-mehra · 2021-03-16T09:42:36Z

@isoos If I am understanding it correctly we should have some kind of iterable or list in the hierarchical order of the markdown which has elements in typed classes such as different classes for heading, paragraph, etc and from that, we can make the decision?

isoos · 2021-03-16T14:25:48Z

@ramyak-mehra: I'm thinking more in a tree, like:

class Section {
  final int level;
  final markdown.Node titleNode;
  final List<markdown.Node> contentNodes;
  List<Section> children;
}

Maybe further methods to extract the text content of titleNode and also to format contentNodes + optionally children to HTML.

ramyak-mehra · 2021-03-18T13:37:06Z

@isoos I was doing something like this github gist . Probably not the best approach and I found this node visitor
but I was not sure if its the right way to go, I explored it a bit but was unable to fully understand it.

isoos · 2021-03-18T19:17:05Z

@ramyak-mehra: as a quick look, I think this code is very early stage, and possible won't handle use case like this:

## section-2

Content of section-2.

#### section-4

Content of section-4.

Which should result in the structure of:

Section(level: 2, titleNode: <... /*section-2*/ ...>, contentNodes: <...>, children: [
  Section(level: 4, titleNode: <... /*section-4*/ ...>, contentNodes: <...>),
]);

As you can see, the level is not the level of the tree node, rather the level of the section title (eg. h2 in html will be level: 2. Also the sections should contain their logical content embedded...

ramyak-mehra · 2021-03-18T19:22:09Z

@ramyak-mehra: as a quick look, I think this code is very early stage, and possible won't handle use case like this:
## section-2

Content of section-2.

#### section-4

Content of section-4.
Which should result in the structure of:
Section(level: 2, titleNode: <... /*section-2*/ ...>, contentNodes: <...>, children: [
  Section(level: 4, titleNode: <... /*section-4*/ ...>, contentNodes: <...>),
]);
As you can see, the level is not the level of the tree node, rather the level of the section title (eg. h2 in html will be level: 2. Also the sections should contain their logical content embedded...

It was just a starting point for me to move forward. I have one doubt for h1 section of multiple h2s are children or h2 , h3 ,h4 ... h6 are children

ramyak-mehra · 2021-04-25T00:32:43Z

@isoos wrote this script to make sections from a parsed markdown gist
This breaks when content is found before any heading. How to handle that case. Also, what would be the next steps?
Analise titleNodes on specific keywords. What would be the keywords?

sigurdm · 2024-11-14T13:43:28Z

It will be hard to detect this reliably automatically. We probably want people to declare this in the pubspec somehow.

Maybe something like

package_kind: library|dev|global

If we know a package is a "dev" tool, the install-page could suggest installing as a dev-dependency if it is a "global" we can show how to global-install it.

To aid discovery we could do a publish validation (or pana scoring), that suggests declaring these if your bin/ directory is non-empty.

Can a package be both a library and a tool ? Probably.

jonasfj added the P2 medium label Jun 2, 2020

jonasfj added this to the Backlog milestone Jun 2, 2020

Levi-Lesches mentioned this issue Dec 2, 2024

First pass at a Table of Contents generator from markdown #8348

Draft

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom analysis for packages which are tools #3657

Custom analysis for packages which are tools #3657

mit-mit commented May 26, 2020

isoos commented May 26, 2020

ramyak-mehra commented Mar 14, 2021

isoos commented Mar 15, 2021

ramyak-mehra commented Mar 15, 2021

isoos commented Mar 16, 2021

ramyak-mehra commented Mar 16, 2021

isoos commented Mar 16, 2021

ramyak-mehra commented Mar 18, 2021

isoos commented Mar 18, 2021

ramyak-mehra commented Mar 18, 2021

ramyak-mehra commented Apr 25, 2021

sigurdm commented Nov 14, 2024

Custom analysis for packages which are tools #3657

Custom analysis for packages which are tools #3657

Comments

mit-mit commented May 26, 2020

isoos commented May 26, 2020

ramyak-mehra commented Mar 14, 2021

isoos commented Mar 15, 2021

ramyak-mehra commented Mar 15, 2021

isoos commented Mar 16, 2021

ramyak-mehra commented Mar 16, 2021

isoos commented Mar 16, 2021

ramyak-mehra commented Mar 18, 2021

isoos commented Mar 18, 2021

ramyak-mehra commented Mar 18, 2021

ramyak-mehra commented Apr 25, 2021

sigurdm commented Nov 14, 2024