Skip to content

Update gather functionality to support 0.4.1 of python-program-analysis #8219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 46 commits into from
Nov 5, 2019
Merged
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
704135a
Main changes for update
greazer Oct 25, 2019
d803bdb
Gather working (using 0.4.1 of python analysis)
greazer Oct 25, 2019
ed0ebd2
Prep for PR
greazer Oct 25, 2019
dfc4319
Implement gather for native editor (first pass)
greazer Oct 26, 2019
a807f70
Merge remote-tracking branch 'origin/master' into dev/jimgries/update…
greazer Oct 26, 2019
6b4e1ea
Fix cell rangification
greazer Oct 26, 2019
b827c1a
Cleanup
greazer Oct 26, 2019
30f1fcc
Fix error
greazer Oct 27, 2019
338361f
Support knowng cell has been exec'd
greazer Oct 28, 2019
af781c1
Bug fix and show gathered notebook
greazer Oct 28, 2019
4e2d305
Make strings localizable
greazer Oct 28, 2019
4a01d0a
executedInCurrentKernal => executeKernalId
greazer Oct 29, 2019
7b13a51
Add ConnectedToNotebook
greazer Oct 29, 2019
839a75d
Simplify logic to make use of cellVM
greazer Oct 29, 2019
0e1b8fc
Attempt to fix multi-nb gather (failed)
greazer Oct 30, 2019
a0094e5
Support separate notebook gathers
greazer Oct 31, 2019
03df1b0
Merge master
greazer Oct 31, 2019
b58fe0e
PR adjustments
greazer Oct 31, 2019
c96acd6
More PR updates
greazer Oct 31, 2019
643e17c
More PR updates
greazer Oct 31, 2019
6a3ca5a
cleanup
greazer Oct 31, 2019
5039dee
Yet another PR update
greazer Oct 31, 2019
17001d6
Fixups
greazer Nov 1, 2019
9282b31
Default gather support to true
greazer Nov 1, 2019
2f0b7a7
Update
greazer Nov 1, 2019
92ac7f9
Gather cleaned up and working
greazer Nov 1, 2019
dbfacb3
String updates
greazer Nov 1, 2019
6cedfde
Update strings, make gatherToScript
greazer Nov 1, 2019
d9696d6
merge master
greazer Nov 1, 2019
b2ee0ca
Merge remote-tracking branch 'origin/master' into dev/jimgries/update…
greazer Nov 1, 2019
d11f679
fix compile errors (why don't I see these?)
greazer Nov 1, 2019
db2600f
Fix executionslicer
greazer Nov 2, 2019
d33ae55
Fix unit test
greazer Nov 2, 2019
ff1a43d
Fix use of gather
greazer Nov 2, 2019
0dc45b2
Fix most of the unit tests
greazer Nov 3, 2019
b380304
Disabling failing tests (for now)
greazer Nov 3, 2019
549d4d1
Fix remaining Unit Tests
greazer Nov 3, 2019
7bc674c
Fix functional tests
greazer Nov 4, 2019
c5eb676
Show file in gathered nb
greazer Nov 4, 2019
19fe23e
Add a note about gather being conservative.
greazer Nov 4, 2019
6d92142
Don't use the ICell.file property to get nb name
greazer Nov 4, 2019
0bfb438
Remove unecessary gather APIs
greazer Nov 4, 2019
361bd23
Put back old algorithm for generating code lenses
rchiodo Nov 4, 2019
27c79c2
Fixup type
rchiodo Nov 4, 2019
21eed9d
Gather doesn't work on markdown
rchiodo Nov 4, 2019
38aea3c
Turn off gather in markdown for interactive too
rchiodo Nov 4, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion build/ci/postInstall.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ var constants_1 = require("../constants");
* The solution is to modify the type definition file after `npm install`.
*/
function fixJupyterLabDTSFiles() {
var relativePath = path.join('node_modules', '@jupyterlab', 'coreutils', 'lib', 'settingregistry.d.ts');
var relativePath = path.join('node_modules', '@jupyterlab', 'services', 'node_modules', '@jupyterlab', 'coreutils', 'lib', 'settingregistry.d.ts');
var filePath = path.join(constants_1.ExtensionRootDir, relativePath);
if (!fs.existsSync(filePath)) {
throw new Error("Type Definition file from JupyterLab not found '" + filePath + "' (pvsc post install script)");
Expand Down
511 changes: 58 additions & 453 deletions package-lock.json

Large diffs are not rendered by default.

11 changes: 9 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -1693,9 +1693,15 @@
"scope": "resource"
},
"python.dataScience.enableGather": {
"type": "boolean",
"default": true,
"description": "Enable code gather for executed cells. For a gathered cell, that cell and only the code it depends on will be exported to a new notebook.",
"scope": "resource"
},
"python.dataScience.gatherToScript": {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems weird. Why not just always open a notebook if coming from a notebook and always open a file if coming from a file? For interactive window users, they'd never find this command and would be put off by us opening a notebook

"type": "boolean",
"default": false,
"description": "Enable code gathering for single cells in the interactive window (experimental).",
"description": "Gather code to a python script rather than a notebook.",
"scope": "resource"
},
"python.dataScience.codeLenses": {
Expand Down Expand Up @@ -2679,7 +2685,8 @@
"dependencies": {
"@blueprintjs/select": "3.10.0",
"@jupyterlab/services": "^3.2.1",
"@msrvida/python-program-analysis": "^0.2.6",
"@jupyterlab/coreutils": "^3.1.0",
"@msrvida/python-program-analysis": "^0.4.1",
"ansi-regex": "^4.1.0",
"arch": "^2.1.0",
"azure-storage": "^2.10.3",
Expand Down
4 changes: 3 additions & 1 deletion package.nls.json
Original file line number Diff line number Diff line change
Expand Up @@ -378,5 +378,7 @@
"DataScience.notebookNotFound" : "python -m jupyter notebook --version is not running",
"DataScience.findJupyterCommandProgress" : "Active interpreter does not support {0}. Searching for the best available interpreter.",
"DataScience.findJupyterCommandProgressCheckInterpreter": "Checking {0}.",
"DataScience.findJupyterCommandProgressSearchCurrentPath": "Searching current path."
"DataScience.findJupyterCommandProgressSearchCurrentPath": "Searching current path.",
"DataScience.gatheredScriptDescription": "# This file contains only the code required to produce the results of the gathered cell.\n",
"DataScience.gatheredNotebookDescriptionInMarkdown": "# Gathered Notebook\nGenerated from ```{0}```\n\nThis notebook contains only the code and cells required to produce the same results as the gathered cell.\n\nPlease note that the python analysis is quite conservative, so if it is unsure whether a line of code is necessary for execution, it will err on the side of including it."
}
1 change: 1 addition & 0 deletions src/client/common/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,7 @@ export interface IDataScienceSettings {
collapseCellInputCodeByDefault: boolean;
maxOutputSize: number;
enableGather?: boolean;
gatherToScript?: boolean;
gatherRules?: IGatherRule[];
sendSelectionToInteractiveWindow: boolean;
markdownRegularExpression: string;
Expand Down
2 changes: 2 additions & 0 deletions src/client/common/utils/localize.ts
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,8 @@ export namespace DataScience {
export const findJupyterCommandProgress = localize('DataScience.findJupyterCommandProgress', 'Active interpreter does not support {0}. Searching for the best available interpreter.');
export const findJupyterCommandProgressCheckInterpreter = localize('DataScience.findJupyterCommandProgressCheckInterpreter', 'Checking {0}.');
export const findJupyterCommandProgressSearchCurrentPath = localize('DataScience.findJupyterCommandProgressSearchCurrentPath', 'Searching current path.');
export const gatheredScriptDescription = localize('DataScience.gatheredScriptDescription', '# This file contains the minimal amount of code required to produce the code cell gathered.\n');
export const gatheredNotebookDescriptionInMarkdown = localize('DataScience.gatheredNotebookDescriptionInMarkdown', '## This notebook was generated for a cell gathered from {0}.');
}

export namespace DebugConfigStrings {
Expand Down
59 changes: 52 additions & 7 deletions src/client/datascience/cellFactory.ts
Original file line number Diff line number Diff line change
Expand Up @@ -107,10 +107,56 @@ export function hasCells(document: TextDocument, settings?: IDataScienceSettings
return false;
}

export function generateCellRanges(document: TextDocument, settings?: IDataScienceSettings): { range: Range; title: string; cell_type: string }[] {
// CellRange is used as the basis for creating new ICells. We only use it in this file.
interface ICellRange {
range: Range;
title: string;
cell_type: string;
}

export function generateCellsFromString(source: string, settings?: IDataScienceSettings): ICell[] {
const lines: string[] = source.splitLines({ trim: false, removeEmptyEntries: false });

// Find all the lines that start a cell
const matcher = new CellMatcher(settings);
const starts: { startLine: number; title: string; code: string; cell_type: string }[] = [];
let currentCode: string | undefined;
for (let index = 0; index < lines.length; index += 1) {
const line = lines[index];
if (matcher.isCell(line)) {
if (starts.length > 0 && currentCode) {
const previousCell = starts[starts.length - 1];
previousCell.code = currentCode;
}
const results = matcher.exec(line);
if (results !== undefined) {
starts.push({
startLine: index + 1,
title: results,
cell_type: matcher.getCellType(line),
code: ''
});
}
currentCode = undefined;
}
currentCode = currentCode ? `${currentCode}\n${line}` : line;
}

if (starts.length >= 1 && currentCode) {
const previousCell = starts[starts.length - 1];
previousCell.code = currentCode;
}

// For each one, get its text and turn it into a cell
return Array.prototype.concat(...starts.map(s => {
return generateCells(settings, s.code, '', s.startLine, false, uuid());
}));
}

export function generateCellRangesFromDocument(document: TextDocument, settings?: IDataScienceSettings): ICellRange[] {
// Implmentation of getCells here based on Don's Jupyter extension work
const matcher = new CellMatcher(settings);
const cells : { range: Range; title: string; cell_type: string }[] = [];
const cells: ICellRange[] = [];
for (let index = 0; index < document.lineCount; index += 1) {
const line = document.lineAt(index);
if (matcher.isCell(line.text)) {
Expand Down Expand Up @@ -140,12 +186,11 @@ export function generateCellRanges(document: TextDocument, settings?: IDataScien
}

export function generateCellsFromDocument(document: TextDocument, settings?: IDataScienceSettings): ICell[] {
// Get our ranges. They'll determine our cells
const ranges = generateCellRanges(document, settings);
const ranges = generateCellRangesFromDocument(document, settings);

// For each one, get its text and turn it into a cell
return Array.prototype.concat(...ranges.map(r => {
const code = document.getText(r.range);
return generateCells(settings, code, document.fileName, r.range.start.line, false, uuid());
return Array.prototype.concat(...ranges.map(cr => {
const code = document.getText(cr.range);
return generateCells(settings, code, '', cr.range.start.line, false, uuid());
}));
}
1 change: 1 addition & 0 deletions src/client/datascience/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ export enum Telemetry {
NotebookOpenCount = 'DATASCIENCE.NATIVE.NOTEBOOK_OPEN_COUNT',
NotebookOpenTime = 'DS_INTERNAL.NATIVE.NOTEBOOK_OPEN_TIME',
SessionIdleTimeout = 'DATASCIENCE.JUPYTER_IDLE_TIMEOUT',
NotebookExecutionActivated = 'DATASCIENCE.NOTEBOOK.EXECUTION.ACTIVATED',
JupyterNotInstalledErrorShown = 'DATASCIENCE.JUPYTER_NOT_INSTALLED_ERROR_SHOWN',
UserInstalledJupyter = 'DATASCIENCE.USER_INSTALLED_JUPYTER',
UserDidNotInstallJupyter = 'DATASCIENCE.USER_DID_NOT_INSTALL_JUPYTER'
Expand Down
4 changes: 2 additions & 2 deletions src/client/datascience/editor-integration/codeLensFactory.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import { IFileSystem } from '../../common/platform/types';
import { IConfigurationService } from '../../common/types';
import * as localize from '../../common/utils/localize';
import { noop } from '../../common/utils/misc';
import { generateCellRanges } from '../cellFactory';
import { generateCellRangesFromDocument } from '../cellFactory';
import { CodeLensCommands, Commands } from '../constants';
import { InteractiveWindowMessages } from '../interactive-common/interactiveWindowTypes';
import { ICell, ICellHashProvider, ICodeLensFactory, IFileHashes, IInteractiveWindowListener } from '../types';
Expand Down Expand Up @@ -63,7 +63,7 @@ export class CodeLensFactory implements ICodeLensFactory, IInteractiveWindowList
}

public createCodeLenses(document: TextDocument): CodeLens[] {
const ranges = generateCellRanges(document, this.configService.getSettings().datascience);
const ranges = generateCellRangesFromDocument(document, this.configService.getSettings().datascience);
const commands = this.enumerateCommands();
const hashes = this.configService.getSettings().datascience.addGotoCodeLenses ? this.hashProvider.getHashes() : [];
const codeLenses: CodeLens[] = [];
Expand Down
4 changes: 2 additions & 2 deletions src/client/datascience/editor-integration/decorator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import { IExtensionSingleActivationService } from '../../activation/types';
import { IDocumentManager } from '../../common/application/types';
import { PYTHON_LANGUAGE } from '../../common/constants';
import { IConfigurationService, IDisposable, IDisposableRegistry } from '../../common/types';
import { generateCellRanges } from '../cellFactory';
import { generateCellRangesFromDocument } from '../cellFactory';

@injectable()
export class Decorator implements IExtensionSingleActivationService, IDisposable {
Expand Down Expand Up @@ -100,7 +100,7 @@ export class Decorator implements IExtensionSingleActivationService, IDisposable
const settings = this.configuration.getSettings().datascience;
if (settings.decorateCells && settings.enabled) {
// Find all of the cells
const cells = generateCellRanges(editor.document, this.configuration.getSettings().datascience);
const cells = generateCellRangesFromDocument(editor.document, this.configuration.getSettings().datascience);

// Find the range for our active cell.
const currentRange = cells.map(c => c.range).filter(r => r.contains(editor.selection.anchor));
Expand Down
68 changes: 20 additions & 48 deletions src/client/datascience/gather/gather.ts
Original file line number Diff line number Diff line change
@@ -1,29 +1,22 @@
import { DataflowAnalyzer } from '@msrvida/python-program-analysis';
import { Cell as ICell, LogCell } from '@msrvida/python-program-analysis/dist/es5/cell';
import { CellSlice } from '@msrvida/python-program-analysis/dist/es5/cellslice';
import { ExecutionLogSlicer } from '@msrvida/python-program-analysis/dist/es5/log-slicer';
import { CellSlice, DataflowAnalyzer, ExecutionLogSlicer } from '@msrvida/python-program-analysis';
import { Cell as IGatherCell } from '@msrvida/python-program-analysis/dist/es5/cell';

import { inject, injectable } from 'inversify';
// tslint:disable-next-line: no-require-imports
import cloneDeep = require('lodash/cloneDeep');
import { IApplicationShell, ICommandManager } from '../../common/application/types';
import { traceInfo } from '../../common/logger';
import { IConfigurationService, IDisposableRegistry } from '../../common/types';
import * as localize from '../../common/utils/localize';
// tslint:disable-next-line: no-duplicate-imports
import { Common } from '../../common/utils/localize';
import { noop } from '../../common/utils/misc';
import { CellMatcher } from '../cellMatcher';
import { concatMultilineStringInput } from '../common';
import { Identifiers } from '../constants';
import { CellState, ICell as IVscCell, IGatherExecution, INotebookExecutionLogger } from '../types';
import { CellState, ICell as IVscCell, IGatherExecution } from '../types';

/**
* An adapter class to wrap the code gathering functionality from [microsoft/python-program-analysis](https://www.npmjs.com/package/@msrvida/python-program-analysis).
*/
@injectable()
export class GatherExecution implements IGatherExecution, INotebookExecutionLogger {
private _executionSlicer: ExecutionLogSlicer;
export class GatherExecution implements IGatherExecution {
private _executionSlicer: ExecutionLogSlicer<IGatherCell>;
private dataflowAnalyzer: DataflowAnalyzer;
private _enabled: boolean;

Expand All @@ -44,52 +37,36 @@ export class GatherExecution implements IGatherExecution, INotebookExecutionLogg

traceInfo('Gathering tools have been activated');
}
public logExecution(vscCell: IVscCell): void {
const gatherCell = convertVscToGatherCell(vscCell);
Copy link
Member

@IanMatthewHuff IanMatthewHuff Nov 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm a bit nervous that we lost our cloneDeep here. Technically convert should not change the vscCell, but I'm worried someone else might tweak it. Was it a perf issue? #ByDesign

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked with Rich, we'll take care of this before calling postExecute, so disregard this.


In reply to: 342290179 [](ancestors = 342290179)


public async preExecute(_vscCell: IVscCell, _silent: boolean): Promise<void> {
// This function is just implemented here for compliance with the INotebookExecutionLogger interface
noop();
if (gatherCell) {
this._executionSlicer.logExecution(gatherCell);
}
}

public async postExecute(vscCell: IVscCell, _silent: boolean): Promise<void> {
if (this._enabled) {
// Don't log if vscCell.data.source is an empty string or if it was
// silently executed. Original Jupyter extension also does this.
if (vscCell.data.source !== '' && !_silent) {
// First make a copy of this cell, as we are going to modify it
const cloneCell: IVscCell = cloneDeep(vscCell);

// Strip first line marker. We can't do this at JupyterServer.executeCodeObservable because it messes up hashing
const cellMatcher = new CellMatcher(this.configService.getSettings().datascience);
cloneCell.data.source = cellMatcher.stripFirstMarker(concatMultilineStringInput(vscCell.data.source));

// Convert IVscCell to IGatherCell
const cell = convertVscToGatherCell(cloneCell) as LogCell;

// Call internal logging method
this._executionSlicer.logExecution(cell);
}
}
public async resetLog(): Promise<void> {
this._executionSlicer.reset();
}

/**
* For a given code cell, returns a string representing a program containing all the code it depends on.
*/
public gatherCode(vscCell: IVscCell): string {
// sliceAllExecutions does a lookup based on executionEventId
const cell = convertVscToGatherCell(vscCell);
if (cell === undefined) {
const gatherCell = convertVscToGatherCell(vscCell);
if (!gatherCell) {
return '';
}

// Get the default cell marker as we need to replace #%% with it.
const defaultCellMarker = this.configService.getSettings().datascience.defaultCellMarker || Identifiers.DefaultCodeCellMarker;

// Call internal slice method
const slices = this._executionSlicer.sliceAllExecutions(cell);
const slices = this._executionSlicer.sliceAllExecutions(gatherCell.persistentId);
const program = slices.length > 0 ? slices[0].cellSlices.reduce(concat, '').replace(/#%%/g, defaultCellMarker) : '';

// Add a comment at the top of the file explaining what gather does
const descriptor = '# This file contains the minimal amount of code required to produce the code cell you gathered.\n';
const descriptor = localize.DataScience.gatheredScriptDescription();
return descriptor.concat(program);
}

Expand Down Expand Up @@ -122,7 +99,7 @@ export class GatherExecution implements IGatherExecution, INotebookExecutionLogg
/**
* Accumulator to concatenate cell slices for a sliced program, preserving cell structures.
*/
function concat(existingText: string, newText: CellSlice) {
function concat(existingText: string, newText: CellSlice): string {
// Include our cell marker so that cell slices are preserved
return `${existingText}#%%\n${newText.textSliceLines}\n\n`;
}
Expand All @@ -131,14 +108,11 @@ function concat(existingText: string, newText: CellSlice) {
* This is called to convert VS Code ICells to Gather ICells for logging.
* @param cell A cell object conforming to the VS Code cell interface
*/
function convertVscToGatherCell(cell: IVscCell): ICell | undefined {
function convertVscToGatherCell(cell: IVscCell): IGatherCell | undefined {
// This should always be true since we only want to log code cells. Putting this here so types match for outputs property
if (cell.data.cell_type === 'code') {
const result: ICell = {
const result: IGatherCell = {
// tslint:disable-next-line no-unnecessary-local-variable
id: cell.id,
gathered: false,
dirty: false,
text: cell.data.source,

// This may need to change for native notebook support since in the original Gather code this refers to the number of times that this same cell was executed
Expand All @@ -147,9 +121,7 @@ function convertVscToGatherCell(cell: IVscCell): ICell | undefined {

// This may need to change for native notebook support, since this is intended to persist in the metadata for a notebook that is saved and then re-loaded
persistentId: cell.id,
outputs: cell.data.outputs,
hasError: cell.state === CellState.error,
is_cell: true
hasError: cell.state === CellState.error
// tslint:disable-next-line: no-any
} as any;
return result;
Expand Down
Loading