Skip to content

induhiu/NLP-Keyword-Extraction-and-Analysis---Forkaia-Talent-Showcase

Repository files navigation

Talent Snapshot: Keyword Extraction and Analysis

Description

My first attempt at keyword extraction and analysis using Spacy, pandas and numpy libraries and Xu Liang's implementation of TextRank! which was a great help.

Dataset used

Dataset containing FORKAIA intern responses to their application questions which could not be made publicly available.

Main procedure

I split the project into two parts - extraction and analysis. The extraction part code can be found in "keyword_extraction.py" and the collected keywords in "Keywords.xlsx." The main idea in the extraction was to use the textrank algorithm to remove stopwords and identify keywords that could be later compared to roles of interest during the analysis. This procedure was used in all sections but the education section. For education keywords, I collected data on the different types of degrees and checked for keywords related to the collected data.

The analysis part code can be found in "keyword_analysis.py" and results in "Snapshot Figures and Analysis.xlsx." For analysis, I made use of Spacy's most_similar function while using the large model to compare the extracted keywords to a certain role of interest and gauge their similarity. If similar, the person was counted as holding that role within FORKAIA.

Results

Below are the results I obtained:

About

Keyword extraction and analysis of Forkaia talents' dataset using Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages