This is a tutorial that help you practice MySQL syntax and interactive with nodejs.
In this repos, we will use WordNet as our source files. And create tables and read them from disk to database.
./dictstore all source files which were downloaded from WordNet.- however, some files are very big, if you want to get a priview of them, you can checkout
./preview. This will store top 20 lines of source files. ./nodejsstore scripts written by nodejs which can help you read those files to mysql.
- You should have MySQL installed and understand basic syntax.
- Familiar with nodejs.
at first, we need get a overview about wordnet and our database.
Of course, we need create database in MySQL to store our tables. Open you MySQL shell, and execute follow commands.
CREATE DATABASE wordnet;
USE wordnet;In English, each word may have many deformation.
Such as farther -> far further -> far.
In adj.exc verb.exc adv.exc and noun.exc, files store those relationship, you can checkout those preview. So we need to create a table to store those information.
So what column should we have?
Well, we need a deformation field and a normal filed to save those. And then , you can add an extra filed to indicate it is a adjective or verb. But I personally think it is not necessary.
So we can do like this:
CREATE TABLE IF NOT EXISTS deformation(
deformation VARCHAR(30) NOT NULL,
normal VARCHAR(30) NOT NULL,
UNIQUE KEY (deformation, normal));
Yeah, we just create our table. If we want to insert some data to this table we may use:
INSERT INTO deformation(deformation, normal)
VALUES ('artier', 'arty');Now , you can checkout ./nodejs/README.md#deformation to implement this will nodejs.
index.noun index.adj index.verb and index.adv give each word an ID, each line of those files look like this:
frisch n 3 1 @ 3 0 10983172 10983007 10982870
frischtell us which word it isnindicate is a noun. inindex.noun, all of them isn; inindex.adv, all of them isr.3indicates it has three means; index of then are last three numbers.1 @ 3 0is useless.10983172 10983007 10982870is the indexes of this word in data.noun or data.*.
So, in this case, we may need three fields:
- word: store out word, such as
frisch - class: indicates it is a verb or adjective
- means_index: such as
10983172 10983007 10982870. Because MySQL can not store array, so we need combine them as a string.
CREATE TABLE IF NOT EXISTS word_index (
word VARCHAR(300) NOT NULL,
class CHAR(1) NOT NULL,
means_index VARCHAR(3000) NOT NULL,
UNIQUE KEY (word, class));I set max size of means_index to 3000 because there is a complex word which is break, It has many means.
INSERT INTO word_index(word, class, means_index)
VALUES ('frisch', 'n', '10983172 10983007 10982870');Now this is our most valueful part, the real means of any word. There are stored at data.noun data.verb ...
this is a random line of source file.
00001740 03 n 01 entity 0 003 ~ 00001930 n 0000 ~ 00002137 n 0000 ~ 04424418 n 0000 | that which is perceived or known or inferred to have its own distinct existence (living or nonliving)
00001740is id of a wordentityis word itself- and there are many complex field, we don't care
- after symbol
|, is means of those verb, so we may write sql like this.
CREATE TABLE IF NOT EXISTS word (
word VARCHAR(300) NOT NULL,
class CHAR(1) NOT NULL,
means VARCHAR(6000) NOT NULL);
INSERT INTO word(word, class, means_index)
VALUES ('entity', 'n', 'that which is perceived or known or inferred to have its own distinct existence (living or nonliving)');Ok, we just analysis the table fields and some SQL statement. Now you can checkout ./nodejs, let's implement them!!!