Database

Our project will contain the following C++ classes:

B Plus Tree class - Supporting necessary B Plus tree operations
- In the leaf node, each word will be map to a FilePointer object ( index file name, offset, numOfDupKey ).
  - Essentially, this is the <k, rid-listapproach and thus there will be no duplicate keys in the B Plus Tree
- Depending on the insertion of keywords, overflow page might be created in the index file, but these overlow pages is transparent to th e B Plus Tree.
TreeNode class
- A small class representing nodes in the BPlus Tree
- A node can be either internal node or leaf node depending on the isLeaf field
FileOperation class - Wrapper class for various C style file operations
- Might be merged to other classes in actual imlementation.
FilePointer class - A small class containing index info for a target word
- Will be used in the leaf node of the B Plus tree
Console class
- A while loop console reads in and parse commands as well as calling appropriate methods.
Index class - Performs the inital index building procedure on a large text file
- Mostly static methods
Database class - The main class of the project, containing instances of the B Plus Tree class, Console class.
- Rollback and Concurrency will be implemented here.
  - Current proposal is to use timestamp to implement rollback and concurrency.
- After deleting a keyword, an index page might contain empty records. We will NOT remove this page since removing it will destory the index structre.

requirements

Index File

First need to create an application to generate the index file for this data. You should implement support for the following commands:
- build index [text file] [index file] -p [page size]
  - text file: name of all document to be indexed. 1 document per line.
  - index file: name of the index file to be stored on disk
  - page size: name page or block size, default is 8096 bytes.

Your index must store the mapping between a word and the document name(s) that the given word maps to.

Supported Features

load [index file]

Loads a B+ index.

merge [index file]

Merge the current index file with the second index file, and update the index file on the disk.

quit

Close the console.

insert [document name]

Insert the word:document name pair into the index.

delete [document name]

Removes a document with the given name from the index.

count [keyword]

Counts a keyword by printing the number of documents that contain this keyword.

search [keyword1, keyword2, ...]

Search multiple key words. Prints the name(s) of documents that contain all of these keywords.

printpath [keyword]

Prints the search path (the id of pages that are accessed when we search the keyword in the B+ tree).

page [i]

Displays the contents of the ith page.

range [keyword1, keyword2]

Range query. Print all of the keywords between keyword1 and keyword2 where keyword1 < keyword2.

In addition to any output, we also print

    1. the time it takes to perform the operation,
    2. number of pages accessed, including pages in buffer,
    3. number of pages read from or written to disk.

ShaoyiZhang/Database

Database

requirements

Index File

Supported Features