/INFO443-Project-2

This project aimed to analyze the component-level architecture of the TypeScript compiler

MIT LicenseMIT

INFO 443 Project 2 -- TypeScript

TypeScript Logo
Figure 1: The Logo of TypeScript

Project Overview

Introduction

About the Codebase

The codebase we examine in this project is TypeScript, a programming language that builds on JavaScript by adding natural syntax for type specifications to data and adding compiler functionality. Developers can use TypeScript to create their own applications, either client- or server-side.

In this project, we will only focus on the compiler portion of the TypeScript codebase.

Fun Facts:

  • TypeScript is a recursive language, so it writes itself.
  • TypeScript is 11 years old, and has been rewritten twice.

Authors/Maintainers

This software was developed by Microsoft (a large company) and is currently maintained by the company's employees (628 contributors). Additionally, contributions are accepted when following Microsoft's Code of Conduct, and a number of people appear to be in charge of approving these commits.

Learn more about the system

Official Github | Official Documentations | Basic Syntax | Compiler Details

Team Members

INFO 443 project authors: Alex Gherman, Henry Bao, Lisi Case, & Patrick Cheng

Development View

Components

Quick overview of the compilation process1.

The process starts with preprocessing. The preprocessor figures out what files should be included in the compilation by following references (`/// ` tags, `require` and `import` statements).

The parser then generates AST Nodes. These are just an abstract representation of the user input in a tree format. A SourceFile object represents an AST for a given file with some additional information like the file name and source text.

The binder then passes over the AST nodes and generates and binds Symbols. One Symbol is created for each named entity. There is a subtle distinction but several declaration nodes can name the same entity. That means that sometimes different Nodes will have the same Symbol, and each Symbol keeps track of its declaration Nodes. For example, a class and a namespace with the same name can merge and will have the same Symbol. The binder also handles scopes and makes sure that each Symbol is created in the correct enclosing scope.

Generating a SourceFile (along with its Symbols) is done through calling the createSourceFile API.

So far, Symbols represent named entities as seen within a single file, but several declarations can merge multiple files, so the next step is to build a global view of all files in the compilation by building a Program.

A Program is a collection of SourceFiles and a set of CompilerOptions. A Program is created by calling the createProgram API.

From a Program instance a TypeChecker can be created. TypeChecker is the core of the TypeScript type system. It is the part responsible for figuring out relationships between Symbols from different files, assigning Types to Symbols, and generating any semantic Diagnostics (i.e. errors).

The first thing a TypeChecker will do is to consolidate all the Symbols from different SourceFiles into a single view, and build a single Symbol Table by "merging" any common Symbols (e.g. namespaces spanning multiple files).

After initializing the original state, the TypeChecker is ready to answer any questions about the program. Such "questions" might be:

  • What is the Symbol for this Node?
  • What is the Type of this Symbol?
  • What Symbols are visible in this portion of the AST?
  • What are the available Signatures for a function declaration?
  • What errors should be reported for a file?

The TypeChecker computes everything lazily; it only "resolves" the necessary information to answer a question. The checker will only examine Nodes/Symbols/Types that contribute to the question at hand and will not attempt to examine additional entities.

An Emitter can also be created from a given Program. The Emitter is responsible for generating the desired output for a given SourceFile; this includes .js, .jsx, .d.ts, and .js.map outputs.

Table of the main components of TypeScript Compiler

Component Role & Relationship Description
Program.ts Core Compiler
  • Initializes and runs the necessary files involved in the compiler
Parser.ts Crates a syntax tree
Depends on: Scanner
  • Makes tokens from scanner into a tree with more meaning
  • Knows when the right JavaScript is in the wrong context
Scanner.ts Internally used and created by Parser
  • Converts text into syntax tokens
  • Knows only if the JavaScript itself is wrong
Checker.ts Checks the syntax tree
Depends on: Binder
  • Provides most of the compiler diagnostics
  • Checks each part of syntax tree's node
Binder.ts Internally used by Checker
  • Turns syntax into symbols
  • Sets up flow containers based on code scope with flow conditionals within
  • Sees a full syntax tree
Emitter.ts Creates files
Depends on: Transformer
  • Takes a syntax tree and turns it into a file/files
  • Prints the tree into .js, .ts, and other file types
Transformer.ts Used by Emitter
  • Takes a syntax tree and transforms it in various ways to match TypeScript configurations

System Organization Diagram

TypeScript UML Structure Diagram
Figure 2: TypeScript UML Structure Diagram

Dependencies

Other Libraries Version
node latest
azure-devops-node-api 11.0.1
browserify 1.11.2
eslint 7.12.1
gulp 4.0.0
typescipt 4.5.5

For full dependencies, see "Details" below:

    "@octokit/rest": "latest",
    "@types/browserify": "latest",
    "@types/chai": "latest",
    "@types/convert-source-map": "latest",
    "@types/glob": "latest",
    "@types/gulp": "^4.0.9",
    "@types/gulp-concat": "latest",
    "@types/gulp-newer": "latest",
    "@types/gulp-rename": "0.0.33",
    "@types/gulp-sourcemaps": "0.0.32",
    "@types/jake": "latest",
    "@types/merge2": "latest",
    "@types/microsoft__typescript-etw": "latest",
    "@types/minimatch": "latest",
    "@types/minimist": "latest",
    "@types/mkdirp": "latest",
    "@types/mocha": "latest",
    "@types/ms": "latest",
    "@types/node": "latest",
    "@types/node-fetch": "^2.3.4",
    "@types/q": "latest",
    "@types/source-map-support": "latest",
    "@types/through2": "latest",
    "@types/xml2js": "^0.4.0",
    "@typescript-eslint/eslint-plugin": "^4.28.0",
    "@typescript-eslint/experimental-utils": "^4.28.0",
    "@typescript-eslint/parser": "^4.28.0",
    "async": "latest",
    "azure-devops-node-api": "^11.0.1",
    "browser-resolve": "^1.11.2",
    "browserify": "latest",
    "chai": "latest",
    "chalk": "^4.1.2",
    "convert-source-map": "latest",
    "del": "5.1.0",
    "diff": "^4.0.2",
    "eslint": "7.12.1",
    "eslint-formatter-autolinkable-stylish": "1.1.4",
    "eslint-plugin-import": "2.22.1",
    "eslint-plugin-jsdoc": "30.7.6",
    "eslint-plugin-no-null": "1.0.2",
    "fancy-log": "latest",
    "fs-extra": "^9.0.0",
    "glob": "latest",
    "gulp": "^4.0.0",
    "gulp-concat": "latest",
    "gulp-insert": "latest",
    "gulp-newer": "latest",
    "gulp-rename": "latest",
    "gulp-sourcemaps": "latest",
    "merge2": "latest",
    "minimist": "latest",
    "mkdirp": "latest",
    "mocha": "latest",
    "mocha-fivemat-progress-reporter": "latest",
    "ms": "^2.1.3",
    "node-fetch": "^2.6.1",
    "plugin-error": "latest",
    "pretty-hrtime": "^1.0.3",
    "prex": "^0.4.3",
    "q": "latest",
    "source-map-support": "latest",
    "through2": "latest",
    "typescript": "^4.5.5",
    "vinyl": "latest",
    "vinyl-sourcemaps-apply": "latest",
    "xml2js": "^0.4.19"
    

Components Dependencies
program.ts types.ts, parser.ts, scanner.ts, checker.ts, emitter.ts, transformer.ts
parser.ts types.ts, scanner.ts
scanner.ts types.ts
checker.ts types.ts, binder.ts
binder.ts types.ts
emitter.ts types.ts, transformer.ts
transformer.ts types.ts

Source Code Structure (Codeline Model)

Codeline Model
Figure 3: Codeline Model

Testing & Configuration

Tests can be run manually from the TypeScript directory using Gulp tools after installing some dependencies.

Directory

Change to the TypeScript directory:

cd TypeScript

Dependencies

Install Gulp tools and dev dependencies:

npm install -g gulp
npm ci

Tests

Run tests for the compiler using the following command:

gulp runtests --runner=compiler          # Run tests for the compiler suite.

Additional commands to run tests of your choice and record results are as follows:

gulp tests                               # Build the test infrastructure using the built compiler.
gulp runtests                            # Run tests using the built compiler and test infrastructure.
gulp runtests --runner=<runnerName>      # Run tests for a specific suite (e.g., conformance, compiler, fourslash, project, user, and docker).
                                         # Note: You'll need to have the docker executable in your system path for the docker runner to work,
                                         # although we are not focusing on docker at the moment.
gulp runtests --tests=<testPath>         # Run a specific test.
gulp runtests-parallel                   # Like runtests, but split across multiple threads. Uses a number of threads equal to the system
                                         # core count by default. Use --workers=<number> to adjust this.
gulp baseline-accept                     # Replace the baseline test results with the results obtained from gulp runtests.
gulp lint                                # Runs eslint on the TypeScript source.
gulp help                                # List the above commands.

Applied Perspective: Evolution

Perspective Introduction

From the evolution perspective, we are considering the ability of the system to be flexible in the face of inevitable change after deployment, and if this is balanced against the costs of providing such flexibility. The term evolution means the process of dealing with all of the possible types of changes in the system development life cycle. Thus, this perspective is relevant to most large-scale information systems.

Concerns

Dimensions of Change

Evolution in the context of TypeScript has to take into account several dimensions of change, considering that TypeScript is intended to be a system that is flexible and accepts changes through contributions and even entire rewrites. Functional evolution is a big dimension of change that needs to be considered, as with TypeScript constantly evolving and filling in gaps, functionality of the system and the subsystems it employs must be adaptable. Another important facet to consider is integration evolution. TypeScript must remain compatible with all the systems it interacts with, as well as writes to, such as JavaScript. As JavaScript-related systems evolve, TypeScript has to evolve with them, which can result in pressures put on the system.

Changes Driven by External Factors

Because TypeScript is built on JavaScript, it has to stay up to date with any updates that JavaScript makes to be consistent.

TypeScript is a superset of JavaScript, where any JavaScript is also TypeScript, and the compiler layer in particular compiles TypeScript into JavaScript. When additional features get added to JavaScript—in the form of ECMAScript standards—the TypeScript compiler needs to be able to understand code using these updates in order accurately verify said code. In fact, tests are built into TypeScript to make sure it conform to these standards. This way, it's clear when TS needs to be modified so that users can continue using all the new functionality. (Source)

Staying up to date is important not only from the perspective of supporting users, but also from a business perspective. Although TypeScript is not for profit, Microsoft would presumably like to retain a wide market of users and be connected to a larger community of developers. Supporting new functionality from JavaScript helps TypeScript stay on the radar and continue being used.

Preservation of Knowledge

If possible, up-to-date preservation of project knowledge such as a platform for communication or documentation is vital for an open-source project like TypeScript. However, this is not an easy task to tackle. A project can allocate only so many resources with a software system that is continuously growing in scope and complexity. In addition, with the nature of open source projects, as contributors come and go, memories would fade. What was known could become unknown. Ultimately, it would cause more time and effort to recover that knowledge. The Preservation of Knowledge might not be a concern for TypeScript yet, but it would be more challenging to carry out once the development state becomes more settled. Thus, TypeScript should have a more robust automated system that records new changes and updates.

Activities

Characterize the Evolution Needs

Type: Functional Evolution

Magnitude: Because TypeScript has types, not only does TypeScript support existing functionality from JavaScript with this addition, but types allow TypeScript to add additional features such as a compiler and other features related to error detection.

Likelihood: Most of the stand-alone functionality for the compiler has been implemented, so future updates are likely to be small.

Timescale: These updates are not specific, but rather vague needs for changes sometime in the future depending on how techniques and algorithms for identifying and handling errors develop (as well as the second type of evolution described below).

Type: Integration Evolution

Magnitude: Since TypeScript is a superset of JavaScript, any JavaScript program is a valid TypeScript, and TypeScript must therefore have implementations to understand and support all of JavaScript. Additionally, it supports JavaScript frameworks such as React.

Likelihood: Because JavaScript is still regularly being developed, TypeScript will continue to be developed as well.

Timescale: These updates, while not necessarily specified in advance, appear on a short time frame multiple times a year, dependent on when JavaScript makes its own updates.

Assess Ease of Evolution

Assessing Functional Evolution

Looking to TypeScript's history of evolution, as well as considering the robust functionality that has already been implemented into TypeScript as a language, the changes required to accomplish additional functionality would likely be fairly risk-free and time/cost-friendly. As an open-source project managed by Microsoft, this aspect of TypeScript's evolution is well accounted for. This can be seen in TypeScript's incremental updates which have occurred every couple of months since it's 1.0 release version in 2014. The functional changes occurring more recently have been described as system improvements and more specified increases in functionality. This indicates that future changes will likely be similarly specific, small, and unobtrusive.

Assessing Integration Evolution:

Integration within TypeScript is a more high-risk point of evolution needs due to the dependence on JavaScript updates. As JavaScript evolves, TypeScript naturally cannot evolve until updates have already been released. This type of integration relationship leaves the door open for there being a period of time when JavaScript has updated more significantly while TypeScript has not yet made the changes necessary to support the changes, leading to errors in TypeScript's functionality. However, in terms of assessing risk versus a need for evolution, this type of improvement that occurs out of necessity for filling in the gaps between JavaScript and TypeScript updates, this is a needed risk to assume in order to maintain the core functionality of the program.

Styles & Patterns Used

Architectural Style

The goal of the TypeScript compiler is to transform a series of .js, .ts, .json, and .d.ts files into .js, .d.ts, and .js.map files accordingly. To abstract this process in a higher-level sense, the compiler takes the sources files through a sequential linear process that incorporates the Pipe & Filter Architectural style.

The TypeScript compiler can be generalized into 6 different operations:

Name Operation
Scanner Reads text from left to right and creates syntax tokens
Parser Examines tokens from the scanner and creates syntax trees accordingly
Binder Cycles through syntax trees and links symbols
Checker Compares symbols across all syntax trees and gives diagnostics
Emitter Takes syntax trees and emits .js, .d.ts files
Architecture Diagram of the TypeScript Compiler
Figure 4: Architecture Diagram of the TypeScript Compiler

Software Design Patterns

Name Context Problem Solution
Adapter TypeScript/src/compiler/transformer.ts TypeScript needs to be able to emit files in various compatible formats, like Javascript, for example. The syntax trees created by TypeScript are compatible with TypeScript, not other languages that a user may want to emit a file to, so the syntax tree must be altered before being passed onto the Emitter. The transformer uses the adapter pattern to transform a TypeScript-compatible syntax tree into a Javascript-compatible syntax tree, which can then be further utilized by the user through whatever means they would like to. The Adapter is a pattern that allows incompatible interfaces to collaborate, and as such, the transformer acts as an Adapter from Typescript to other languages.
Abstract Factory BaseNodeFactory @ line 8 in TypeScript/src/compiler/factory/baseNodeFactory.ts TypeScript needs to able to create a syntax tree for parsing JavaScript, which is composed of nodes of a variety of predefined types representing different parts of syntax. To abstract out this creation, TypeScript uses a BaseNodeFactory to abstract out creating a group of nodes. This way, clients of this class don't need to worry about the nitty-gritty details of all of the ObjectAllocator's Node constructors and can create instances of nodes more simply and within a shared context.
Builder builder.ts, builderPublic.ts, builderState.ts, and builderStatePublic.ts in TypeScript/src/compiler/ TypeScript needs to create different complex objects for a variety of tasks, such as managing program state changes, creating diagnostic functions and watch functions. These complex objects all have the same construction process. The builder files provides the other components a large number of builder interfaces to unify and simplify the complex object creating process. Allowing clients to create different complex objects using the same construction process.
Visitor NodeVisitor() @ line 8028 in TypeScript/src/compiler/types.ts In TypeScript, each node represents a fundamental building block. These nodes need to be identified and transformed in a flexible manner when it comes to adding new behaviors to existing functionalities without violating the Single Responsibility Principle. The visitor interface declares a set of visiting methods that correspond to the accepted node that is passed in. The functions will then process and possibly transforms the node. This pattern permits nodes to be operating without figuring out their concrete classes.

Architectural Assessment

Principle Definition Examples Discussion
Single Responsibility Principle An element should be responsible to one and only one actor.
  1. The parser component is used to build syntax trees. In TypeScript/src/compiler/parser.ts, there are many functions that are each responsible for a small step of the syntax tree building process. For example, the function findHighestListElementThatStartsAtPosition @ line 9230
  2. TypeScript/src/compiler/sys.ts @ line 97 getCustomLevels()
Lots of complex functionality is regularly broken down into small steps. As evident in the first provided example, methods are often designed to accomplish very specific and narrow tasks. However, the getCustomLevels() function is an example of the Single Responsibility Principle violation. The method gets the custom level, but it also contains code to set the custom level, thus giving it two separate purposes.
Open-Closed Principle Elements should be open for extension but closed for modification.
  1. Function parseTag() @ 7989 line in TypeScript/src/compiler/parser.ts
  2. The function createMissingNode() @ line 1804 in TypeScript/src/compiler/parser.ts
The parseTag() function violates the open-closed principle as it contains several specific functions for certain tags rather than abstracting these functions to a higher level. If a new type needed to be accommodated, this would require modification of existing code.
The createMissingNode() function violates the open-closed principle as it uses a conditional function to identify different SyntaxKinds rather than encapsulating the function with abstract functions. If a new SyntaxKind needed to be accommodated, this would require modification of existing code.
Interface Segregation Principle Clients should not be forced to implement interfaces they do not use. Interfaces should not have methods that it doesn’t need.
  1. TypeScript/src/compiler/type.ts - property hasTrailingComma in two interfaces @ lines 1013 ~ 1021: MutableNodeArray and NodeArray
  2. TypeScript/src/compiler/type.ts property autoGenerateFlags @ line 1137 - Identifier interface, and @ line 1151 GeneratedIdentifier interface
The TypeScript codebase uses interface variations to offer reduced/additional functionality for property modification. In the first given example, there are two specific node arrays that allow clients to focus on specialized functionality that fit their specific needs. Rather than being forced to use a mutable node array if the array hasTrailingComma property will never be changed, a client can use a NodeArray interface instead, where that property is readonly and thus does not have functionality that the client doesn't need. Similarly, in the second example, the autoGenerateFlags property may or may not be a read-only property. This allows a client to be intentional in choosing an interface whose functionality best reflects their goal for an object.
Principle of Separation of Concerns Organize software into separate elements that are as independent as possible 1. under TypeScript/src:
  • compiler/
  • debug/
  • server/
  • services/
  • ...

2. under TypeScript/src/compiler/:
  • parser.ts
  • scanner.ts
  • checker.ts
  • binder.ts
  • emitter.ts
  • transformer.ts
The entire TypeScript codebase is broken up into distinct modules, and similar can be said for those modules as well—including for our focus, the compiler. This helps create an architecture that can be analyzed at various levels of detail, significantly assisting with comprehension as well as code organization.
Principle of Least Knowledge
(Law of Demeter)
An object should never know the internal details of other objects.
  1. TypeScript/src/compiler/utilities.ts @ line 4906
    getModifierFlagsWorker() returns property of node
  2. TypeScript/src/compiler/binder.ts @ line 18
    getModuleInstanceState() returns node.body property
  3. TypeScript/src/compiler/corePublic.ts
    • line 56: set method for ESMap
    • line 42: get key for ReadonlyESMAP
Throughout the codebase, getter and setter methods are used to allow clients to interact with the properties of a class or other object indirectly. This helps with encapsulation and abstraction, making sure that other objects don't have access to the internal details of a given object.

System Improvement

System Repo Fork: https://github.com/henry-bao/TypeScript

Location Problem Refactoring
TypeScript/src/compiler/parser.ts @ line 7989, parseTag() The problem with this function that we are aiming to address through refactoring is its violation of the Open-Closed Principle, as discussed earlier in our report. The function contains several tag-specific functions that aren't abstracted out to allow for extension rather than modification of the function if more types were added to TypeScript and therefore needed to be accounted for. First, we extracted these tag-specific functions into a separate function. This was necessary to address the abstraction problem of the parseTag function so that these tag-specific functions could be contained in a separate entity, therefore making it better closed to modification. Next, we created a dictionary containing the individual cases originally encapsulated by the switch within the parseTag function that could then be utilized and referenced with the helper function containing the original extraction in order to reduce redundancy.
TypeScript/src/compiler/parser.ts @ line 8015, parseSimpleTagAllocator() The problem that we are aiming to solve in this function is the repetitive nature of the cases within the switch. Each of these repetitive functions only vary by one parameter, and therefore it is bad practice to have a separate case for each one when this repetition could be abstracted for readability and overall simplicity of the code. We created a dictionary that would take a tagName from the parseSimpleTagAllocator and select a specific function as the second parameter in the parseSimpleTag function to return. Ultimately, this leaves the parseSimpleTagAllocator with no switch and only a singular parseSimpleTag call that uses the dictionary as a helper.
TypeScript/src/compiler/parser.ts @ line 1804, createMissingNode() This function used a nested ternary operator to create nodes during the TypeScript compiler's code parsing process. In turn, it rendered the code architecture with bad readability, as well as some redundancies (e.g., the multiple kind ===) The nested ternary operator was modified into a combination of if and switch statement. This solution provided a clearer code structure for each SyntaxKind condition.
TypeScript/src/compiler/parser.ts @ line 9002, visitNode() This function uses many lines of comment to explain the code, while the code was written with bad readability. The comment in this function is used to explain a conditional statement and the code's action accordingly. This is a bad use of comment, because good code should be self-explanatory We extracted a chunk of the code into a function checkChildrenIntersections() to better encapsulate the code.

We also extracted the conditional statement into a constant value named intersectsChangeRange for better readability.

Now we can delete the comments as we made the code more readable.

Foot Notes

Footnotes

  1. TypeScript Compiler Notes by Microsoft