/42cursus-minishell

A custom shell program emulating basic bash functionalities

Primary LanguageC

ascii-text-art (3)

Minishell

Table of Contents

  1. Overview
  2. Project Challenges
  3. Team Development Steps for Minishell
    1. Initial Planning and Setup
    2. Research and Design Phase
      1. Shell Operations
      2. External Functions
    3. Parsing and Input Management
      1. Command Reading
      2. Steps for building the parser and AST (Abstract Syntax Tree)
        1. Designing a Syntax Error Checker
        2. Syntax Error Checker Function
        3. Tokenization
        4. Token Structure
        5. Tokenization Strategy
        6. Tokenization Function
        7. Parsing
        8. AST Node Structure
        9. Parsing Function
        10. AST Visualization
    4. Command Execution
    5. Built-in Commands Implementation
    6. Signal Handling
    7. Environment Variable Expansion
    8. Testing and Debugging
  4. Project Structure
  5. Team Members
  6. Resources
  7. Acknowledgements

Overview

Our Minishell project represents a collaborative effort to build a custom shell program in C, inspired by the functionalities of bash, the Bourne Again SHell. As part of the 42 School curriculum, this project served as a unique opportunity for us to deepen our collective understanding of system processes, file descriptors, and the complexities of command-line interfaces. Through the development of our own shell, we engaged in a comprehensive exploration of Unix-based systems, gaining hands-on experience in process control, command interpretation, and environmental variable management.

Project Challenges

  • Parsing User Input: Accurately parsing user input, including handling spaces, quotes, and special characters, while distinguishing between command arguments and options.
  • Executing Commands: Implementing logic to search for and execute the right executable based on the PATH environment variable or a specified path, and managing execution of built-in commands versus external commands.
  • Signal Handling: Correctly handling Unix signals such as SIGINT (ctrl-C), SIGQUIT (ctrl-\), and EOF (ctrl-D), and ensuring the shell behaves similarly to bash in response to these signals.
  • Input/Output Redirection and Pipes: Implementing input and output redirection (<, >, >>, <<) and pipes (|) to allow for command chaining and data redirection, which involves managing file descriptors and process communication.
  • Environment Variable Expansion: Managing environment variables and supporting their expansion within commands, including the special case of $? to represent the exit status of the most recently executed command.
  • Memory Management: Ensuring efficient memory management throughout the shell, including preventing memory leaks especially in the context of the readline function and dynamically allocated resources.
  • Built-in Commands Implementation: Creating internal implementations of several built-in commands (echo, cd, pwd, export, unset, env, exit) that behave consistently with their bash counterparts.
  • Concurrency and Process Management: Handling concurrency through process creation and management, using system calls like fork, execve, wait, and pipe, and ensuring robust process control and signal handling.
  • Error Handling: Developing comprehensive error handling strategies to deal with invalid commands, permissions issues, nonexistent files, and other runtime errors.

Team Development Steps for Minishell

Step 1: Initial Planning and Setup

  • Repository Setup: Collaboratively set up the GitHub repository, ensuring a clear directory structure and branch strategy.
  • Makefile Creation: Makefile that includes rules for all, clean, fclean, re
  • Set up libft libray

Step 2: Research and Design Phase

This phase was about understanding the shell's operations, researching the external functions allowed, and dividing them among ourselves to research and explain their usage to the team.

Shell Operations

shell

terminals

External Functions

Reviewed the external functions allowed, dividing them among ourselves to research and explain their usage to the team.

Readline Functions:

Function Description
readline Reads a line from the standard input and returns it.
rl_clear_history Clears the readline history list.
rl_on_new_line Prepares readline for reading input on a new line.
rl_replace_line Replaces the content of the readline current line buffer.
rl_redisplay Updates the display to reflect changes to the input line.
add_history Adds the most recent input to the readline history list.

Standard I/O Functions:

Function Description
printf Outputs formatted data to stdout.

Memory Allocation Functions:

Function Description
malloc Allocates specified bytes of heap memory.
free Deallocates previously allocated memory.

File I/O Functions:

Function Description
write Writes data to a file descriptor.
access Checks calling process's permissions for a file or directory.
open Opens a file or device, returning a file descriptor.
read Reads data from a file descriptor into a buffer.
close Closes a previously opened file descriptor.

Process Control Functions:

Function Description
fork Creates a new process by duplicating the calling process.
wait Suspends execution of the calling process until one of its children terminates.
waitpid Waits for a specific child process to change state.
wait3 Waits for any child process to change state.
wait4 Waits for a specific child process to change state.
signal Handles or ignores signals sent to the process.
sigaction Handles or ignores signals sent to the process.
sigemptyset Initializes and adds signals to a signal set.
sigaddset Initializes and adds signals to a signal set.
kill Sends a signal to a process or a group of processes.
exit Terminates the calling process.

Directory Functions:

Function Description
getcwd Gets the current working directory.
chdir Changes the current working directory.
stat Returns information about a file or a file descriptor.
lstat Returns information about a file or a file descriptor.
fstat Returns information about a file or a file descriptor.
unlink Removes a link to a file.
execve Replaces the current process image with a new process image.

File Descriptor Functions:

Function Description
dup Duplicates a file descriptor.
dup2 Duplicates a file descriptor.
pipe Creates a pipe for inter-process communication.

Directory Functions:

Function Description
opendir Manages directory streams.
readdir Manages directory streams.
closedir Manages directory streams.

Error Handling Functions:

Function Description
strerror Returns a pointer to the textual representation of an error code.
perror Returns a pointer to the textual representation of an error code.

Terminal Functions:

Function Description
isatty Tests whether a file descriptor refers to a terminal.
ttyname Returns the name of the terminal associated with a file descriptor.
ttyslot Returns the name of the terminal associated with a file descriptor.
ioctl Controls device-specific input/output operations.
getenv Returns the value of an environment variable.
tcsetattr Sets and gets terminal attributes.
tcgetattr Sets and gets terminal attributes.
tgetent Terminal handling functions from the termcap library.
tgetflag Terminal handling functions from the termcap library.
tgetnum Terminal handling functions from the termcap library.
tgetstr Terminal handling functions from the termcap library.
tgoto Terminal handling functions from the termcap library.
tput Terminal handling functions from the termcap library.

Step 3: Parsing and Input Management

Command Reading

Readline Library: Implemented readline and integrated add_history ( GNU Readline)

brew install readline

Add the following to the Makefile:

READLINE_INCLUDE = $(shell brew --prefix readline)/include
READLINE_LIB = $(shell brew --prefix readline)/lib
INCLUDES = -I./includes -I./lib/libft -I$(READLINE_INCLUDE)
#include <stdio.h>
#include <readline/readline.h>
#include <readline/history.h>

int main(void)
{
    char *input;

    while (1)
    {
        input = readline("minishell$ ");
        if (!input)
            break;
        if (*input)
            add_history(input);
        printf("Input: %s\n", input);
        free(input);
    }
    return 0;
}

readline(3) - Linux manual page

Steps for building the parser and AST (Abstract Syntax Tree)

  • Syntax Error Checking: This involves verifying whether the input string adheres to the shell's syntax rules. It checks for unclosed quotes, and misuse of redirection or pipe symbols. Syntax error checking ensures that the input can be correctly interpreted and executed.

    bash-quotes

  • Tokenization: This step breaks the input string into meaningful pieces, known as tokens. Tokens can be commands, arguments, redirection operators (<, >, >>, <<), pipe symbols (|), and environment variable identifiers. Tokenization simplifies the parsing process by converting the input string into a format that's easier to analyze.

  • Parsing: During parsing, tokens are analyzed to understand their syntactical relationship. This step involves constructing a representation of the input that reflects the user's intention. Depending on the complexity of the shell, this could mean building an abstract syntax tree (AST) or a simpler structure.

  • AST Construction:

    • Commands: Nodes in the AST represent commands along with their arguments. These nodes are fundamental to understanding what actions the shell needs to perform.
    • Redirections: Redirection nodes are created to represent input and output redirections. These nodes are attached to command nodes to modify how the commands read their input or write their output.
    • Pipelines: When a pipe is encountered, a pipeline node links command nodes together, indicating that the output of one command serves as the input to another.
    • Environment Variable Expansion: This can be handled as part of command parsing or immediately before command execution. It involves replacing environment variable identifiers with their corresponding values.
  • Execution AST: After the AST is built, it is traversed to execute the commands it represents. This involves:

    • Executing built-in commands directly within the shell.
    • Launching external commands by creating new processes.
    • Setting up redirections as specified by the redirection nodes.
    • Managing pipelines by connecting the stdout of one command to the stdin of the next.

a. Designing a Syntax Error Checker

the syntax error checker will be responsible for identifying and reporting syntax errors in the user input.

  1. Unclosed Quotes: verify that all quotes are properly closed.
  2. Misplaced Operators: Detect if pipes | are used incorrectly, such as being at the start or end of the input, or if multiple pipes are used consecutively.
  3. Logical Operators: Detect logical operators such as && and || and report them as not supported.
  4. Invalid Redirections: Detect invalid redirections, such as multiple consecutive redirections or redirections at the start or end of the input.

b. Syntax Error Checker Function

syntax_checker.c:

  • syntax_error_checker Function: Iterates through the input string, checking for syntax errors and reporting them if found.
  • has_unclosed_quotes Function: Checks for unclosed quotes in the input string.
  • has_invalid_redirections Function: Detects invalid redirections, such as multiple consecutive redirections or redirections at the start or end of the input.
  • has_misplaced_operators Function: Detects misplaced pipes and redirections.
  • has_logical_operators Function: Detects logical operators such as && and || and reports them as not supported.

c. Tokenization

The goal of the tokenization process is to break down the input string into a series of tokens that the parser can easily understand. These tokens represent commands, arguments, pipes, redirections, and other elements of the shell syntax.

  • Quotations: Distinguishing between single (') and double (") quotes.
  • Redirections: Recognizing input (<), output (>), append (>>), and here-documen(<<) redirections.
  • Pipes (|): Splitting commands to be executed in a pipeline.
  • Environment variables: Expanding variables starting with $.
  • Command separation: Identifying commands and their arguments.

d. Token Structure

// Token type enumeration
typedef enum e_token_type
{
    TOKEN_WORD,      // For commands and arguments
    TOKEN_PIPE,      // For '|'
    TOKEN_REDIR_IN,  // For '<'
    TOKEN_REDIR_OUT, // For '>'
    TOKEN_REDIR_APPEND, // For '>>'
    TOKEN_REDIR_HEREDOC, // For '<<'
    TOKEN_ENV_VAR, // For environment variables
}   t_token_type;

// Token structure
typedef struct s_token
{
    t_token_type type;
    char        *value;
    struct s_token *next;
}   t_token;

e. Tokenization Strategy

  1. Whitespace Handling: Skip whitespace outside quotes to separate commands and arguments.
  2. Quoting: Correctly handle single (') and double quotes ("), preserving the text exactly as is within single quotes and allowing for variable expansion and escaped characters within double quotes.
  3. Redirections and Pipes: Detect >, >>, <, <<, and |, treating them as separate tokens while managing any adjacent whitespace.

f. Tokenization Function

The tokenization function will iterate through the input string, identifying and categorizing segments according to the shell syntax.

tokenization.c:

  • tokenize_input Function: Iterates through the input, creating tokens for words separated by spaces.
  • handle_special_chars Function: Handles the special characters in the input string.
  • handle_word Function: Handles the words in the input string.
  • print_tokens Function: Prints the tokens to verify the tokenization process.

Example:

> ls -l | wc -l > output.txt | ls > output2.txt
Token:  ls                    | Type:  WORD                
--------------------------------------------------
Token:  -l                    | Type:  WORD                
--------------------------------------------------
Token:  |                     | Type:  PIPE                
--------------------------------------------------
Token:  wc                    | Type:  WORD                
--------------------------------------------------
Token:  -l                    | Type:  WORD                
--------------------------------------------------
Token:  >                     | Type:  REDIRECT_OUT        
--------------------------------------------------
Token:  output.txt            | Type:  WORD                
--------------------------------------------------
Token:  |                     | Type:  PIPE                
--------------------------------------------------
Token:  ls                    | Type:  WORD                
--------------------------------------------------
Token:  >                     | Type:  REDIRECT_OUT        
--------------------------------------------------
Token:  output2.txt           | Type:  WORD                
--------------------------------------------------

g. Parsing

The parsing process involves analyzing the tokens to understand their syntactical relationship. This step constructs a representation of the input that reflects the user's intention.

  • Command Parsing: Parsing commands and their arguments, creating command nodes in the AST.
  • Pipeline Parsing: Parsing pipeline tokens, creating pipeline nodes in the AST.
  • Redirection Parsing: Parsing redirection tokens, creating redirection nodes in the AST.
  • File Node Creation: Creating a file node for redirections in the AST.

h .AST Node Structure

The AST node structure will represent the input string in a way that reflects the user's intention. The AST will be composed of nodes that represent commands, arguments, redirections, and pipelines.

typedef struct s_ast_node
{
    t_node type type;
    char *args;
    struct s_ast_node *left;
    struct s_ast_node *right;
}   t_ast_node;

i. Parsing Function

The parsing function will iterate through the tokens, building an abstract syntax tree (AST) that represents the input string.

parse.c:

  • parse_tokens Function: Iterates through the tokens, building an abstract syntax tree (AST) that represents the input string.
  • parse_command Function: Parses a command and its arguments, creating a command node in the AST.
  • parse_pipeline Function: Parses pipeline tokens, creating pipeline nodes in the AST.
  • parse_redirection Function: Parses redirection tokens, creating redirection nodes in the AST.
  • create_file_node Function: Creates a file node for redirections in the AST.

j. AST Visualization

generate_ast_diagram/generate_ast_diagram.c:

  • generate_ast_diagram Function: Generates a visual representation of the AST, showing the structure of the input string.

Example:

ls -l | wc -l > output.txt | ls > output2.txt

graphviz

GraphvizOnline

Step 4: Command Execution

  • Built-in Commands: Implementing internal versions of several built-in commands (echo, cd, pwd, export, unset, env, exit) that behave consistently with their bash counterparts.

    builtins

  • External Commands: Implementing logic to search for and execute the right executable based on the PATH environment variable or a specified path.

    path

  • Process Creation: Using system calls like fork, execve, wait, and pipe to manage process creation and execution.

    processes (1)

  • Redirection and Pipes: Implementing input and output redirection (<, >, >>, <<) and pipes (|) to allow for command chaining and data redirection.

    redirects pipes bash-pipes

Step 5: Built-in Commands Implementation

builtins

  • echo: Outputs the arguments passed to it.

  • cd: Changes the current working directory.

  • pwd: Prints the current working directory.

  • export: Sets environment variables.

  • unset: Unsets environment variables

  • env: Prints the environment variables.

    environment-variables

  • exit: Exits the shell.

Step 6: Signal Handling

signals

  • SIGINT: Handling the ctrl-C signal to interrupt the shell's execution.
  • SIGQUIT: Handling the ctrl-\ signal to quit the shell.
  • EOF: Handling the ctrl-D signal to exit the shell.

Step 7: Environment Variable Expansion

  • Environment Variable Expansion: Managing environment variables and supporting their expansion within commands, including the special case of $? to represent the exit status of the most recently executed command.

    parameter-expansion

Step 8: Testing and Debugging

Project Structure

├── includes
│   └── minishell.h
├── lib
│   └── libft
├── Makefile
└── src
    ├── 01_input_validation
    ├── 02_tokenization
    ├── 03_parsing
    ├── 04_execution
    ├── 05_builtin
    ├── 06_signals
    ├── 07_env_var_expansion
    ├── main.c
    └── utils

Team Members

Resources

Acknowledgements

Thanks to reda ghouzraf, MTRX, Nasreddine hanafi, Khalid zerri for their help and support during the project.