wujr5/c-and-cpp-language-learning

计科:week10 第六次实验报告

wujr5 opened this issue · 23 comments

wujr5 commented

计科:week10 C String and Functions

Description

The C programming language has a set of functions implementing operations on strings (character strings and byte strings) in its standard library. Various operations, such as copying, concatenation, tokenization and searching are supported. For character strings, the standard library uses the convention that strings are null-terminated: a string of n characters is represented as an arrays of an n + 1 elements, the last of which is a "NUL" character.

The only support for strings in the programming language proper is that the compiler translates quoted string constants into null-terminated strings.

From wikipedia

String is a very important thing when we are programming. All of the input and ouput are strings and any kind of data can be translated into a "String Form" include int, double and so on. In other words, an integer or a double number can be represented by a string simply.

Strings in C programming language is implemented by using an character array whose length is n+1(n represents the valid length of the string). So the operations on Strings in C can be treated as the operations for arrays. The following standard C library includes some basic Cstring operations.

This header file defines several functions to manipulate C strings and arrays.

Functions

Copying

memcpy
Copy block of memory (function )
memmove
Move block of memory (function )
strcpy
Copy string (function )...
strncpy
Copy characters from string (function )

Concatenation:

strcat
Concatenate strings (function )
strncat
Append characters from string (function )

Comparison:

memcmp
Compare two blocks of memory (function )
strcmp
Compare two strings (function )
strcoll
Compare two strings using locale (function )
strncmp
Compare characters of two strings (function )
strxfrm
Transform string using locale (function )

Searching:

memchr
Locate character in block of memory (function )
strchr
Locate first occurrence of character in string (function )
strcspn
Get span until character in string (function )
strpbrk
Locate characters in string (function )
strrchr
Locate last occurrence of character in string (function )
strspn
Get span of character set in string (function )
strstr
Locate substring (function )
strtok
Split string into tokens (function )

Other:

memset
Fill block of memory (function )
strerror
Get pointer to error message string (function )
strlen
Get string length (function )

Macros

NULL
Null pointer (macro )

Types

size_t
Unsigned integral type (type )

From cplusplus.com

Separate source file and header file

For header file, we just write the declaration of the functions and constants.
For source file, we implement the functions(function definitions)
For example:

func.h

#ifndef FUNC_H_INCLUDED
#define FUNC_H_INCLUDED
int plus(int a, int b);
#endif //FUNC_H_INCLUDED

func.cpp

#include "func.h"
int plus(int a, int b) {
    return a+b;
}

Basic Assignment (90pts):

This time your task is not to just use the functions simply. For this assignment, you will be given serveral function prototypes and your job is to implement the functions with the descriptions and smaples.

#ifndef STRING_H_INCLUDED
#define STRING_H_INCLUDED
/*
   Notice all the samples are use for function logic, not exactly the grammar
*/

/*
    function: myStrlen
    @description: Calculate the length of the string
    @input: const string s
    @output: the length of the string
    @sample input: "abcde\0abcde"
    @sample output: 5
    @Notice:  Function myStrlen() show check whether the input array is NULL.
    If so, you show return 0. Note that strlen() in string.h does not check NULL arrays.
*/
unsigned int myStrlen(const char s[]);

/*
    function: myStrcpy
    @description: copy one string to another
    @input: const string source
    @output: string destination
    @sample input: "a" "abc"
    @sample output: "abc"
    @Notice: You can not assign the array directly which will cause a runtime error.
*/
char * myStrcpy(char destination[], const char source[]);

/*
    function: myStrcat
    @input: const string source
    @output: string destination
    @sample input: "abcde" "abc"
    @sample output: "abcdeabc"
    @Notice: You can not assign the array directly which will cause a runtime error.
*/
char * myStrcat(char destination[], const char source[]);

/*
    function: myStrcmp
    @input: const string1 and const string2
    @output: if string1 is euqal to string2, output 0
             if string1 is greater than string2 ouput an integer > 0
             if string2 is greater than string1 output an integer < 0
    @sample input: "abc" "abc"
    @sample output: 0
*/
int myStrcmp(const char str1[], const char str2[]);

/*
    function: Mystrfind
    @input: cosnt string1 and const string2
    @output: if str2 is a substring of str1, output the first index in str1
             otherwise, ouput -1 which indicates can not find
    @sample input: "abcde" "cde"
    @sample output: 2
*/
int MyStrfind(const char str1[], const char str2[]);

/*
    function: LeftRotateString
    @input: string buff, an integer n which indicates the first n
    @output: put the first n chars to the end of the string
    @sample input: "abcdefg" 4
    @sample output: "efgabcd"
*/
void LeftRotateString(char *buff, int n);

/*
    function: myParseInt
    @input: a const string
    @output: an integer parse from a string
    @sample input: "123"
    @sample output: 123
    @Notice: You should notice show extreme conditions such as:
   1 null array input: the input is an array, the program will crash in the access to the null array, 
       so you need to judge whether the array is null before using the array.
   2 The sign: integer not only contain numbers, and may is the positive integer expressed
       starts with a '+' or '-', so if the first character is' - ', 
       to get the integer value is converted to a negative integer.
   3 illegal characters: the input string may contain characters that are not numbers. 
     Therefore, whenever you encounter these illegal characters, the program should stop converting.
   4 integer overflow: the number of input is the string of the form of input,
      so the input of a very long string will likely lead to overflow. You should set the return 
      value to MAX_INT or MIN_INT in the situation.
*/
int myParseInt(const char str[]);

/*
    function: myStrcontain
    @input: const string1 and const string2
    @output: a boolean value. id all chars in str2 are in str1, output true
             else output flase
    @sample input1: "ABCD" "BAD"
    @sample output1: 1
    @sample input2: "ABCD" "BCE"
    @sample output2: 0
*/
int myStrcontain(const char str1[], const char str2[]);

#endif // STRING_H_INCLUDED

Requirements:

  1. You need to write a file myString.cpp for all the implements of the function prototypes.

  2. In file myString.cpp you can not include any other standard C or C++ library, the file should be the same type like the following:

    #include "myString.h"
    #ifndef NULL
    #define NULL 0
    #endif // NULL
    
    /*code*/
    /*function definations*/
    /*code end*/
  3. You can not change the function prototypes.

  4. Black box testing will be again applied this experiment. Please check your file names for three times.

  5. Please do not add main function in your file, you can only add it when you are testing.

Deep Thinking(10pts)

  1. For the function char * myStrcat(char destination[], const char source[]);, the array destination indicates the result of the function, why does the function add a return value char *? Why not void myStrcat(char destination[], const char source[]);?(Microsoft interview question)
  2. Many functions in C standar library <string.h> do not check NULL array and they also do not carry about the size of the array. Why? And why strlen_s in c11 provide size check? (Google interview questions)
  3. We have a small C program:
#include <iostream>
using namespace std;

int main() {
    char s1[100] = "abcdefg";
    char * s2 = "abcdefg";
    s1[0] = 'z';
    s2[0] = 'z';

    return 0;
}

What will happen when we run the program and why? What if we remove s2[0] = 'z';?
4. Why separate the source file and the header file? Think about the reasons. (Microsoft interview question)
5. How can we improve LeftRotateString or myStrcontain by other algorithm? (optional question, Microsoft interview questions)
Hint: String Reverse can be applied for LeftRotateString and hash table can be used for myStrcontain.

Report:

Just answer the questions in deep thinking part

Submit (Please pay attention to this part)

作业提交方式:作业由学委收好,再发到我邮箱.

提交方式:FTP

地址: ftp://172.18.182.75/
远程目录: Experiment/计科班/Experiment6
命名规范:13331314_叶嘉祺_EX6_v0.zip
注意v0代表你提交的版本号,第一次提交为v0,第二次为v1,以此类推
注意,要打包压缩程ZIP格式

--13331314_叶嘉祺_EX6.zip
  |--myString.cpp
  |--report.pdf

注意,文件压缩包里不能出现文件夹,只能包含一层目录

作业缓交或补交:每缓交一天 -5% 总分

Deadline

12月10日 18:00

最后

1、文件命名绝对不能错
2、你需要先学习如何写一个函数库。也就是第一个任务,如何编写.h文件和.cpp文件分开,然后在main函数中调用.h文件中的函数。说白了就是明白#include 这种实现的机制。
3、.h文件和.cpp文件请严格遵守要求中的约定。设定好的函数原型和.h文件请不要更改,并且理解为什么一字不能改?对于系统构架师(编写.h的工程师)和具体实现的工程师(编写.cpp文件工程师),以及测试工程师(编写main.cpp进行测试的工程师),和用户(调用库函数的人)之间是怎样合作,怎样沟通理解的?
4、请先自己尝试编写main函数进行测试(文件分离,再编写)

老师我可以参考其他libc的代码吗?

wujr5 commented

@Icenowy 首先叫师兄就好23333。然后最好先自己实现,然后再去参考lib的代码,这样领悟更深刻。直接参考,get到的东西不会比自己一步一步写出来深刻。

其实我看别家libc看到很多因缺斯汀的东西……uclibc的str系列函数主要用指针(这还算正常),glibc的strlen居然是用一次检查一个uint32_t而不是一个char的方式加速……还有很多libc选择了用汇编而不是C实现str系列函数

  • 发送自我的Sony Xperia™智能手机

---- 吴家荣编写 ----

@Icenowy 首先叫师兄就好23333。然后最好先自己实现,然后再去参考lib的代码,这样领悟更深刻。直接参考,get到的东西不会比自己一步一步写出来深刻。


Reply to this email directly or view it on GitHub:
#33 (comment)

SgLy commented

@wujr5 myStrfind有效率要求吗……要不要写KMP

你觉得可能吗?(我都不会写KMP)
而且至少uclibc, bsd libc的strstr都是朴素算法,msvcrt略优化。
好像在官方性的libc里,只有glibc是kmp。
但是glibc那种成精的优化。。。还是那句话,你连glibc的strlen都不一定能看懂(我承认我看不懂)

  • 发送自我的Sony Xperia™智能手机

---- SHANGGUAN Lingyun编写 ----

@wujr5 myStrfind有效率要求吗……要不要写KMP


Reply to this email directly or view it on GitHub:
#33 (comment)

SgLy commented

@Icenowy 好吧……不过我还是决定写一个KMP(

@SgLy 暴力装B啊。。。教我写KMP!

@wujr5 这道题目有个地方矛盾了:后面说用INT_MAX,但是又不允许引用其他系统header,于是没有任何header提供INT_MAX
INT_MAX由 (limits.h) 提供。
建议在myString.h里面加上#include <limits.h>

注:limits.h是特定的。因为不同平台会有不同的limits.h

@wujr5 另外,要写出myStrcontain的符合标准实现的版本也得用limits.h

当然,您也可以在题目里假定int是4字节,char是1字节

但是,这些东西的确是编译器特定的(比如DOS上的C编译器/用于8051的gcc/IAR/etc.就是2字节int,因为8位单片机受不起4字节int)

头文件和源文件包含在同一个文件夹就可以调用我写的函数吗?

@Icenowy 没必要,你完全可以计算出int和char的大小,以及INT_MAX

即使能计算出来,第一对于真实程序而言没必要,第二影响效率,第三如果malloc不允许使用的话,不能开动态内存,所以不能用计算的方法,必须在预编译阶段
另外如果判题环境是i686-pc-linux-gnu, x86_64-linux-gnu, i686-pc-mingw32, x86_64-pc-mingw64之一的话,sizeof(int)一定是4,sizeof(char)一定是1

  • 发送自我的Sony Xperia™智能手机

---- Ye Jiaqi编写 ----

@Icenowy 没必要,你完全可以计算出int和char的大小,以及INT_MAX


Reply to this email directly or view it on GitHub:
#33 (comment)

另外limits.h不算是库,而应该算标准要求编译器提供的语法特性(在linux下面limits.h在/usr/lib/gcc/$TRIPLET/include/而不在/usr/include/也说明这点)所以应该被允许引用(标准libc要实现功能也得用limits.h,因为limits.h先于libc存在,在编译器被构建的时候已经确定

  • 发送自我的Sony Xperia™智能手机

---- Ye Jiaqi编写 ----

@Icenowy 没必要,你完全可以计算出int和char的大小,以及INT_MAX


Reply to this email directly or view it on GitHub:
#33 (comment)

@Icenowy 题目要求你不能够include就是不能,无必要纠结于引用不引用limits.h。而且,你完全可以在预编译阶段计算出你需要的值,完全不需要每次运行都计算。

至少标准不允许

  • 发送自我的Sony Xperia™智能手机

---- Ye Jiaqi编写 ----

@Icenowy 题目要求你不能够include就是不能,无必要纠结于引用不引用limits.h。而且,你完全可以在预编译阶段计算出你需要的值,完全不需要每次运行都计算。


Reply to this email directly or view it on GitHub:
#33 (comment)

预编译阶段不能使用sizeof

  • 发送自我的Sony Xperia™智能手机

---- Ye Jiaqi编写 ----

@Icenowy 题目要求你不能够include就是不能,无必要纠结于引用不引用limits.h。而且,你完全可以在预编译阶段计算出你需要的值,完全不需要每次运行都计算。


Reply to this email directly or view it on GitHub:
#33 (comment)

@yyh14353191 首先头文件、库源文件、测试文件需要在同一目录。
其次,如果用IDE的话,库源文件(不含main)和测试源文件(含main)一定要在同一个工程(另外Dev-Cpp的工程支持稀烂)
如果用命令行编译,格式如下:
gcc myString.cpp test.cpp -o test
也就是说要把测试源文件和库源文件放在同一条命令编译。

SgLy commented

@wujr5 myParseInt如果遇到非法字符串应该返回什么?

SgLy commented

@Icenowy KMP写起来好长好容易错……想想算了(又不加分

2333333

  • 发送自我的Sony Xperia™智能手机

---- SHANGGUAN Lingyun编写 ----

@Icenowy KMP写起来好长好容易错……想想算了(又不加分


Reply to this email directly or view it on GitHub:
#33 (comment)

应该是stop parse(libc的atoi是在非数字的位置停止

  • 发送自我的Sony Xperia™智能手机

---- SHANGGUAN Lingyun编写 ----

@wujr5 myParseInt如果遇到非法字符串应该返回什么?


Reply to this email directly or view it on GitHub:
#33 (comment)

9mwbmar8lf_y e l1 qcvw
话说这里输出的正负整数有规定准确值吗?

tpr2_18oxh dc4ism2ir4pm
这个意思是数据溢出返回MAX或MIN吗?怎么一返回就出错?