6.S081-Lab: Xv6 and Unix utilities 操作系统实验分享

2830

2022.06.18

2022.06.19

发布于未知归属地

学习操作系统的时候，买了一本人民邮电出版社的-Operating Systems: Three Easy Pieces 中译版（操作系统导论）。
从书中学到了很多关于操作系统的思想，出现一个问题，思考策略，并比较策略之间的利弊。整本书读起来十分有趣，当然我不是在这里打广告。
当时只是学习了理论知识，书中所给的练习以及后面的实验都没有做过，当时在附录后面看到：xv6 实验，一直没来做。
这些天，从网上找到了 xv6 实验对应的课程 6.S081，课程的 lab 是重点。
这个 Lab 主要是 Shell 的一些命令实现，做完之后会体会到：原来这些命令是这样子的！
希望我的文章能给大家带来些帮助！

一些代码的分析会需要一些 Linux 知识，有使用过 shell 的经验会更好。
做实验的同时，也需要研读下课程提供的参考书，这里我选择了原文阅读。
感兴趣的小伙伴，可以一起来做做！环境配置放在这里了 xv6内核实验环境搭建

Xv6-Lab-utilities

sleep (easy)

Some hints:

Before you start coding, read Chapter 1 of the xv6 book.

Look at some of the other programs in user/ (e.g., user/echo.c, user/grep.c, and user/rm.c) to see how you can obtain the command-line arguments passed to a program.

If the user forgets to pass an argument, sleep should print an error message.

The command-line argument is passed as a string; you can convert it to an integer using atoi (see user/ulib.c).

Use the system call sleep.

See kernel/sysproc.c for the xv6 kernel code that implements the sleep system call (look for sys_sleep), user/user.h for the C definition of sleep callable from a user program, and user/usys.S for the assembler code that jumps from user code into the kernel for sleep.

Make sure main calls exit() in order to exit your program.

Add your sleep program to UPROGS in Makefile; once you've done that, make qemu will compile your program and you'll be able to run it from the xv6 shell.

Look at Kernighan and Ritchie's book The C programming language (second edition) (K&R) to learn about C.

user 目录下的其他程序

按照实验指导给的步骤，先看看可执行命令对应的代码是怎样的，再添加自己编写的程序。

echo.c

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int main(int argc, char *argv[])
{
  int i;

  for(i = 1; i < argc; i++){
    write(1, argv[i], strlen(argv[i]));
    if(i + 1 < argc){
      write(1, " ", 1);
    } else {
      write(1, "\n", 1);
    }
  }
  exit(0);
}

rm.c

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int
main(int argc, char *argv[])
{
  int i;

  if(argc < 2){
    fprintf(2, "Usage: rm files...\n");
    exit(1);
  }

  for(i = 1; i < argc; i++){
    if(unlink(argv[i]) < 0){
      fprintf(2, "rm: %s failed to delete\n", argv[i]);
      break;
    }
  }

  exit(0);
}

这些程序都是这么做的：

执行的操作是从 int main 函数开始的
通过 argc（参数数量）和 argv（参数值数组）获取调用的参数
程序正常结束时，调用 exit(0) 退出表示调用成功，调用 exit(1) 退出表示调用失败，这个很重要！！！

我们可以猜测这么一个过程：在命令行输入某个命令时，会查找这个命令对应的可执行文件，并尝试去运行；程序通过 argc、argv 来获取命令参数。

更进一步，通过查看 xv6 book

我们知道 argv 的第一个参数（即argv[0]）存放着程序名称。

Most programs ignore the first element of the argument array, which is conventionally the name of the program

book-riscv-rev2.pdf#P12

我们还知道 shell 是通过 fork + exec 两个系统调用的配合来实现命令的执行。

The main loop reads a line of input from the user with getcmd. Then it calls fork, which creates a copy of the shell process...If exec succeeds then the child will execute instructions from echo instead of runcmd.

这个时候就大胆的看 xv6 中的 shell 源码，不过在这里就不多说了，后续有机会再好好分析。
user/sh.c

int
main(void)
{
  static char buf[100];
  int fd;

  // Ensure that three file descriptors are open.
  while((fd = open("console", O_RDWR)) >= 0){
    if(fd >= 3){
      close(fd);
      break;
    }
  }

  // Read and run input commands.
  while(getcmd(buf, sizeof(buf)) >= 0){
    if(buf[0] == 'c' && buf[1] == 'd' && buf[2] == ' '){
      // Chdir must be called by the parent, not the child.
      buf[strlen(buf)-1] = 0;  // chop \n
      if(chdir(buf+3) < 0)
        fprintf(2, "cannot cd %s\n", buf+3);
      continue;
    }
    if(fork1() == 0)
      runcmd(parsecmd(buf));
    wait(0);
  }
  exit(0);
}

实验指导让我们往 Makefile 的 UPROGS 段添加自己的程序名，看看这个地方放着什么：

UPROGS=\
	$U/_cat\
	$U/_echo\
	$U/_forktest\
	$U/_grep\
	$U/_init\
	$U/_kill\
	$U/_ln\
	$U/_ls\
	$U/_mkdir\
	$U/_rm\
	$U/_sh\
	$U/_stressfs\
	$U/_usertests\
	$U/_grind\
	$U/_wc\
	$U/_zombie\

cat、echo、grep...都是可执行命令，可以想象的到，编译内核时会根据这个内容来添加对应的命令。

系统调用 sleep 学习

sleep 系统调用接受一个 int 类型的参数，即要睡眠的 ticks。

int sleep(int);

用户程序调用内核 sleep 的汇编代码

可见，调用 sleep 相当于调用 SYS_sleep

.global sleep
sleep:
 li a7, SYS_sleep
 ecall
 ret

sleep 的内核实现

uint64
sys_sleep(void)
{
  int n;
  uint ticks0;
	

  if(argint(0, &n) < 0)
    return -1;

  acquire(&tickslock);
  ticks0 = ticks;
  while(ticks - ticks0 < n){
    if(myproc()->killed){
      release(&tickslock);
      return -1;
    }
    sleep(&ticks, &tickslock);
  }
  release(&tickslock);
  return 0;
}

获取 trapframe 中调用进程传入的第一个参数，即 ticks
使用自旋锁 tickslock 确保只有一个进程在while 循环代码中执行
记录起始 ticks，当前 ticks 减去起始 ticks 小于给定的 n时，会先判断进程的状态；如果进程在睡眠中途被 kill 则会释放锁并退出，否则调用 sleep 睡眠

实现 sleep

我们要实现的命令类似于这样

$ sleep 10
(nothing happens for a little while)
$

由于命令行的参数都是字符串，而我们需要 int 参数来调用 sleep，因此需要转换函数 atoi。

atoi 函数原型

int atoi(const char*);

user/sleep.c

//
// Created by 悠一木碧 on 2022/6/15.
//
#include "../kernel/types.h"
#include "../kernel/stat.h"
#include "../user/user.h"

int
main(int argc, char *argv[]) {

    if (argc != 2) {
	fprintf(2, "Usage: sleep ticks\n");
	exit(1);
    }

    int n = atoi(argv[1]);
    sleep(n);

    exit(0);
}

Makefile

...
...
UPROGS=\
	$U/_cat\
	$U/_echo\
	$U/_forktest\
	$U/_grep\
	$U/_init\
	$U/_kill\
	$U/_ln\
	$U/_ls\
	$U/_mkdir\
	$U/_rm\
	$U/_sh\
	$U/_stressfs\
	$U/_usertests\
	$U/_grind\
	$U/_wc\
	$U/_zombie\
	$U/_sleep\
...
...

测试结果

pingpong (easy)

Write a program that uses UNIX system calls to ''ping-pong'' a byte between two processes over a pair of pipes, one for each direction. The parent should send a byte to the child; the child should print ": received ping", where is its process ID, write the byte on the pipe to the parent, and exit; the parent should read the byte from the child, print ": received pong", and exit. Your solution should be in the file user/pingpong.c.

Some hints:

Use pipe to create a pipe.

Use fork to create a child.

Use read to read from the pipe, and write to write to the pipe.

Use getpid to find the process ID of the calling process.

Add the program to UPROGS in Makefile.

User programs on xv6 have a limited set of library functions available to them. You can see the list in user/user.h; the source (other than for system calls) is in user/ulib.c, user/printf.c, and user/umalloc.c.

Run the program from the xv6 shell and it should produce the following output:
    $ make qemu
    ...
    init: starting sh
    $ pingpong
    4: received ping
    3: received pong
    $
  
Your solution is correct if your program exchanges a byte between two processes and produces output as shown above.

系统调用 pipe 学习

A pipe is a small kernel buffer exposed to processes as a pair of file descriptors, one for reading and one for writing. Writing data to one end of the pipe makes that data available for reading from the other end of the pipe.

book-riscv-rev2.pdf#P15

pipe 系统用创建两个文件描述符，并将其放置在给定的数组 arr 中；其中arr[0] 用于read，arr[1]用于write。当向arr[1]写入数据时，可以从arr[0] 读取。

The program calls pipe.... After fork, both parent and child have file descriptors referring to the pipe.

book-riscv-rev2.pdf#P16

Two file descriptors share an offset if they were derived from the same original file descriptor by a sequence of fork and dup calls.

book-riscv-rev2.pdf#P15

需要注意的是，当进程在调用 pipe 之后再调用 fork 创建子进程，父子进程都会有这两个文件描述符（file desciptor）的引用，且共享读写的 offset。
这里提到的东西会很多，比如文件描述符的引用计数，涉及到了文件系统组织，读写的 offset 听上去很绕口；学习过操作系统和 IO 的同学可能会知道这里指什么。不过这里，我们也不细研究，只探讨书中所说的是否正确。

测试

为了测试书中所说的两个功能，我们使用 pipe 系统调用创建管道，并使用 fork 创建子进程，父子进程都通过 p[1] 写入，测试它们的写入是否共享 offset，并测试是否能从 p[0] 读取。

user/test.c

//
// Created by 悠一木碧 on 2022/6/15.
//
#include "../kernel/types.h"
#include "../kernel/stat.h"
#include "../user/user.h"

int
main(int argc, char *argv[])
{
    int p[2];
    pipe(p);

    int ret;
    if ((ret = fork()) == 0) {
        close(p[0]);
        const char *str = "hello ";
        int len = strlen(str);
        int n = write(p[1], str, len);
        int pid = getpid();
        if (n != len) {
            fprintf(2, "process %d: write %d bytes, but expect to write %d bytes!\n", pid, n, len);
            exit(1);
        }
        fprintf(2, "process %d: write %d bytes\n", pid, n);
        close(p[1]);
    } else if (ret > 0) {
        wait(0);
        const char *str = "world!";
        int len = strlen(str);
        int n = write(p[1], str, len);
        int pid = getpid();
        if (len != n) {
            fprintf(2, "process %d: write %d bytes, but expect to write %d bytes!\n", pid, n, len);
            exit(1);
        }
        fprintf(2, "process %d: write %d bytes\n", pid, n);
        close(p[1]);

        char buf[1024];
        while ((n = read(p[0], buf, 1023)) > 0) {
            buf[n] = '\0';
            fprintf(1, "process %d: read %d bytes, str is %s\n", pid, n, buf);
        }
        if (n < 0) {
            fprintf(2, "read error!\n");
            exit(1);
        }
        close(p[0]);
    } else {
        fprintf(2, "fork failed!\n");
        exit(1);
    }

    exit(0);
}

Makefile

为了简便，Makefile 只给新增的内容

UPROGS=\
    ...
    ...
    ...
    $U/_test\

运行结果

输出说明测试正确。

book-riscv-rev2.pdf#P16 代码学习

    int p[2];
    char *argv[2];
    argv[0] = "wc";
    argv[1] = 0;
    pipe(p);
	
    if(fork() == 0) {
        // child process
        close(0);
        dup(p[0]);
        close(p[0]);
        close(p[1]);
        exec("/bin/wc", argv);
    } else {
        // parent process
        close(p[0]);
        write(p[1], "hello world\n", 12);
        close(p[1]);
    }

父子进程都会拥有对数组 p 文件描述符的引用，也就是说这两个文件描述符的引用计数为 2。

观察上面的代码可以发现：

数组argv 是参数值，其中 argv[0] 存放的是程序名称 wc，argv[1] 存放的是程序的第一个参数值 0，给指针赋 0，相当于 NULL 。

Most programs ignore the first element of the argument array, which is conventionally the name of the program.

book-riscv-rev2.pdf#P12

子进程做了什么

子进程先释放了文件描述符 0 的引用，即标准输入，此时文件描述符 0 是可用的。
调用了 dup[p[0]]，即将最小可用的文件描述符指向 p[0]。
释放所有的文件描述符引用。
调用 exec("/bin/wc", argv)，使用给定参数 argv 装载 /bin/wc 程序。

父进程做了什么

向管道 p[1] 写入 hello world 并关闭所有文件描述符的引用；至此，p 数组的文件描述符引用都被释放。

wc程序会做什么

学习 Linux 命令的时候，相信很多人都会接触到这个命令，它会输出一个文件的行数，单词数，字节数...（有问题还请纠正！！！）。
查看user/wc.c 的源代码：

char buf[512];

void
wc(int fd, char *name)
{
  int i, n;
  int l, w, c, inword;

  l = w = c = 0;
  inword = 0;
  while((n = read(fd, buf, sizeof(buf))) > 0){
    for(i=0; i<n; i++){
      c++;
      if(buf[i] == '\n')
        l++;
      if(strchr(" \r\t\n\v", buf[i]))
        inword = 0;
      else if(!inword){
        w++;
        inword = 1;
      }
    }
  }
  if(n < 0){
    printf("wc: read error\n");
    exit(1);
  }
  // 根据上面的代码，不难猜出这四个输出的含义
  printf("%d %d %d %s\n", l, w, c, name);
}

int
main(int argc, char *argv[])
{
  int fd, i;
  printf("there are %d args\n", argc);
  if(argc <= 1){
    wc(0, "");
    exit(0);
  }

  for(i = 1; i < argc; i++){
    if((fd = open(argv[i], 0)) < 0){
      printf("wc: cannot open %s\n", argv[i]);
      exit(1);
    }
    wc(fd, argv[i]);
    close(fd);
  }
  exit(0);
}

wc依据给定参数的个数，来决定 wc 的操作

参数数量少于 2 个时，调用 wc(0, "")，表示从标准输入中读取
遍历 argv，调用 open 获取对应的文件描述符，再调用 wc 从最后一个 fd 读取。

还需要明确一点：系统调用 read 会等待数据或者是文件尾的到来，而文件符的引用都被释放时，就相当于到达文件尾。

a read on a pipe waits for either data to be written or for all file descriptors referring to the write end to be closed;

book-riscv-rev2.pdf#P16

这对使用 while 循环 read 的退出十分重要。

按照上面的说明，这个程序的输出应当是统计 Hello World\n 字符串的结果；现在，我们继续编写程序测试一下。

测试

user/test.c

//
// Created by 悠一木碧 on 2022/6/15.
//
#include "../kernel/types.h"
#include "../kernel/stat.h"
#include "../user/user.h"

int
main(int argc, char *argv[])
{
    int p[2];
    pipe(p);
	
    char *args[1];
    args[0] = "wc";

    int ret;
    if ((ret = fork()) == 0) {
        close(0);
        dup(p[0]);
        close(p[0]);
        close(p[1]);
        exec("wc", args);
    } else if (ret > 0) {
        close(p[0]);
        write(p[1], "hello world\n", 12);
        close(p[1]);
    } else {
        fprintf(2, "fork failed!\n");
        exit(1);
    }

    exit(0);
}

需要注意的是，运行环境是 xv6 内核，因此没有 /bin 目录，我们直接指定加载 wc 程序即可。

由于 argv[1] = 0，相当于只含有一个有效参数，因此程序只传递一个参数 args。

为了让程序 wc 提供更清晰的信息，我们修改 user/wc.c

user/wc.c

int
main(int argc, char *argv[])
{
  int fd, i;
  printf("there are %d args\n", argc);
  if(argc <= 1){
    wc(0, "");
    exit(0);
  }
  // 省略其他代码
  ...
  ...
}

运行结果

在运行 test 程序后，wc 打印输出的结果是 1行，2单词，12字节长度，符合预期。

这次运行 test 后，不会出现命令提示符 $，具体原因可以研究下 user/sh.c源码，因为 test.c 使用了 exec 导致的。

在完成测试后，将 wc.c 程序恢复成原来的代码。

实现 pingpong

回到正题，学习完 pipe 的原理和使用后，我们使用 pipe 来完成父子进程间的简单通信，具体原理就是使用管道来进行写入和读取。

user/pingpong.c

//
// Created by 悠一木碧 on 2022/6/15.
//
#include "../kernel/types.h"
#include "../kernel/stat.h"
#include "../user/user.h"

int main(int argc, char *argv[])
{
    if (argc > 1) {
	fprintf(2, "args ignored!\n");
    }

    int pp[2];
    int cp[2];
    pipe(pp);
    pipe(cp);

    char *p_write = "ping";
    char *c_write = "pong";
    int ret;
    if ((ret = fork()) == 0) {
        // pp[1] 是父进程写入的 fd，子进程用不到
        // cp[0] 是父进程读取的 fd，子进程用不到
        close(pp[1]);
        close(cp[0]);
		int size = strlen(p_write) + 1;
		char buf[size];
		int n;
		while ((n = read(pp[0], buf, size)) > 0) {
            fprintf(1, "%d: received %s\n", getpid(), buf);
		}
        if (n < 0) {
            fprintf(2, "read error!\n");
            exit(1);
        }
//        printf("%d: read %d bytes\n", getpid(), n);
        // 读取完成，释放对应文件描述符
        close(pp[0]);
        n = write(cp[1], c_write, strlen(c_write));
//        printf("%d: write %d bytes\n", getpid(), n);
        close(cp[1]);
    } else if (ret > 0) {
        // 同上
        close(pp[0]);
        close(cp[1]);
        int n = write(pp[1], p_write, strlen(p_write));
//        printf("%d: write %d bytes\n", getpid(), n);
        close(pp[1]);
        int size = strlen(c_write) + 1;
        char buf[size];
        while ((n = read(cp[0], buf, size)) > 0) {
            fprintf(1, "%d: received %s\n", getpid(), buf);
        }
        if ((n = read(cp[0], buf, size)) < 0) {
            fprintf(2, "read error!\n");
            exit(1);
        }
        close(cp[0]);
//        printf("%d: read %d bytes\n", getpid(), n);
    } else {
        fprintf(2, "fork failed!\n");
    }

    exit(0);
}

Makefile

UPROGS=\
    ...
    ...
    ...
    $U/_pingpong\

测试结果

测试结果说明程序功能正确。

primes (hard)

Write a concurrent version of prime sieve using pipes. This idea is due to Doug McIlroy, inventor of Unix pipes. The picture halfway down this page and the surrounding text explain how to do it. Your solution should be in the file user/primes.c.

Your goal is to use pipe and fork to set up the pipeline. The first process feeds the numbers 2 through 35 into the pipeline. For each prime number, you will arrange to create one process that reads from its left neighbor over a pipe and writes to its right neighbor over another pipe. Since xv6 has limited number of file descriptors and processes, the first process can stop at 35.

Some hints:

Be careful to close file descriptors that a process doesn't need, because otherwise your program will run xv6 out of resources before the first process reaches 35.

Once the first process reaches 35, it should wait until the entire pipeline terminates, including all children, grandchildren, &c. Thus the main primes process should only exit after all the output has been printed, and after all the other primes processes have exited.

Hint: read returns zero when the write-side of a pipe is closed.

It's simplest to directly write 32-bit (4-byte) ints to the pipes, rather than using formatted ASCII I/O.

You should create the processes in the pipeline only as they are needed.

Add the program to UPROGS in Makefile.

Your solution is correct if it implements a pipe-based sieve and produces the following output:
    $ make qemu
    ...
    init: starting sh
    $ primes
    prime 2
    prime 3
    prime 5
    prime 7
    prime 11
    prime 13
    prime 17
    prime 19
    prime 23
    prime 29
    prime 31
    $

实验分析

这个实验要求我们使用管道来筛选素数，且要求过程和下图类似：

伪代码

p = get a number from left neighbor
print p
loop:
    n = get a number from left neighbor
    if (p does not divide n)

假设管道中有数字 2,3,4....n，进程读取第一个数 p并将其输出，然后继续读取数字n，如果 n 不能被 p 整除，则将其传递给下一个进程。

经过上面的实验，我们可以使用 pipe来实现两个进程间的通信。这个实验则要求通信能够传递下去，很容易想到递归创建管道和子进程来做到这一点。

实现 primes

根据上面的描述，可以得到的思路如下：

使用 pipe 创建管道，然后调用fork创建子进程
父进程向管道写入数字，随后释放该文件描述符的引用，用于表示文件尾
子进程从管道读取第一个数字并将其输出
子进程重复步骤1

细节

递归需要有终止条件，当子进程发现管道中没有数字时，停止递归；即 read 返回 0 时，说明需要停止递归。

第一个进程负责向管道中写入初始数据（2~35），不同于后续子进程，第一个进程需要释放的文件描述符只有1个；因此第一个进程不在递归范围内。

父进程需要等待子进程完成后再退出，这个可以用 wait() 实现。

user/primes.c

//
// Created by 悠一木碧 on 2022/6/16.
//
#include "../kernel/types.h"
#include "../kernel/stat.h"
#include "../user/user.h"

void childFilter(int fd) {
    int buf[1];
    int ret;
    if (0 == (read(fd, buf, 4))) {
        close(fd);
        exit(0);
    }
    int p = *buf;
    fprintf(1, "prime %d\n", p);

    int newFd[2];
    pipe(newFd);
    if ((ret = fork()) == 0) {
        close(newFd[1]);
        childFilter(newFd[0]);
    } else if (ret > 0) {
        close(newFd[0]);
        while ((ret = read(fd, buf, 4)) > 0) {
            if ((*buf) % p != 0) {
                if ((write(newFd[1], buf, 4)) != 4) {
                    fprintf(2, "write error!\n");
                    exit(1);
                }
            }
        }
        if (ret < 0) {
            fprintf(2, "read error!\n");
            exit(1);
        }
        close(fd);
        close(newFd[1]);
        wait(0);
    } else {
        fprintf(2, "fork failed!\n");
        exit(1);
    }
}

int main(int argc, char *argv[])
{
    int p[2];
    pipe(p);

    int ret;
    if ((ret = fork()) == 0) {
        close(p[1]);
        childFilter(p[0]);
    } else if (ret > 0) {
        close(p[0]);
        for (int i = 2; i <= 35; ++i) {
            if (4 != write(p[1], &i, 4)) {
                fprintf(2, "write error!\n");
                exit(1);
            }
        }
        close(p[1]);
        wait(0);
    } else {
        fprintf(2, "fork failed!\n");
        exit(1);
    }

    exit(0);
}

Makefile

UPROGS=\
    ...
    ...
    ...
    $U/_primes\

测试结果

测试结果说明正确

find (moderate)

Write a simple version of the UNIX find program: find all the files in a directory tree with a specific name. Your solution should be in the file user/find.c.

Some hints:

Look at user/ls.c to see how to read directories.

Use recursion to allow find to descend into sub-directories.

Don't recurse into "." and "..".

Changes to the file system persist across runs of qemu; to get a clean file system run make clean and then make qemu.

You'll need to use C strings. Have a look at K&R (the C book), for example Section 5.5.

Note that == does not compare strings like in Python. Use strcmp() instead.

Add the program to UPROGS in Makefile.

Your solution is correct if produces the following output (when the file system contains the files b and a/b):
    $ make qemu
    ...
    init: starting sh
    $ echo > b
    $ mkdir a
    $ echo > a/b
    $ find . b
    ./b
    ./a/b
    $ 

实验分析

本实验要求我们参照 user/ls.c 程序读取目录的实现 user/find.c，我们先看看 user/ls.c 是怎么实现的。

find 程序分析

user/find.c#main

int
main(int argc, char *argv[])
{
  int i;
    
  if(argc < 2){
    ls(".");
    exit(0);
  }
  for(i=1; i<argc; i++)
    ls(argv[i]);
  exit(0);
}

main 函数是 ls 程序的入口；可以看见，当不指定目录时，ls默认当前目录。另外，ls 是可以指定多个目录的。

接下来，看看关键的 ls 函数：

user/ls.c#ls

void
ls(char *path)
{
  char buf[512], *p;
  int fd;
  struct dirent de;
  struct stat st;

  // ls 程序首先调用 `open`，0 对应于`kernel/fcntl.h` 中 `define O_RDONLY  0x000`，也就是以只读模式打开对应文件，获取文件描述符 `fd`
  if((fd = open(path, 0)) < 0){
    fprintf(2, "ls: cannot open %s\n", path);
    return;
  }

  // 利用获得的 `fd` 调用 `fstat` 获取文件对应的状态信息，其中就包含了文件类型 `type`
  if(fstat(fd, &st) < 0){
    fprintf(2, "ls: cannot stat %s\n", path);
    close(fd);
    return;
  }

  // 这里只法处理文件类型和目录类型
  switch(st.type){
  case T_FILE:
    printf("%s %d %d %l\n", fmtname(path), st.type, st.ino, st.size);
    break;
  
  case T_DIR:
    // 接下来用 buf 存储 path + filename，目录项的 DIRSIZ 规定了文件名称最大长度，在这里判断 buf 是否能够装下
    if(strlen(path) + 1 + DIRSIZ + 1 > sizeof buf){
      printf("ls: path too long\n");
      break;
    }
    // 先将当前目录 path 复制到 buf
    strcpy(buf, path);
    p = buf+strlen(buf);
    *p++ = '/';
    // 读取目录项 dirent
    while(read(fd, &de, sizeof(de)) == sizeof(de)){
      if(de.inum == 0)
        continue;
      // p 指针指向了 path/ 后一个位置，因此这里相当于将 filename 附加到 buf
      memmove(p, de.name, DIRSIZ);
      // 设置字符串结尾，0 为什么是字符串结尾？原因可查看源码 user/ulib.c#strlen 函数的实现
      p[DIRSIZ] = 0;
      if(stat(buf, &st) < 0){
        printf("ls: cannot stat %s\n", buf);
        continue;
      }
      printf("%s %d %d %d\n", fmtname(buf), st.type, st.ino, st.size);
    }
    break;
  }
  close(fd);
}

函数中还调用了 fmtname，这个函数返回 char*，来看看它是什么作用：

user/ls.c#fmtname

char*
fmtname(char *path)
{
  static char buf[DIRSIZ+1];
  char *p;

  // 找到 path 最后一个 / 分隔符的后一个字符的位置
  for(p=path+strlen(path); p >= path && *p != '/'; p--)
    ;
  p++;

  if(strlen(p) >= DIRSIZ)
    return p;
  // 将最后一个 / 后面的内容复制到 buf，并在其后添加空格
  memmove(buf, p, strlen(p));
  memset(buf+strlen(p), ' ', DIRSIZ-strlen(p));
  return buf;
}

结合 ls 函数，ls函数利用 fmtname获取当前目录的文件名称（截断了前缀路径，只保留文件名），然后将其输出。

A path like /a/b/c refers to the file or directory named c inside the directory named b inside the directory named a in the root directory /.

book-riscv-rev2.pdf#P17

这里用书中例子：文件 a/b/c，则当前目录为 /a/b，此时的 buf 内容为 /a/b/c，调用 fmtname 的返回值是 c，即文件名称。

总结

ls 支持指定多个目录，没有指定时，使用当前目录。
ls 只是打开对应的目录，并不会查找子目录。

实现 find

结合上面对 ls 的分析，我们的 find 输出的是绝对路径，而不是当前目录下的文件名称，因此不需要使用上述的 fmtname 函数。

find 需要对指定的目录的子目录进行查找，因此我们需要对目录进行递归。
find 输出的是当前文件目录匹配的文件的绝对路径。
find 没有指定目录时，默认使用当前目录。
find 应当先遍历当前目录下的子文件并输出符合条件的文件路径，然后再进行子目录查找；要做到这一点，需要有一个数据结构来存储。

user/find.c

//
// Created by 悠一木碧 on 2022/6/17.
//

#include "../kernel/types.h"
#include "../kernel/stat.h"
#include "../user/user.h"
#include "../kernel/fs.h"
#include "../kernel/fcntl.h"
#include "lib/util.h"

const int BUF_SIZE = 512;

void find(const char *path, const char *regex) {
    struct stat st;
    int fd;
    if ((fd = open(path, O_RDONLY)) < 0) {
        fprintf(2, "find: cannot open %s\n", path);
        return;
    }
    if (fstat(fd, &st) < 0) {
        fprintf(2, "find: cannot stat %s\n", path);
        close(fd);
        return;
    }
    short type = st.type;
    if (type == T_DIR) {
        int pathLen = strlen(path);
        if (pathLen + 1 + DIRSIZ + 1 > BUF_SIZE) {
            fprintf(2, "find: path too long! The path length is %d chars\n", pathLen);
        } else {
            char buf[BUF_SIZE];
            strcpy(buf, path);
            char *p = buf + pathLen;
            *p = '/';
            ++p;
            *p = 0;
            struct dirent d_entry;
            struct str_list p_head;
            p_head.next = 0;
            struct str_list *pre = &p_head;
            while (read(fd, &d_entry, sizeof(d_entry)) == sizeof(d_entry)) {
                // 从目录中读取对应的 d_entry
                // 忽略 inode = 0 的无效目录项
                if (d_entry.inum == 0) {
                    continue;
                }
                // 将目录下的文件名称附加在当前目录下，如果文件名和查找名称相同，则输出
                memmove(p, d_entry.name, DIRSIZ);
                p[DIRSIZ] = 0;
                // 忽略文件名称不匹配的文件
                if (strcmp(d_entry.name, regex) == 0) {
                    fprintf(1, "%s\n", buf);
                }
                // 读取对应文件状态信息
                if (stat(buf, &st) < 0) {
                    fprintf(2, "find: cannot stat %s\n", path);
                    close(fd);
                    return;
                }
                // 只对子目录进行递归，且不对 .(当前目录) 和 ..(父目录) 进行递归
                // 将需要递归的子目录暂存至 list 中
                if (st.type == T_DIR && strcmp(".", d_entry.name) != 0 && strcmp("..", d_entry.name) != 0) {
                    struct str_list *cur = malloc(sizeof(p_head));
                    strcpy(cur->str, buf);
                    cur->next = 0;
                    pre->next = cur;
                    pre = cur;
                }
            }
            struct str_list *head = p_head.next;
            struct str_list *tmp;
            while (head != 0) {
                tmp = head;
                find(head->str, regex);
                head = head->next;
                // 防止内存泄漏
                free(tmp);
            }
        }
    } else {
        fprintf(2, "path: %s is not a directory!\n", path);
    }
    close(fd);
}

int
main(int argc, char *argv[]) {
    if (argc < 2) {
        fprintf(2, "usage: find path filename.\n");
        exit(1);
    }
    char *path;
    char *regx;
    path = (argc < 3) ? "." : argv[1];
    regx = (argc < 3) ? argv[1] : argv[2];
    find(path, regx);

    exit(0);
}

Makefile

UPROGS=\
    ...
    ...
    ...
    $U/_find\

测试结果

测试结果表明，功能正确。

xargs (moderate)

Write a simple version of the UNIX xargs program: read lines from the standard input and run a command for each line, supplying the line as arguments to the command. Your solution should be in the file user/xargs.c.

The following example illustrates xarg's behavior:
    $ echo hello too | xargs echo bye
    bye hello too
    $
Note that the command here is "echo bye" and the additional arguments are "hello too", making the command "echo bye hello too", which outputs "bye hello too".

Please note that xargs on UNIX makes an optimization where it will feed more than argument to the command at a time. We don't expect you to make this optimization. To make xargs on UNIX behave the way we want it to for this lab, please run it with the -n option set to 1. For instance
    $ echo "1\n2" | xargs -n 1 echo line
    line 1
    line 2
    $
Some hints:

Use fork and exec to invoke the command on each line of input. Use wait in the parent to wait for the child to complete the command.

To read individual lines of input, read a character at a time until a newline ('\n') appears.

kernel/param.h declares MAXARG, which may be useful if you need to declare an argv array.

Add the program to UPROGS in Makefile.

Changes to the file system persist across runs of qemu; to get a clean file system run make clean and then make qemu.

xargs, find, and grep combine well:
  $ find . b | xargs grep hello
will run "grep hello" on each file named b in the directories below ".".

To test your solution for xargs, run the shell script xargstest.sh. Your solution is correct if it produces the following output:
  $ make qemu
  ...
  init: starting sh
  $ sh < xargstest.sh
  $ $ $ $ $ $ hello
  hello
  hello
  $ $   
You may have to go back and fix bugs in your find program. The output has many $ because the xv6 shell doesn't realize it is processing commands from a file instead of from the console, and prints a $ for each command in the file.

实验分析

开始编写 xargs 程序前，得先明确 xargs 的行为。

xargs 从 argv 中获取要执行的命令，以及命令的参数
xargs 从标准输入流中依次读取一行，将其作为命令的额外参数
xargs 有 -n 可选项，用于指定一次性读取附加多少行的额外参数传递给命令，不要求实现这个，实现依次读取 1 行并执行命令即可

上述示例中，是 shell启动了两个命令，并利用了管道。

xargs 调用 fork创建子进程，并利用参数调用 exec 执行命令；xargs 很明显利用了管道读取左边（left）命令的输出作为参数，然后附加到已有的参数。

比如在 shell 中执行下列命令：

echo hello too

很明显，这个输出是在标准输出（1）。

而 xargs 不能从标准输出读取，能从标准输入读取（0），因此 shell 利用管道帮我们做了输出重定向：

user/sh.c#runcmd

// Execute cmd.  Never returns.
void
runcmd(struct cmd *cmd)
{
  int p[2];
  struct backcmd *bcmd;
  struct execcmd *ecmd;
  struct listcmd *lcmd;
  struct pipecmd *pcmd;
  struct redircmd *rcmd;

  if(cmd == 0)
    exit(1);

  switch(cmd->type){
  default:
    panic("runcmd");

  case EXEC:
     ...
     ...
    break;

  case REDIR:
     ...
     ...
    break;

  case LIST:
     ...
     ...
    break;

  case PIPE:
    pcmd = (struct pipecmd*)cmd;
    if(pipe(p) < 0)
      panic("pipe");
    if(fork1() == 0){
      close(1);
      dup(p[1]);
      close(p[0]);
      close(p[1]);
      runcmd(pcmd->left);
    }
    if(fork1() == 0){
      close(0);
      dup(p[0]);
      close(p[0]);
      close(p[1]);
      runcmd(pcmd->right);
    }
    close(p[0]);
    close(p[1]);
    wait(0);
    wait(0);
    break;

  case BACK:
     ...
     ...
    break;
  }
  exit(0);
}

sh.c 中根据 command 的类型，来执行对应的操作。管道类型的指令时，利用了之前提到的 pipe 系统调用创建管道。还利用到了 dup 系统调用（使最小可用的文件描述符指向所给 fd）。

先调用的 fork1 执行管道左侧的命令，并将标准输出关闭(1)，使其指向 p[1]
后调用的 fork2 执行管道右侧的命令，并将标准输入关闭(0)，使其指向 p[0]

总结： 根据前面管道的知识，即使 close(p[0])，p[0] 也被 0 所重定向，因此完成了输出重定向和输入重定向，left 命令的输出就是 right 命令的输入！
这里贴张图：

实现 xargs

思路：

首先，xargs 的参数是在 main 函数的 argv 中，argv[0] 是程序名称，argv[1] 及其后面的元素才是真正的参数。
xargs 接受一个命令名称，指定要执行的命令，因此当参数数量 argc < 2 时，说明命令有错误。
其次，xargs 接收一个 -n 选项，这个选项如果有，则是在 argv[1] ，通过 strcmp 比较判断 xargs 命令是否带了选项，同时判断参数个数是否合法。
上述步骤处理完成后，我们可以得到命令名称 cmd 、参数行数 line 和 argv 中提供的额外参数 exec_args；接下来要从上一个命令的输出中读取额外参数并将其附加到 exec_args，为了确定从哪个位置开始放参数，我们使用 arg_idx 指定。
在 xargs 函数中进行真正的读取操作，这里选用标准输入流（0）。
实验指导我们以 \n 分隔每行，真正实验过程中，读 “1\n2” 只能读到两个连续的字符 '\' 和 'n'；另外，echo 程序在输出的末尾添加 '\n' 换行符

user/echo.c

#include "../kernel/types.h"
#include "../kernel/stat.h"
#include "../user/user.h"

int
main(int argc, char *argv[])
{
  int i;

  for(i = 1; i < argc; i++){
    write(1, argv[i], strlen(argv[i]));
    if(i + 1 < argc){
      write(1, " ", 1);
    } else {
      write(1, "\n", 1);
    }
  }
  exit(0);
}

所以读取标准输入流时的最后，要么直接读到输入流末尾（即 read 返回 0）；要么在这之前会读取到换行符。因此 xargs 中会以换行符终止一行的参数读取。
user/xargs.c

//
// Created by 悠一木碧 on 2022/6/18.
//
#include "../kernel/types.h"
#include "../user/user.h"
#include "../kernel/param.h"

const char *line_option = "-n";

void run_cmd_per_line(char *cmd, char *exec_args[]) {
    int ret;
    if ((ret = fork()) == 0) {
        if (exec(cmd, exec_args) == -1) {
            fprintf(1, "xargs: there is no command like %s!\n", cmd);
            exit(1);
        }
    } else if (ret > 0) {
        wait(0);
    } else {
        fprintf(2, "fork failed!\n");
        exit(1);
    }
}

void xargs(char *cmd, int line, char *exec_args[], int arg_idx) {

    char c;
    if (0 == read(0, &c, 1)) {
        run_cmd_per_line(cmd, exec_args);
        return;
    }
    int ret = 1;
    int cnt = 0;
    int j = arg_idx;
    do {
        if (c == '"' || c == '\n') {
            continue;
        }
        if (cnt == 0) {
            j = arg_idx;
        }
        char *buf = malloc(512);
        int i = 0;
        buf[i++] = c;
        char pre_c = c;

        while ((ret = read(0, &c, 1)) == 1 && c != '\n') {
            if (c == '"') {
                continue;
            }
            if (pre_c == '\\' && c == 'n') {
                buf[i - 1] = 0;
                break;
            }
            pre_c = c;
            buf[i++] = c;
        }
        if (ret == 0 || c == '\n') {
            buf[i] = 0;
        }
        exec_args[j++] = buf;
        ++cnt;
        if (ret == 0 || cnt >= line) {
            exec_args[j] = 0;
            run_cmd_per_line(cmd, exec_args);
            for (int k = arg_idx; k < j; ++k) {
                free(exec_args[k]);
            }
            cnt = 0;
        }
    } while (ret != 0 && 0 != (read(0, &c, 1)));

    if (cnt > 0) {
        exec_args[j] = 0;
        run_cmd_per_line(cmd, exec_args);
        for (int k = arg_idx; k < j; ++k) {
            free(exec_args[k]);
        }
    }
}

int main(int argc, char *argv[]) {
    if (argc < 2) {
        fprintf(2, "Usage: xargs [-n lines] command [args]\n");
        exit(1);
    }
    int cmd_idx = 1;
    int arg_beg = 2;
    int line = 1;
    if (strcmp(argv[1], line_option) == 0) {
        if (argc < 4) {
            fprintf(2, "Usage: xargs [-n lines] command [args]\n");
            exit(1);
        }
        line = atoi(argv[2]);
        if (line < 0) {
            line = 1;
        }
        cmd_idx = 3;
        arg_beg = 4;
    }

    char *cmd = argv[cmd_idx];
    char *exec_args[MAXARG];
    int arg_idx = 0;
    exec_args[arg_idx++] = cmd;
    for (; arg_beg < argc; ++arg_beg) {
        exec_args[arg_idx++] = argv[arg_beg];
    }
//    fprintf(1, "===========debug-info-begin===========\n");
//    fprintf(1, "line: %d\n", line);
//    fprintf(1, "cmd: %s\n", cmd);
//    fprintf(1, "exec_args: ");
//    for (int i = 0; i < arg_idx - 1; ++i) {
//        fprintf(1, "%s,", exec_args[i]);
//    }
//    if (arg_idx - 1 >= 0) {
//        fprintf(1, "%s", exec_args[arg_idx - 1]);
//    }
//    fprintf(1, "\n");
//    fprintf(1, "===========debug-info-ends===========\n");
    exec_args[arg_idx] = 0;
    xargs(cmd, line, exec_args, arg_idx);

    exit(0);
}

xargs.c 实现了选项 -n，用于指定一次性附加多少行的参数到命令中。

Makefile

UPROGS=\
    ...
    ...
    ...
    $U/_xargs\

测试结果

测试结果表明，程序功能正确。

结语

这是我在 xv6-labs-2020 目录下使用 make grade 命令测试分数的结果，整个实验花费了10个多小时，从结果上来说还是不错的。整个实验给我的感受也挺深的，C语言很难debug是真。