值类型
When you are a Bear of Very Little Brain, and you Think of Things, you find sometimes that a Thing which seemed very Thingish inside you is quite different when it gets out into the open and has other people looking at it.
你要是一只脑子很小的熊,当你想事情的时候,你会发现,有时在你心里看起来很像回事的事情,当它展示出来,让别人看着它的时候,就完全不同了。
A. A. Milne, Winnie-the-Pooh A. A. 米尔恩, 小熊维尼
The past few chapters were huge, packed full of complex techniques and pages of code. In this chapter, there’s only one new concept to learn and a scattering of straightforward code. You’ve earned a respite. 前面的几章篇幅很长,充满了复杂的技术和一页又一页的代码。在本章中,只需要学习一个新概念和一些简单的代码。你获得了喘息的机会。
Lox is dynamically typed. A single variable can
hold a Boolean, number, or string at different points in time. At least, that’s
the idea. Right now, in clox, all values are numbers. By the end of the chapter,
it will also support Booleans and nil
. While those aren’t super interesting,
they force us to figure out how our value representation can dynamically handle
different types.
Lox是动态类型的。一个变量可以在不同的时间点持有布尔值、数字或字符串。至少,我们的想法是如此。现在,在clox中,所有的值都是数字。到本章结束时,它还将支持布尔值和nil
。虽然这些不是特别有趣,但它们迫使我们弄清楚值表示如何动态地处理不同类型。
18 . 1带标签的联合体
The nice thing about working in C is that we can build our data structures from the raw bits up. The bad thing is that we have to do that. C doesn’t give you much for free at compile time and even less at runtime. As far as C is concerned, the universe is an undifferentiated array of bytes. It’s up to us to decide how many of those bytes to use and what they mean. 使用C语言工作的好处是,我们可以从最基础的比特位开始构建数据结构。坏处是,我们必须这样做。C语言在编译时并没有提供多少免费的东西,在运行时就更少了。对C语言来说,宇宙是一个无差别的字节数组。由我们来决定使用多少个字节以及它们的含义。
In order to choose a value representation, we need to answer two key questions: 为了选择一种值的表示形式,我们需要先回答两个关键问题:
-
How do we represent the type of a value? If you try to, say, multiply a number by
true
, we need to detect that error at runtime and report it. In order to do that, we need to be able to tell what a value’s type is.我们如何表示一个值的类型? 比如说,如果你将一个数字乘以
true
,我们需要在运行时检测到这个错误并报告它。为此,我们需要知道值的类型是什么? -
How do we store the value itself? We need to not only be able to tell that three is a number, but that it’s different from the number four. I know, seems obvious, right? But we’re operating at a level where it’s good to spell these things out.
我们如何存储该值本身? 我们不仅要能分辨出3是一个数字,还要能分辨出它与4是不同的。我知道,这是显而易见的对吧?但是在我们所讨论的层面,最好把这些事情说清楚。
Since we’re not just designing this language but building it ourselves, when answering these two questions we also have to keep in mind the implementer’s eternal quest: to do it efficiently. 因为我们不仅仅是设计这门语言,还要自己构建它,所以在回答这两个问题时,我们还必须牢记实现者们永恒的追求:高效地完成它。
Language hackers over the years have come up with a variety of clever ways to pack the above information into as few bits as possible. For now, we’ll start with the simplest, classic solution: a tagged union. A value contains two parts: a type “tag”, and a payload for the actual value. To store the value’s type, we define an enum for each kind of value the VM supports. 多年来,语言黑客们想出了各种巧妙的方法,将上述信息打包成尽可能少的比特。现在,我们将从最简单、最经典的解决方案开始:带标签的联合体。一个值包含两个部分:一个类型“标签”,和一个实际值的有效载荷。为了存储值的类型,我们要为虚拟机支持的每一种值定义一个枚举。
#include "common.h"
typedef enum { VAL_BOOL, VAL_NIL, VAL_NUMBER, } ValueType;
typedef double Value;
For now, we have only a couple of cases, but this will grow as we add strings,
functions, and classes to clox. In addition to the type, we also need to store
the data for the value—the double
for a number, true
or false
for a
Boolean. We could define a struct with fields for each possible type.
现在,我们只有这几种情况,但随着我们向clox中添加字符串、函数和类,这里也会越来越多。除了类型之外,我们还需要存储值的数据—数字是double
值,Boolean是true
或false
。我们可以定义一个结构体,其中包含每种可能的类型所对应的字段。
But this is a waste of memory. A value can’t simultaneously be both a number and a Boolean. So at any point in time, only one of those fields will be used. C lets you optimize this by defining a union. A union looks like a struct except that all of its fields overlap in memory. 但这是对内存的一种浪费。一个值不可能同时是数字和布尔值。所以在任何时候,这些字段中只有一个会被使用。C语言中允许定义联合体来优化这一点。联合体看起来很像是结构体,区别在于其中的所有字段在内存中是重叠的。
The size of a union is the size of its largest field. Since the fields all reuse the same bits, you have to be very careful when working with them. If you store data using one field and then access it using another, you will reinterpret what the underlying bits mean. 联合体的大小就是其最大字段的大小。由于这些字段都复用相同的比特位,你在使用它们时必须要非常小心。如果你使用一个字段存储数据,然后用另一个字段访问数据,那你需要重新解释底层比特位的含义。
As the name “tagged union” implies, our new value representation combines these two parts into a single struct. 顾名思义,“带标签的联合体”说明,我们新的值表示形式中将这两部分合并成一个结构体。
} ValueType;
add after enum ValueType
replace 1 line
typedef struct { ValueType type; union { bool boolean; double number; } as; } Value;
typedef struct {
There’s a field for the type tag, and then a second field containing the union of all of the underlying values. On a 64-bit machine with a typical C compiler, the layout looks like this: 有一个字段用作类型标签,然后是第二个字段,一个包含所有底层值的联合体。在使用典型的C语言编译器的64位机器上,布局看起来如下:
The four-byte type tag comes first, then the union. Most architectures prefer values be aligned to their size. Since the union field contains an eight-byte double, the compiler adds four bytes of padding after the type field to keep that double on the nearest eight-byte boundary. That means we’re effectively spending eight bytes on the type tag, which only needs to represent a number between zero and three. We could stuff the enum in a smaller size, but all that would do is increase the padding. 首先是4字节的类型标签,然后是联合体。大多数体系结构都喜欢将值与它们的字长对齐。由于联合体字段中包含一个8字节的double值,所以编译器在类型字段后添加了4个字节的填充,以使该double值保持在最近的8字节边界上。这意味着我们实际在类型标签上花费了8个字节,而它只需要表示0到3之间的数字。我们可以把枚举放在一个占位更少的变量中,但这样做只会增加填充量。
So our Values are 16 bytes, which seems a little large. We’ll improve it later. In the meantime, they’re still small enough to store on the C stack and pass around by value. Lox’s semantics allow that because the only types we support so far are immutable. If we pass a copy of a Value containing the number three to some function, we don’t need to worry about the caller seeing modifications to the value. You can’t “modify” three. It’s three forever. 所以我们的Value是16个字节,这似乎有点大。我们稍后会改进它。同时,它们也足够小,可以存储在C语言的堆栈中,并按值传递。Lox的语义允许这样做,因为到目前为止我们只支持不可变类型。如果我们把一个包含数字3的Value的副本传递给某个函数,我们不需要担心调用者会看到对该值的修改。你不能“修改”3,它永远都是3。
18 . 2Lox的值和C语言的值
That’s our new value representation, but we aren’t done. Right now, the rest of
clox assumes Value is an alias for double
. We have code that does a straight C
cast from one to the other. That code is all broken now. So sad.
这就是我们新的值表示形式,但是我们还没有做完。现在,clox的其余部分都假定了Value是double
的别名。我们有一些代码是直接用C语言将一个值转换为另一个值。这些代码现在都被破坏了,好伤心。
With our new representation, a Value can contain a double, but it’s not equivalent to it. There is a mandatory conversion step to get from one to the other. We need to go through the code and insert those conversions to get clox working again. 在我们新的表示形式中,Value可以包含一个double值,但它并不等同于double类型。有一个强制性的转换步骤可以实现从一个值到另一个值的转换。我们需要遍历代码并插入这些转换步骤,以使clox重新工作。
We’ll implement these conversions as a handful of macros, one for each type and operation. First, to promote a native C value to a clox Value: 我们会用少量的宏来实现这些转换,每个宏对应一个类型和操作。首先,将原生的C值转换为clox Value:
} Value;
add after struct Value
#define BOOL_VAL(value) ((Value){VAL_BOOL, {.boolean = value}}) #define NIL_VAL ((Value){VAL_NIL, {.number = 0}}) #define NUMBER_VAL(value) ((Value){VAL_NUMBER, {.number = value}})
typedef struct {
Each one of these takes a C value of the appropriate type and produces a Value that has the correct type tag and contains the underlying value. This hoists statically typed values up into clox’s dynamically typed universe. In order to do anything with a Value, though, we need to unpack it and get the C value back out. 其中每个宏都接收一个适当类型的C值,并生成一个Value,其具有正确类型标签并包含底层的值。这就把静态类型的值提升到了clox的动态类型的世界。但是为了能对Value做任何操作,我们需要将其拆包并取出对应的C值。
} Value;
add after struct Value
#define AS_BOOL(value) ((value).as.boolean) #define AS_NUMBER(value) ((value).as.number)
#define BOOL_VAL(value) ((Value){VAL_BOOL, {.boolean = value}})
These macros go in the opposite direction. Given a Value of the right type, they unwrap it and return the corresponding raw C value. The “right type” part is important! These macros directly access the union fields. If we were to do something like: 这些宏的作用是反方向的。给定一个正确类型的Value,它们会将其解包并返回对应的原始C值。“正确类型”很重要!这些宏会直接访问联合体字段。如果我们要这样做:
Value value = BOOL_VAL(true); double number = AS_NUMBER(value);
Then we may open a smoldering portal to the Shadow Realm. It’s not safe to use
any of the AS_
macros unless we know the Value contains the appropriate type.
To that end, we define a last few macros to check a Value’s type.
那我们可能会打开一个通往暗影王国的阴燃之门。除非我们知道Value包含适当的类型,否则使用任何的AS_
宏都是不安全的。为此,我们定义最后几个宏来检查Value的类型。
} Value;
add after struct Value
#define IS_BOOL(value) ((value).type == VAL_BOOL) #define IS_NIL(value) ((value).type == VAL_NIL) #define IS_NUMBER(value) ((value).type == VAL_NUMBER)
#define AS_BOOL(value) ((value).as.boolean)
These macros return true
if the Value has that
type. Any time we call one of the AS_
macros, we need to guard it behind a
call to one of these first. With these eight macros, we can now safely shuttle
data between Lox’s dynamic world and C’s static one.
如果Value具有对应类型,这些宏会返回true
。每当我们调用一个AS_
宏时,我们都需要保证首先调用了这些宏。有了这8个宏,我们现在可以安全地在Lox的动态世界和C的静态世界之间传输数据了。
18 . 3动态类型数字
We’ve got our value representation and the tools to convert to and from it. All that’s left to get clox running again is to grind through the code and fix every place where data moves across that boundary. This is one of those sections of the book that isn’t exactly mind-blowing, but I promised I’d show you every single line of code, so here we are. 我们已经有了值的表示形式和转换的工具。要想让clox重新运行起来,剩下的工作就是仔细检查代码,修复每个数据跨边界传递的地方。这是本书中不太让人兴奋的章节之一,但我保证会给你展示每一行代码,所以我们开始吧。
The first values we create are the constants generated when we compile number literals. After we convert the lexeme to a C double, we simply wrap it in a Value before storing it in the constant table. 我们创建的第一个值是在编译数值字面量时生成的常量。在我们将词素转换为C语言的double之后,我们简单地将其包装在一个Value中,然后再存储到常量表中。
double value = strtod(parser.previous.start, NULL);
in number()
replace 1 line
emitConstant(NUMBER_VAL(value));
}
Over in the runtime, we have a function to print values. 在运行时,我们有一个函数来打印值。
void printValue(Value value) {
in printValue()
replace 1 line
printf("%g", AS_NUMBER(value));
}
Right before we send the Value to printf()
, we unwrap it and extract the
double value. We’ll revisit this function shortly to add the other types, but
let’s get our existing code working first.
在我们将Value发送给printf()
之前,我们将其拆装并提取出double值。我们很快会重新回顾这个函数并添加其它类型,但是我们先让现有的代码工作起来。
18 . 3 . 1一元取负与运行时错误
The next simplest operation is unary negation. It pops a value off the stack, negates it, and pushes the result. Now that we have other types of values, we can’t assume the operand is a number anymore. The user could just as well do: 接下来最简单的操作是一元取负。它会从栈中弹出一个值,对其取负,并将结果压入栈。现在我们有了其它类型的值,我们不能再假设操作数是一个数字。用户也可以这样做:
print -false; // Uh...
We need to handle that gracefully, which means it’s time for runtime errors. Before performing an operation that requires a certain type, we need to make sure the Value is that type. 我们需要优雅地处理这个问题,这意味着是时候讨论运行时错误了。在执行需要特定类型的操作之前,我们需要确保Value是该类型。
For unary negation, the check looks like this: 对于一元取负来说,检查是这样的:
case OP_DIVIDE: BINARY_OP(/); break;
in run()
replace 1 line
case OP_NEGATE: if (!IS_NUMBER(peek(0))) { runtimeError("Operand must be a number."); return INTERPRET_RUNTIME_ERROR; } push(NUMBER_VAL(-AS_NUMBER(pop()))); break;
case OP_RETURN: {
First, we check to see if the Value on top of the stack is a number. If it’s not, we report the runtime error and stop the interpreter. Otherwise, we keep going. Only after this validation do we unwrap the operand, negate it, wrap the result and push it. 首先,我们检查栈顶的Value是否是一个数字。如果不是,则报告运行时错误并停止解释器。否则,我们就继续运行。只有在验证之后,我们才会拆装操作数,取负,将结果封装并压入栈。
To access the Value, we use a new little function. 为了访问Value,我们使用一个新的小函数。
add after pop()
static Value peek(int distance) { return vm.stackTop[-1 - distance]; }
It returns a Value from the stack but doesn’t pop it.
The distance
argument is how far down from the top of the stack to look: zero
is the top, one is one slot down, etc.
它从堆栈中返回一个Value,但是并不弹出它。distance
参数是指要从堆栈顶部向下看多远:0是栈顶,1是下一个槽,以此类推。
We report the runtime error using a new function that we’ll get a lot of mileage out of over the remainder of the book. 我们使用一个新函数来报告运行时错误,在本书的剩余部分,我们会从中得到很多的好处。
add after resetStack()
static void runtimeError(const char* format, ...) { va_list args; va_start(args, format); vfprintf(stderr, format, args); va_end(args); fputs("\n", stderr); size_t instruction = vm.ip - vm.chunk->code - 1; int line = vm.chunk->lines[instruction]; fprintf(stderr, "[line %d] in script\n", line); resetStack(); }
You’ve certainly called variadic functions—ones that take a varying number
of arguments—in C before: printf()
is one. But you may not have defined
your own. This book isn’t a C tutorial, so I’ll
skim over it here, but basically the ...
and va_list
stuff let us pass an
arbitrary number of arguments to runtimeError()
. It forwards those on to
vfprintf()
, which is the flavor of printf()
that takes an explicit
va_list
.
你以前肯定在C语言中调用过变参函数—接受不同数量参数的函数:printf()
就是其中之一。但你可能还没定义过自己的变参函数。这本书不是C语言教程,所以我在这里略过了,但是基本上是...
和va_list
让我们可以向runtimeError()
传递任意数量的参数。它将这些参数转发给vfprintf()
,这是printf()
的一个变体,需要一个显式地va_list
。
Callers can pass a format string to runtimeError()
followed by a number of
arguments, just like they can when calling printf()
directly. runtimeError()
then formats and prints those arguments. We won’t take advantage of that in this
chapter, but later chapters will produce formatted runtime error messages that
contain other data.
调用者可以向runtimeError()
传入一个格式化字符串,后跟一些参数,就像他们直接调用printf()
一样。然后runtimeError()
格式化并打印这些参数。在本章中我们不会利用这一点,但后面的章节中将生成包含其它数据的格式化运行时错误信息。
After we show the hopefully helpful error message, we tell the user which line of their code was being executed when the error occurred. Since we left the tokens behind in the compiler, we look up the line in the debug information compiled into the chunk. If our compiler did its job right, that corresponds to the line of source code that the bytecode was compiled from. 在显示了希望有帮助的错误信息之后,我们还会告诉用户,当错误发生时正在执行代码中的哪一行。因为我们在编译器中留下了标识,所以我们可以从编译到字节码块中的调试信息中查找行号。如果我们的编译器正确完成了它的工作,就能对应到字节码被编译出来的那一行源代码。
We look into the chunk’s debug line array using the current bytecode instruction
index minus one. That’s because the interpreter advances past each instruction
before executing it. So, at the point that we call runtimeError()
, the failed
instruction is the previous one.
我们使用当前字节码指令索引减1来查看字节码块的调试行数组。这是因为解释器在之前每条指令之前都会向前推进。所以,当我们调用 runtimeError()
,失败的指令就是前一条。
In order to use va_list
and the macros for working with it, we need to bring
in a standard header.
为了使用va_list
和相关的宏,我们需要引入一个标准头文件。
add to top of file
#include <stdarg.h>
#include <stdio.h>
With this, our VM can not only do the right thing when we negate numbers (like it used to before we broke it), but it also gracefully handles erroneous attempts to negate other types (which we don’t have yet, but still). 有了它,我们的虚拟机不仅可以在对数字取负时正确执行(原本就会这样做),而且还可以优雅地处理对其它类型取负的错误尝试(目前还没有,但仍然存在)。
18 . 3 . 2二元数字运算符
We have our runtime error machinery in place now, so fixing the binary operators
is easier even though they’re more complex. We support four binary operators
today: +
, -
, *
, and /
. The only difference between them is which
underlying C operator they use. To minimize redundant code between the four
operators, we wrapped up the commonality in a big preprocessor macro that takes
the operator token as a parameter.
我们现在已经有了运行时错误机制,所以修复二元运算符更容易,尽管它们更复杂。现在我们支持四种二元运算符:+
、-
、*
和/
。它们之间唯一的区别就是使用的是哪种底层C运算符。为了尽量减少这四个运算符之间的冗余代码,我们将它们的共性封装在一个大的预处理宏中,该宏以运算符标识作为参数。
That macro seemed like overkill a few chapters ago, but we get the benefit from it today. It lets us add the necessary type checking and conversions in one place. 这个宏在前几章中似乎是多余的,但现在我们却从中受益。它让我们可以在某个地方添加必要的类型检查和转换。
#define READ_CONSTANT() (vm.chunk->constants.values[READ_BYTE()])
in run()
replace 6 lines
#define BINARY_OP(valueType, op) \ do { \ if (!IS_NUMBER(peek(0)) || !IS_NUMBER(peek(1))) { \ runtimeError("Operands must be numbers."); \ return INTERPRET_RUNTIME_ERROR; \ } \ double b = AS_NUMBER(pop()); \ double a = AS_NUMBER(pop()); \ push(valueType(a op b)); \ } while (false)
for (;;) {
Yeah, I realize that’s a monster of a macro. It’s not what I’d normally consider good C practice, but let’s roll with it. The changes are similar to what we did for unary negate. First, we check that the two operands are both numbers. If either isn’t, we report a runtime error and yank the ejection seat lever. 是的,我知道这是一个巨大的宏。这不是我通常认为的好的C语言实践,但我们还是用它吧。这些调整与我们对一元取负所做的相似。首先,我们检查两个操作数是否都是数字。如果其中一个不是,我们就报告一个运行时错误,并拉下弹射座椅手柄。
If the operands are fine, we pop them both and unwrap them. Then we apply the
given operator, wrap the result, and push it back on the stack. Note that we
don’t wrap the result by directly using NUMBER_VAL()
. Instead, the wrapper to
use is passed in as a macro parameter. For our
existing arithmetic operators, the result is a number, so we pass in the
NUMBER_VAL
macro.
如果操作数都没有问题,我们就把它们都弹出栈并进行拆装。然后我们应用给定的运算符,包装结果并将其压回栈中。注意,我们没有直接使用NUMBER_VAL()
来包装结果。相反,我们要使用的包装器是作为宏参数传入的。对于我们现有的数字运算符来说,结果是一个数字,所以我们传入NUMBER_VAL
宏。
}
in run()
replace 4 lines
case OP_ADD: BINARY_OP(NUMBER_VAL, +); break; case OP_SUBTRACT: BINARY_OP(NUMBER_VAL, -); break; case OP_MULTIPLY: BINARY_OP(NUMBER_VAL, *); break; case OP_DIVIDE: BINARY_OP(NUMBER_VAL, /); break;
case OP_NEGATE:
Soon, I’ll show you why we made the wrapping macro an argument. 很快,我就会告诉你为什么我们要将包装宏作为参数。
18 . 4两个新类型
All of our existing clox code is back in working order. Finally, it’s time to add some new types. We’ve got a running numeric calculator that now does a number of pointless paranoid runtime type checks. We can represent other types internally, but there’s no way for a user’s program to ever create a Value of one of those types. 我们现有的所有clox代码都恢复正常工作了。最后,是时候添加一些新类型了。我们有一个正在运行的数字计算器,它现在做了一些毫无意义的偏执的运行时类型检查。我们可以在内部表示其它类型,但用户的程序无法创建这些类型的Value。
Not until now, that is. We’ll start by adding compiler support for the three new
literals: true
, false
, and nil
. They’re all pretty simple, so we’ll do all
three in a single batch.
现在还不能。首先,我们向编译器添加对三个新字面量的支持:true
、false
、nil
。它们都很简单,所以我们一次性完成这三个。
With number literals, we had to deal with the fact that there are billions of
possible numeric values. We attended to that by storing the literal’s value in
the chunk’s constant table and emitting a bytecode instruction that simply
loaded that constant. We could do the same thing for the new types. We’d store,
say, true
, in the constant table, and use an OP_CONSTANT
to read it out.
对于数字字面量,我们要面对这样一个事实:有数十亿个可能的数字值。为此,我们将字面量的值保存在字节码块的常量表中,并生成一个加载该常量的字节码指令。我们可以对这些新类型做同样的事。我们在常量表中存储值,比如true
,并使用OP_CONSTANT
来读取它。
But given that there are literally (heh) only three possible values we need to worry about with these new types, it’s gratuitous—and slow!—to waste a two-byte instruction and a constant table entry on them. Instead, we’ll define three dedicated instructions to push each of these literals on the stack. 但是考虑到这些新类型实际上只有三种可能的值,这样做是没有必要的—而且速度很慢!—浪费了一个两字节的指令和常量表中的一个项。相反,我们会定义三个专用指令来将这些字面量压入栈中。
OP_CONSTANT,
in enum OpCode
OP_NIL, OP_TRUE, OP_FALSE,
OP_ADD,
Our scanner already treats true
, false
, and nil
as keywords, so we can
skip right to the parser. With our table-based Pratt parser, we just need to
slot parser functions into the rows associated with those keyword token types.
We’ll use the same function in all three slots. Here:
我们的扫描器已经将true
、false
和nil
视为关键字,所以我们可以直接调到解析器。对于我们这个基于表格的Pratt解析器,只需要将解析器函数插入到与这些关键字标识类型相对应的行中。我们会在三个槽中使用相同的函数。这里:
[TOKEN_ELSE] = {NULL, NULL, PREC_NONE},
replace 1 line
[TOKEN_FALSE] = {literal, NULL, PREC_NONE},
[TOKEN_FOR] = {NULL, NULL, PREC_NONE},
Here: 这里:
[TOKEN_THIS] = {NULL, NULL, PREC_NONE},
replace 1 line
[TOKEN_TRUE] = {literal, NULL, PREC_NONE},
[TOKEN_VAR] = {NULL, NULL, PREC_NONE},
And here: 还有这里:
[TOKEN_IF] = {NULL, NULL, PREC_NONE},
replace 1 line
[TOKEN_NIL] = {literal, NULL, PREC_NONE},
[TOKEN_OR] = {NULL, NULL, PREC_NONE},
When the parser encounters false
, nil
, or true
, in prefix position, it
calls this new parser function:
当解析器在前缀位置遇到false
、nil
或 true
时,它会调用这个新的解析器函数:
add after binary()
static void literal() { switch (parser.previous.type) { case TOKEN_FALSE: emitByte(OP_FALSE); break; case TOKEN_NIL: emitByte(OP_NIL); break; case TOKEN_TRUE: emitByte(OP_TRUE); break; default: return; // Unreachable. } }
Since parsePrecedence()
has already consumed the keyword token, all we need to
do is output the proper instruction. We figure that
out based on the type of token we parsed. Our front end can now compile Boolean
and nil literals to bytecode. Moving down the execution pipeline, we reach the
interpreter.
因为parsePrecedence()
已经消耗了关键字标识,我们需要做的就是输出正确的指令。我们根据解析出的标识的类型来确定指令。我们的前端现在可以将布尔值和nil
字面量编译为字节码。沿着执行管道向下移动,我们就到了解释器。
case OP_CONSTANT: { Value constant = READ_CONSTANT(); push(constant); break; }
in run()
case OP_NIL: push(NIL_VAL); break; case OP_TRUE: push(BOOL_VAL(true)); break; case OP_FALSE: push(BOOL_VAL(false)); break;
case OP_ADD: BINARY_OP(NUMBER_VAL, +); break;
This is pretty self-explanatory. Each instruction summons the appropriate value and pushes it onto the stack. We shouldn’t forget our disassembler either. 这一点是不言而喻的。每条指令都会召唤出相应的值并将其压入堆栈。我们也不能忘记反汇编程序。
case OP_CONSTANT: return constantInstruction("OP_CONSTANT", chunk, offset);
in disassembleInstruction()
case OP_NIL: return simpleInstruction("OP_NIL", offset); case OP_TRUE: return simpleInstruction("OP_TRUE", offset); case OP_FALSE: return simpleInstruction("OP_FALSE", offset);
case OP_ADD:
With this in place, we can run this Earth-shattering program: 有了这些,我们就可以运行这个惊天动地的程序:
true
Except that when the interpreter tries to print the result, it blows up. We need
to extend printValue()
to handle the new types too:
只是当解释器试图打印结果时,就崩溃了。我们也需要扩展printValue()
来处理新类型:
void printValue(Value value) {
in printValue()
replace 1 line
switch (value.type) { case VAL_BOOL: printf(AS_BOOL(value) ? "true" : "false"); break; case VAL_NIL: printf("nil"); break; case VAL_NUMBER: printf("%g", AS_NUMBER(value)); break; }
}
There we go! Now we have some new types. They just aren’t very useful yet. Aside
from the literals, you can’t really do anything with them. It will be a while
before nil
comes into play, but we can start putting Booleans to work in the
logical operators.
我们继续!现在我们有了一些新的类型,只是它们目前还不是很有用。除了字面量之外,你无法真正对其做任何事。还需要一段时间nil
才会发挥作用,但我们可以先让布尔值在逻辑运算符中发挥作用。
18 . 4 . 1逻辑非和false
The simplest logical operator is our old exclamatory friend unary not. 最简单的逻辑运算符是我们充满感叹意味的老朋友一元取非。
print !true; // "false"
This new operation gets a new instruction. 这个新操作会有一条新指令。
OP_DIVIDE,
in enum OpCode
OP_NOT,
OP_NEGATE,
We can reuse the unary()
parser function we wrote for unary negation to
compile a not expression. We just need to slot it into the parsing table.
我们可以重用为一元取负所写的解析函数来编译一个逻辑非表达式。我们只需要将其插入到解析表格中。
[TOKEN_STAR] = {NULL, binary, PREC_FACTOR},
replace 1 line
[TOKEN_BANG] = {unary, NULL, PREC_NONE},
[TOKEN_BANG_EQUAL] = {NULL, NULL, PREC_NONE},
Because I knew we were going to do this, the unary()
function already has a
switch on the token type to figure out which bytecode instruction to output. We
merely add another case.
因为我之前已知道我们要这样做,unary()
函数已经有了关于标识类型的switch语句,来判断要输出哪个字节码指令。我们只需要增加一个分支即可。
switch (operatorType) {
in unary()
case TOKEN_BANG: emitByte(OP_NOT); break;
case TOKEN_MINUS: emitByte(OP_NEGATE); break; default: return; // Unreachable. }
That’s it for the front end. Let’s head over to the VM and conjure this instruction into life. 前端就这样了。让我们去虚拟机那里,并将这个指令变成现实。
case OP_DIVIDE: BINARY_OP(NUMBER_VAL, /); break;
in run()
case OP_NOT: push(BOOL_VAL(isFalsey(pop()))); break;
case OP_NEGATE:
Like our previous unary operator, it pops the one operand, performs the
operation, and pushes the result. And, as we did there, we have to worry about
dynamic typing. Taking the logical not of true
is easy, but there’s nothing
preventing an unruly programmer from writing something like this:
跟之前的一元运算符一样,它会弹出一个操作数,执行操作,并将结果压入栈中。正如我们所做的那样,我们必须考虑动态类型。对true
进行逻辑取非很容易,但没什么能阻止一个不守规矩的程序员写出这样的东西:
print !nil;
For unary minus, we made it an error to negate anything that isn’t a number. But Lox, like most scripting languages, is more
permissive when it comes to !
and other contexts where a Boolean is expected.
The rule for how other types are handled is called “falsiness”, and we implement
it here:
对于一元取负,我们把对任何非数字的东西进行取负当作一个错误。但是Lox,像大多数脚本语言一样,在涉及到!
和其它期望出现布尔值的情况下,是比较宽容的。处理其它类型的规则被称为“falsiness”,我们在这里实现它:
add after peek()
static bool isFalsey(Value value) { return IS_NIL(value) || (IS_BOOL(value) && !AS_BOOL(value)); }
Lox follows Ruby in that nil
and false
are falsey and every other value
behaves like true
. We’ve got a new instruction we can generate, so we also
need to be able to ungenerate it in the disassembler.
Lox遵循Ruby的规定,nil
和false
是假的,其它的值都表现为true
。我们已经有了一条可以生成的新指令,所以我们也需要能够在反汇编程序中反生成它。
case OP_DIVIDE: return simpleInstruction("OP_DIVIDE", offset);
in disassembleInstruction()
case OP_NOT: return simpleInstruction("OP_NOT", offset);
case OP_NEGATE:
18 . 4 . 2相等与比较运算符
That wasn’t too bad. Let’s keep the momentum going and knock out the equality
and comparison operators too: ==
, !=
, <
, >
, <=
, and >=
. That covers
all of the operators that return Boolean results except the logical operators
and
and or
. Since those need to short-circuit (basically do a little
control flow) we aren’t ready for them yet.
还不算太糟。让我们继续保持这种势头,搞定相等与比较运算符: ==
,!=
,<
,>
,<=
和>=
。这涵盖了所有会返回布尔值的运算符,除了逻辑运算符and
和or
。因为这些运算符需要短路计算(基本上是做一个小小的控制流),我们还没准备好。
Here are the new instructions for those operators: 下面是这些运算符对应的新指令:
OP_FALSE,
in enum OpCode
OP_EQUAL, OP_GREATER, OP_LESS,
OP_ADD,
Wait, only three? What about !=
, <=
, and >=
? We could create instructions
for those too. Honestly, the VM would execute faster if we did, so we should
do that if the goal is performance.
等一下,只有三个?那!=
、<=
和>=
呢?我们也可以为它们创建指令。老实说,如果我们这样做,虚拟机的执行速度会更快。所以如果我们的目标是追求性能,那就应该这样做。
But my main goal is to teach you about bytecode compilers. I want you to start internalizing the idea that the bytecode instructions don’t need to closely follow the user’s source code. The VM has total freedom to use whatever instruction set and code sequences it wants as long as they have the right user-visible behavior. 但我的主要目标是教你有关字节码编译器的知识。我想要你开始内化一个想法:字节码指令不需要紧跟用户的源代码。虚拟机可以完全自由地使用它想要的任何指令集和代码序列,只要它们有正确的用户可见的行为。
The expression a != b
has the same semantics as !(a == b)
, so the compiler
is free to compile the former as if it were the latter. Instead of a dedicated
OP_NOT_EQUAL
instruction, it can output an OP_EQUAL
followed by an OP_NOT
.
Likewise, a <= b
is the same as !(a > b)
and a >= b
is !(a < b)
. Thus, we only need three new instructions.
表达式a!=b
与!(a==b)
具有相同的语义,所以编译器可以自由地编译前者,就好像它是后者一样。它可以输出一条OP_EQUAL
指令,之后是一条OP_NOT
,而不是一条专用的OP_NOT_EQUAL
指令。同样地,a<=b
与!(a>b)
相同,而a>=b
与!(a<b)
相同,所以我们只需要三条新指令。
Over in the parser, though, we do have six new operators to slot into the parse
table. We use the same binary()
parser function from before. Here’s the row
for !=
:
不过,在解析器中,我们确实有6个新的操作符要加入到解析表中。我们使用与之前相同的binary()
解析函数。下面是!=
对应的行:
[TOKEN_BANG] = {unary, NULL, PREC_NONE},
replace 1 line
[TOKEN_BANG_EQUAL] = {NULL, binary, PREC_EQUALITY},
[TOKEN_EQUAL] = {NULL, NULL, PREC_NONE},
The remaining five operators are a little farther down in the table. 其余五个运算符在表的最下方。
[TOKEN_EQUAL] = {NULL, NULL, PREC_NONE},
replace 5 lines
[TOKEN_EQUAL_EQUAL] = {NULL, binary, PREC_EQUALITY}, [TOKEN_GREATER] = {NULL, binary, PREC_COMPARISON}, [TOKEN_GREATER_EQUAL] = {NULL, binary, PREC_COMPARISON}, [TOKEN_LESS] = {NULL, binary, PREC_COMPARISON}, [TOKEN_LESS_EQUAL] = {NULL, binary, PREC_COMPARISON},
[TOKEN_IDENTIFIER] = {NULL, NULL, PREC_NONE},
Inside binary()
we already have a switch to generate the right bytecode for
each token type. We add cases for the six new operators.
在binary()
中,我们已经有了一个switch语句,为每种标识类型生成正确的字节码。我们为这六个新运算符添加分支。
switch (operatorType) {
in binary()
case TOKEN_BANG_EQUAL: emitBytes(OP_EQUAL, OP_NOT); break; case TOKEN_EQUAL_EQUAL: emitByte(OP_EQUAL); break; case TOKEN_GREATER: emitByte(OP_GREATER); break; case TOKEN_GREATER_EQUAL: emitBytes(OP_LESS, OP_NOT); break; case TOKEN_LESS: emitByte(OP_LESS); break; case TOKEN_LESS_EQUAL: emitBytes(OP_GREATER, OP_NOT); break;
case TOKEN_PLUS: emitByte(OP_ADD); break;
The ==
, <
, and >
operators output a single instruction. The others output
a pair of instructions, one to evalute the inverse operation, and then an
OP_NOT
to flip the result. Six operators for the price of three instructions!
==
、<
和>
运算符输出单个指令。其它运算符则输出一对指令,一条用于计算逆运算,然后用OP_NOT
来反转结果。仅仅使用三种指令就表达出了六种运算符的效果!
That means over in the VM, our job is simpler. Equality is the most general operation. 这意味着在虚拟机中,我们的工作更简单了。相等是最普遍的操作。
case OP_FALSE: push(BOOL_VAL(false)); break;
in run()
case OP_EQUAL: { Value b = pop(); Value a = pop(); push(BOOL_VAL(valuesEqual(a, b))); break; }
case OP_ADD: BINARY_OP(NUMBER_VAL, +); break;
You can evaluate ==
on any pair of objects, even objects of different types.
There’s enough complexity that it makes sense to shunt that logic over to a
separate function. That function always returns a C bool
, so we can safely
wrap the result in a BOOL_VAL
. The function relates to Values, so it lives
over in the “value” module.
你可以对任意一对对象执行==
,即使这些对象是不同类型的。这有足够的复杂性,所以有必要把这个逻辑分流到一个单独的函数中。这个函数会一个C语言的bool
值,所以我们可以安全地把结果包装在一个BOLL_VAL
中。这个函数与Value有关,所以它位于“value”模块中。
} ValueArray;
add after struct ValueArray
bool valuesEqual(Value a, Value b);
void initValueArray(ValueArray* array);
And here’s the implementation: 下面是实现:
add after printValue()
bool valuesEqual(Value a, Value b) { if (a.type != b.type) return false; switch (a.type) { case VAL_BOOL: return AS_BOOL(a) == AS_BOOL(b); case VAL_NIL: return true; case VAL_NUMBER: return AS_NUMBER(a) == AS_NUMBER(b); default: return false; // Unreachable. } }
First, we check the types. If the Values have different types, they are definitely not equal. Otherwise, we unwrap the two Values and compare them directly. 首先,我们检查类型。如果两个Value的类型不同,它们肯定不相等。否则,我们就把这两个Value拆装并直接进行比较。
For each value type, we have a separate case that handles comparing the value
itself. Given how similar the cases are, you might wonder why we can’t simply
memcmp()
the two Value structs and be done with it. The problem is that
because of padding and different-sized union fields, a Value contains unused
bits. C gives no guarantee about what is in those, so it’s possible that two
equal Values actually differ in memory that isn’t used.
对于每一种值类型,我们都有一个单独的case分支来处理值本身的比较。考虑到这些分支的相似性,你可能会想,为什么我们不能简单地对两个Value结构体进行memcmp()
,然后就可以了。问题在于,因为填充以及联合体字段的大小不同,Value中会包含无用的比特位。C语言不能保证这些值是什么,所以两个相同的Value在未使用的内存中可能是完全不同的。
(You wouldn’t believe how much pain I went through before learning this fact.) (你无法想象在了解这个事实之前我经历了多少痛苦。)
Anyway, as we add more types to clox, this function will grow new cases. For now, these three are sufficient. The other comparison operators are easier since they work only on numbers. 总之,随着我们向clox中添加更多的类型,这个函数也会增加更多的case分支。就目前而言,这三个已经足够了。其它的比较运算符更简单,因为它们只处理数字。
push(BOOL_VAL(valuesEqual(a, b))); break; }
in run()
case OP_GREATER: BINARY_OP(BOOL_VAL, >); break; case OP_LESS: BINARY_OP(BOOL_VAL, <); break;
case OP_ADD: BINARY_OP(NUMBER_VAL, +); break;
We already extended the BINARY_OP
macro to handle operators that return
non-numeric types. Now we get to use that. We pass in BOOL_VAL
since the
result value type is Boolean. Otherwise, it’s no different from plus or minus.
我们已经扩展了BINARY_OP
宏,来处理返回非数字类型的运算符。现在我们要用到它了。因为结果值类型是布尔型,所以我们传入BOOL_VAL
。除此之外,这与加减运算没有区别。
As always, the coda to today’s aria is disassembling the new instructions. 与往常一样,今天的咏叹调的尾声是对新指令进行反汇编。
case OP_FALSE: return simpleInstruction("OP_FALSE", offset);
in disassembleInstruction()
case OP_EQUAL: return simpleInstruction("OP_EQUAL", offset); case OP_GREATER: return simpleInstruction("OP_GREATER", offset); case OP_LESS: return simpleInstruction("OP_LESS", offset);
case OP_ADD:
With that, our numeric calculator has become something closer to a general expression evaluator. Fire up clox and type in: 这样一来,我们的数字计算器就变得更接近于一个通用的表达式求值器。启动clox并输入:
!(5 - 4 > 3 * 2 == !nil)
OK, I’ll admit that’s maybe not the most useful expression, but we’re making progress. We have one missing built-in type with its own literal form: strings. Those are much more complex because strings can vary in size. That tiny difference turns out to have implications so large that we give strings their very own chapter. 好吧,我承认这可能不是最有用的表达式,但我们正在取得进展。我们还缺少一种自带字面量形式的内置类型:字符串。它们要复杂得多,因为字符串的大小可以不同。这个微小的差异会产生巨大的影响,以至于我们给字符串单独开了一章。
Challenges
-
We could reduce our binary operators even further than we did here. Which other instructions can you eliminate, and how would the compiler cope with their absence? 我们可以进一步简化二元操作符。还有哪些指令可以取消,编译器如何应对这些指令的缺失?
-
Conversely, we can improve the speed of our bytecode VM by adding more specific instructions that correspond to higher-level operations. What instructions would you define to speed up the kind of user code we added support for in this chapter? 相反,我们可以通过添加更多对应于高级操作的专用指令来提高字节码虚拟机的速度。你会定义什么指令来加速我们在本章中添加的那种用户代码?