读 Node.js 源码深入理解 cjs 模块系统

Posted 2022-11-21 阿里巴巴淘系技术团队官网博客

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了读 Node.js 源码深入理解 cjs 模块系统相关的知识，希望对你有一定的参考价值。

本文将对 Node.js 源码进行探索，深入理解 cjs 模块的加载过程。

相信大家都知道如何在 Node.js 中加载一个模块：

const fs = require('fs');
const express = require('express');
const anotherModule = require('./another-module');

没错，require 就是加载 cjs 模块的 API，但 V8 本身是没有 cjs 模块系统的，所以 node 是怎么通过 require找到模块并且加载的呢？我们今天将对 Node.js 源码进行探索，深入理解 cjs 模块的加载过程。

我们阅读的 node 代码版本为 v17.x：

git head ：881174e016d6c27b20c70111e6eae2296b6c6293

代码链接：https://github.com/nodejs/node/tree/881174e016d6c27b20c70111e6eae2296b6c6293

内置模块

为了知道 require 的工作逻辑，我们需要先了解内置模块是如何被加载到 node 中的(诸如 'fs'，'path'，'child_process'，也包括无法被用户引用的内部模块)，准备好代码之后，我们首先要从 node 启动开始阅读。

node 的 main 函数在 src/node_main.cc内，通过调用 API node::Start来启动一个 node 实例：

src/node_main.cc地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/src/node_main.cc#L105

node::Start地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/src/node.cc#L1134

int Start(int argc, char** argv) 
  InitializationResult result = InitializeOncePerProcess(argc, argv);
  if (result.early_return) 
    return result.exit_code;
  


  
    Isolate::CreateParams params;
    const std::vector<size_t>* indices = nullptr;
    const EnvSerializeInfo* env_info = nullptr;
    bool use_node_snapshot =
        per_process::cli_options->per_isolate->node_snapshot;
    if (use_node_snapshot) 
      v8::StartupData* blob = NodeMainInstance::GetEmbeddedSnapshotBlob();
      if (blob != nullptr) 
        params.snapshot_blob = blob;
        indices = NodeMainInstance::GetIsolateDataIndices();
        env_info = NodeMainInstance::GetEnvSerializeInfo();
      
    
    uv_loop_configure(uv_default_loop(), UV_METRICS_IDLE_TIME);


    NodeMainInstance main_instance(&params,
                                   uv_default_loop(),
                                   per_process::v8_platform.Platform(),
                                   result.args,
                                   result.exec_args,
                                   indices);
    result.exit_code = main_instance.Run(env_info);
  


  TearDownOncePerProcess();
  return result.exit_code;

这里创建了事件循环，且创建了一个 NodeMainInstance 的实例 main_instance 并调用了它的 Run方法：

Run地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/src/node_main_instance.cc#L127

int NodeMainInstance::Run(const EnvSerializeInfo* env_info) 
  Locker locker(isolate_);
  Isolate::Scope isolate_scope(isolate_);
  HandleScope handle_scope(isolate_);


  int exit_code = 0;
  DeleteFnPtr<Environment, FreeEnvironment> env =
      CreateMainEnvironment(&exit_code, env_info);
  CHECK_NOT_NULL(env);


  Context::Scope context_scope(env->context());
  Run(&exit_code, env.get());
  return exit_code;

‍

Run 方法中调用 CreateMainEnvironment来创建并初始化环境：

CreateMainEnvironment地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/src/node_main_instance.cc#L170

Environment* CreateEnvironment(
    IsolateData* isolate_data,
    Local<Context> context,
    const std::vector<std::string>& args,
    const std::vector<std::string>& exec_args,
    EnvironmentFlags::Flags flags,
    ThreadId thread_id,
    std::unique_ptr<InspectorParentHandle> inspector_parent_handle) 
  Isolate* isolate = context->GetIsolate();
  HandleScope handle_scope(isolate);
  Context::Scope context_scope(context);
  // TODO(addaleax): This is a much better place for parsing per-Environment
  // options than the global parse call.
  Environment* env = new Environment(
      isolate_data, context, args, exec_args, nullptr, flags, thread_id);
#if HAVE_INSPECTOR
  if (inspector_parent_handle) 
    env->InitializeInspector(
        std::move(static_cast<InspectorParentHandleImpl*>(
            inspector_parent_handle.get())->impl));
   else 
    env->InitializeInspector();  
  
#endif


  if (env->RunBootstrapping().IsEmpty()) 
    FreeEnvironment(env);
    return nullptr;
  


  return env;

通过创建 Environment 对象 env 并调用其 RunBootstrapping方法：

RunBootstrapping地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/src/node.cc#L398

MaybeLocal<Value> Environment::RunBootstrapping() 
  EscapableHandleScope scope(isolate_);


  CHECK(!has_run_bootstrapping_code());


  if (BootstrapInternalLoaders().IsEmpty()) 
    return MaybeLocal<Value>();
  


  Local<Value> result;
  if (!BootstrapNode().ToLocal(&result)) 
    return MaybeLocal<Value>();
  


  // Make sure that no request or handle is created during bootstrap -
  // if necessary those should be done in pre-execution.
  // Usually, doing so would trigger the checks present in the ReqWrap and
  // HandleWrap classes, so this is only a consistency check.
  CHECK(req_wrap_queue()->IsEmpty());
  CHECK(handle_wrap_queue()->IsEmpty());


  DoneBootstrapping();


  return scope.Escape(result);

这里的 BootstrapInternalLoaders实现了 node 模块加载过程中非常重要的一步：

通过包装并执行 internal/bootstrap/loaders.js获取内置模块的 nativeModulerequire函数用于加载内置的 js 模块，获取 internalBinding用于加载内置的 C++ 模块，NativeModule则是专门用于内置模块的小型模块系统。

BootstrapInternalLoaders地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/src/node.cc#L298

internal/bootstrap/loaders.js地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/lib/internal/bootstrap/loaders.js#L326

nativeModulerequire地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/lib/internal/bootstrap/loaders.js#L326

nativeModulerequire地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/lib/internal/bootstrap/loaders.js#L332

internalBinding地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/lib/internal/bootstrap/loaders.js#L164

function nativeModuleRequire(id) 
  if (id === loaderId) 
    return loaderExports;
  


  const mod = NativeModule.map.get(id);
  // Can't load the internal errors module from here, have to use a raw error.
  // eslint-disable-next-line no-restricted-syntax
  if (!mod) throw new TypeError(`Missing internal module '$id'`);
  return mod.compileForInternalLoader();



const loaderExports = 
  internalBinding,
  NativeModule,
  require: nativeModuleRequire
;


return loaderExports;

需要注意的是，这个 require 函数只会被用于内置模块的加载，用户模块的加载并不会用到它。（这也是为什么我们通过打印 require('module')._cache 可以看到所有用户模块，却看不到 fs 等内置模块的原因，因为两者的加载和缓存维护方式并不一样）。

用户模块

接下来让我们把目光移回到 NodeMainInstance::Run函数：

NodeMainInstance::Run地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/src/node_main_instance.cc#L127

int NodeMainInstance::Run(const EnvSerializeInfo* env_info) 
  Locker locker(isolate_);
  Isolate::Scope isolate_scope(isolate_);
  HandleScope handle_scope(isolate_);


  int exit_code = 0;
  DeleteFnPtr<Environment, FreeEnvironment> env =
      CreateMainEnvironment(&exit_code, env_info);
  CHECK_NOT_NULL(env);


  Context::Scope context_scope(env->context());
  Run(&exit_code, env.get());
  return exit_code;

我们已经通过 CreateMainEnvironment 函数创建好了一个 env 对象，这个 Environment 实例已经有了一个模块系统 NativeModule 用于维护内置模块。

然后代码会运行到 Run 函数的另一个重载版本：

重载版本地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/src/node_main_instance.cc#L142

void NodeMainInstance::Run(int* exit_code, Environment* env) 
  if (*exit_code == 0) 
    LoadEnvironment(env, StartExecutionCallback);


    *exit_code = SpinEventLoop(env).FromMaybe(1);
  


  ResetStdio();


  // TODO(addaleax): Neither NODE_SHARED_MODE nor HAVE_INSPECTOR really
  // make sense here.
#if HAVE_INSPECTOR && defined(__POSIX__) && !defined(NODE_SHARED_MODE)
  struct sigaction act;
  memset(&act, 0, sizeof(act));
  for (unsigned nr = 1; nr < kMaxSignal; nr += 1) 
    if (nr == SIGKILL || nr == SIGSTOP || nr == SIGPROF)
      continue;
    act.sa_handler = (nr == SIGPIPE) ? SIG_IGN : SIG_DFL;
    CHECK_EQ(0, sigaction(nr, &act, nullptr));
  
#endif


#if defined(LEAK_SANITIZER)  
  __lsan_do_leak_check();
#endif

在这里调用 LoadEnvironment：

LoadEnvironment地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/src/api/environment.cc#L403

MaybeLocal<Value> LoadEnvironment(
    Environment* env,
    StartExecutionCallback cb) 
  env->InitializeLibuv();
  env->InitializeDiagnostics();


  return StartExecution(env, cb);

然后执行 StartExecution：

StartExecution地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/src/node.cc#L455

MaybeLocal<Value> StartExecution(Environment* env, StartExecutionCallback cb) 
  // 已省略其他运行方式，我们只看 `node index.js` 这种情况，不影响我们理解模块系统
  if (!first_argv.empty() && first_argv != "-") 
    return StartExecution(env, "internal/main/run_main_module");

在 StartExecution(env, "internal/main/run_main_module")这个调用中，我们会包装一个 function，并传入刚刚从 loaders 中导出的 require 函数，并运行 lib/internal/main/run_main_module.js内的代码：

lib/internal/main/run_main_module.js地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/lib/internal/main/run_main_module.js

'use strict';


const 
  prepareMainThreadExecution
 = require('internal/bootstrap/pre_execution');


prepareMainThreadExecution(true);


markBootstrapComplete();


// Note: this loads the module through the ESM loader if the module is
// determined to be an ES module. This hangs from the CJS module loader
// because we currently allow monkey-patching of the module loaders
// in the preloaded scripts through require('module').
// runMain here might be monkey-patched by users in --require.
// XXX: the monkey-patchability here should probably be deprecated.
require('internal/modules/cjs/loader').Module.runMain(process.argv[1]);

所谓的包装 function 并传入 require，伪代码如下：

(function(require, /* 其他入参 */) 
  // 这里是 internal/main/run_main_module.js 的文件内容
)();

所以这里是通过内置模块的 require 函数加载了 lib/internal/modules/cjs/loader.js导出的 Module 对象上的 `runMain` 方法，不过我们在 loader.js 中并没有发现 runMain 函数，其实这个函数是在 lib/internal/bootstrap/pre_execution.js中被定义到 Module 对象上的：

lib/internal/modules/cjs/loader.js地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/lib/internal/modules/cjs/loader.js#L172

lib/internal/bootstrap/pre_execution.js地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/lib/internal/bootstrap/pre_execution.js#L428

function initializeCJSLoader() 
  const CJSLoader = require('internal/modules/cjs/loader');
  if (!noGlobalSearchPaths) 
    CJSLoader.Module._initPaths();
  
  // TODO(joyeecheung): deprecate this in favor of a proper hook?
  CJSLoader.Module.runMain =
    require('internal/modules/run_main').executeUserEntryPoint;

在 lib/internal/modules/run_main.js中找到 executeUserEntryPoint 方法：

lib/internal/modules/run_main.js地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/lib/internal/modules/run_main.js#L74

function executeUserEntryPoint(main = process.argv[1]) 
  const resolvedMain = resolveMainPath(main);
  const useESMLoader = shouldUseESMLoader(resolvedMain);
  if (useESMLoader) 
    runMainESM(resolvedMain || main);
   else 
    // Module._load is the monkey-patchable CJS module loader.
    Module._load(main, null, true);

参数 main 即为我们传入的入口文件 index.js。可以看到，index.js 作为一个 cjs 模块应该被 Module._load 加载，那么 _load干了些什么呢？这个函数是 cjs 模块加载过程中最重要的一个函数，值得仔细阅读：

// `_load` 函数检查请求文件的缓存
// 1. 如果模块已经存在，返回已缓存的 exports 对象
// 2. 如果模块是内置模块，通过调用 `NativeModule.prototype.compileForPublicLoader()`
//    获取内置模块的 exports 对象，compileForPublicLoader 函数是有白名单的，只能获取公开
//    内置模块的 exports。
// 3. 以上两者皆为否，创建新的 Module 对象并保存到缓存中，然后通过它加载文件并返回其 exports。


// request：请求的模块，比如 `fs`，`./another-module`，'@pipcook/core' 等
// parent：父模块，如在 `a.js` 中 `require('b.js')`，那么这里的 request 为 'b.js',
           parent 为 `a.js` 对应的 Module 对象
// isMain: 除入口文件为 `true` 外，其他模块都为 `false`
Module._load = function(request, parent, isMain) 
  let relResolveCacheIdentifier;
  if (parent) 
    debug('Module._load REQUEST %s parent: %s', request, parent.id);
     // relativeResolveCache 是模块路径缓存，
    // 用于加速父模块所在目录下的所有模块请求当前模块时
    // 可以直接查询到实际路径，而不需要通过 _resolveFilename 查找文件
    relResolveCacheIdentifier = `$parent.path\\x00$request`;
    const filename = relativeResolveCache[relResolveCacheIdentifier];
    if (filename !== undefined) 
      const cachedModule = Module._cache[filename];
      if (cachedModule !== undefined) 
        updateChildren(parent, cachedModule, true);
        if (!cachedModule.loaded)
          return getExportsForCircularRequire(cachedModule);
        return cachedModule.exports;
      
      delete relativeResolveCache[relResolveCacheIdentifier];
    
  
  // 尝试查找模块文件路径，找不到模块抛出异常
  const filename = Module._resolveFilename(request, parent, isMain);
  // 如果是内置模块，从 `NativeModule` 加载
  if (StringPrototypeStartsWith(filename, 'node:')) 
    // Slice 'node:' prefix
    const id = StringPrototypeSlice(filename, 5);


    const module = loadNativeModule(id, request);
    if (!module?.canBeRequiredByUsers) 
      throw new ERR_UNKNOWN_BUILTIN_MODULE(filename);
    


    return module.exports;
  
  // 如果缓存中已存在，将当前模块 push 到父模块的 children 字段
  const cachedModule = Module._cache[filename];
  if (cachedModule !== undefined) 
    updateChildren(parent, cachedModule, true);
    // 处理循环引用
    if (!cachedModule.loaded) 
      const parseCachedModule = cjsParseCache.get(cachedModule);
      if (!parseCachedModule || parseCachedModule.loaded)
        return getExportsForCircularRequire(cachedModule);
      parseCachedModule.loaded = true;
     else 
      return cachedModule.exports;
    
  
  // 尝试从内置模块加载
  const mod = loadNativeModule(filename, request);
  if (mod?.canBeRequiredByUsers) return mod.exports;
  
  // Don't call updateChildren(), Module constructor already does.
  const module = cachedModule || new Module(filename, parent);


  if (isMain) 
    process.mainModule = module;
    module.id = '.';
  
  // 将 module 对象加入缓存
  Module._cache[filename] = module;
  if (parent !== undefined) 
    relativeResolveCache[relResolveCacheIdentifier] = filename;
  


  // 尝试加载模块，如果加载失败则删除缓存中的 module 对象，
  // 同时删除父模块的 children 内的 module 对象。
  let threw = true;
  try 
    module.load(filename);
    threw = false;
   finally 
    if (threw) 
      delete Module._cache[filename];
      if (parent !== undefined) 
        delete relativeResolveCache[relResolveCacheIdentifier];
        const children = parent?.children;
        if (ArrayIsArray(children)) 
          const index = ArrayPrototypeIndexOf(children, module);
          if (index !== -1) 
            ArrayPrototypeSplice(children, index, 1);
          
        
      
     else if (module.exports &&
               !isProxy(module.exports) &&
               ObjectGetPrototypeOf(module.exports) ===
                 CircularRequirePrototypeWarningProxy) 
      ObjectSetPrototypeOf(module.exports, ObjectPrototype);
    
  
  // 返回 exports 对象
  return module.exports;
;

module 对象上的 load函数用于执行一个模块的加载：

load地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/lib/internal/modules/cjs/loader.js#L963

Module.prototype.load = function(filename) 
  debug('load %j for module %j', filename, this.id);


  assert(!this.loaded);
  this.filename = filename;
  this.paths = Module._nodeModulePaths(path.dirname(filename));


  const extension = findLongestRegisteredExtension(filename);
  // allow .mjs to be overridden
  if (StringPrototypeEndsWith(filename, '.mjs') && !Module._extensions['.mjs'])
    throw new ERR_REQUIRE_ESM(filename, true);


  Module._extensions[extension](this, filename);
  this.loaded = true;


  const esmLoader = asyncESM.esmLoader;
  // Create module entry at load time to snapshot exports correctly
  const exports = this.exports;
  // Preemptively cache
  if ((module?.module === undefined ||
       module.module.getStatus() < kEvaluated) &&
      !esmLoader.cjsCache.has(this))
    esmLoader.cjsCache.set(this, exports);
;

实际的加载动作是在 Module._extensions[extension](this, filename); 中进行的，根据扩展名的不同，会有不同的加载策略：

.js：调用 fs.readFileSync 读取文件内容，将文件内容包在 wrapper 中，需要注意的是，这里的 require 是 Module.prototype.require 而非内置模块的 require 方法。
```
const wrapper = [
  '(function (exports, require, module, __filename, __dirname)  ',
  '\\n);',
];
```
.json：调用 fs.readFileSync 读取文件内容，并转换为对象。
.node：调用 dlopen 打开 node 扩展。

.js地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/lib/internal/modules/cjs/loader.js#L1104

.json地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/lib/internal/modules/cjs/loader.js#L1152

.node地址：https://github.com/nodejs/node/blob/881174e016d6c27b20c70111e6eae2296b6c6293/lib/internal/modules/cjs/loader.js#L1170

而 Module.prototype.require 函数也是调用了静态方法 Module._load实现模块加载的：

Module.prototype.require = function(id) 
  validateString(id, 'id');
  if (id === '') 
    throw new ERR_INVALID_ARG_VALUE('id', id,
                                    'must be a non-empty string');
  
  requireDepth++;
  try 
    return Module._load(id, this, /* isMain */ false);
   finally 
    requireDepth--;
  
;

看到这里，cjs 模块的加载过程已经基本清晰了：

初始化 node，加载 NativeModule，用于加载所有的内置的 js 和 c++ 模块
运行内置模块 run_main
在 run_main 中引入用户模块系统 module
通过 module 的 _load 方法加载入口文件，在加载时通过传入 module.require 和 module.exports 等让入口文件可以正常 require 其他依赖模块并递归让整个依赖树被完整加载。

在清楚了 cjs 模块加载的完整流程之后，我们还可以顺着这条链路阅读其他代码，比如 global 变量的初始化，esModule 的管理方式等，更深入地理解 node 内的各种实现。

团队介绍

我们是阿里巴巴-大淘宝技术-营销互动团队（原频道与D2C智能团队），也是阿里经济体前端委员会智能化方向的核心团队，隶属于大淘宝技术（大淘宝技术，一支致力于成为全球最懂商业的技术创新团队，旗下包含淘宝技术、天猫技术等团队和业务，是一支是具有商业和技术双重基因的螺旋体）。

我们在「杭州阿里巴巴西溪园区」办公，
我们的定位是「用诗人的浪漫和科学家的严谨打造最懂 AI 的 smart and international 的前端团队」，我们的使命是「前端智能让业务创新更高效」。

✿ 拓展阅读

作者|周飞宇(牟牟)

编辑|橙子君

以上是关于读 Node.js 源码深入理解 cjs 模块系统的主要内容，如果未能解决你的问题，请参考以下文章