在Java中使用WebRTC传输视频

贝贝猫技术分享 2019-10-17

1615

引言

最近一段时间的主要工作内容是开发一个远程控制手机的功能，其中音视频传输的部分是采用 WebRTC 技术来进行的，而我们的手机都是通过与其直接连接的 Agent 服务器进行管理，Agent 服务是 Java 写的，现在市面上又没有合适的 Java 版 WebRTC 库，所以我就基于 Google 开源代码，写了一个 JNI 调用 WebRTC Native 的库。之前的一篇文章，我主要讲了讲我是怎么编译 WebRTC^[1]的。这篇文章，我就来分享一下我是怎么在 Java 中使用 WebRTC 的，以及我根据业务需要对 WebRTC 的一些改动。说实话，在刚开始着手进行这部分工作的时候，真的可谓步履维艰，主要是太久没有写 C 的代码了，又对 WebRTC Native APIs 不熟悉，而且 WebRTC 这个技术用的人也不是很多，文档比较少。所以我当时在进行这部分开发的时候，先是参考Javascript 中 WebRTC 的使用^[2]，简单的熟悉了一下 Native APIs，此外还参考了NodeJS 的实现^[3]，遇到了问题就去 Google 的论坛WebRTC-Discuss^[4]，如果上述流程均没找到解决方案，就针对想要实现的功能走读所有相关代码=。=。整个功能开发完之后，在回过头来看所有写过的代码，感觉这个东西真的并不难，感慨自己当时真的是菜的抠脚^.。

Native APIs 介绍

如果您也要进行和我类似的工作，我觉得最主要的还是要先熟悉整个 Native APIs 的使用流程，梳理一下，你就会发现整个使用过程其实非常简单，也就八个大步骤。接下来我会先简单介绍这八个主要步骤，然后再针对每一个步骤，详细的介绍我是怎么做的。Native APIs 使用流程:

通过 Native APIs 创建三个 WebRTC 工作的线程：Worker Thread，Network Thread，Signaling Thread

如果您像我一样需要自定义的音频采集模块以及自定义的编解码实现的话，也需要在这一步将其初始化。

创建 PeerConnectionFactory，这个工厂是所有后续工作的源头，无论是连接，还是音视频采集都需要由它来创建。
创建 PeerConnection，在这个过程中您可以设置连接的一些参数，比如 ICE Server 用哪个，网络 TCP/UDP 策略是怎样的。

如果您像我一样需要对端口的使用进行一些限制的话，需要指定自定义 PortAllocator

创建 Audio/VideoSource，创建 AudioSource 时可以指定一些采集参数，VideoSource 需要一个 VideoCapturer 对象作为参数。

如果您想我一样需要自己提供视频图像的话，就要实现一个自定义的 VideoCapturer

以上一步创建的 Audio/VideoSource 作为参数，创建 AudioTrackInterface，这个对象代表了 Audio/Video 的采集过程
创建 MediaStreamInterface 并将前一步创建的 Audio/VideoTrack 添加进去，这个对象代表了传输通道
将上一步创建的 MediaStream 添加到第三步创建的 PeerConnection 中
PeerConnection 通过 Observer 以回调的形式通知使用者，当前的连接状态等。我们需要通过各类回调以及 PeerConnection 的 API，来完成与另一个连接者之间的 SDP 和 ICE Candidate 的交换。

这八个步骤中，前两个是 Native APIs 这里特有的内容，其后的这些步骤基本上和 Web 中对 WebRTC 的使用流程相似。我当时就是在这些 Native 特有的内容上遇到了很多坑，接下来就让我详细的介绍一下我是如何在 Java 服务中通过 Native APIs 和其他客户端建立起连接吧。

JNI Vs JNA

大家应该都知道，要想在 Java 中调用 C++的代码，需要使用 JNI 或者 JNA 技术，那么它们两个有什么不同呢？在我们这个场景中应该使用哪一个呢？上图就是 JNI 的使用方式，从图中可以看到使用步骤非常多，很繁琐。我们先要在 Java 代码里定义好接口，然后通过工具^[5]生成对应的 C 语言头文件，接着再用 C 语言实现这些接口并编译成共享库，最终在 JVM 中 Load 该库，从而达到调用 C 语言代码的目的。而 JNA 相对来说就简单了许多，我们不需要重写我们的动态链接库文件，而是有直接调用的 API，大大简化了我们的工作量。看似 JNA 好像完胜 JNI，这部分工作非 JNA 莫属了。但是在我的这个场景中，JNA 有几个致命的问题，以至于我只能用 JNI。为什么不用 JNA

JNA 只能实现 Java 访问 C 函数，而我们在使用 PeerConnection 相关的 APIs 时，很多都是以 Observer 的形式回调的，这就需要 C 代码回调 Java 的 ObserverWrapper。
JNA 技术比使用 JNI 技术调用动态链接库会有些微的性能损失，虽然我不确定这个损失有多大，但是考虑到我们需要从 Java 传输每帧的图像给 C，这个过程我们希望是越快越好。

好了，既然我们已经确定要使用 JNI 技术了，就让我来介绍一下我具体是怎么做的吧。

代码结构

Java 代码结构

script/build-header-files.sh: 根据我写的 Java 接口，生成对应 C 语言头文件的脚本。

    #!/usr/bin/env bash
    ls -l ../path/to/rtc4j/core| grep ^- | awk '{print $9}' |
    sed 's/.class//g'|
    sed 's/^/package.name.of.core.&/g'|
    xargs javah -classpath ../target/classes -d ../../cpp/src/jni/
复制

src/XXX/core/: 这个包下就是这个库的核心部分，主要包含了音频采集器，视频采集器，连接过程中需要用到的各种回调接口，WebRTC 核心类的 Wrapper:

RTC -> webrtc::PeerConnectionFactoryInterface
PeerConnection -> webrtc::PeerConnectionInterface
DataChannel -> webrtc::DataChannelInterface

src/XXX/model/: 定义了核心类中使用到的 POJO 对象
src/XXX/utils/: 实现了不同平台下在 Java 端加载 Shared Lib 的过程

C++代码结构

C++这边的代码结构也比较简单，基本上和 Java 的接口是一一对应的。

src/jni/: 由 Java 接口自动生成出来的 C 语言头文件，和 Java 相关的类型工具包
src/media/: 音视频采集相关类，自定义编码相关类

音频部分实现了一个自定义的 AudioDeviceModule，在创建 PeerConnectionFactory的时候将其注入
视频部分实现了一个自定义的 VideoCapturer，在创建 VideoSource的时候将其注入
H264 的视频编解码使用了 FFMPEG 中提供的 libx264 以及 h264_nvenc(英伟达加速)，这部分代码在创建 PeerConnectionFactory的时候将其注入

src/rtc/: 各个 Java Wrapper 接口的实现类
src/rtc/network: 这里面定义了我自己的 SocketFactory，通过它达到了限制端口的目的，这部分在创建 PeerConnection的时候将其注入

Java 代码相对来说都比较简单，就是给 Native APIs 做个壳儿，C++也有不少代码就是对更下层 WebRTC lib 的简单封装，这些部分我就一笔带过了，着重来讲一下这里比较难啃的骨头。

在 C++中引入需要的库

整个 C++项目我是基于 CMake 搭建的，其中使用到了libwebrtc^[6]，FFMPEG^[7](用于视频编码)，libjpeg-turbo^[8](用于将 JavaVideoCapturer 中获取的图片转码成 YUV), CMake 文件如下：

   cmake_minimum_required(VERSION 3.8)
   project(rtc)
   set(CMAKE_CXX_STANDARD 11)


   if (APPLE)
       set(CMAKE_CXX_FLAGS "-fno-rtti -pthread") #WebRTC库用到的FLAGS
   elseif (UNIX)
       #除了前两个-fno-rtti -pthread，其他都是FFMPEG需要使用到的FLAGS
       set(CMAKE_CXX_FLAGS "-fno-rtti -pthread -lva -lva-drm -lva-x11 -llzma -lX11 -lz -ldl -ltheoraenc -ltheoradec")
       set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,-Bsymbolic")
   endif()


   include(./CMakeModules/FindFFMPEG.cmake) #引入FFMPEG
   include(./CMakeModules/FindLibJpegTurbo.cmake) #引入Jpeg-Turbo


   if (CMAKE_SYSTEM_NAME MATCHES "Linux") #C++代码中用于区分系统环境使用到属性
       set_property(DIRECTORY APPEND PROPERTY COMPILE_DEFINITIONS WEBRTC_LINUX)
   elseif(CMAKE_SYSTEM_NAME MATCHES "Darwin")
       set_property(DIRECTORY APPEND PROPERTY COMPILE_DEFINITIONS WEBRTC_MAC)
   endif()


   find_package(LibWebRTC REQUIRED) #引入WebRTC
   find_package(JNI REQUIRED) #引入JNI
   include_directories(${Java_INCLUDE_PATH}) #JNI头文件
   include_directories(${Java_INCLUDE_PATH2}) #JNI头文件
   include(${LIBWEBRTC_USE_FILE}) #WebRTC头文件
   include_directories("src")
   include_directories(${CMAKE_CURRENT_BINARY_DIR})
   include_directories(${TURBO_INCLUDE_DIRS}) #Jpeg-Turbo头文件


   file(GLOB_RECURSE SOURCES *.cpp) #需要编译的内容
   file(GLOB_RECURSE HEADERS *.h) #需要编译的内容头文件


   add_library(rtc SHARED ${SOURCES} ${HEADERS}) #编译共享库
   target_include_directories(rtc PRIVATE ${TURBO_INCLUDE_DIRS} ${FFMPEG_INCLUDE_DIRS})
   target_link_libraries(rtc PRIVATE ${TURBO_LIBRARIES} ${FFMPEG_LIBRARIES} ${LIBWEBRTC_LIBRARIES}) #链接共享库
复制

引入这些库的时候也踩了不少坑，尤其是使用 FFMPEG 的时候，下面简单分享一下。

编译 FFMPEG

在 Linux 下编译 FFMPEG，我主要参考了官方Guide^[9], 但是我们这里需要有一些改动 a. 如果有 enable-shared 开关一定要打开，官方Guide^[10]中都是 disable 的 b. 编译的时候一定要加上**"-fPIC"**，否则在 Linux 下链接时会有错误提示。共享对象可能会被不同的进程加载到不同的位置上，如果共享对象中的指令使用了绝对地址、外部模块地址，那么在共享对象被加载时就必须根据相关模块的加载位置对这个地址做调整，也就是修改这些地址，让它在对应进程中能正确访问，而被修改到的段就不能实现多进程共享一份物理内存，它们在每个进程中都必须有一份物理内存的拷贝。fPIC 指令就是为了让使用到同一个共享对象的多个进程能尽可能多的共享物理内存，它背后把那些涉及到绝对地址、外部模块地址访问的地方都抽离出来，保证代码段的内容可以多进程相同，实现共享。

   usr/bin/ld: test.o: relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC
   test.o: could not read symbols: Bad value
   collect2: ld returned 1 exit status
复制

c. 如果您也需要 Nvidia 的支持的话，请参考官方Guide^[11]d. 最后分享一下我最终编译 FFMPEG 时使用到的命令

   PATH="$HOME/bin:$PATH" PKG_CONFIG_PATH="$HOME/ffmpeg_build/lib/pkgconfig" ./configure \
     --prefix="$HOME/ffmpeg_build" \
     --pkg-config-flags="--static" \
     --extra-cflags="-I$HOME/ffmpeg_build/include" \
     --extra-ldflags="-L$HOME/ffmpeg_build/lib" \
     --extra-libs=-lpthread \
     --extra-libs=-lm \
     --bindir="$HOME/bin" \
     --enable-gpl \
     --enable-libfdk_aac \
     --enable-libfreetype \
     --enable-libmp3lame \
     --enable-libopus \
     --enable-libvorbis \
     --enable-libvpx \
     --enable-libx264 \
     --enable-libx265 \
     --enable-nonfree \
     --extra-cflags=-I/usr/local/cuda/include/ \
     --extra-ldflags=-L/usr/local/cuda/lib64 \
     --enable-shared \
     --cc="gcc -m64 -fPIC” \
     --enable-nvenc \
     --enable-cuda \
     --enable-cuvid \
     --enable-libnpp
复制

Mac 上安装 FFMPEG 就比较简单粗暴, 一键安装带所有参数的版本

    brew install ffmpeg $(brew options ffmpeg | grep -vE '\s' | grep -- '--with-' | tr '\n' ' ')
复制

安装 libjpeg-turbo

因为这个库比简单，我就直接下载了别人编译的版本^[12]。

引入 Turbo 和 FFMPEG

引入这两个库的方式非常类似，这里我就选取比较简单的 FindLibJpegTurbo.cmake 作为例子，FFMPEG 与其相比就是寻找的下层依赖更多罢了。

   # Try to find the libjpeg-turbo libraries and headers
   #
   # TURBO_INCLUDE_DIRS
   # TURBO_LIBRARIES
   # TURBO_FOUND


   # Find header files
   FIND_PATH(
       TURBO_INCLUDE_DIRS turbojpeg.h
       opt/libjpeg-turbo/include/
   )


   FIND_LIBRARY(
       TURBO_LIBRARY
       NAMES libturbojpeg.a
       PATH opt/libjpeg-turbo/lib64
   )


   FIND_LIBRARY(
       JPEG_LIBRARY
       NAMES libjpeg.a
       PATH opt/libjpeg-turbo/lib64
   )




   IF (TURBO_LIBRARY)
       SET(TURBO_FOUND TRUE)
   ENDIF ()


   IF (FFMPEG_FOUND AND TURBO_INCLUDE_DIRS)
       SET(TURBO_FOUND TRUE)
       SET(TURBO_LIBRARIES ${TURBO_LIBRARY} ${JPEG_LIBRARY})
       MESSAGE(STATUS "Found Turbo library: ${TURBO_LIBRARIES}, ${TURBO_INCLUDE_DIRS}")
   ELSE (FFMPEG_FOUND AND TURBO_INCLUDE_DIRS)
       MESSAGE(STATUS "Not found Turbo library")
   ENDIF ()
复制

至此，所有准备工作总算是完了，让我们来看看到底是怎么调用 Native APIs 的吧。

使用 Native APIs

创建 PeerConnectionFactory

之前介绍 Native APIs 的时候就提过，WebRTC 有三个主要线程来处理各项事务，这里我们先通过 API 来创建相应的线程，顺便一提说这个 WebRTC 提供的线程库真的很强大，你甚至可以把它作为一个跨平台的线程库来时候。如果有机会，我以后会专门写一篇文章介绍它的实现。书归正传，在创建线程的时候有一个重点的点就是创建 NetworkThread 时需要使用 CreateWithSocketServer 方法。

   void RTC::InitThreads() {
       signaling_thread = rtc::Thread::Create();
       signaling_thread->SetName("signaling", nullptr);
       RTC_CHECK(signaling_thread->Start()) << "Failed to start thread";
       WEBRTC_LOG("Original socket server used.", INFO);
       worker_thread = rtc::Thread::Create();
       worker_thread->SetName("worker", nullptr);
       RTC_CHECK(worker_thread->Start()) << "Failed to start thread";
       network_thread = rtc::Thread::CreateWithSocketServer();
       network_thread->SetName("network", nullptr);
       RTC_CHECK(network_thread->Start()) << "Failed to start thread";
   }
复制

此外如果您像我一样，有特殊的音频采集需求的话，就需要自己实现一个自己的 AudioDeviceModule，这里有一个注意的内容是创建 AudioDeviceModule 的过程必须在工作线程中进行，而且我们也需要在工作线程中释放该对象。

   void RTC::Init(jobject audio_capturer, jobject video_capturer) { 初始化PeerConnectionFactory过程
       this->video_capturer = video_capturer;
       InitThreads(); 初始化线程
       audio_device_module = worker_thread->Invoke<rtc::scoped_refptr<webrtc::AudioDeviceModule>>(
               RTC_FROM_HERE,
               rtc::Bind(
                       &RTC::InitJavaAudioDeviceModule,
                       this,
                       audio_capturer)); 在工作线程中初始化AudioDeviceModule
       WEBRTC_LOG("After fake audio device module.", INFO);
       InitFactory();
   }


   通过Java获取音频数据的AudioDeviceModule，之后会详细讲其具体的实现
   rtc::scoped_refptr<webrtc::AudioDeviceModule> RTC::InitJavaAudioDeviceModule(jobject audio_capturer) {
       RTC_DCHECK(worker_thread.get() == rtc::Thread::Current());
       WEBRTC_LOG("Create fake audio device module.", INFO);
       auto result = new rtc::RefCountedObject<FakeAudioDeviceModule>(
               FakeAudioDeviceModule::CreateJavaCapturerWrapper(audio_capturer),
               FakeAudioDeviceModule::CreateDiscardRenderer(44100));
       WEBRTC_LOG("Create fake audio device module finished.", INFO);
       is_connect_to_audio_card = true;
       return result;
   }


   ...
   释放AudioDeviceModule的过程
   worker_thread->Invoke<void>(RTC_FROM_HERE, rtc::Bind(&RTC::ReleaseAudioDeviceModule, this));
   ...


   因为audio_device_module是以rtc::RefCountedObject的形式存储的，它其实是一个计数指针，当该指针的引用数为0时，会自动调用对应实例的析构函数，所以我们在这里只需要将其赋值为nullptr即可
   void RTC::ReleaseAudioDeviceModule() {
       RTC_DCHECK(worker_thread.get() == rtc::Thread::Current());
       audio_device_module = nullptr;
   }
复制

有了三个关键线程和 AudioDeviceModule 之后，就可以创建 PeerConnectionFactory 了，我这里因为业务的需要，会有一些端口的限制，我也在这里进行了初始化，我们将在创建 PortAllocator 的时候使用它。看到这里您可能会有疑惑，为什么视频采集的注入和音频采集的注入不是在同一个地方进行的，那么你不是一个人，我也很疑惑=。=，我甚至觉得 SocketFactory 也应该丢到 PeerConnectionFactory 里管理，这样就不用每次创建 PeerConnection 的时候自己创建一个 PortAllocator。

   void RTC::InitFactory() {
       创建带端口和IP限制的SocketFacotry
       socket_factory.reset(
               new rtc::SocketFactoryWrapper(network_thread.get(), this->white_private_ip_prefix, this->min_port,
                                             this->max_port));
       network_manager.reset(new rtc::BasicNetworkManager());
       这里使用到了我自己实现的视频编码器，这部分我也会在后续进行详细介绍
       peer_connection_factory = webrtc::CreatePeerConnectionFactory(
               network_thread.get(), worker_thread.get(), signaling_thread.get(), audio_device_module,
               webrtc::CreateBuiltinAudioEncoderFactory(), webrtc::CreateBuiltinAudioDecoderFactory(),
               CreateVideoEncoderFactory(hardware_accelerate), CreateVideoDecoderFactory(),
               nullptr, nullptr);
   }
复制

诚然，在创建 PeerConnectionFactory 的过程中，有许多和我想法不一样的接口设计，我觉得可能是因为我的使用场景并不是常规使用场景，这样 WebRTC 的接口就显得不是很顺手。总之，PeerConnectionFactory 也算是整出来了，整理一下整个过程就是，创建线程->创建音频采集模块->创建 EncoderFactory->实例化 PeerConnectionFactory。

创建 PeerConnection

有了 PeerConnectionFactory 之后，我们就可以通过它来创建连接了。在这一步，我们需要提供 Ice Server 的相关信息，而且我在这里使用到了上一步中创建的 SocketFactory 来创建 PortAllocator，从而达到了限制端口的目的。此外我还在这一步中通过调用 PeerConnection 的 API，添加了最大传输速度的限制。

   创建PeerConnection
   PeerConnection *
   RTC::CreatePeerConnection(PeerConnectionObserver *peerConnectionObserver, std::string uri,
                             std::string username, std::string password, int max_bit_rate) {
       传递Ice Server信息
       webrtc::PeerConnectionInterface::RTCConfiguration configuration;
       webrtc::PeerConnectionInterface::IceServer ice_server;
       ice_server.uri = std::move(uri);
       ice_server.username = std::move(username);
       ice_server.password = std::move(password);
       configuration.servers.push_back(ice_server);
       禁用TCP协议
       configuration.tcp_candidate_policy = webrtc::PeerConnectionInterface::TcpCandidatePolicy::kTcpCandidatePolicyDisabled;
       减少音频延迟
       configuration.audio_jitter_buffer_fast_accelerate = true;
       利用之前创建的SocketFacotry生成PortAllocator达到限制端口的效果
       std::unique_ptr<cricket::PortAllocator> port_allocator(
               new cricket::BasicPortAllocator(network_manager.get(), socket_factory.get()));
       port_allocator->SetPortRange(this->min_port, this->max_port);
       创建PeerConnection并限制比特率
       return new PeerConnection(peer_connection_factory->CreatePeerConnection(
               configuration, std::move(port_allocator), nullptr, peerConnectionObserver), peerConnectionObserver,
                                 is_connect_to_audio_card, max_bit_rate);
   }


   调用API限制比特率
   void PeerConnection::ChangeBitrate(int bitrate) {
       auto bit_rate_setting = webrtc::BitrateSettings();
       bit_rate_setting.min_bitrate_bps = 30000;
       bit_rate_setting.max_bitrate_bps = bitrate;
       bit_rate_setting.start_bitrate_bps = bitrate;
       this->peer_connection->SetBitrate(bit_rate_setting);
   }
复制

创建 Audio/VideoSource

这一步我们需要使用 PeerConnectionFactory 的 API 来创建 Audio/VideoSource。在创建 AudioSource 时，我可以指定一些音频参数，而在创建 VideoSource 时，我们要指定一个 VideoCapturer。值得一提的是，需要在 SignallingThread 创建 VideoCapturer

   ...
   创建Audio/VideoSource
   audio_source = rtc->CreateAudioSource(GetAudioOptions());
   video_source = rtc->CreateVideoSource(rtc->CreateFakeVideoCapturerInSignalingThread());
   ...


   获取默认Audio Configurations
   cricket::AudioOptions PeerConnection::GetAudioOptions() {
       cricket::AudioOptions options;
       options.audio_jitter_buffer_fast_accelerate = absl::optional<bool>(true);
       options.audio_jitter_buffer_max_packets = absl::optional<int>(10);
       options.echo_cancellation = absl::optional<bool>(false);
       options.auto_gain_control = absl::optional<bool>(false);
       options.noise_suppression = absl::optional<bool>(false);
       options.highpass_filter = absl::optional<bool>(false);
       options.stereo_swapping = absl::optional<bool>(false);
       options.typing_detection = absl::optional<bool>(false);
       options.experimental_agc = absl::optional<bool>(false);
       options.extended_filter_aec = absl::optional<bool>(false);
       options.delay_agnostic_aec = absl::optional<bool>(false);
       options.experimental_ns = absl::optional<bool>(false);
       options.residual_echo_detector = absl::optional<bool>(false);
       options.audio_network_adaptor = absl::optional<bool>(true);
       return options;
   }


   创建AudioSource
   rtc::scoped_refptr<webrtc::AudioSourceInterface> RTC::CreateAudioSource(const cricket::AudioOptions &options) {
       return peer_connection_factory->CreateAudioSource(options);
   }


   在SignalingThread创建VideoCapturer
   FakeVideoCapturer *RTC::CreateFakeVideoCapturerInSignalingThread() {
       if (video_capturer) {
           return signaling_thread->Invoke<FakeVideoCapturer *>(RTC_FROM_HERE,
                                                                rtc::Bind(&RTC::CreateFakeVideoCapturer, this,
                                                                          video_capturer));
       } else {
           return nullptr;
       }
   }
复制

创建 Audio/VideoTrack

这一步相对来说就很简单了，以上一步创建的 Source 作为参数，加个名字就能创建出 Audio/VideoTrack。这个接口同样也是 PeerConnectionFactory 的。

   ...
   创建Audio/VideoTrack
   video_track = rtc->CreateVideoTrack("video_track", video_source.get());
   audio_track = rtc->CreateAudioTrack("audio_track", audio_source);
   ...


   创建VideoTrack
   rtc::scoped_refptr<webrtc::VideoTrackSourceInterface> RTC::CreateVideoSource(cricket::VideoCapturer *capturer) {
       return peer_connection_factory->CreateVideoSource(capturer);
   }


   创建AudioTrack
   rtc::scoped_refptr<webrtc::VideoTrackInterface> RTC::CreateVideoTrack(const std::string &label,
                                                                         webrtc::VideoTrackSourceInterface *source) {
       return peer_connection_factory->CreateVideoTrack(label, source);
   }
复制

创建 LocalMediaStream

调用 PeerConnectionFactory 的 API 创建 LocalMediaStream，并将之前的 Audio/VideoTrack 添加到该 Stream 中，最后将其添加到 PeerConnection 中。

   ...
   创建LocalMediaStream
   transport_stream = rtc->CreateLocalMediaStream("stream");
   添加Audio/VideoTrack
   transport_stream->AddTrack(video_track);
   transport_stream->AddTrack(audio_track);
   添加Stream到PeerConnection
   peer_connection->AddStream(transport_stream);
   ...
复制

创建 Data Channel

创建 Data Channel 的过程相比于前面创建音视频传输的过程就简单多了，调用一个 PeerConnection 的 API 就创建出来了，在创建的时候可以指令一些配置项，主要是用来约束该 Data Channel 的可靠性。需要注意的是，一个 Data Channel 在客户端这里会有两个对象一个代表本地端，一个代表远端，本地端的 DataChannel 对象通过 CreateDataChannel 获得，远端的 DataChannel 通过 PeerConnection 的 OnDataChannel 回调获得。当需要发送数据时，调用 DataChannel 的 Send 接口，当远端发送数据过来时，会触发 OnMessage 的回调函数。

   创建Data Channel
   DataChannel *
   PeerConnection::CreateDataChannel(std::string label, webrtc::DataChannelInit config, DataChannelObserver *observer) {
       rtc::scoped_refptr<webrtc::DataChannelInterface> data_channel = peer_connection->CreateDataChannel(label, &config);
       data_channel->RegisterObserver(observer);
       return new DataChannel(data_channel, observer);
   }


   可配置内容
   struct DataChannelInit {
      Deprecated. Reliability is assumed, and channel will be unreliable if
      maxRetransmitTime or MaxRetransmits is set.
     bool reliable = false;


      True if ordered delivery is required.
     bool ordered = true;


      The max period of time in milliseconds in which retransmissions will be
      sent. After this time, no more retransmissions will be sent. -1 if unset.
     
      Cannot be set along with |maxRetransmits|.
     int maxRetransmitTime = -1;


      The max number of retransmissions. -1 if unset.
     
      Cannot be set along with |maxRetransmitTime|.
     int maxRetransmits = -1;


      This is set by the application and opaque to the WebRTC implementation.
     std::string protocol;


      True if the channel has been externally negotiated and we do not send an
      in-band signalling in the form of an "open" message. If this is true, |id|
      below must be set; otherwise it should be unset and will be negotiated
      in-band.
     bool negotiated = false;


      The stream id, or SID, for SCTP data channels. -1 if unset (see above).
     int id = -1;
   };


   发送数据
   void DataChannel::Send(webrtc::DataBuffer &data_buffer) {
       data_channel->Send(data_buffer);
   }


    Message received.
   void OnMessage(const webrtc::DataBuffer &buffer) override {
       C++回调Java时需要将当前线程Attach到一个Java线程上
       JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
       jbyteArray jbyte_array = CHAR_POINTER_2_J_BYTE_ARRAY(env, buffer.data.cdata(),
                                                            static_cast<int>(buffer.data.size()));
       jclass data_buffer = GET_DATA_BUFFER_CLASS();
       jmethodID init_method = env->GetMethodID(data_buffer, "<init>", "([BZ)V");
       jobject data_buffer_object = env->NewObject(data_buffer, init_method,
                                                   jbyte_array,
                                                   buffer.binary);
       jclass observer_class = env->GetObjectClass(java_observer);
       jmethodID java_event_method = env->GetMethodID(observer_class, "onMessage",
                                                      "(Lpackage/name/of/rtc4j/model/DataBuffer;)V");
       找到对应的回调函数，并执行该函数
       env->CallVoidMethod(java_observer, java_event_method, data_buffer_object);
       释放相关引用
       env->ReleaseByteArrayElements(jbyte_array, env->GetByteArrayElements(jbyte_array, nullptr), JNI_ABORT);
       env->DeleteLocalRef(data_buffer_object);
       env->DeleteLocalRef(observer_class);
   }


   Attach c++线程到Java线程
   JNIEnv *ATTACH_CURRENT_THREAD_IF_NEEDED() {
       JNIEnv *jni = GetEnv();
       if (jni)
           return jni;
       JavaVMAttachArgs args;
       args.version = JNI_VERSION_1_8;
       args.group = nullptr;
       args.name = const_cast<char *>("JNI-RTC");
    Deal with difference in signatures between Oracle's jni.h and Android's.
   #ifdef _JavaSOFT_JNI_H_   Oracle's jni.h violates the JNI spec!
       void *env = nullptr;
   #else
       JNIEnv* env = nullptr;
   #endif
       RTC_CHECK(!g_java_vm->AttachCurrentThread(&env, &args)) << "Failed to attach thread";
       RTC_CHECK(env) << "AttachCurrentThread handed back NULL!";
       jni = reinterpret_cast<JNIEnv *>(env);
       return jni;
   }


   JNIEnv *GetEnv() {
       void *env = nullptr;
       jint status = g_java_vm->GetEnv(&env, JNI_VERSION_1_8);
       RTC_CHECK(((env != nullptr) && (status == JNI_OK)) ||
                 ((env == nullptr) && (status == JNI_EDETACHED)))
           << "Unexpected GetEnv return: " << status << ":" << env;
       return reinterpret_cast<JNIEnv *>(env);
   }


   //Detach 当前C++线程对应的Java线程
   void DETACH_CURRENT_THREAD_IF_NEEDED() {
       // This function only runs on threads where |g_jni_ptr| is non-NULL, meaning
       // we were responsible for originally attaching the thread, so are responsible
       // for detaching it now.  However, because some JVM implementations (notably
       // Oracle's http://goo.gl/eHApYT) also use the pthread_key_create mechanism,
       // the JVMs accounting info for this thread may already be wiped out by the
       // time this is called. Thus it may appear we are already detached even though
       // it was our responsibility to detach!  Oh well.
       if (!GetEnv())
           return;
       jint status = g_java_vm->DetachCurrentThread();
       RTC_CHECK(status == JNI_OK) << "Failed to detach thread: " << status;
       RTC_CHECK(!GetEnv()) << "Detaching was a successful no-op???";
   }
复制

在这一步中，我引入了一些关于 Attach Thread 和 Detach Thread 的相关内容，我觉得有必要进行简单的解释。之前我们提过，在 WebRTC 中会有三个主要线程，Worker Thread，Network Thread，Signaling Thread，其中 WebRTC 的回调都是通过 Worker Thread 来执行的。而这个 Worker Thread 是我们用 C++代码创建的独立线程，这类线程不像 Java 调用 C++代码那样能简单容易得获取到 JNIEnv，举个例子：比如如下代码：

   public class Widget {
   private native void nativeMethod();
   }
复制

他生成的 Native 头文件里对应的函数声明是这个样子：

   JNIEXPORT void JNICALL
   Java_xxxxx_nativeMethod(JNIEnv *env, jobject instance);
复制

我们可以看到，这个函数声明中第一个参数就是 JNIEnv，我们可以通过它以类似反射的形式调用 Java 中的函数代码。而 C++中独立创建的线程，是没有 JNIEnv 与之对应的，对于这些线程，如果你想要在其中调用 Java 代码，就必须先通过JavaVM::AttachCurrentThread
，将其 Attach 到一个 Java 线程上去，然后就能获得一个 JNIEnv。需要注意的是对于一个已经绑定到 JavaVM 上的线程调用AttachCurrentThread
不会有任何影响。如果你的线程已经绑定到了 JavaVM 上，你还可以通过调用JavaVM::GetEnv
获取 JNIEnv，如果你的线程没有绑定，这个函数返回 JNI_EDETACHED。最后当我们不再需要该线程调用 Java 代码时，需要调用DetachCurrentThread
来释放。

PeerConnection 建立连接

从上一步 Stream 加入到 PeerConnection 之后，剩下的工作就是如何利用 PeerConnection 的 API 和回调函数与其他客户端建立起连接了。这一步中主要涉及的 API 就是 CreateOffer，CreateAnswer，SetLocalDescription, SetRemoteDescription。在调用 CreateOffer，CreateAnswer 时，我们需要指定当前客户端是否接受另一客户端的 Audio/Video，而在我的使用场景中只会出现 Java 服务器给其他客户端推送音视频数据这种情况，所以我在使用的时候 ReceiveAudio/Video 均为 false。

   void PeerConnection::CreateAnswer(jobject java_observer) {
       create_session_observer->SetGlobalJavaObserver(java_observer, "answer");
       auto options = webrtc::PeerConnectionInterface::RTCOfferAnswerOptions();
       options.offer_to_receive_audio = false;
       options.offer_to_receive_video = false;
       peer_connection->CreateAnswer(create_session_observer, options);
   }


   void PeerConnection::CreateOffer(jobject java_observer) {
       create_session_observer->SetGlobalJavaObserver(java_observer, "offer");
       auto options = webrtc::PeerConnectionInterface::RTCOfferAnswerOptions();
       options.offer_to_receive_audio = false;
       options.offer_to_receive_video = false;
       peer_connection->CreateOffer(create_session_observer, options);
   }


   webrtc::SdpParseError PeerConnection::SetLocalDescription(JNIEnv *env, jobject sdp) {
       webrtc::SdpParseError error;
       webrtc::SessionDescriptionInterface *session_description(
               webrtc::CreateSessionDescription(GET_STRING_FROM_OBJECT(env, sdp, const_cast<char *>("type")),
                                                GET_STRING_FROM_OBJECT(env, sdp, const_cast<char *>("sdp")), &error));
       peer_connection->SetLocalDescription(set_session_description_observer, session_description);
       return error;
   }


   webrtc::SdpParseError PeerConnection::SetRemoteDescription(JNIEnv *env, jobject sdp) {
       webrtc::SdpParseError error;
       webrtc::SessionDescriptionInterface *session_description(
               webrtc::CreateSessionDescription(GET_STRING_FROM_OBJECT(env, sdp, const_cast<char *>("type")),
                                                GET_STRING_FROM_OBJECT(env, sdp, const_cast<char *>("sdp")), &error));
       peer_connection->SetRemoteDescription(set_session_description_observer, session_description);
       return error;
   }
复制

在 Java 端一般来说我都是以如下方式交换 SDP：

   //添加Stream到PeerConnection之后
   sessionRTCMap.get(headerAccessor.getSessionId()).getPeerConnection().createOffer(sdp -> executor.submit(() -> {
       try {
           sessionRTCMap.get(headerAccessor.getSessionId()).getPeerConnection().setLocalDescription(sdp);
           sendMessage(headerAccessor.getSessionId(), SDP_DESTINATION, sdp);
       } catch (Exception e) {
           log.error("{}", e);
       }
   }));


   //接收到远端传过来的Answer SDP之后
   SessionDescription sessionDescription = JSON.parseObject((String) requestResponse.getData(), SessionDescription.class);
   sessionRTCMap.get(headerAccessor.getSessionId()).getPeerConnection().setRemoteDescription(sessionDescription);
复制

走到这一步，正常来说，整个连接就已经连通了。接下来我会讲一下我是如何释放所有相关资源，作为正常使用场景的完结。这个部分也有不少坑，我当时由于对 WebRTC 指针管理机制的不熟悉，频繁出现泄露问题和操作非法指针问题，说出来都是泪啊 T.T。

释放所有相关资源

我们以 Java 中的释放过程作为起点，来浏览一下整个资源释放的过程。

   public void releaseResource() {
       lock.lock();
       try {
           //
           if (videoDataChannel != null) { //如果有使用DataChannel，先释放远端的DataChannel对象
               videoDataChannel.close();
               videoDataChannel = null;
           }
           log.info("Release remote video data channel");
           if (localVideoDataChannel != null) { //如果有使用DataChannel，接着释放本地的DataChannel对象
               localVideoDataChannel.close();
               localVideoDataChannel = null;
           }
           log.info("Release local video data channel");
           if (peerConnection != null) { //释放PeerConnection对象
               peerConnection.close();
               peerConnection = null;
           }
           log.info("Release peer connection");
           if (rtc != null) { //释放PeerConnectFactory相关对象
               rtc.close();
           }
           log.info("Release rtc");
       } catch (Exception ignored) {
       }finally {
           destroyed = true;
           lock.unlock();
       }
   }
复制

然后是 C++的相关释放代码：



   DataChannel::~DataChannel() {
       data_channel->UnregisterObserver(); //先解除注册进去的观察者
       delete data_channel_observer; //销毁观察者对象
       data_channel->Close(); //关闭Data Channel
       //rtc::scoped_refptr<webrtc::DataChannelInterface> data_channel; (Created by webrtc::PeerConnectionInterface::CreateDataChannel)
       data_channel = nullptr; //销毁Data Channel对象（计数指针）
   }




   PeerConnection::~PeerConnection() {
       peer_connection->Close(); //关闭PeerConnection
       //rtc::scoped_refptr<webrtc::PeerConnectionInterface> peer_connection; (Created by webrtc::PeerConnectionFactoryInterface::CreatePeerConnection)
       peer_connection = nullptr; //销毁PeerConnection对象（计数指针）
       delete peer_connection_observer; //销毁使用过的观察者
       delete set_session_description_observer; //销毁使用过的观察者
       delete create_session_observer; //销毁使用过的观察者
   }


   RTC::~RTC() {
       //rtc::scoped_refptr<webrtc::PeerConnectionFactoryInterface> peer_connection_factory; (Created by webrtc::CreatePeerConnectionFactory)
       peer_connection_factory = nullptr; //释放PeerConnectionFactory
       WEBRTC_LOG("Destroy peer connection factory", INFO);
       worker_thread->Invoke<void>(RTC_FROM_HERE, rtc::Bind(&RTC::ReleaseAudioDeviceModule, this)); //在Worker Thread释放AudioDeviceModule，因为是在这个线程创建的
       signaling_thread->Invoke<void>(RTC_FROM_HERE, rtc::Bind(&RTC::DetachCurrentThread, this)); //Detach signalling thread
       worker_thread->Invoke<void>(RTC_FROM_HERE, rtc::Bind(&RTC::DetachCurrentThread, this)); //Detach worker thread
       network_thread->Invoke<void>(RTC_FROM_HERE, rtc::Bind(&RTC::DetachCurrentThread, this)); //Detach network thread
       worker_thread->Stop(); //停止线程
       signaling_thread->Stop(); //停止线程
       network_thread->Stop(); //停止线程
       worker_thread.reset(); //销毁线程（计数指针）
       signaling_thread.reset(); //销毁线程（计数指针）
       network_thread.reset(); //销毁线程（计数指针）
       network_manager = nullptr; //销毁Network Manager（计数指针）
       socket_factory = nullptr; //销毁Socket Factory（计数指针）
       WEBRTC_LOG("Stop threads", INFO);
       if (video_capturer) {
           JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
           env->DeleteGlobalRef(video_capturer); //销毁对VideoCapturer的Java对象引用，这个对象是我保存在RTC类下的全局引用env->NewGlobalRef(video_capturer)
           //这里没有销毁AudioCapturer的Java引用是因为我将其引用保存在AudioDeviceModule中了
       }
   }
复制

至此，如果您只会涉及到正常 WebRTC 使用场景的话，那么我想您已经掌握了如何在 Java 中调用 WebRTC 的 Native APIs。接下来的部分，是我针对业务场景进行的一些 API 改动，如果您对这部分也感兴趣，就请听我慢慢道来。

附加内容

从 Java 采集音频数据

接口介绍

之前在介绍如何创建 PeerConnectionFactory时，我们提到了 AudioDeviceModule 这个接口，WebRTC 捕捉音频数据就是通过它来完成的。而我们正是通过实现这个接口，将自定义的音频采集模块注入到 WebRTC 中的。接下来我们先简单的看一下这个接口都包含什么内容。

   // 这里我只留下一些关键的内容
   class AudioDeviceModule : public rtc::RefCountInterface {
    public:


     // 该回调是音频采集的关键，当我们有新的音频数据时，需要将其封装成正确的形式，通过该回调传递音频数据
     // Full-duplex transportation of PCM audio
     virtual int32_t RegisterAudioCallback(AudioTransport* audioCallback) = 0;


     // 列出所有可使用的音频输入输出设备，因为我们要代理整个音频采集（输出）模块，所以这些函数只返回一个设备就行了
     // Device enumeration
     virtual int16_t PlayoutDevices() = 0;
     virtual int16_t RecordingDevices() = 0;
     virtual int32_t PlayoutDeviceName(uint16_t index,
                                       char name[kAdmMaxDeviceNameSize],
                                       char guid[kAdmMaxGuidSize]) = 0;
     virtual int32_t RecordingDeviceName(uint16_t index,
                                         char name[kAdmMaxDeviceNameSize],
                                         char guid[kAdmMaxGuidSize]) = 0;


     // 在需要进行音频采集和音频输出时，上层接口会通过下列函数指定想要使用的设备，因为前面几个函数我们只返回了一个设备，所有上层接口只会使用该设备
     // Device selection
     virtual int32_t SetPlayoutDevice(uint16_t index) = 0;
     virtual int32_t SetPlayoutDevice(WindowsDeviceType device) = 0;
     virtual int32_t SetRecordingDevice(uint16_t index) = 0;
     virtual int32_t SetRecordingDevice(WindowsDeviceType device) = 0;


     // 初始化内容
     // Audio transport initialization
     virtual int32_t PlayoutIsAvailable(bool* available) = 0;
     virtual int32_t InitPlayout() = 0;
     virtual bool PlayoutIsInitialized() const = 0;
     virtual int32_t RecordingIsAvailable(bool* available) = 0;
     virtual int32_t InitRecording() = 0;
     virtual bool RecordingIsInitialized() const = 0;


     // 开始录音/播放的接口
     // Audio transport control
     virtual int32_t StartPlayout() = 0;
     virtual int32_t StopPlayout() = 0;
     virtual bool Playing() const = 0;
     virtual int32_t StartRecording() = 0;
     virtual int32_t StopRecording() = 0;
     virtual bool Recording() const = 0;


     // 后面这部分是音频播放相关，我并没有使用到
     // Audio mixer initialization
     virtual int32_t InitSpeaker() = 0;
     virtual bool SpeakerIsInitialized() const = 0;
     virtual int32_t InitMicrophone() = 0;
     virtual bool MicrophoneIsInitialized() const = 0;


     // Speaker volume controls
     virtual int32_t SpeakerVolumeIsAvailable(bool* available) = 0;
     virtual int32_t SetSpeakerVolume(uint32_t volume) = 0;
     virtual int32_t SpeakerVolume(uint32_t* volume) const = 0;
     virtual int32_t MaxSpeakerVolume(uint32_t* maxVolume) const = 0;
     virtual int32_t MinSpeakerVolume(uint32_t* minVolume) const = 0;


     // Microphone volume controls
     virtual int32_t MicrophoneVolumeIsAvailable(bool* available) = 0;
     virtual int32_t SetMicrophoneVolume(uint32_t volume) = 0;
     virtual int32_t MicrophoneVolume(uint32_t* volume) const = 0;
     virtual int32_t MaxMicrophoneVolume(uint32_t* maxVolume) const = 0;
     virtual int32_t MinMicrophoneVolume(uint32_t* minVolume) const = 0;


     // Speaker mute control
     virtual int32_t SpeakerMuteIsAvailable(bool* available) = 0;
     virtual int32_t SetSpeakerMute(bool enable) = 0;
     virtual int32_t SpeakerMute(bool* enabled) const = 0;


     // Microphone mute control
     virtual int32_t MicrophoneMuteIsAvailable(bool* available) = 0;
     virtual int32_t SetMicrophoneMute(bool enable) = 0;
     virtual int32_t MicrophoneMute(bool* enabled) const = 0;


     // 多声道支持
     // Stereo support
     virtual int32_t StereoPlayoutIsAvailable(bool* available) const = 0;
     virtual int32_t SetStereoPlayout(bool enable) = 0;
     virtual int32_t StereoPlayout(bool* enabled) const = 0;
     virtual int32_t StereoRecordingIsAvailable(bool* available) const = 0;
     virtual int32_t SetStereoRecording(bool enable) = 0;
     virtual int32_t StereoRecording(bool* enabled) const = 0;


     // Playout delay
     virtual int32_t PlayoutDelay(uint16_t* delayMS) const = 0;


   };
复制

实现内容

简单浏览完 AudioDeviceModule 之后，想必大家应该已经有思路了，我这里因为只涉及到音频采集，所以只实现了其中几个接口。简单的讲，我的思路就是在 AudioDeviceModule 中创建一个线程，当StartReCording
被调用时，该线程开始以某一频率调用 Java 的相关代码来获取 Audio PCM 数据，然后以回调的形式上交数据。下面我就来介绍一下我实现的核心内容。

   // 首先，我定了一个两个下级接口与Java端接口对应
   class Capturer {
       public:
           virtual bool isJavaWrapper() {
               return false;
           }


           virtual ~Capturer() {}


           // Returns the sampling frequency in Hz of the audio data that this
           // capturer produces.
           virtual int SamplingFrequency() = 0;


           // Replaces the contents of |buffer| with 10ms of captured audio data
           // (see FakeAudioDevice::SamplesPerFrame). Returns true if the capturer can
           // keep producing data, or false when the capture finishes.
           virtual bool Capture(rtc::BufferT<int16_t> *buffer) = 0;
   };


   class Renderer {
       public:
           virtual ~Renderer() {}


           // Returns the sampling frequency in Hz of the audio data that this
           // renderer receives.
           virtual int SamplingFrequency() const = 0;


           // Renders the passed audio data and returns true if the renderer wants
           // to keep receiving data, or false otherwise.
           virtual bool Render(rtc::ArrayView<const int16_t> data) = 0;
   };


   // 这两个下级接口的实现如下
   class JavaAudioCapturerWrapper final : public FakeAudioDeviceModule::Capturer {
       public:


           // 构造函数主要是保存Java音频采集类的全局引用，然后获取到需要的函数
           JavaAudioCapturerWrapper(jobject audio_capturer)
                   : java_audio_capturer(audio_capturer) {
               WEBRTC_LOG("Instance java audio capturer wrapper.", INFO);
               JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
               audio_capture_class = env->GetObjectClass(java_audio_capturer);
               sampling_frequency_method = env->GetMethodID(audio_capture_class, "samplingFrequency", "()I");
               capture_method = env->GetMethodID(audio_capture_class, "capture", "(I)Ljava/nio/ByteBuffer;");
               WEBRTC_LOG("Instance java audio capturer wrapper end.", INFO);
           }


           // 析构函数释放Java引用
           ~JavaAudioCapturerWrapper() {
               JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
               if (audio_capture_class != nullptr) {
                   env->DeleteLocalRef(audio_capture_class);
                   audio_capture_class = nullptr;
               }
               if (java_audio_capturer) {
                   env->DeleteGlobalRef(java_audio_capturer);
                   java_audio_capturer = nullptr;
               }
           }


           bool isJavaWrapper() override {
               return true;
           }


           // 调用Java端函数获取采样率，这里我是调用了一次Java函数之后，就讲该值缓存了起来
           int SamplingFrequency() override {
               if (sampling_frequency_in_hz == 0) {
                   JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
                   this->sampling_frequency_in_hz = env->CallIntMethod(java_audio_capturer, sampling_frequency_method);
               }
               return sampling_frequency_in_hz;
           }


           // 调用Java函数获取PCM数据，这里值得注意的是需要返回16-bit-小端序的PCM数据，
           bool Capture(rtc::BufferT<int16_t> *buffer) override {
               buffer->SetData(
                       FakeAudioDeviceModule::SamplesPerFrame(SamplingFrequency()), // 通过该函数计算data buffer的size
                       [&](rtc::ArrayView<int16_t> data "&") { // 得到前一个参数设置的指定大小的数据块
                           JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
                           size_t length;
                           jobject audio_data_buffer = env->CallObjectMethod(java_audio_capturer, capture_method,
                                                                             data.size() * 2);// 因为Java端操作的数据类型是Byte，所以这里size * 2
                           void *audio_data_address = env->GetDirectBufferAddress(audio_data_buffer);
                           jlong audio_data_size = env->GetDirectBufferCapacity(audio_data_buffer);
                           length = (size_t) audio_data_size / 2; // int16 等于 2个Byte
                           memcpy(data.data(), audio_data_address, length * 2);
                           env->DeleteLocalRef(audio_data_buffer);
                           return length;
                       });
               return buffer->size() == buffer->capacity();
           }


       private:
           jobject java_audio_capturer;
           jclass audio_capture_class;
           jmethodID sampling_frequency_method;
           jmethodID capture_method;
           int sampling_frequency_in_hz = 0;
   };


   size_t FakeAudioDeviceModule::SamplesPerFrame(int sampling_frequency_in_hz) {
       return rtc::CheckedDivExact(sampling_frequency_in_hz, kFramesPerSecond);
   }


   constexpr int kFrameLengthMs = 10; // 10ms采集一次数据
   constexpr int kFramesPerSecond = 1000 / kFrameLengthMs; //每秒采集的帧数


   // 播放器里其实什么也没干^.^
   class DiscardRenderer final : public FakeAudioDeviceModule::Renderer {
   public:
       explicit DiscardRenderer(int sampling_frequency_in_hz)
               : sampling_frequency_in_hz_(sampling_frequency_in_hz) {}


       int SamplingFrequency() const override {
           return sampling_frequency_in_hz_;
       }


       bool Render(rtc::ArrayView<const int16_t>) override {
           return true;
       }


   private:
       int sampling_frequency_in_hz_;
   };


   // 接下来是AudioDeviceModule的核心实现，我使用WebRTC提供的EventTimerWrapper和跨平台线程库来实现周期性Java采集函数调用
   std::unique_ptr<webrtc::EventTimerWrapper> tick_;
   rtc::PlatformThread thread_;


   // 构造函数
   FakeAudioDeviceModule::FakeAudioDeviceModule(std::unique_ptr<Capturer> capturer,
                                                std::unique_ptr<Renderer> renderer,
                                                float speed)
           : capturer_(std::move(capturer)),
             renderer_(std::move(renderer)),
             speed_(speed),
             audio_callback_(nullptr),
             rendering_(false),
             capturing_(false),
             done_rendering_(true, true),
             done_capturing_(true, true),
             tick_(webrtc::EventTimerWrapper::Create()),
             thread_(FakeAudioDeviceModule::Run, this, "FakeAudioDeviceModule") {
   }


   // 主要是将rendering_置为true
   int32_t FakeAudioDeviceModule::StartPlayout() {
       rtc::CritScope cs(&lock_);
       RTC_CHECK(renderer_);
       rendering_ = true;
       done_rendering_.Reset();
       return 0;
   }


   // 主要是将rendering_置为false
   int32_t FakeAudioDeviceModule::StopPlayout() {
       rtc::CritScope cs(&lock_);
       rendering_ = false;
       done_rendering_.Set();
       return 0;
   }


   // 主要是将capturing_置为true
   int32_t FakeAudioDeviceModule::StartRecording() {
       rtc::CritScope cs(&lock_);
       WEBRTC_LOG("Start audio recording", INFO);
       RTC_CHECK(capturer_);
       capturing_ = true;
       done_capturing_.Reset();
       return 0;
   }


   // 主要是将capturing_置为false
   int32_t FakeAudioDeviceModule::StopRecording() {
       rtc::CritScope cs(&lock_);
       WEBRTC_LOG("Stop audio recording", INFO);
       capturing_ = false;
       done_capturing_.Set();
       return 0;
   }


   // 设置EventTimer的频率，并开启线程
   int32_t FakeAudioDeviceModule::Init() {
       RTC_CHECK(tick_->StartTimer(true, kFrameLengthMs / speed_));
       thread_.Start();
       thread_.SetPriority(rtc::kHighPriority);
       return 0;
   }


   // 保存上层音频采集的回调函数，之后我们会用它上交音频数据
   int32_t FakeAudioDeviceModule::RegisterAudioCallback(webrtc::AudioTransport *callback) {
       rtc::CritScope cs(&lock_);
       RTC_DCHECK(callback || audio_callback_);
       audio_callback_ = callback;
       return 0;
   }


   bool FakeAudioDeviceModule::Run(void *obj) {
       static_cast<FakeAudioDeviceModule *>(obj)->ProcessAudio();
       return true;
   }


   void FakeAudioDeviceModule::ProcessAudio() {
       {
           rtc::CritScope cs(&lock_);
           if (needDetachJvm) {
               WEBRTC_LOG("In audio device module process audio", INFO);
           }
           auto start = std::chrono::steady_clock::now();
           if (capturing_) {
               // Capture 10ms of audio. 2 bytes per sample.
               // 获取音频数据
               const bool keep_capturing = capturer_->Capture(&recording_buffer_);
               uint32_t new_mic_level;
               if (keep_capturing) {
                   // 通过回调函数上交音频数据，这里包括：数据，数据大小，每次采样数据多少byte，声道数，采样率，延时等
                   audio_callback_->RecordedDataIsAvailable(
                           recording_buffer_.data(), recording_buffer_.size(), 2, 1,
                           static_cast<const uint32_t>(capturer_->SamplingFrequency()), 0, 0, 0, false, new_mic_level);
               }
               // 如果没有音频数据了，就停止采集
               if (!keep_capturing) {
                   capturing_ = false;
                   done_capturing_.Set();
               }
           }
           if (rendering_) {
               size_t samples_out;
               int64_t elapsed_time_ms;
               int64_t ntp_time_ms;
               const int sampling_frequency = renderer_->SamplingFrequency();
               // 从上层接口获取音频数据
               audio_callback_->NeedMorePlayData(
                       SamplesPerFrame(sampling_frequency), 2, 1, static_cast<const uint32_t>(sampling_frequency),
                       playout_buffer_.data(), samples_out, &elapsed_time_ms, &ntp_time_ms);
               // 播放音频数据
               const bool keep_rendering = renderer_->Render(
                       rtc::ArrayView<const int16_t>(playout_buffer_.data(), samples_out));
               if (!keep_rendering) {
                   rendering_ = false;
                   done_rendering_.Set();
               }
           }
           auto end = std::chrono::steady_clock::now();
           auto diff = std::chrono::duration<double, std::milli>(end - start).count();
           if (diff > kFrameLengthMs) {
               WEBRTC_LOG("JNI capture audio data timeout, real capture time is " + std::to_string(diff) + " ms", DEBUG);
           }
           // 如果AudioDeviceModule要被销毁了，就Detach Thread
           if (capturer_->isJavaWrapper() && needDetachJvm && !detached2Jvm) {
               DETACH_CURRENT_THREAD_IF_NEEDED();
               detached2Jvm = true;
           } else if (needDetachJvm) {
               detached2Jvm = true;
           }
       }
       // 时间没到就一直等，当够了10ms会触发下一次音频处理过程
       tick_->Wait(WEBRTC_EVENT_INFINITE);
   }


   // 析构函数
   FakeAudioDeviceModule::~FakeAudioDeviceModule() {
       WEBRTC_LOG("In audio device module FakeAudioDeviceModule", INFO);
       StopPlayout(); // 关闭播放
       StopRecording(); // 关闭采集
       needDetachJvm = true; // 触发工作线程的Detach
       while (!detached2Jvm) { // 等待工作线程Detach完毕
       }
       WEBRTC_LOG("In audio device module after detached2Jvm", INFO);
       thread_.Stop();// 关闭线程
       WEBRTC_LOG("In audio device module ~FakeAudioDeviceModule finished", INFO);
   }
复制

顺便一提，在 Java 端我采用了直接内存来传递音频数据，主要是因为这样减少内存拷贝。

从 Java 采集视频数据

从 Java 采集视频数据和采集音频数据的过程十分相似，不过视频采集模块的注入是在创建 VideoSource的时候，此外还有一个需要注意的点是，需要在SignallingThread创建 VideoCapturer。

   ...
   video_source = rtc->CreateVideoSource(rtc->CreateFakeVideoCapturerInSignalingThread());
   ...


   FakeVideoCapturer *RTC::CreateFakeVideoCapturerInSignalingThread() {
       if (video_capturer) {
           return signaling_thread->Invoke<FakeVideoCapturer *>(RTC_FROM_HERE,
                                                                rtc::Bind(&RTC::CreateFakeVideoCapturer, this,
                                                                          video_capturer));
       } else {
           return nullptr;
       }
   }
复制

VideoCapturer 这个接口中需要我们实现的内容也并不多，关键的就是主循环，开始，关闭，接下来看一下我的实现吧。

   // 构造函数
   FakeVideoCapturer::FakeVideoCapturer(jobject video_capturer)
           : running_(false),
             video_capturer(video_capturer),
             is_screen_cast(false),
             ticker(webrtc::EventTimerWrapper::Create()),
             thread(FakeVideoCapturer::Run, this, "FakeVideoCapturer") {
       // 保存会使用到的Java函数
       JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
       video_capture_class = env->GetObjectClass(video_capturer);
       get_width_method = env->GetMethodID(video_capture_class, "getWidth", "()I");
       get_height_method = env->GetMethodID(video_capture_class, "getHeight", "()I");
       get_fps_method = env->GetMethodID(video_capture_class, "getFps", "()I");
       capture_method = env->GetMethodID(video_capture_class, "capture", "()Lpackage/name/of/rtc4j/model/VideoFrame;");
       width = env->CallIntMethod(video_capturer, get_width_method);
       previous_width = width;
       height = env->CallIntMethod(video_capturer, get_height_method);
       previous_height = height;
       fps = env->CallIntMethod(video_capturer, get_fps_method);
       // 设置上交的数据格式YUV420
       static const cricket::VideoFormat formats[] = {
               {width, height, cricket::VideoFormat::FpsToInterval(fps), cricket::FOURCC_I420}
       };
       SetSupportedFormats({&formats[0], &formats[arraysize(formats)]});
       // 根据Java中反馈的FPS设置主循环执行间隔
       RTC_CHECK(ticker->StartTimer(true, rtc::kNumMillisecsPerSec / fps));
       thread.Start();
       thread.SetPriority(rtc::kHighPriority);
       // 因为Java端传输过来的时Jpg图片，所以我这里用libjpeg-turbo进行了解压，转成YUV420
       decompress_handle = tjInitDecompress();
       WEBRTC_LOG("Create fake video capturer, " + std::to_string(width) + ", " + std::to_string(height), INFO);
   }


   // 析构函数
   FakeVideoCapturer::~FakeVideoCapturer() {
       thread.Stop();
       SignalDestroyed(this);
       // 释放Java资源
       JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
       if (video_capture_class != nullptr) {
           env->DeleteLocalRef(video_capture_class);
           video_capture_class = nullptr;
       }
       // 释放解压器
       if (decompress_handle) {
           if (tjDestroy(decompress_handle) != 0) {
               WEBRTC_LOG("Release decompress handle failed, reason is: " + std::string(tjGetErrorStr2(decompress_handle)),
                          ERROR);
           }
       }
       WEBRTC_LOG("Free fake video capturer", INFO);
   }


   bool FakeVideoCapturer::Run(void *obj) {
       static_cast<FakeVideoCapturer *>(obj)->CaptureFrame();
       return true;
   }


   void FakeVideoCapturer::CaptureFrame() {
       {
           rtc::CritScope cs(&lock_);
           if (running_) {
               int64_t t0 = rtc::TimeMicros();
               JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
               // 从Java端获取每一帧的图片，
               jobject java_video_frame = env->CallObjectMethod(video_capturer, capture_method);
               if (java_video_frame == nullptr) { // 如果返回的图片为空，就上交一张纯黑的图片
                   rtc::scoped_refptr<webrtc::I420Buffer> buffer = webrtc::I420Buffer::Create(previous_width,
                                                                                              previous_height);
                   webrtc::I420Buffer::SetBlack(buffer);
                   OnFrame(webrtc::VideoFrame(buffer, (webrtc::VideoRotation) previous_rotation, t0), previous_width,
                           previous_height);
                   return;
               }
               // Java中使用直接内存来传输图片
               jobject java_data_buffer = env->CallObjectMethod(java_video_frame, GET_VIDEO_FRAME_BUFFER_GETTER_METHOD());
               auto data_buffer = (unsigned char *) env->GetDirectBufferAddress(java_data_buffer);
               auto length = (unsigned long) env->CallIntMethod(java_video_frame, GET_VIDEO_FRAME_LENGTH_GETTER_METHOD());
               int rotation = env->CallIntMethod(java_video_frame, GET_VIDEO_FRAME_ROTATION_GETTER_METHOD());
               int width;
               int height;
               // 解压Jpeg头部信息，获取长宽
               tjDecompressHeader(decompress_handle, data_buffer, length, &width, &height);
               previous_width = width;
               previous_height = height;
               previous_rotation = rotation;
               // 以32对齐的方式解压并上交YUV420数据，这里采用32对齐是因为这样编码效率更高，此外mac上的videotoolbox编码要求必须使用32对齐
               rtc::scoped_refptr<webrtc::I420Buffer> buffer =
                       webrtc::I420Buffer::Create(width, height,
                                                  width % 32 == 0 ? width : width / 32 * 32 + 32,
                                                  (width / 2) % 32 == 0 ? (width / 2) : (width / 2) / 32 * 32 + 32,
                                                  (width / 2) % 32 == 0 ? (width / 2) : (width / 2) / 32 * 32 + 32);
               uint8_t *planes[] = {buffer->MutableDataY(), buffer->MutableDataU(), buffer->MutableDataV()};
               int strides[] = {buffer->StrideY(), buffer->StrideU(), buffer->StrideV()};
               tjDecompressToYUVPlanes(decompress_handle, data_buffer, length, planes, width, strides, height,
                                       TJFLAG_FASTDCT | TJFLAG_NOREALLOC);
               env->DeleteLocalRef(java_data_buffer);
               env->DeleteLocalRef(java_video_frame);
               // OnFrame 函数就是将数据递交给WebRTC的接口
               OnFrame(webrtc::VideoFrame(buffer, (webrtc::VideoRotation) rotation, t0), width, height);
           }
       }
       ticker->Wait(WEBRTC_EVENT_INFINITE);
   }


   // 开启
   cricket::CaptureState FakeVideoCapturer::Start(
           const cricket::VideoFormat &format) {
       //SetCaptureFormat(&format); This will cause crash in CentOS
       running_ = true;
       SetCaptureState(cricket::CS_RUNNING);
       WEBRTC_LOG("Start fake video capturing", INFO);
       return cricket::CS_RUNNING;
   }


   // 关闭
   void FakeVideoCapturer::Stop() {
       running_ = false;
       //SetCaptureFormat(nullptr); This will cause crash in CentOS
       SetCaptureState(cricket::CS_STOPPED);
       WEBRTC_LOG("Stop fake video capturing", INFO);
   }


   // YUV420
   bool FakeVideoCapturer::GetPreferredFourccs(std::vector<uint32_t> *fourccs) {
       fourccs->push_back(cricket::FOURCC_I420);
       return true;
   }


   // 调用默认实现
   void FakeVideoCapturer::AddOrUpdateSink(rtc::VideoSinkInterface<webrtc::VideoFrame> *sink,
                                           const rtc::VideoSinkWants &wants) {
       cricket::VideoCapturer::AddOrUpdateSink(sink, wants);
   }


   void FakeVideoCapturer::RemoveSink(rtc::VideoSinkInterface<webrtc::VideoFrame> *sink) {
       cricket::VideoCapturer::RemoveSink(sink);
   }


复制

至此，如何从 Java 端获取音视频数据的部分就介绍完了，你会发现这个东西其实并不难，我这就算是抛砖引玉吧，大家可以通过我的实现，更快的理解这部分的流程。

限制连接端口

回顾一下之前进行端口限制的完成流程，在创建 PeerConnectionFactory的时候，我们实例化了一个 SocketFactory 和一个默认的 NetworkManager，随后在创建 PeerConnection的时候，我们通过这两个实例创建了一个 PortAllocator，并将这个 PortAllocator 注入到 PeerConnection 中。整个流程中，真正做端口限制的代码都在 SocketFactory 中，当然，也用到了 PortAllocator 的 API。这里你可能会有疑问，PortAllocator 中不是有接口可以限制端口范围吗，怎么还需要 SocketFactory？

   std::unique_ptr<cricket::PortAllocator> port_allocator(
   new cricket::BasicPortAllocator(network_manager.get(), socket_factory.get()));
   port_allocator->SetPortRange(this->min_port, this->max_port); // Port allocator的端口限制API
复制

我当时也是只通过这个 API 设置了端口，但是我发现它还是会申请限制之外的端口来做一些别的事情，所以最后我直接复写了 SocketFactory，将所有非法端口的申请都给禁掉了，此外因为我们的服务器上还有一些不能用的子网 IP，我也在 SocketFactory 中进行了处理，我的实现内容如下：

   rtc::AsyncPacketSocket *
   rtc::SocketFactoryWrapper::CreateUdpSocket(const rtc::SocketAddress &local_address, uint16_t min_port,
                                              uint16_t max_port) {
       // 端口非法判断
       if (min_port < this->min_port || max_port > this->max_port) {
           WEBRTC_LOG("Create udp socket cancelled, port out of range, expect port range is:" +
                      std::to_string(this->min_port) + "->" + std::to_string(this->max_port)
                      + "parameter port range is: " + std::to_string(min_port) + "->" + std::to_string(max_port),
                      LogLevel::INFO);
           return nullptr;
       }
       // IP非法判断
       if (!local_address.IsPrivateIP() || local_address.HostAsURIString().find(this->white_private_ip_prefix) == 0) {
           rtc::AsyncPacketSocket *result = BasicPacketSocketFactory::CreateUdpSocket(local_address, min_port, max_port);
           const auto *address = static_cast<const void *>(result);
           std::stringstream ss;
           ss << address;
           WEBRTC_LOG("Create udp socket, min port is:" + std::to_string(min_port) + ", max port is: " +
                      std::to_string(max_port) + ", result is: " + result->GetLocalAddress().ToString() + "->" +
                      result->GetRemoteAddress().ToString() + ", new socket address is: " + ss.str(), LogLevel::INFO);


           return result;
       } else {
           WEBRTC_LOG("Create udp socket cancelled, this ip is not in while list:" + local_address.HostAsURIString(),
                      LogLevel::INFO);
           return nullptr;
       }
   }
复制

自定义视频编码

您可能已经知道了，WebRTC 技术默认是使用 VP8 进行编码的，而普遍的观点是 VP8 并没有 H264 好。此外 Safari 是不支持 VP8 编码的，所以在与 Safari 进行通讯的时候 WebRTC 使用的是 OpenH264 进行视频编码，而 OpenH264 效率又没有 libx264 高，所以我对编码部分的改善主要就集中在：

替换默认编码方案为 H264
基于 FFmpeg 使用 libx264 进行视频编码，并且当宿主机有较好的 GPU 时我会使用 GPU 进行加速（h264_nvenc）
支持运行时修改传输比特率

替换默认编码

替换默认编码方案为 H264 比较简单，我们只需要复写 VideoEncoderFactory 的GetSupportedFormats
：

   // Returns a list of supported video formats in order of preference, to use
   // for signaling etc.
   std::vector<webrtc::SdpVideoFormat> GetSupportedFormats() const override {
       return GetAllSupportedFormats();
   }


   // 这里我设置了只支持H264编码，打包模式为NonInterleaved
   std::vector<webrtc::SdpVideoFormat> GetAllSupportedFormats() {
       std::vector<webrtc::SdpVideoFormat> supported_codecs;
       supported_codecs.emplace_back(CreateH264Format(webrtc::H264::kProfileBaseline, webrtc::H264::kLevel3_1, "1"));
       return supported_codecs;
   }


   webrtc::SdpVideoFormat CreateH264Format(webrtc::H264::Profile profile,
                                           webrtc::H264::Level level,
                                           const std::string &packetization_mode) {
       const absl::optional<std::string> profile_string =
               webrtc::H264::ProfileLevelIdToString(webrtc::H264::ProfileLevelId(profile, level));
       RTC_CHECK(profile_string);
       return webrtc::SdpVideoFormat(cricket::kH264CodecName,
                                     {{cricket::kH264FmtpProfileLevelId,        *profile_string},
                                      {cricket::kH264FmtpLevelAsymmetryAllowed, "1"},
                                      {cricket::kH264FmtpPacketizationMode,     packetization_mode}});
   }


复制

实现编码器

然后是基于 FFmpeg 对VideoEncoder
接口的实现，对 FFmpeg 的使用我主要参考了官方 Example^[13]。然后简单看看我们需要实现 VideoEncoder 的什么接口吧：

   FFmpegH264EncoderImpl(const cricket::VideoCodec &codec, bool hardware_accelerate);


   ~FFmpegH264EncoderImpl() override;


   // |max_payload_size| is ignored.
   // The following members of |codec_settings| are used. The rest are ignored.
   // - codecType (must be kVideoCodecH264)
   // - targetBitrate
   // - maxFramerate
   // - width
   // - height
   // 初始化编码器
   int32_t InitEncode(const webrtc::VideoCodec *codec_settings,
                      int32_t number_of_cores,
                      size_t max_payload_size) override;


   // 释放资源
   int32_t Release() override;


   // 当我们编码完成时，通过该回调上交视频帧
   int32_t RegisterEncodeCompleteCallback(
           webrtc::EncodedImageCallback *callback) override;


   // WebRTC自己的码率控制器，它会根据当前网络情况，修改码率
   int32_t SetRateAllocation(const webrtc::VideoBitrateAllocation &bitrate_allocation,
                             uint32_t framerate) override;


   // The result of encoding - an EncodedImage and RTPFragmentationHeader - are
   // passed to the encode complete callback.
   int32_t Encode(const webrtc::VideoFrame &frame,
                  const webrtc::CodecSpecificInfo *codec_specific_info,
                  const std::vector<webrtc::FrameType> *frame_types) override;
复制

在实现这个接口时，参考了 WebRTC 官方的 OpenH264Encoder，需要注意的是 WebRTC 是能支持 Simulcast 的，所以这个的编码实例可能会有多个，也就是说一个 Stream 对应一个编码实例。接下来，我讲逐步讲解我的实现方案，因为这个地方比较复杂。先介绍一下我这里定义的结构体和成员变量吧：

   // 用该结构体保存一个编码实例的所有相关资源
   typedef struct {
       AVCodec *codec = nullptr;        //指向编解码器实例
       AVFrame *frame = nullptr;        //保存解码之后/编码之前的像素数据
       AVCodecContext *context = nullptr;    //编解码器上下文，保存编解码器的一些参数设置
       AVPacket *pkt = nullptr;        //码流包结构，包含编码码流数据
   } CodecCtx;


   // 编码器实例
   std::vector<CodecCtx *> encoders_;
   // 编码器参数
   std::vector<LayerConfig> configurations_;
   // 编码完成后的图片
   std::vector<webrtc::EncodedImage> encoded_images_;
   // 图片缓存部分
   std::vector<std::unique_ptr<uint8_t[]>> encoded_image_buffers_;
   // 编码相关配置
   webrtc::VideoCodec codec_;
   webrtc::H264PacketizationMode packetization_mode_;
   size_t max_payload_size_;
   int32_t number_of_cores_;
   // 编码完成后的回调
   webrtc::EncodedImageCallback *encoded_image_callback_;
复制

构造函数部分比较简单，就是保存打包格式，以及申请空间：

   FFmpegH264EncoderImpl::FFmpegH264EncoderImpl(const cricket::VideoCodec &codec, bool hardware)
           : packetization_mode_(webrtc::H264PacketizationMode::SingleNalUnit),
             max_payload_size_(0),
             hardware_accelerate(hardware),
             number_of_cores_(0),
             encoded_image_callback_(nullptr),
             has_reported_init_(false),
             has_reported_error_(false) {
       RTC_CHECK(cricket::CodecNamesEq(codec.name, cricket::kH264CodecName));
       std::string packetization_mode_string;
       if (codec.GetParam(cricket::kH264FmtpPacketizationMode,
                          &packetization_mode_string) &&
           packetization_mode_string == "1") {
           packetization_mode_ = webrtc::H264PacketizationMode::NonInterleaved;
       }
       encoded_images_.reserve(webrtc::kMaxSimulcastStreams);
       encoded_image_buffers_.reserve(webrtc::kMaxSimulcastStreams);
       encoders_.reserve(webrtc::kMaxSimulcastStreams);
       configurations_.reserve(webrtc::kMaxSimulcastStreams);
   }
复制

然后是非常关键的初始化编码器过程，在这里我先是进行了一个检查，然后对每一个 Stream 创建相应的编码器实例：

   int32_t FFmpegH264EncoderImpl::InitEncode(const webrtc::VideoCodec *inst,
                                             int32_t number_of_cores,
                                             size_t max_payload_size) {
       ReportInit();
       if (!inst || inst->codecType != webrtc::kVideoCodecH264) {
           ReportError();
           return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;
       }
       if (inst->maxFramerate == 0) {
           ReportError();
           return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;
       }
       if (inst->width < 1 || inst->height < 1) {
           ReportError();
           return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;
       }


       int32_t release_ret = Release();
       if (release_ret != WEBRTC_VIDEO_CODEC_OK) {
           ReportError();
           return release_ret;
       }


       int number_of_streams = webrtc::SimulcastUtility::NumberOfSimulcastStreams(*inst);
       bool doing_simulcast = (number_of_streams > 1);


       if (doing_simulcast && (!webrtc::SimulcastUtility::ValidSimulcastResolutions(
               *inst, number_of_streams) ||
                               !webrtc::SimulcastUtility::ValidSimulcastTemporalLayers(
                                       *inst, number_of_streams))) {
           return WEBRTC_VIDEO_CODEC_ERR_SIMULCAST_PARAMETERS_NOT_SUPPORTED;
       }
       encoded_images_.resize(static_cast<unsigned long>(number_of_streams));
       encoded_image_buffers_.resize(static_cast<unsigned long>(number_of_streams));
       encoders_.resize(static_cast<unsigned long>(number_of_streams));
       configurations_.resize(static_cast<unsigned long>(number_of_streams));
       for (int i = 0; i < number_of_streams; i++) {
           encoders_[i] = new CodecCtx();
       }
       number_of_cores_ = number_of_cores;
       max_payload_size_ = max_payload_size;
       codec_ = *inst;


       // Code expects simulcastStream resolutions to be correct, make sure they are
       // filled even when there are no simulcast layers.
       if (codec_.numberOfSimulcastStreams == 0) {
           codec_.simulcastStream[0].width = codec_.width;
           codec_.simulcastStream[0].height = codec_.height;
       }


       for (int i = 0, idx = number_of_streams - 1; i < number_of_streams;
            ++i, --idx) {
           // Temporal layers still not supported.
           if (inst->simulcastStream[i].numberOfTemporalLayers > 1) {
               Release();
               return WEBRTC_VIDEO_CODEC_ERR_SIMULCAST_PARAMETERS_NOT_SUPPORTED;
           }




           // Set internal settings from codec_settings
           configurations_[i].simulcast_idx = idx;
           configurations_[i].sending = false;
           configurations_[i].width = codec_.simulcastStream[idx].width;
           configurations_[i].height = codec_.simulcastStream[idx].height;
           configurations_[i].max_frame_rate = static_cast<float>(codec_.maxFramerate);
           configurations_[i].frame_dropping_on = codec_.H264()->frameDroppingOn;
           configurations_[i].key_frame_interval = codec_.H264()->keyFrameInterval;


           // Codec_settings uses kbits/second; encoder uses bits/second.
           configurations_[i].max_bps = codec_.maxBitrate * 1000;
           configurations_[i].target_bps = codec_.startBitrate * 1000;
           if (!OpenEncoder(encoders_[i], configurations_[i])) {
               Release();
               ReportError();
               return WEBRTC_VIDEO_CODEC_ERROR;
           }
           // Initialize encoded image. Default buffer size: size of unencoded data.
           encoded_images_[i]._size =
                   CalcBufferSize(webrtc::VideoType::kI420, codec_.simulcastStream[idx].width,
                                  codec_.simulcastStream[idx].height);
           encoded_images_[i]._buffer = new uint8_t[encoded_images_[i]._size];
           encoded_image_buffers_[i].reset(encoded_images_[i]._buffer);
           encoded_images_[i]._completeFrame = true;
           encoded_images_[i]._encodedWidth = codec_.simulcastStream[idx].width;
           encoded_images_[i]._encodedHeight = codec_.simulcastStream[idx].height;
           encoded_images_[i]._length = 0;
       }


       webrtc::SimulcastRateAllocator init_allocator(codec_);
       webrtc::BitrateAllocation allocation = init_allocator.GetAllocation(
               codec_.startBitrate * 1000, codec_.maxFramerate);
       return SetRateAllocation(allocation, codec_.maxFramerate);
   }


   // OpenEncoder函数是创建编码器的过程，这个函数中有一个隐晦的点是创建AVFrame时一定要记得设置为32内存对齐，这个之前我们在采集图像数据的时候提过
   bool FFmpegH264EncoderImpl::OpenEncoder(FFmpegH264EncoderImpl::CodecCtx *ctx, H264Encoder::LayerConfig &config) {
       int ret;
       /* find the mpeg1 video encoder */
   #ifdef WEBRTC_LINUX
       if (hardware_accelerate) {
           ctx->codec = avcodec_find_encoder_by_name("h264_nvenc");
       }
   #endif
       if (!ctx->codec) {
           ctx->codec = avcodec_find_encoder_by_name("libx264");
       }
       if (!ctx->codec) {
           WEBRTC_LOG("Codec not found", ERROR);
           return false;
       }
       WEBRTC_LOG("Open encoder: " + std::string(ctx->codec->name) + ", and generate frame, packet", INFO);


       ctx->context = avcodec_alloc_context3(ctx->codec);
       if (!ctx->context) {
           WEBRTC_LOG("Could not allocate video codec context", ERROR);
           return false;
       }
       config.target_bps = config.max_bps;
       SetContext(ctx, config, true);
       /* open it */
       ret = avcodec_open2(ctx->context, ctx->codec, nullptr);
       if (ret < 0) {
           WEBRTC_LOG("Could not open codec, error code:" + std::to_string(ret), ERROR);
           avcodec_free_context(&(ctx->context));
           return false;
       }


       ctx->frame = av_frame_alloc();
       if (!ctx->frame) {
           WEBRTC_LOG("Could not allocate video frame", ERROR);
           return false;
       }
       ctx->frame->format = ctx->context->pix_fmt;
       ctx->frame->width = ctx->context->width;
       ctx->frame->height = ctx->context->height;
       ctx->frame->color_range = ctx->context->color_range;
       /* the image can be allocated by any means and av_image_alloc() is
        * just the most convenient way if av_malloc() is to be used */
       ret = av_image_alloc(ctx->frame->data, ctx->frame->linesize, ctx->context->width, ctx->context->height,
                            ctx->context->pix_fmt, 32);
       if (ret < 0) {
           WEBRTC_LOG("Could not allocate raw picture buffer", ERROR);
           return false;
       }
       ctx->frame->pts = 1;
       ctx->pkt = av_packet_alloc();
       return true;
   }


   // 设置FFmpeg编码器的参数
   void FFmpegH264EncoderImpl::SetContext(CodecCtx *ctx, H264Encoder::LayerConfig &config, bool init) {
       if (init) {
           AVRational rational = {1, 25};
           ctx->context->time_base = rational;
           ctx->context->max_b_frames = 0;
           ctx->context->pix_fmt = AV_PIX_FMT_YUV420P;
           ctx->context->codec_type = AVMEDIA_TYPE_VIDEO;
           ctx->context->codec_id = AV_CODEC_ID_H264;
           ctx->context->gop_size = config.key_frame_interval;
           ctx->context->color_range = AVCOL_RANGE_JPEG;
           // 设置两个参数让编码过程更快
           if (std::string(ctx->codec->name) == "libx264") {
               av_opt_set(ctx->context->priv_data, "preset", "ultrafast", 0);
               av_opt_set(ctx->context->priv_data, "tune", "zerolatency", 0);
           }
           av_log_set_level(AV_LOG_ERROR);
           WEBRTC_LOG("Init bitrate: " + std::to_string(config.target_bps), INFO);
       } else {
           WEBRTC_LOG("Change bitrate: " + std::to_string(config.target_bps), INFO);
       }
       config.key_frame_request = true;
       ctx->context->width = config.width;
       ctx->context->height = config.height;


       ctx->context->bit_rate = config.target_bps * 0.7;
       ctx->context->rc_max_rate = config.target_bps * 0.85;
       ctx->context->rc_min_rate = config.target_bps * 0.1;
       ctx->context->rc_buffer_size = config.target_bps * 2; // buffer_size变化，触发libx264的码率编码，如果不设置这个前几条不生效
   #ifdef WEBRTC_LINUX
       if (std::string(ctx->codec->name) == "h264_nvenc") { // 使用类似于Java反射的思想，设置h264_nvenc的码率
           NvencContext* nvenc_ctx = (NvencContext*)ctx->context->priv_data;
           nvenc_ctx->encode_config.rcParams.averageBitRate = ctx->context->bit_rate;
           nvenc_ctx->encode_config.rcParams.maxBitRate = ctx->context->rc_max_rate;
           return;
       }
   #endif
   }
复制

SetContext 中的最后几行，主要是关于如何动态设置编码器码率，这些内容应该是整个编码器设置过程中最硬核的部分了，我正是通过这些来实现 libx264 以及 h264_nvenc 的运行时码率控制。讲完了初始化编码器这一大块内容，让我们来放松一下，先看两个简单的接口，一个是编码回调的注册，一个是 WebRTC 中码率控制模块的注入，前面提过 WebRTC 会根据网络情况设置编码的码率。

   int32_t FFmpegH264EncoderImpl::RegisterEncodeCompleteCallback(
           webrtc::EncodedImageCallback *callback) {
       encoded_image_callback_ = callback;
       return WEBRTC_VIDEO_CODEC_OK;
   }


   int32_t FFmpegH264EncoderImpl::SetRateAllocation(
           const webrtc::BitrateAllocation &bitrate,
           uint32_t new_framerate) {
       if (encoders_.empty())
           return WEBRTC_VIDEO_CODEC_UNINITIALIZED;


       if (new_framerate < 1)
           return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;


       if (bitrate.get_sum_bps() == 0) {
           // Encoder paused, turn off all encoding.
           for (auto &configuration : configurations_)
               configuration.SetStreamState(false);
           return WEBRTC_VIDEO_CODEC_OK;
       }


       // At this point, bitrate allocation should already match codec settings.
       if (codec_.maxBitrate > 0)
           RTC_DCHECK_LE(bitrate.get_sum_kbps(), codec_.maxBitrate);
       RTC_DCHECK_GE(bitrate.get_sum_kbps(), codec_.minBitrate);
       if (codec_.numberOfSimulcastStreams > 0)
           RTC_DCHECK_GE(bitrate.get_sum_kbps(), codec_.simulcastStream[0].minBitrate);


       codec_.maxFramerate = new_framerate;


       size_t stream_idx = encoders_.size() - 1;
       for (size_t i = 0; i < encoders_.size(); ++i, --stream_idx) {
           // Update layer config.
           configurations_[i].target_bps = bitrate.GetSpatialLayerSum(stream_idx);
           configurations_[i].max_frame_rate = static_cast<float>(new_framerate);


           if (configurations_[i].target_bps) {
               configurations_[i].SetStreamState(true);
               SetContext(encoders_[i], configurations_[i], false);
           } else {
               configurations_[i].SetStreamState(false);
           }
       }


       return WEBRTC_VIDEO_CODEC_OK;
   }
复制

放松完了，让我们来看看最后一块难啃的骨头吧，没错，就是编码过程了，这块看似简单实则有个大坑。

   int32_t FFmpegH264EncoderImpl::Encode(const webrtc::VideoFrame &input_frame,
                                         const webrtc::CodecSpecificInfo *codec_specific_info,
                                         const std::vector<webrtc::FrameType> *frame_types) {
       // 先进行一些常规检查
       if (encoders_.empty()) {
           ReportError();
           return WEBRTC_VIDEO_CODEC_UNINITIALIZED;
       }
       if (!encoded_image_callback_) {
           RTC_LOG(LS_WARNING)
               << "InitEncode() has been called, but a callback function "
               << "has not been set with RegisterEncodeCompleteCallback()";
           ReportError();
           return WEBRTC_VIDEO_CODEC_UNINITIALIZED;
       }


       // 获取视频帧
       webrtc::I420BufferInterface *frame_buffer = (webrtc::I420BufferInterface *) input_frame.video_frame_buffer().get();
       // 检查下一帧是否需要关键帧，一般进行码率变化时，会设定下一帧发送关键帧
       bool send_key_frame = false;
       for (auto &configuration : configurations_) {
           if (configuration.key_frame_request && configuration.sending) {
               send_key_frame = true;
               break;
           }
       }
       if (!send_key_frame && frame_types) {
           for (size_t i = 0; i < frame_types->size() && i < configurations_.size();
                ++i) {
               if ((*frame_types)[i] == webrtc::kVideoFrameKey && configurations_[i].sending) {
                   send_key_frame = true;
                   break;
               }
           }
       }


       RTC_DCHECK_EQ(configurations_[0].width, frame_buffer->width());
       RTC_DCHECK_EQ(configurations_[0].height, frame_buffer->height());


       // Encode image for each layer.
       for (size_t i = 0; i < encoders_.size(); ++i) {
           // EncodeFrame input.
           copyFrame(encoders_[i]->frame, frame_buffer);
           if (!configurations_[i].sending) {
               continue;
           }
           if (frame_types != nullptr) {
               // Skip frame?
               if ((*frame_types)[i] == webrtc::kEmptyFrame) {
                   continue;
               }
           }
           // 控制编码器发送关键帧
           if (send_key_frame || encoders_[i]->frame->pts % configurations_[i].key_frame_interval == 0) {
               // API doc says ForceIntraFrame(false) does nothing, but calling this
               // function forces a key frame regardless of the |bIDR| argument's value.
               // (If every frame is a key frame we get lag/delays.)
               encoders_[i]->frame->key_frame = 1;
               encoders_[i]->frame->pict_type = AV_PICTURE_TYPE_I;
               configurations_[i].key_frame_request = false;
           } else {
               encoders_[i]->frame->key_frame = 0;
               encoders_[i]->frame->pict_type = AV_PICTURE_TYPE_P;
           }


           // Encode!编码过程
           int got_output;
           int enc_ret;
           // 给编码器喂图片
           enc_ret = avcodec_send_frame(encoders_[i]->context, encoders_[i]->frame);
           if (enc_ret != 0) {
               WEBRTC_LOG("FFMPEG send frame failed, returned " + std::to_string(enc_ret), ERROR);
               ReportError();
               return WEBRTC_VIDEO_CODEC_ERROR;
           }
           encoders_[i]->frame->pts++;
           while (enc_ret >= 0) {
               // 从编码器接受视频帧
               enc_ret = avcodec_receive_packet(encoders_[i]->context, encoders_[i]->pkt);
               if (enc_ret == AVERROR(EAGAIN) || enc_ret == AVERROR_EOF) {
                   break;
               } else if (enc_ret < 0) {
                   WEBRTC_LOG("FFMPEG receive frame failed, returned " + std::to_string(enc_ret), ERROR);
                   ReportError();
                   return WEBRTC_VIDEO_CODEC_ERROR;
               }


               // 将编码器返回的帧转化为WebRTC需要的帧类型
               encoded_images_[i]._encodedWidth = static_cast<uint32_t>(configurations_[i].width);
               encoded_images_[i]._encodedHeight = static_cast<uint32_t>(configurations_[i].height);
               encoded_images_[i].SetTimestamp(input_frame.timestamp());
               encoded_images_[i].ntp_time_ms_ = input_frame.ntp_time_ms();
               encoded_images_[i].capture_time_ms_ = input_frame.render_time_ms();
               encoded_images_[i].rotation_ = input_frame.rotation();
               encoded_images_[i].content_type_ =
                       (codec_.mode == webrtc::VideoCodecMode::kScreensharing)
                       ? webrtc::VideoContentType::SCREENSHARE
                       : webrtc::VideoContentType::UNSPECIFIED;
               encoded_images_[i].timing_.flags = webrtc::VideoSendTiming::kInvalid;
               encoded_images_[i]._frameType = ConvertToVideoFrameType(encoders_[i]->frame);


               // Split encoded image up into fragments. This also updates
               // |encoded_image_|.
               // 这里就是前面提到的大坑，FFmpeg编码出来的视频帧每个NALU之间可能以0001作为头，也会出现以001作为头的情况
               // 而WebRTC只识别以0001作为头的NALU
               // 所以我接下来要处理一下编码器输出的视频帧，并生成一个RTC报文的头部来描述该帧的数据
               webrtc::RTPFragmentationHeader frag_header;
               RtpFragmentize(&encoded_images_[i], &encoded_image_buffers_[i], *frame_buffer, encoders_[i]->pkt,
                              &frag_header);
               av_packet_unref(encoders_[i]->pkt);
               // Encoder can skip frames to save bandwidth in which case
               // |encoded_images_[i]._length| == 0.
               if (encoded_images_[i]._length > 0) {
                   // Parse QP.
                   h264_bitstream_parser_.ParseBitstream(encoded_images_[i]._buffer,
                                                         encoded_images_[i]._length);
                   h264_bitstream_parser_.GetLastSliceQp(&encoded_images_[i].qp_);


                   // Deliver encoded image.
                   webrtc::CodecSpecificInfo codec_specific;
                   codec_specific.codecType = webrtc::kVideoCodecH264;
                   codec_specific.codecSpecific.H264.packetization_mode =
                           packetization_mode_;
                   codec_specific.codecSpecific.H264.simulcast_idx = static_cast<uint8_t>(configurations_[i].simulcast_idx);
                   encoded_image_callback_->OnEncodedImage(encoded_images_[i],
                                                           &codec_specific, &frag_header);
               }
           }
       }


       return WEBRTC_VIDEO_CODEC_OK;
   }
复制

下面就是进行 NAL 转换以及提取 RTP 头部信息的过程：

   // Helper method used by FFmpegH264EncoderImpl::Encode.
   // Copies the encoded bytes from |info| to |encoded_image| and updates the
   // fragmentation information of |frag_header|. The |encoded_image->_buffer| may
   // be deleted and reallocated if a bigger buffer is required.
   //
   // After OpenH264 encoding, the encoded bytes are stored in |info| spread out
   // over a number of layers and "NAL units". Each NAL unit is a fragment starting
   // with the four-byte start code {0,0,0,1}. All of this data (including the
   // start codes) is copied to the |encoded_image->_buffer| and the |frag_header|
   // is updated to point to each fragment, with offsets and lengths set as to
   // exclude the start codes.
   void FFmpegH264EncoderImpl::RtpFragmentize(webrtc::EncodedImage *encoded_image,
                                              std::unique_ptr<uint8_t[]> *encoded_image_buffer,
                                              const webrtc::VideoFrameBuffer &frame_buffer, AVPacket *packet,
                                              webrtc::RTPFragmentationHeader *frag_header) {
       std::list<int> data_start_index;
       std::list<int> data_length;
       int payload_length = 0;
       // 以001 或者 0001 作为开头的情况下，遍历出所有的NAL并记录NALU数据开始的下标和NALU数据长度
       for (int i = 2; i < packet->size; i++) {
           if (i > 2
               && packet->data[i - 3] == start_code[0]
               && packet->data[i - 2] == start_code[1]
               && packet->data[i - 1] == start_code[2]
               && packet->data[i] == start_code[3]) {
               if (!data_start_index.empty()) {
                   data_length.push_back((i - 3 - data_start_index.back()));
               }
               data_start_index.push_back(i + 1);
           } else if (packet->data[i - 2] == start_code[1] &&
                      packet->data[i - 1] == start_code[2] &&
                      packet->data[i] == start_code[3]) {
               if (!data_start_index.empty()) {
                   data_length.push_back((i - 2 - data_start_index.back()));
               }
               data_start_index.push_back(i + 1);
           }
       }
       if (!data_start_index.empty()) {
           data_length.push_back((packet->size - data_start_index.back()));
       }


       for (auto &it : data_length) {
           payload_length += +it;
       }
       // Calculate minimum buffer size required to hold encoded data.
       auto required_size = payload_length + data_start_index.size() * 4;
       if (encoded_image->_size < required_size) {
           // Increase buffer size. Allocate enough to hold an unencoded image, this
           // should be more than enough to hold any encoded data of future frames of
           // the same size (avoiding possible future reallocation due to variations in
           // required size).
           encoded_image->_size = CalcBufferSize(
                   webrtc::VideoType::kI420, frame_buffer.width(), frame_buffer.height());
           if (encoded_image->_size < required_size) {
               // Encoded data > unencoded data. Allocate required bytes.
               WEBRTC_LOG("Encoding produced more bytes than the original image data! Original bytes: " +
                          std::to_string(encoded_image->_size) + ", encoded bytes: " + std::to_string(required_size) + ".",
                          WARNING);
               encoded_image->_size = required_size;
           }
           encoded_image->_buffer = new uint8_t[encoded_image->_size];
           encoded_image_buffer->reset(encoded_image->_buffer);
       }
       // Iterate layers and NAL units, note each NAL unit as a fragment and copy
       // the data to |encoded_image->_buffer|.
       int index = 0;
       encoded_image->_length = 0;
       frag_header->VerifyAndAllocateFragmentationHeader(data_start_index.size());
       for (auto it_start = data_start_index.begin(), it_length = data_length.begin();
            it_start != data_start_index.end(); ++it_start, ++it_length, ++index) {
           memcpy(encoded_image->_buffer + encoded_image->_length, start_code, sizeof(start_code));
           encoded_image->_length += sizeof(start_code);
           frag_header->fragmentationOffset[index] = encoded_image->_length;
           memcpy(encoded_image->_buffer + encoded_image->_length, packet->data + *it_start,
                  static_cast<size_t>(*it_length));
           encoded_image->_length += *it_length;
           frag_header->fragmentationLength[index] = static_cast<size_t>(*it_length);
       }
   }
复制

最后，是非常简单的编码器释放的过程：

   int32_t FFmpegH264EncoderImpl::Release() {
       while (!encoders_.empty()) {
           CodecCtx *encoder = encoders_.back();
           CloseEncoder(encoder);
           encoders_.pop_back();
       }
       configurations_.clear();
       encoded_images_.clear();
       encoded_image_buffers_.clear();
       return WEBRTC_VIDEO_CODEC_OK;
   }


   void FFmpegH264EncoderImpl::CloseEncoder(FFmpegH264EncoderImpl::CodecCtx *ctx) {
       if (ctx) {
           if (ctx->context) {
               avcodec_close(ctx->context);
               avcodec_free_context(&(ctx->context));
           }
           if (ctx->frame) {
               av_frame_free(&(ctx->frame));
           }
           if (ctx->pkt) {
               av_packet_free(&(ctx->pkt));
           }
           WEBRTC_LOG("Close encoder context and release context, frame, packet", INFO);
           delete ctx;
       }
   }
复制

至此，我对 WebRTC 的使用经历就已经介绍完了，希望我的经验能帮到大家。能坚持看完的童鞋，我真的觉得很不容易，我都一度觉得这篇文章写的太冗长，涉及的内容太多了。但是，因为各个部分的内容环环相扣，拆开来描述又怕思路会断。所以是以一条常规使用流程为主，中间依次引入一些我的改动内容，最后以附加项的形式详细介绍我对 WebRTC Native APIs 的改动。而且，我也是近期才开始写文章来分享经验，可能比较词穷描述的不是很到位，希望大家海涵。如果哪位童鞋发现我有什么说的不对的地方，希望能留言告诉我，我会尽可能地及时作出处理的。

参考内容

[1]http://www.cnblogs.com/lanxuezaipiao/p/3635556.html

[2]https://www.cnblogs.com/cswuyg/p/3830703.html

[3]http://blog.guorongfei.com/2017/01/24/android-jni-tips-md/

[4]https://github.com/FFmpeg/FFmpeg

引用链接

[1]

编译WebRTC: /编译WebRTC库

[2]

Javascript中WebRTC的使用: https://webrtc.github.io/samples/

[3]

NodeJS的实现: https://github.com/node-webrtc/node-webrtc

[4]

WebRTC-Discuss: https://groups.google.com/forum/#!forum/discuss-webrtc

[5]

工具: https://docs.oracle.com/javase/7/docs/technotes/tools/windows/javah.html

[6]

libwebrtc: https://github.com/BeiKeJieDeLiuLangMao/libwebrtc-m70

[7]

FFMPEG: https://ffmpeg.org/

[8]

libjpeg-turbo: https://libjpeg-turbo.org/

[9]

Guide: https://trac.ffmpeg.org/wiki/CompilationGuide/Centos#FFmpeg

[10]