介绍
Aliware
Java 场景
Aliware
在介绍前先了解下 ChaosBlade Java 场景的注入流程。
ChaosBlade 执行 prepare 命令,触发 sandbox 对目标 JVM 挂载 Java agent。 ChaosBlade 执行 create 命令,触发 sandbox 对目标 JVM 进行字节码增强,从而达到故障注入的目的。
Prepare(挂载)阶段优化
Aliware
现象
本地模拟一个简单的 HTTP 服务,控制其 CPU Idle 在 50% 左右,当执行 blade prepare jvm --pid 挂载 agent 后,发现 CPU 空闲率迅速下降,并且下降的幅度较大。在生产中进行故障注入有可能会直接让 Idle 掉低从而触发告警:
定位
优化
private void lazyLoadPlugin(ModelSpec modelSpec, Model model) throws ExperimentException {
PluginLifecycleListener listener = ManagerFactory.getListenerManager().getPluginLifecycleListener();
if (listener == null) {
throw new ExperimentException("can get plugin listener");
}
PluginBeans pluginBeans = ManagerFactory.getPluginManager().getPlugins(modelSpec.getTarget());
if (pluginBeans == null) {
throw new ExperimentException("can get plugin bean");
}
if (pluginBeans.isLoad()) {
return;
}
listener.add(pluginBean);
ManagerFactory.getPluginManager().setLoad(pluginBeans, modelSpec.getTarget());
}
详细代码 PR: https://github.com/ChaosBlade-io/ChaosBlade-exec-jvm/pull/233
改进后效果
Create(注入)阶段优化
Aliware
Dubbo 故障优化
问题描述
优化
./blade create dubbo throwCustomException --provider --exception Java.lang.Exception --service org.apache.dubbo.UserProvider --methodname GetUser
private void lazyLoadPlugin(ModelSpec modelSpec, Model model) throws ExperimentException {
// ...... 省略
for (PluginBean pluginBean : pluginBeans.getPluginBeans()) {
String flag = model.getMatcher().get(pluginBean.getName());
if ("true".equalsIgnoreCase(flag)) {
listener.add(pluginBean);
break;
}
listener.add(pluginBean);
}
// ...... 省略
}
}
相关 PR: https://github.com/ChaosBlade-io/ChaosBlade-exec-jvm/pull/267
自定义脚本故障优化
问题描述
./blade c jvm script --classname com.example.xxx.HelloController --methodname Hello --script-content .....
问题排查
Stack Trace is:
Java.lang.Thread.State: RUNNABLE
at Java.util.zip.ZipFile.getEntryTime(Native Method)
at Java.util.zip.ZipFile.getZipEntry(ZipFile.Java:586)
at Java.util.zip.ZipFile.access$900(ZipFile.Java:60)
at Java.util.zip.ZipFile$ZipEntryIterator.next(ZipFile.Java:539)
- locked <0x00000006c0a57670> (a sun.net.www.protocol.jar.URLJarFile)
at Java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.Java:514)
at Java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.Java:495)
at Java.util.jar.JarFile$JarEntryIterator.next(JarFile.Java:258)
at Java.util.jar.JarFile$JarEntryIterator.nextElement(JarFile.Java:267)
at Java.util.jar.JarFile$JarEntryIterator.nextElement(JarFile.Java:248)
at com.alibaba.ChaosBlade.exec.plugin.jvm.script.Java.JavaCodeScriptEngine$InMemoryJavaFileManager.processJar(JavaCodeScriptEngine.Java:421)
at com.alibaba.ChaosBlade.exec.plugin.jvm.script.Java.JavaCodeScriptEngine$InMemoryJavaFileManager.listUnder(JavaCodeScriptEngine.Java:401)
at com.alibaba.ChaosBlade.exec.plugin.jvm.script.Java.JavaCodeScriptEngine$InMemoryJavaFileManager.find(JavaCodeScriptEngine.Java:390)
at com.alibaba.ChaosBlade.exec.plugin.jvm.script.Java.JavaCodeScriptEngine$InMemoryJavaFileManager.list(JavaCodeScriptEngine.Java:375)
at com.sun.tools.Javac.api.ClientCodeWrapper$WrappedJavaFileManager.list(ClientCodeWrapper.Java:231)
at com.sun.tools.Javac.jvm.ClassReader.fillIn(ClassReader.Java:2796)
at com.sun.tools.Javac.jvm.ClassReader.complete(ClassReader.Java:2446)
at com.sun.tools.Javac.jvm.ClassReader.access$000(ClassReader.Java:76)
at com.sun.tools.Javac.jvm.ClassReader$1.complete(ClassReader.Java:240)
at com.sun.tools.Javac.code.Symbol.complete(Symbol.Java:574)
at com.sun.tools.Javac.comp.MemberEnter.visitTopLevel(MemberEnter.Java:507)
at com.sun.tools.Javac.tree.JCTree$JCCompilationUnit.accept(JCTree.Java:518)
at com.sun.tools.Javac.comp.MemberEnter.memberEnter(MemberEnter.Java:437)
at com.sun.tools.Javac.comp.MemberEnter.complete(MemberEnter.Java:1038)
at com.sun.tools.Javac.code.Symbol.complete(Symbol.Java:574)
at com.sun.tools.Javac.code.Symbol$ClassSymbol.complete(Symbol.Java:1037)
at com.sun.tools.Javac.comp.Enter.complete(Enter.Java:493)
at com.sun.tools.Javac.comp.Enter.main(Enter.Java:471)
at com.sun.tools.Javac.main.JavaCompiler.enterTrees(JavaCompiler.Java:982)
at com.sun.tools.Javac.main.JavaCompiler.compile(JavaCompiler.Java:857)
at com.sun.tools.Javac.main.Main.compile(Main.Java:523)
at com.sun.tools.Javac.api.JavacTaskImpl.doCall(JavacTaskImpl.Java:129)
at com.sun.tools.Javac.api.JavacTaskImpl.call(JavacTaskImpl.Java:138)
at com.alibaba.ChaosBlade.exec.plugin.jvm.script.Java.JavaCodeScriptEngine.compileClass(JavaCodeScriptEngine.Java:149)
at com.alibaba.ChaosBlade.exec.plugin.jvm.script.Java.JavaCodeScriptEngine.compile(JavaCodeScriptEngine.Java:113)
at com.alibaba.ChaosBlade.exec.plugin.jvm.script.base.AbstractScriptEngineService.doCompile(AbstractScriptEngineService.Java:82)
at com.alibaba.ChaosBlade.exec.plugin.jvm.script.base.AbstractScriptEngineService.compile(AbstractScriptEngineService.Java:69)
at com.alibaba.ChaosBlade.exec.plugin.jvm.script.model.DynamicScriptExecutor.run(DynamicScriptExecutor.Java:74)
at com.alibaba.ChaosBlade.exec.common.injection.Injector.inject(Injector.Java:73)
at com.alibaba.ChaosBlade.exec.common.aop.AfterEnhancer.afterAdvice(AfterEnhancer.Java:46)
at com.alibaba.ChaosBlade.exec.common.plugin.MethodEnhancer.afterAdvice(MethodEnhancer.Java:47)
at com.alibaba.ChaosBlade.exec.bootstrap.jvmsandbox.AfterEventListener.onEvent(AfterEventListener.Java:93)
at com.alibaba.jvm.sandbox.core.enhance.weaver.EventListenerHandler.handleEvent(EventListenerHandler.Java:116)
at com.alibaba.jvm.sandbox.core.enhance.weaver.EventListenerHandler.handleOnEnd(EventListenerHandler.Java:426)
at com.alibaba.jvm.sandbox.core.enhance.weaver.EventListenerHandler.handleOnReturn(EventListenerHandler.Java:363)
优化
public interface PreActionExecutor {
/**
* Pre run executor
*
* @param enhancerModel
* @throws Exception
*/
void preRun(EnhancerModel enhancerModel) throws ExperimentException;
}
private void applyPreActionExecutorHandler(ModelSpec modelSpec, Model model)
throws ExperimentException {
ActionExecutor actionExecutor = modelSpec.getActionSpec(model.getActionName()).getActionExecutor();
if (actionExecutor instanceof PreActionExecutor) {
EnhancerModel enhancerModel = new EnhancerModel(EnhancerModel.class.getClassLoader(), model.getMatcher());
enhancerModel.merge(model);
((PreActionExecutor) actionExecutor).preRun(enhancerModel);
}
}
相关 PR: https://github.com/ChaosBlade-io/ChaosBlade-exec-jvm/pull/269
日志打印优化
问题描述
- locked <0x00000006f08422d0> (a org.apache.log4j.DailyRollingFileAppender)
at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.Java:66)
at org.apache.log4j.Category.callAppenders(Category.Java:206)
- locked <0x00000006f086daf8> (a org.apache.log4j.Logger)
at org.apache.log4j.Category.forcedLog(Category.Java:391)
at org.apache.log4j.Category.log(Category.Java:856)
at org.slf4j.impl.Log4jLoggerAdapter.log(Log4jLoggerAdapter.Java:601)
LOGGER.info("Match rule: {}", JsonUtil.writer().writeValueAsString(model));
Java.lang.Thread.State: RUNNABLE
at Java.lang.String.charAt(String.Java:657)
at Java.io.UnixFileSystem.normalize(UnixFileSystem.Java:87)
at Java.io.File.<init>(File.Java:279)
at sun.net.www.protocol.file.Handler.openConnection(Handler.Java:80)
- locked <0x00000000c01f2740> (a sun.net.www.protocol.file.Handler)
at sun.net.www.protocol.file.Handler.openConnection(Handler.Java:72)
- locked <0x00000000c01f2740> (a sun.net.www.protocol.file.Handler)
at Java.net.URL.openConnection(URL.Java:979)
at sun.net.www.protocol.jar.JarFileFactory.getConnection(JarFileFactory.Java:65)
at sun.net.www.protocol.jar.JarFileFactory.getPermission(JarFileFactory.Java:154)
at sun.net.www.protocol.jar.JarFileFactory.getCachedJarFile(JarFileFactory.Java:126)
at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.Java:81)
- locked <0x00000000c00171f0> (a sun.net.www.protocol.jar.JarFileFactory)
at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.Java:122)
at sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.Java:152)
at Java.net.URL.openStream(URL.Java:1045)
at Java.lang.ClassLoader.getResourceAsStream(ClassLoader.Java:1309)
......
at Java.lang.reflect.Method.invoke(Method.Java:498)
at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.Java:689)
at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.Java:755)
at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.Java:178)
at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.Java:728)
at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.Java:755)
at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.Java:178)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.Java:480)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.Java:319)
at com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.Java:1516)
at com.fasterxml.jackson.databind.ObjectWriter._writeValueAndClose(ObjectWriter.Java:1217)
at com.fasterxml.jackson.databind.ObjectWriter.writeValueAsString(ObjectWriter.Java:1086)
at com.alibaba.ChaosBlade.exec.common.injection.Injector.inject(Injector.Java:69)
at com.alibaba.ChaosBlade.exec.common.aop.AfterEnhancer.afterAdvice(AfterEnhancer.Java:46)
at com.alibaba.ChaosBlade.exec.common.plugin.MethodEnhancer.afterAdvice(MethodEnhancer.Java:47)
at com.alibaba.ChaosBlade.exec.bootstrap.jvmsandbox.AfterEventListener.onEvent(AfterEventListener.Java:93)
at com.alibaba.jvm.sandbox.core.enhance.weaver.EventListenerHandler.handleEvent(EventListenerHandler.Java:116)
at com.alibaba.jvm.sandbox.core.enhance.weaver.EventListenerHandler.handleOnEnd(EventListenerHandler.Java:426)
at com.alibaba.jvm.sandbox.core.enhance.weaver.EventListenerHandler.handleOnReturn(EventListenerHandler.Java:363)
at Java.com.alibaba.jvm.sandbox.spy.Spy.spyMethodOnReturn(Spy.Java:192)
优化
LOGGER.info("Match rule: {}", model);
@Override
public String toString() {
return "Model{" +
"target='" + target + '\'' +
", matchers=" + matcher.getMatchers().toString() +
", action=" + action.getName() +
'}';
}
相关 PR: https://github.com/ChaosBlade-io/ChaosBlade-exec-jvm/pull/260
Metaspace OOM 优化
Aliware
Metaspace is a native (as in: off-heap) memory manager in the hotspot.
It is used to manage memory for class metadata. Class metadata are allocated when classes are loaded.
Their lifetime is usually scoped to that of the loading classloader - when a loader gets collected, all class metadata it accumulated are released in bulk.
现象
日志表现
定位
在文章开始介绍了ChaosBlade注入Java故障的流程,知道在故障注入时会将 jvm-sandbox 动态的挂载(attach)到目标进程 JVM 上,在 attach 后会加载 sandbox 内部 jar 以及 sandbox 的自定义模块 jar 等,在这个过程中会加载大量的类,当加载类时会分配 Metaspace 空间存储类的元数据。
本地复现
How to Close a URLClassLoader?
The URLClassLoader close() method effectively eliminates the problem of how to support updated implementations of the classes and resources loaded from a particular codebase, and in particular from JAR files. In principle, once the application clears all references to a loader object, the garbage collector and finalization mechanisms will eventually ensure that all resources (such as the JarFile objects) are released and closed.
猜想
验证猜想
内部类的引用: EventProcesser$Process SandboxProtector
优化
由于jvm-sandbox项目已经不在活跃了,我们将jvm-sandbox项目fork到了ChaosBlade中。
优化后的相关pr: https://github.com/ChaosBlade-io/jvm-sandbox/pull/1
改进后效果
再次优化
虽然我们解决了 JVM-Sandbox 的 ThreadLocal 泄漏问题,但是由于 Metaspace 的内存分配以及回收机制还是有可能导致 OOM!!!
关于 Metaspace 的内存分配以及回收的相关内容可以参考文章: https://www.javadoop.com/post/metaspace
上面的优化基础上还需要在每一次故障注入前触发一次 full gc,目的是让上一次 jvm-sandbox 占用的元空间强制释放掉。
public static void agentmain(String featureString, Instrumentation inst) {
System.gc();
LAUNCH_MODE = LAUNCH_MODE_ATTACH;
final Map<String, String> featureMap = toFeatureMap(featureString);
writeAttachResult(
getNamespace(featureMap),
getToken(featureMap),
install(featureMap, inst)
);
}
相关 PR: https://github.com/ChaosBlade-io/jvm-sandbox/pull/6
那么如何彻底解决 Metaspace OOM 问题呢?先说结论:不能彻底解决,因为在使用反射的情况下会自动生成一些,所以在业务代码中很难去关闭,那就导致 DelegatingClassLoader 会一直存活,从而引发 Metaspace 碎片化的问题,最终导致 Metaspace 空间无法被正确的回收(这部分内容比较复杂,一言两语很难描述清楚)
自动生成:
sun.reflect.DelegatingClassLoader
思考
关于 Metaspace OOM 的问题,其实优化是一方面,换个角度想也许是我们使用的方式不正确。在我们的业务场景下是会频繁的对一个服务进行故障注入&卸载,每次的注入点不同。
JIT(及时编译)导致 CPU 抖动
Cloud Native
问题描述
在 Java 中编译器主要分为三类:
关于即时编译的内容可以参考文章: https://xie.infoq.cn/article/dacbe19251f8ec828efacdfde
总结
Cloud Native
ChaosBlade 支持丰富的故障注入场景,尤其是在Java 生态中支持大量的插件。对于Java 场景的故障注入优势比较明显。
通过对上面介绍的问题进行优化,使用ChaosBlade进行Java场景的故障注入不会再导致CPU Idle跌底,即使在线上运行的服务进行故障注入也会将CPU的抖动控制在一个较小的波动范围。
作者简介:
张斌斌(Github账号:binbin0325)ChaosBlade Committer , Nacos PMC ,Apache Dubbo-Go Committer, Sentinel-Golang Committer 。目前主要关注于混沌工程、中间件以及云原生方向。