热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

cloudstack下libvirtd服务无响应问题

这篇文章主要介绍了cloudstack下libvirtd服务无响应问题的相关资料,需要的朋友可以参考下

在cloudstack4.5.2版本下,偶尔出现libvirtd服务无响应的情况,导致virsh命令无法使用,同时伴随cloudstack master丢失该slave主机连接的情况。最初怀疑是libvirtd服务或版本的问题,经过分析和排查最终确定是cloudstack-agent的问题。但是在官网上并没有找到类似的bug提交,该问题可能还存在于更高的版本,需要时间进一步从根本上分析。下面是该问题的处理过程,在此记录下,关注和使用cloudstack的朋友可以参考。

众所周知,cloudstack的社区热度远不如openstack,为什么还要选择clcoudstack?这个问题以后有机会再和大家聊。言归正传。

环境交代

宿主机操作系统:centos6.5x64(2.6.32-431.el6.x86_64)
cloudstack版本:4.5.2
libvirt版本:libvirt-0.10.2-54.el6_7.2.x86_64

问题描述

通过cloudstackapi listHosts报警信息显示:
node5.cloud.rtmap:192.168.14.20 state is Down at 2016-05-13T07:19:04+0800
#有关cloudstackapi的使用方法在其它文章中总结,不在此处说明。

登陆问题宿主服务器检查:

[root@node5 log]#virsh list --all

没有响应ctrl^c退出

这时的vm可以正常工作,但处于失控状态

尝试重启启动libvirtd服务:

[root@node5 log]# service libvirtd stop

正在关闭 libvirtd 守护进程:                               [失败]  #无法关闭libvirtd服务

尝试重启启动cloudstack-agent服务:

[root@node5 libvirt]# service cloudstack-agent restart
Stopping Cloud Agent: 
Starting Cloud Agent: 

libvirtd故障依旧

简单维护

[root@node5 ping]# libvirtd -d -l --config /etc/libvirt/libvirtd.conf 

libvirtd:错误:Unable to initialize network sockets。查看 /var/log/messages 或者运行不带 --daemon 的命令查看更多信息。

[root@node5 log]# libvirtd -d

可以执行成功,这时执行virsh list --all 可以查看和操作vm

[root@node5 log]#virsh list --all
Id  名称             状态
----------------------------------------------------
 2   i-4-185-VM           running

虽然vm运行正常,现在也可以通过命令正常管理了。但是对于cloudstack平台而言,宿主机处于down状态,vm处于失控状态。

临时解决办法是在其它大的升级和维护过程中重启服务器解决,根本解决还要具体问题具体分析。

分析与排查

检查进程

[root@node5 log]# ps ax |grep libvirtd
 6485 ?    R  863:37 libvirtd --daemon -l  #该服务始终处于run状态

[root@node5 log]# top -p 6485
top -p 6485
top - 09:19:41 up 12 days, 22:27, 1 user, load average: 3.05, 5.07, 6.64
Tasks:  1 total,  0 running,  1 sleeping,  0 stopped,  0 zombie
Cpu(s): 4.8%us, 1.4%sy, 0.0%ni, 93.1%id, 0.6%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 264420148k total, 182040780k used, 82379368k free,  834232k buffers
Swap: 8388600k total,    92k used, 8388508k free, 100453708k cached

  PID USER   PR NI VIRT RES SHR S %CPU %MEM  TIME+ COMMAND
 6485 root   20  0 984m 12m 4440 R 100.2 0.0 844:22.68 libvirtd       #cpu占用100%,无法释放,影响系统稳定性

杀进程

[root@node5 log]# kill -9 6485
[root@node5 log]# kill -9 6485
[root@master log]# ps ax |grep libvirtd  #检查进程依然存在
 6485 ?    R  863:37 libvirtd --daemon -l
[root@node5 ~]# libvirtd -d -l --config /etc/libvirt/libvirtd.conf
libvirtd:错误:Unable to initialize network sockets。查看 /var/log/messages 或者运行不带 --daemon 的命令查看更多信息。
[root@node5 ~]# netstat -antp |grep 16509
tcp    0   0 0.0.0.0:16509        0.0.0.0:*          LISTEN   3658/libvirtd    
tcp    1   0 192.168.14.25:16509     192.168.14.22:8717     CLOSE_WAIT -          
tcp    1   0 192.168.14.25:16509     192.168.14.20:5152     CLOSE_WAIT -          
tcp    1   0 192.168.14.25:16509     192.168.14.10:39359     CLOSE_WAIT -          
tcp    0   0 :::16509          :::*            LISTEN   3658/libvirtd    
tcp    39   0 ::1:16509          ::1:19715          CLOSE_WAIT - 

经过上述操作,初步判断libvirtd陷入了hang死状态。

追踪进程

[root@node5 log]#strace -f libvirtd
[pid 107570] close(23058)        = -1 EBADF (Bad file descriptor)
[pid 107570] close(23059)        = -1 EBADF (Bad file descriptor)
[pid 107570] close(23060)        = -1 EBADF (Bad file descriptor)
[pid 107570] close(23061)        = -1 EBADF (Bad file descriptor)
[pid 107570] close(23062)        = -1 EBADF (Bad file descriptor)
[pid 107570] close(23063)        = -1 EBADF (Bad file descriptor)
[pid 107570] close(23064)        = -1 EBADF (Bad file descriptor)
[pid 107570] close(23065)        = -1 EBADF (Bad file descriptor)
[pid 107570] close(23066)        = -1 EBADF (Bad file descriptor)
[pid 107570] close(23067)        = -1 EBADF (Bad file descriptor)
[pid 107570] close(23068)        = -1 EBADF (Bad file descriptor)
[pid 107570] close(23069)        = -1 EBADF (Bad file descriptor)
[pid 107570] close(23070)        = -1 EBADF (Bad file descriptor)
[pid 107570] close(23071)        = -1 EBADF (Bad file descriptor)
^C[pid 107570] close(23072 
Process 107559 detached
Process 107560 detached
Process 107561 detached
Process 107562 detached
Process 107563 detached
Process 107564 detached
Process 107565 detached
Process 107566 detached
Process 107567 detached
Process 107568 detached
Process 107569 detached
Process 107570 detached

父进程6485在不断的产生和关闭子进程,并返回错误信息。Bad file descriptor的原因(如何触发的,谁触发的)? 循环为何无法退出?问题如何再现?

获得更多的线索

官方文档(libvirtd各种故障诊断记录和解决办法非常详尽)
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-Troubleshooting-Common_libvirt_errors_and_troubleshooting.html#sect-libvirtd_failed_to_start

开启系统日志

Change libvirt's logging in /etc/libvirt/libvirtd.conf by enabling the line below. To enable the setting the line, open the /etc/libvirt/libvirtd.conf file in a text editor, remove the hash (or #) symbol from the beginning of the following line, and save the change:
log_outputs="3:syslog:libvirtd"

参照配置,重启服务器等待下次故障观察日志

......

Jun 1 12:42:26 node5 abrtd: New client connected
Jun 1 12:42:26 node5 abrtd: Directory 'pyhook-2016-06-01-12:42:26-70065' creation detected
Jun 1 12:42:26 node5 abrt-server[70066]: Saved Python crash dump of pid 70065 to /var/spool/abrt/pyhook-2016-06-01-12:42:26-70065
Jun 1 12:42:26 node5 abrtd: Package 'cloudstack-common' isn't signed with proper key
Jun 1 12:42:26 node5 abrtd: 'post-create' on '/var/spool/abrt/pyhook-2016-06-01-12:42:26-70065' exited with 1
Jun 1 12:42:26 node5 abrtd: Deleting problem directory '/var/spool/abrt/pyhook-2016-06-01-12:42:26-70065'
Jun 1 12:43:26 node5 abrt: detected unhandled Python exception in '/usr/share/cloudstack-common/scripts/vm/network/security_group.py'
......

Jun 6 10:36:21 node5 libvirtd: 102840: warning : qemuDomainObjBeginJobInternal:878 : Cannot start job (modify, none) for domain i-4-30-VM; current job is (modify, none) owned by (102925, 0)
Jun 6 10:36:21 node5 libvirtd: 102840: error : qemuDomainObjBeginJobInternal:883 : Timed out during operation: cannot acquire state change lock
Jun 6 10:39:59 node5 libvirtd: 114071: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem , 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun 6 10:39:59 node5 libvirtd: 114071: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用
Jun 6 10:40:46 node5 libvirtd: 114147: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem , 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun 6 10:40:46 node5 libvirtd: 114147: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用
Jun 6 10:42:15 node5 libvirtd: 114204: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem , 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun 6 10:42:15 node5 libvirtd: 114204: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用
Jun 6 10:47:05 node5 libvirtd: 114375: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem , 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun 6 10:47:05 node5 libvirtd: 114375: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用
Jun 6 10:47:23 node5 libvirtd: 114412: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem , 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun 6 10:47:23 node5 libvirtd: 114412: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用
......

Jun 12 03:08:02 node5 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="3111" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Jun 12 09:20:40 node5 libvirtd: 72575: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem , 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun 12 09:20:40 node5 libvirtd: 72575: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': 资源暂时不可用

并未获得致命错误和更多线索。(该日志配置选项还是很有必要打开的,很多问题都可以通过它来定位)

解决过程

解决思路

尝试和找到终止进程、重启服务的方法
提交bug,等待补丁升级
分析源代码,再现问题,解决问题(投入研发和时间)
由于不能再现问题,还是从简入繁吧。触发这些子进程的元凶是谁?还是cloudstack-agent的嫌疑最大,但之前重启过该服务并没有解决问题,那么agent服务是怎么一回事呢?

看下启动脚本可以基本了解,

[root@node5 libvirt]# cat /etc/rc.d/init.d/cloudstack-agent

#!/bin/bash

# chkconfig: 35 99 10
# description: Cloud Agent

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
#  http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# WARNING: if this script is changed, then all other initscripts MUST BE changed to match it as well

. /etc/rc.d/init.d/functions

# set environment variables

SHORTNAME=$(basename $0 | sed -e 's/^[SK][0-9][0-9]//')
PIDFILE=/var/run/"$SHORTNAME".pid
LOCKFILE=/var/lock/subsys/"$SHORTNAME"
LOGDIR=/var/log/cloudstack/agent
LOGFILE=${LOGDIR}/agent.log
PROGNAME="Cloud Agent"
CLASS="com.cloud.agent.AgentShell"
JSVC=`which jsvc 2>/dev/null`;

# exit if we don't find jsvc
if [ -z "$JSVC" ]; then
  echo no jsvc found in path;
  exit 1;
fi

unset OPTIONS
[ -r /etc/sysconfig/"$SHORTNAME" ] && source /etc/sysconfig/"$SHORTNAME"

# The first existing directory is used for JAVA_HOME (if JAVA_HOME is not defined in $DEFAULT)
JDK_DIRS="/usr/lib/jvm/jre /usr/lib/jvm/java-7-openjdk /usr/lib/jvm/java-7-openjdk-i386 /usr/lib/jvm/java-7-openjdk-amd64 /usr/lib/jvm/java-6-openjdk /usr/lib/jvm/java-6-openjdk-i386 /usr/lib/jvm/java-6-openjdk-amd64 /usr/lib/jvm/java-6-sun"

for jdir in $JDK_DIRS; do
  if [ -r "$jdir/bin/java" -a -z "${JAVA_HOME}" ]; then
    JAVA_HOME="$jdir"
  fi
done
export JAVA_HOME

ACP=`ls /usr/share/cloudstack-agent/lib/*.jar | tr '\n' ':' | sed s'/.$//'`
PCP=`ls /usr/share/cloudstack-agent/plugins/*.jar 2>/dev/null | tr '\n' ':' | sed s'/.$//'`

# We need to append the JSVC daemon JAR to the classpath
# AgentShell implements the JSVC daemon methods
export CLASSPATH="/usr/share/java/commons-daemon.jar:$ACP:$PCP:/etc/cloudstack/agent:/usr/share/cloudstack-common/scripts"

start() {
  echo -n $"Starting $PROGNAME: "
  if hostname --fqdn >/dev/null 2>&1 ; then
    $JSVC -Xms256m -Xmx2048m -cp "$CLASSPATH" -pidfile "$PIDFILE" \
      -errfile $LOGDIR/cloudstack-agent.err -outfile $LOGDIR/cloudstack-agent.out $CLASS
    RETVAL=$?
    echo
  else
    failure
    echo
    echo The host name does not resolve properly to an IP address. Cannot start "$PROGNAME". > /dev/stderr
    RETVAL=9
  fi
  [ $RETVAL = 0 ] && touch ${LOCKFILE}
  return $RETVAL
}

stop() {
  echo -n $"Stopping $PROGNAME: "
  $JSVC -pidfile "$PIDFILE" -stop $CLASS
  RETVAL=$?
  echo
  [ $RETVAL = 0 ] && rm -f ${LOCKFILE} ${PIDFILE}
}

case "$1" in
  start)
    start
    ;;
  stop)
    stop
    ;;
  status)
    status -p ${PIDFILE} $SHORTNAME
    RETVAL=$?
    ;;
  restart)
    stop
    sleep 3
    start
    ;;
  condrestart)
    if status -p ${PIDFILE} $SHORTNAME >&/dev/null; then
      stop
      sleep 3
      start
    fi
    ;;
  *)
  echo $"Usage: $SHORTNAME {start|stop|restart|condrestart|status|help}"
  RETVAL=3
esac

exit $RETVAL

[root@node5 libvirt]# ps ax |grep jsvc.exec
 6655 ?    Ss   0:00 jsvc.exec -Xms256m -Xmx2048m -cp /usr/share/java/commons-daemon.jar:/usr/share/cloudstack-agent/lib/activation-1.1.jar:/usr/share/cloudstack-agent/lib/antisamy-1.4.3.jar:/usr/share/cloudstack-agent/lib/aopalliance-1.0.jar:/usr/share/cloudstack-agent/lib/apache-log4j-extras-1.1.jar:/usr/share/cloudstack-agent/lib/aspectjweaver-1.7.0.jar:/usr/share/cloudstack-agent/lib/aws-java-sdk-1.3.22.jar:/usr/share/cloudstack-agent/lib/batik-css-1.7.jar:/usr/share/cloudstack-agent/lib/batik-ext-1.7.jar:/usr/share/cloudstack-agent/lib/batik-util-1.7.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk15-1.46.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk16-1.46.jar:/usr/share/cloudstack-agent/lib/bsh-core-2.0b4.jar:/usr/share/cloudstack-agent/lib/cglib-nodep-2.2.2.jar:/usr/share/cloudstack-agent/lib/cloud-agent-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-core-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-components-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-schema-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-cluster-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-config-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-db-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-events-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-ipc-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-jobs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-managed-context-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-rest-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-security-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-hypervisor-kvm-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-network-ovs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-server-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-utils-4.5.2.jar:/usr/share/cloudstack-agent/lib/commons-beanutils-core-1.7.0.jar:/usr/share/cloudstack-agent/lib/commons-codec-1.6.jar:/usr/share/cloudstack-agent/lib/commons-collections-3.2.1.jar:/usr/share/cloudstack-agent/lib/commons-configuration-1.8.jar:/usr/share/cloudstack-agent/lib/commons-daemon-1.0.10.jar:/usr/share/cloudstack-agent/lib/commons-dbcp-1.4.jar:/usr/share/cloudstack-agent/lib/commons-fileupload-1.2.jar:/usr/share/cloudstack-agent/lib/commons-httpclient-3.1.jar:/usr/share/cloudstack-agent/lib/commons-io-1.4.jar:/usr/share/cloudstack-agent/lib/commons-lang-2.6.jar:/usr/share/cloudstack-agent/lib/commons-logging-1.1.3.jar:/usr/share/cloudstack-agent/lib/commons-net-3.3.jar:/usr/share/cloudstack-agent/lib/commons-pool-1.6.jar:/usr/share/cloudstack-agent/lib/cxf-bundle-jaxrs-2.7.0.jar:/usr/share/cloudstack-agent/lib/dom4j-1.6.1.jar:/usr/share/cloudstack-agent/lib/ehcache-core-2.6.6.jar:/usr/share/cloudstack-agent/lib/ejb-api-3.0.jar:/usr/share/cloudstack-agent/lib/esapi-2.0.1.jar:/usr/share/cloudstack-agent/lib/geronimo-javamail_1.4_spec-1.7.1.jar:/usr/share/cloudstack-agent/lib/geronimo-servlet_3.0_spec-1.0.jar:/usr/share/cloudstack-agent/lib/gson-1.7.2.jar:/usr/share/cloudstack-agent/lib/guava-14.0-rc1.jar:/usr/share/cloudstack-agent/lib/httpclient-4.3.6.jar:/usr/share/cloudstack-agent/lib/httpcore-4.3.3.jar:/usr/share/cloudstack-agent/lib/jackson-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-databind-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-jaxrs-json-provider-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-mapper-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-module-jaxb-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jasypt-1.9.0.jar:/usr/share/cloudstack-agent/lib/java-ipv6-0.10.jar:/usr/share/cloudstack-agent/lib/javassist-3.12.1.GA.jar:/usr/share/cloudstack-agent/lib/javassist-3.18.1-GA.jar:/usr/share/cloudstack-agent/lib/javax.inject-1.jar:/usr/share/cloudstack-agent/lib/javax.persistence-2.0.0.jar:/usr/share/cloudstack-agent/lib/javax.ws.rs-api-2.0-m10.jar
 6657 ?    Sl   0:05 jsvc.exec -Xms256m -Xmx2048m -cp /usr/share/java/commons-daemon.jar:/usr/share/cloudstack-agent/lib/activation-1.1.jar:/usr/share/cloudstack-agent/lib/antisamy-1.4.3.jar:/usr/share/cloudstack-agent/lib/aopalliance-1.0.jar:/usr/share/cloudstack-agent/lib/apache-log4j-extras-1.1.jar:/usr/share/cloudstack-agent/lib/aspectjweaver-1.7.0.jar:/usr/share/cloudstack-agent/lib/aws-java-sdk-1.3.22.jar:/usr/share/cloudstack-agent/lib/batik-css-1.7.jar:/usr/share/cloudstack-agent/lib/batik-ext-1.7.jar:/usr/share/cloudstack-agent/lib/batik-util-1.7.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk15-1.46.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk16-1.46.jar:/usr/share/cloudstack-agent/lib/bsh-core-2.0b4.jar:/usr/share/cloudstack-agent/lib/cglib-nodep-2.2.2.jar:/usr/share/cloudstack-agent/lib/cloud-agent-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-core-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-components-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-schema-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-cluster-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-config-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-db-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-events-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-ipc-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-jobs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-managed-context-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-rest-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-security-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-hypervisor-kvm-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-network-ovs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-server-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-utils-4.5.2.jar:/usr/share/cloudstack-agent/lib/commons-beanutils-core-1.7.0.jar:/usr/share/cloudstack-agent/lib/commons-codec-1.6.jar:/usr/share/cloudstack-agent/lib/commons-collections-3.2.1.jar:/usr/share/cloudstack-agent/lib/commons-configuration-1.8.jar:/usr/share/cloudstack-agent/lib/commons-daemon-1.0.10.jar:/usr/share/cloudstack-agent/lib/commons-dbcp-1.4.jar:/usr/share/cloudstack-agent/lib/commons-fileupload-1.2.jar:/usr/share/cloudstack-agent/lib/commons-httpclient-3.1.jar:/usr/share/cloudstack-agent/lib/commons-io-1.4.jar:/usr/share/cloudstack-agent/lib/commons-lang-2.6.jar:/usr/share/cloudstack-agent/lib/commons-logging-1.1.3.jar:/usr/share/cloudstack-agent/lib/commons-net-3.3.jar:/usr/share/cloudstack-agent/lib/commons-pool-1.6.jar:/usr/share/cloudstack-agent/lib/cxf-bundle-jaxrs-2.7.0.jar:/usr/share/cloudstack-agent/lib/dom4j-1.6.1.jar:/usr/share/cloudstack-agent/lib/ehcache-core-2.6.6.jar:/usr/share/cloudstack-agent/lib/ejb-api-3.0.jar:/usr/share/cloudstack-agent/lib/esapi-2.0.1.jar:/usr/share/cloudstack-agent/lib/geronimo-javamail_1.4_spec-1.7.1.jar:/usr/share/cloudstack-agent/lib/geronimo-servlet_3.0_spec-1.0.jar:/usr/share/cloudstack-agent/lib/gson-1.7.2.jar:/usr/share/cloudstack-agent/lib/guava-14.0-rc1.jar:/usr/share/cloudstack-agent/lib/httpclient-4.3.6.jar:/usr/share/cloudstack-agent/lib/httpcore-4.3.3.jar:/usr/share/cloudstack-agent/lib/jackson-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-databind-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-jaxrs-json-provider-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-mapper-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-module-jaxb-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jasypt-1.9.0.jar:/usr/share/cloudstack-agent/lib/java-ipv6-0.10.jar:/usr/share/cloudstack-agent/lib/javassist-3.12.1.GA.jar:/usr/share/cloudstack-agent/lib/javassist-3.18.1-GA.jar:/usr/share/cloudstack-agent/lib/javax.inject-1.jar:/usr/share/cloudstack-agent/lib/javax.persistence-2.0.0.jar:/usr/share/cloudstack-agent/lib/javax.ws.rs-api-2.0-m10.jar

重启服务

[root@node5 bin]# service cloudstack-agent status
cloudstack-agent (pid 6657) 正在运行...
[root@node5 bin]# service cloudstack-agent stop
Stopping Cloud Agent:

[root@node5 bin]# service cloudstack-agent status
cloudstack-agent (pid 6657) 正在运行..

ps ax |grep jsvc.exec 也验证了进程依然存在

眼前一亮的同时,也发现了之前使用restart带来的问题,stop不成功的问题被掩盖了~~~有没有懊恼? 不过来不及反思,接下来的问题还远不是这么简单......

[root@node5 bin]# kill -9 6655 6657
[root@node5 bin]# kill -9 6655 6657
-bash: kill: (6655) - 没有那个进程
-bash: kill: (6657) - 没有那个进程
[root@node5 bin]# service cloudstack-agent status
cloudstack-agent 已死,但 pid 文件仍存
[root@node5 bin]# rm /var/run/cloudstack-agent.pid
rm:是否删除普通文件 "/var/run/cloudstack-agent.pid"?y
[root@node5 bin]# service cloudstack-agent status
cloudstack-agent 已死,但是 subsys 被锁
[root@node5 bin]# service cloudstack-agent start
[root@node5 bin]# service cloudstack-agent status
cloudstack-agent (pid 109382) 正在运行...
[root@node5 bin]# netstat -antp |grep 8250
tcp    0   0 192.168.14.20:22220     192.168.14.10:8250     ESTABLISHED 109382/jsvc.exec 

处理后状态恢复正常,但是libvirtd仍然无法杀掉, 很快netstat -antp |grep 8250 状态再次消失,cloudstack master平台监控主机记录由Up状态转为disconnect状态。不过毕竟不是down状态,较之前已经有了进步。

启动一个libvirtd -d看下,

[root@node5 bin]# libvirtd -d
[root@node5 bin]# ps ax |grep libvirtd
  6485 ?    R  863:37 libvirtd --daemon -l
 130057 ?    Sl   0:38 libvirtd -d
 28904 pts/0   S+   0:00 grep libvirtd

然后在cloudstack master平台上手工点击强制重新连接该主机,成功了。主机监控状态由disconnect转为Up,这时再次尝试杀掉6485仍然是不成功的,于是又在cloudstack master管理平台上尝试着点击操作了一下暂停vm命令,vm成功暂停。再返回服务器上观察原来hung死的libvirtd进程已经消失。

[root@node5 bin]# libvirtd -d
[root@node5 bin]# ps ax |grep libvirtd
 130057 ?    Sl   0:38 libvirtd -d
 28904 pts/0   S+   0:00 grep libvirtd

至此既恢复了平台对该主机的管控,也终止了libvirtd异常进程。问题初步归于cloudstack-agent在处理发送个libvirtd的信号上存在些小问题。以后再单独分析下jsvc进程,再现问题和根本解决。

问题反思

在处理服务异常的问题上,命令行参数不要用restart,用stop和kill来调试。说起来都是泪!


推荐阅读
  • Centos7.6安装Gitlab教程及注意事项
    本文介绍了在Centos7.6系统下安装Gitlab的详细教程,并提供了一些注意事项。教程包括查看系统版本、安装必要的软件包、配置防火墙等步骤。同时,还强调了使用阿里云服务器时的特殊配置需求,以及建议至少4GB的可用RAM来运行GitLab。 ... [详细]
  • 本文介绍了Hyperledger Fabric外部链码构建与运行的相关知识,包括在Hyperledger Fabric 2.0版本之前链码构建和运行的困难性,外部构建模式的实现原理以及外部构建和运行API的使用方法。通过本文的介绍,读者可以了解到如何利用外部构建和运行的方式来实现链码的构建和运行,并且不再受限于特定的语言和部署环境。 ... [详细]
  • 在Docker中,将主机目录挂载到容器中作为volume使用时,常常会遇到文件权限问题。这是因为容器内外的UID不同所导致的。本文介绍了解决这个问题的方法,包括使用gosu和suexec工具以及在Dockerfile中配置volume的权限。通过这些方法,可以避免在使用Docker时出现无写权限的情况。 ... [详细]
  • 本文介绍了在开发Android新闻App时,搭建本地服务器的步骤。通过使用XAMPP软件,可以一键式搭建起开发环境,包括Apache、MySQL、PHP、PERL。在本地服务器上新建数据库和表,并设置相应的属性。最后,给出了创建new表的SQL语句。这个教程适合初学者参考。 ... [详细]
  • ZSI.generate.Wsdl2PythonError: unsupported local simpleType restriction ... [详细]
  • CentOS 7部署KVM虚拟化环境之一架构介绍
    本文介绍了CentOS 7部署KVM虚拟化环境的架构,详细解释了虚拟化技术的概念和原理,包括全虚拟化和半虚拟化。同时介绍了虚拟机的概念和虚拟化软件的作用。 ... [详细]
  • 使用正则表达式爬取36Kr网站首页新闻的操作步骤和代码示例
    本文介绍了使用正则表达式来爬取36Kr网站首页所有新闻的操作步骤和代码示例。通过访问网站、查找关键词、编写代码等步骤,可以获取到网站首页的新闻数据。代码示例使用Python编写,并使用正则表达式来提取所需的数据。详细的操作步骤和代码示例可以参考本文内容。 ... [详细]
  • 树莓派语音控制的配置方法和步骤
    本文介绍了在树莓派上实现语音控制的配置方法和步骤。首先感谢博主Eoman的帮助,文章参考了他的内容。树莓派的配置需要通过sudo raspi-config进行,然后使用Eoman的控制方法,即安装wiringPi库并编写控制引脚的脚本。具体的安装步骤和脚本编写方法在文章中详细介绍。 ... [详细]
  • Node.js学习笔记(一)package.json及cnpm
    本文介绍了Node.js中包的概念,以及如何使用包来统一管理具有相互依赖关系的模块。同时还介绍了NPM(Node Package Manager)的基本介绍和使用方法,以及如何通过NPM下载第三方模块。 ... [详细]
  • Linux下安装免费杀毒软件ClamAV及使用方法
    本文介绍了在Linux系统下安装免费杀毒软件ClamAV的方法,并提供了使用该软件更新病毒库和进行病毒扫描的指令参数。同时还提供了官方安装文档和下载地址。 ... [详细]
  • CentOS7.8下编译muduo库找不到Boost库报错的解决方法
    本文介绍了在CentOS7.8下编译muduo库时出现找不到Boost库报错的问题,并提供了解决方法。文章详细介绍了从Github上下载muduo和muduo-tutorial源代码的步骤,并指导如何编译muduo库。最后,作者提供了陈硕老师的Github链接和muduo库的简介。 ... [详细]
  • 微信官方授权及获取OpenId的方法,服务器通过SpringBoot实现
    主要步骤:前端获取到code(wx.login),传入服务器服务器通过参数AppID和AppSecret访问官方接口,获取到OpenId ... [详细]
  • 本文详细介绍了在Centos7上部署安装zabbix5.0的步骤和注意事项,包括准备工作、获取所需的yum源、关闭防火墙和SELINUX等。提供了一步一步的操作指南,帮助读者顺利完成安装过程。 ... [详细]
  • Python项目实战10.2:MySQL读写分离性能优化
    本文介绍了在Python项目实战中进行MySQL读写分离的性能优化,包括主从同步的配置和Django实现,以及在两台centos 7系统上安装和配置MySQL的步骤。同时还介绍了创建从数据库的用户和权限的方法。摘要长度为176字。 ... [详细]
  • centos6.8 下nginx1.10 安装 ... [详细]
author-avatar
Coco李可儿
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有