本博客使用了一个自动任务每天23点零5秒自动生成订阅,部署时正常执行,但是最近发现不能执行了。

spring配置文件

<task:scheduled-tasks>
    <task:scheduled ref="atomService" method="genAtomXmlWithoutException" cron="5 0 23 * * ?"/>
    <task:scheduled ref="dictManager" method="flushDictList" cron="5 1/25 * * * ?"/>
    <task:scheduled ref="backupService" method="backup" cron="30 1 0/10 * * ?"/>
</task:scheduled-tasks>

排查原因:

首先进入日志目录,搜索日志

grep -rn '生成atom.xml' | sort | tail -10

发现在4月7日之后就没有执行了

blog-operate-web.2020-04-03.log:114:[blog-operate-web 2020-04-03 23:00:05.000] [INFO ] [pool-1-thread-1](AtomService.java:58) - 开始生成atom.xml
blog-operate-web.2020-04-03.log:115:[blog-operate-web 2020-04-03 23:00:05.132] [INFO ] [pool-1-thread-1](AtomService.java:85) - 生成atom.xml成功, 服务端文件路径: /export/App/www.6zeroq.com/atom.xml
blog-operate-web.2020-04-04.log:118:[blog-operate-web 2020-04-04 23:00:05.001] [INFO ] [pool-1-thread-1](AtomService.java:58) - 开始生成atom.xml
blog-operate-web.2020-04-04.log:119:[blog-operate-web 2020-04-04 23:00:05.131] [INFO ] [pool-1-thread-1](AtomService.java:85) - 生成atom.xml成功, 服务端文件路径: /export/App/www.6zeroq.com/atom.xml
blog-operate-web.2020-04-05.log:173:[blog-operate-web 2020-04-05 23:00:05.001] [INFO ] [pool-1-thread-1](AtomService.java:58) - 开始生成atom.xml
blog-operate-web.2020-04-05.log:174:[blog-operate-web 2020-04-05 23:00:05.144] [INFO ] [pool-1-thread-1](AtomService.java:85) - 生成atom.xml成功, 服务端文件路径: /export/App/www.6zeroq.com/atom.xml
blog-operate-web.2020-04-06.log:239:[blog-operate-web 2020-04-06 23:00:05.001] [INFO ] [pool-1-thread-1](AtomService.java:58) - 开始生成atom.xml
blog-operate-web.2020-04-06.log:240:[blog-operate-web 2020-04-06 23:00:05.119] [INFO ] [pool-1-thread-1](AtomService.java:85) - 生成atom.xml成功, 服务端文件路径: /export/App/www.6zeroq.com/atom.xml
blog-operate-web.2020-04-07.log:112:[blog-operate-web 2020-04-07 23:00:05.001] [INFO ] [pool-1-thread-1](AtomService.java:58) - 开始生成atom.xml
blog-operate-web.2020-04-07.log:113:[blog-operate-web 2020-04-07 23:00:05.140] [INFO ] [pool-1-thread-1](AtomService.java:85) - 生成atom.xml成功, 服务端文件路径: /export/App/www.6zeroq.com/atom.xml

再查询其他任务执行情况

grep -rn '刷新字典' | sort | tail -10
blog-operate-web.2020-04-08.log:37:[blog-operate-web 2020-04-08 07:01:05.005] [INFO ] [pool-1-thread-1](DictManager.java:45) - 刷新字典成功
blog-operate-web.2020-04-08.log:38:[blog-operate-web 2020-04-08 07:26:05.005] [INFO ] [pool-1-thread-1](DictManager.java:45) - 刷新字典成功
blog-operate-web.2020-04-08.log:41:[blog-operate-web 2020-04-08 07:51:05.004] [INFO ] [pool-1-thread-1](DictManager.java:45) - 刷新字典成功
blog-operate-web.2020-04-08.log:42:[blog-operate-web 2020-04-08 08:01:05.006] [INFO ] [pool-1-thread-1](DictManager.java:45) - 刷新字典成功
blog-operate-web.2020-04-08.log:43:[blog-operate-web 2020-04-08 08:26:05.005] [INFO ] [pool-1-thread-1](DictManager.java:45) - 刷新字典成功
blog-operate-web.2020-04-08.log:45:[blog-operate-web 2020-04-08 08:51:05.005] [INFO ] [pool-1-thread-1](DictManager.java:45) - 刷新字典成功
blog-operate-web.2020-04-08.log:46:[blog-operate-web 2020-04-08 09:01:05.005] [INFO ] [pool-1-thread-1](DictManager.java:45) - 刷新字典成功
blog-operate-web.2020-04-08.log:47:[blog-operate-web 2020-04-08 09:26:05.005] [INFO ] [pool-1-thread-1](DictManager.java:45) - 刷新字典成功
blog-operate-web.2020-04-08.log:48:[blog-operate-web 2020-04-08 09:51:05.005] [INFO ] [pool-1-thread-1](DictManager.java:45) - 刷新字典成功
blog-operate-web.2020-04-08.log:49:[blog-operate-web 2020-04-08 10:01:05.005] [INFO ] [pool-1-thread-1](DictManager.java:45) - 刷新字典成功

再看下备份的任务

grep -rn '备份' | grep -v '当前备份文件列表' | grep -v '备份配置' | sort  | tail -10
blog-operate-web.2020-04-07.log:10:[blog-operate-web 2020-04-07 00:01:30.127] [INFO ] [pool-1-thread-1](BackupService.java:209) - 备份结束
blog-operate-web.2020-04-07.log:49:[blog-operate-web 2020-04-07 10:01:30.008] [INFO ] [pool-1-thread-1](BackupService.java:141) - 开始备份
blog-operate-web.2020-04-07.log:52:[blog-operate-web 2020-04-07 10:01:30.156] [WARN ] [pool-1-thread-1](BackupService.java:185) - 原备份文件大小和现备份文件大小一致,跳过备份,sizeInBytes=1130547(1.07 MiB)
blog-operate-web.2020-04-07.log:53:[blog-operate-web 2020-04-07 10:01:30.157] [INFO ] [pool-1-thread-1](BackupService.java:209) - 备份结束
blog-operate-web.2020-04-07.log:6:[blog-operate-web 2020-04-07 00:01:30.004] [INFO ] [pool-1-thread-1](BackupService.java:141) - 开始备份
blog-operate-web.2020-04-07.log:99:[blog-operate-web 2020-04-07 20:01:30.007] [INFO ] [pool-1-thread-1](BackupService.java:141) - 开始备份
blog-operate-web.2020-04-07.log:9:[blog-operate-web 2020-04-07 00:01:30.125] [WARN ] [pool-1-thread-1](BackupService.java:185) - 原备份文件大小和现备份文件大小一致,跳过备份,sizeInBytes=1130547(1.07 MiB)
blog-operate-web.2020-04-08.log:54:[blog-operate-web 2020-04-08 10:01:30.008] [INFO ] [pool-1-thread-1](BackupService.java:141) - 开始备份
blog-operate-web.2020-04-08.log:6:[blog-operate-web 2020-04-08 00:01:30.007] [INFO ] [pool-1-thread-1](BackupService.java:141) - 开始备份
blog-operate-web.2020-04-08.log:9:[blog-operate-web 2020-04-08 00:01:39.579] [INFO ] [pool-1-thread-1](BackupService.java:209) - 备份结束

从上我们可以知道,生成订阅最后执行成功时间为:2020-04-07 23:00:05.140,刷新字典最后执行成功时间为:2020-04-08 00:01:39.579,而从2020-04-08 10:01:30.008开始的开始备份并没有对应备份结束关键字,所以我们可以推导是该时间触发的备份出了问题。

使用jps和jstack查找目前出问题的线程栈

blog@web0:~$ jps
1315 Bootstrap
1176 Jps
blog@web0:~$ jstack 1315 | grep -B 30 com.zeroq6
"ContainerBackgroundProcessor[StandardEngine[Catalina]]" #18 daemon prio=5 os_prio=0 tid=0x00007f5ba433b800 nid=0x537 waiting on condition [0x00007f5b77093000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1359)
	at java.lang.Thread.run(Thread.java:748)

"pool-1-thread-1" #17 prio=5 os_prio=0 tid=0x00007f5b80db3800 nid=0x536 runnable [0x00007f5b75226000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at com.sun.mail.util.TraceInputStream.read(TraceInputStream.java:126)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
	- locked <0x00000000faf5a568> (a java.io.BufferedInputStream)
	at com.sun.mail.util.LineInputStream.readLine(LineInputStream.java:104)
	at com.sun.mail.smtp.SMTPTransport.readServerResponse(SMTPTransport.java:2425)
	at com.sun.mail.smtp.SMTPTransport.simpleCommand(SMTPTransport.java:2375)
	at com.sun.mail.smtp.SMTPTransport$LoginAuthenticator.doAuth(SMTPTransport.java:986)
	at com.sun.mail.smtp.SMTPTransport$Authenticator.authenticate(SMTPTransport.java:927)
	at com.sun.mail.smtp.SMTPTransport.authenticate(SMTPTransport.java:861)
	at com.sun.mail.smtp.SMTPTransport.protocolConnect(SMTPTransport.java:766)
	- locked <0x00000000faf5a650> (a com.sun.mail.smtp.SMTPTransport)
	at javax.mail.Service.connect(Service.java:388)
	- locked <0x00000000faf5a650> (a com.sun.mail.smtp.SMTPTransport)
	at javax.mail.Service.connect(Service.java:246)
	at javax.mail.Service.connect(Service.java:195)
	at javax.mail.Transport.send0(Transport.java:254)
	at javax.mail.Transport.send(Transport.java:146)
	at com.zeroq6.common.mail.MailSender.sendMail(MailSender.java:150)
	at com.zeroq6.blog.operate.service.BackupService.backup(BackupService.java:180)
	at com.zeroq6.blog.operate.service.BackupService.backup(BackupService.java:64)
blog@web0:~$ 

问题定位

通过线程栈我们知道,线程卡在了java.net.SocketInputStream.socketRead0这个本地方法,查阅资料,由于没有设置邮件发送时读取超时时间(默认无限等待),导致该方法在连接丢失没有数据时无限等待,而Spring中任务调度默认一个线程,所以其他调度也无法执行了

解决方案

  • 自定义Spring任务线程个数,减少任务之前的影响
<task:scheduler id="blogTaskScheduler" pool-size="3"/>
    
<task:scheduled-tasks scheduler="blogTaskScheduler">
    <task:scheduled ref="atomService" method="genAtomXmlWithoutException" cron="5 0 23 * * ?"/>
    <task:scheduled ref="dictManager" method="flushDictList" cron="5 1/25 * * * ?"/>
    <task:scheduled ref="backupService" method="backup" cron="30 1 0/10 * * ?"/>
</task:scheduled-tasks>
  • 使用发送邮件使用连接超时,读取/写入超时设置
// 连接超时,5s
properties.setProperty("mail.smtp.connectiontimeout", "5000");
// 读超时,30s
properties.setProperty("mail.smtp.timeout", "30000");
// 写超时,30s
properties.setProperty("mail.smtp.writetimeout", "30000");

参考:

https://blog.csdn.net/Colton_Null/article/details/89218011

http://blog.sina.com.cn/s/blog_48d4cf2d0102w4mw.html

https://stackoverflow.com/questions/18970409/why-javamail-connection-timeout-is-too-long

https://docs.spring.io/spring/docs/4.3.9.RELEASE/spring-framework-reference/html/scheduling.html#scheduling-task-namespace