Core dump when running rr_interval feature computation

Description

18/06/07 16:51:38 WARN hdfs.DFSClient: DFS Read
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1219024977-10.40.40.46-1519873848477:blk_1099707892_25967105 file=/cerebralcortex/data/42bbf143-b184-4cf7-9afa-289468d9e36b/2e42515e-1843-3984-a31a-70387ae69807/20171111.gz
at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1076)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:647)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:941)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:996)
at java.io.DataInputStream.read(DataInputStream.java:100)
hdfsRead: FSDataInputStream#read error:
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1219024977-10.40.40.46-1519873848477:blk_1099707892_25967105 file=/cerebralcortex/data/42bbf143-b184-4cf7-9afa-289468d9e36b/2e42515e-1843-3984-a31a-70387ae69807/20171111.gz
at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1076)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:647)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:941)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:996)
at java.io.DataInputStream.read(DataInputStream.java:100)
2018-06-07 16:51:38.708119 - [./MD2K_Cerebral_Cortex-2.2.2-py3.6.egg/cerebralcortex/core/data_manager/raw/stream_handler.py - read_hdfs_day_file - 292] - Error loading from HDFS: Cannot parse row. Traceback (most recent call last):
File "./MD2K_Cerebral_Cortex-2.2.2-py3.6.egg/cerebralcortex/core/data_manager/raw/stream_handler.py", line 275, in read_hdfs_day_file
data = curfile.read()
File "pyarrow/io.pxi", line 219, in pyarrow.lib.NativeFile.read (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:50632)
File "pyarrow/error.pxi", line 79, in pyarrow.lib.check_status (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:8345)
pyarrow.lib.ArrowIOError: HDFS read failed, errno: 255

18/06/07 16:51:39 WARN hdfs.DFSClient: zero
FSDataInputStream#close error:
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:471)
at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:737)
at java.io.FilterInputStream.close(FilterInputStream.java:181)
Exception ignored in: 'pyarrow.lib.NativeFile._dealloc_'
Traceback (most recent call last):
File "pyarrow/io.pxi", line 81, in pyarrow.lib.NativeFile.close (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:48901)
File "pyarrow/error.pxi", line 79, in pyarrow.lib.check_status (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:8345)
pyarrow.lib.ArrowIOError: HDFS: CloseFile failed
#

  1. A fatal error has been detected by the Java Runtime Environment:
    #

  2. SIGSEGV (0xb) at pc=0x00007f75f495dfc7, pid=66412, tid=0x00007f766394e740
    #

  3. JRE version: OpenJDK Runtime Environment (8.0_141-b16) (build 1.8.0_141-b16)

  4. Java VM: OpenJDK 64-Bit Server VM (25.141-b16 mixed mode linux-amd64 compressed oops)

  5. Problematic frame:

  6. V [libjvm.so+0x655fc7]
    #

  7. Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    #

  8. An error report file with more information is saved as:

  9. /usr/local/spark-2.2.0-bin-hadoop2.7/work/app-20180607164845-1084/1/hs_err_pid66412.log
    #

  10. If you would like to submit a bug report, please visit:

  11. http://bugreport.java.com/bugreport/crash.jsp
    #

Environment

None

Assignee

Nasir Ali

Reporter

Timothy Hnat

Labels

None

Priority

Highest
Configure