-------------------------------------------------------------------------- By default, for Open MPI 4.0 and later, infiniband ports on a device are not used by default. The intent is to use UCX for these devices. You can override this policy by setting the btl_openib_allow_ib MCA parameter to true. Local host: c001 Local adapter: mlx5_0 Local port: 1 -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: There was an error initializing an OpenFabrics device. Local host: c001 Local device: mlx5_0 -------------------------------------------------------------------------- [c001:384308] *** Process received signal *** [c001:384308] Signal: Segmentation fault (11) [c001:384308] Signal code: Address not mapped (1) [c001:384308] Failing at address: 0x20 [c001:384308] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7f7dad45c090] [c001:384308] [ 1] /usr/local/lib/libtorch_cpu.so(+0x46b846e)[0x7f7d8342346e] [c001:384308] [ 2] /usr/local/lib/libtorch_cpu.so(+0x46b9b70)[0x7f7d83424b70] [c001:384308] [ 3] /usr/local/lib/libtorch_cpu.so(+0x46c71fb)[0x7f7d834321fb] [c001:384308] [ 4] /usr/local/lib/libtorch_cpu.so(_ZN5torch3jit16InterpreterState3runERSt6vectorIN3c106IValueESaIS4_EE+0x52)[0x7f7d8341f212] [c001:384308] [ 5] /usr/local/lib/libtorch_cpu.so(+0x46a71a6)[0x7f7d834121a6] [c001:384308] [ 6] /usr/local/lib/libtorch_cpu.so(_ZNK5torch3jit6MethodclESt6vectorIN3c106IValueESaIS4_EERKSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES4_St4hashISD_ESt8equal_toISD_ESaISt4pairIKSD_S4_EEE+0x180)[0x7f7d83095170] [c001:384308] [ 7] /mmfs1/scratch/bwaters2/bwaters/job-92c45b02-3852-49c3-bc69-8df5b7330661-007-e4a28f0e-0e38-49c4-9c72-21d392bf51e1/TE_153579645116_004-and-MO_781946209112_000-1711823197/staged_job_files/repository/md/TorchML__MD_173118614730_000/libkim-api-model-driver.so(_ZN12PytorchModel3RunERN3c106IValueE+0x381)[0x7f7da8ddafb1] [c001:384308] [ 8] /mmfs1/scratch/bwaters2/bwaters/job-92c45b02-3852-49c3-bc69-8df5b7330661-007-e4a28f0e-0e38-49c4-9c72-21d392bf51e1/TE_153579645116_004-and-MO_781946209112_000-1711823197/staged_job_files/repository/md/TorchML__MD_173118614730_000/libkim-api-model-driver.so(_ZN32TorchMLModelDriverImplementation3RunEPKN3KIM21ModelComputeArgumentsE+0x5c)[0x7f7da8dd8d7c] [c001:384308] [ 9] /mmfs1/scratch/bwaters2/bwaters/job-92c45b02-3852-49c3-bc69-8df5b7330661-007-e4a28f0e-0e38-49c4-9c72-21d392bf51e1/TE_153579645116_004-and-MO_781946209112_000-1711823197/staged_job_files/repository/md/TorchML__MD_173118614730_000/libkim-api-model-driver.so(_ZN32TorchMLModelDriverImplementation7ComputeEPKN3KIM21ModelComputeArgumentsE+0xd)[0x7f7da8dd8e3d] [c001:384308] [10] /mmfs1/scratch/bwaters2/bwaters/job-92c45b02-3852-49c3-bc69-8df5b7330661-007-e4a28f0e-0e38-49c4-9c72-21d392bf51e1/TE_153579645116_004-and-MO_781946209112_000-1711823197/staged_job_files/repository/md/TorchML__MD_173118614730_000/libkim-api-model-driver.so(_ZN18TorchMLModelDriver7ComputeEPKN3KIM12ModelComputeEPKNS0_21ModelComputeArgumentsE+0x33)[0x7f7da8dd08e3] [c001:384308] [11] /usr/local/lib/libkim-api.so.2(_ZNK3KIM19ModelImplementation12ModelComputeEPKNS_16ComputeArgumentsE+0x4a0)[0x7f7dacda0900] [c001:384308] [12] /usr/local/lib/libkim-api.so.2(_ZNK3KIM19ModelImplementation7ComputeEPKNS_16ComputeArgumentsE+0x667)[0x7f7dacdaee67] [c001:384308] [13] /usr/local/lib/liblammps.so.0(_ZN9LAMMPS_NS7PairKIM7computeEii+0x20d)[0x7f7dae0e5c2d] [c001:384308] [14] /usr/local/lib/liblammps.so.0(_ZN9LAMMPS_NS6Verlet5setupEi+0x3a2)[0x7f7dae02b982] [c001:384308] [15] /usr/local/lib/liblammps.so.0(_ZN9LAMMPS_NS3Run7commandEiPPc+0xc7e)[0x7f7dadfc0b8e] [c001:384308] [16] /usr/local/lib/liblammps.so.0(_ZN9LAMMPS_NS5Input15execute_commandEv+0xc0f)[0x7f7dade3896f] [c001:384308] [17] /usr/local/lib/liblammps.so.0(_ZN9LAMMPS_NS5Input4fileEv+0x175)[0x7f7dade38c25] [c001:384308] [18] lammps(+0x13fe)[0x56210f8ed3fe] [c001:384308] [19] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f7dad43d083] [c001:384308] [20] lammps(+0x148e)[0x56210f8ed48e] [c001:384308] *** End of error message *** ../../td/CohesiveEnergyVsLatticeConstant__TD_554653289799_003/runner: line 194: 384308 Segmentation fault (core dumped) lammps -in output/isolated_atom.lammps.in > output/isolated_atom.lammps.log Error: Isolated atom energy parsed from LAMMPS is not numerical. Check the LAMMPS log for errors. Exiting... Command exited with non-zero status 1 {"realtime":11.88,"usertime":2.08,"systime":0.60,"memmax":322336,"memavg":0}