-------------------------------------------------------------------------- By default, for Open MPI 4.0 and later, infiniband ports on a device are not used by default. The intent is to use UCX for these devices. You can override this policy by setting the btl_openib_allow_ib MCA parameter to true. Local host: c004 Local adapter: mlx5_0 Local port: 1 -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: There was an error initializing an OpenFabrics device. Local host: c004 Local device: mlx5_0 -------------------------------------------------------------------------- [c004:2649547] *** Process received signal *** [c004:2649547] Signal: Segmentation fault (11) [c004:2649547] Signal code: Address not mapped (1) [c004:2649547] Failing at address: 0x6d [c004:2649547] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7f1f483a3090] [c004:2649547] [ 1] /lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_assignERKS4_+0x2c)[0x7f1f486b024c] [c004:2649547] [ 2] /usr/local/lib/libtorch_cpu.so(+0x46b847d)[0x7f1f1f2eb47d] [c004:2649547] [ 3] /usr/local/lib/libtorch_cpu.so(+0x46b9b70)[0x7f1f1f2ecb70] [c004:2649547] [ 4] /usr/local/lib/libtorch_cpu.so(+0x46c71fb)[0x7f1f1f2fa1fb] [c004:2649547] [ 5] /usr/local/lib/libtorch_cpu.so(_ZN5torch3jit16InterpreterState3runERSt6vectorIN3c106IValueESaIS4_EE+0x52)[0x7f1f1f2e7212] [c004:2649547] [ 6] /usr/local/lib/libtorch_cpu.so(+0x46a71a6)[0x7f1f1f2da1a6] [c004:2649547] [ 7] /usr/local/lib/libtorch_cpu.so(_ZNK5torch3jit6MethodclESt6vectorIN3c106IValueESaIS4_EERKSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES4_St4hashISD_ESt8equal_toISD_ESaISt4pairIKSD_S4_EEE+0x180)[0x7f1f1ef5d170] [c004:2649547] [ 8] /mmfs1/scratch/bwaters2/bwaters/job-96cd42be-2438-4e8c-905f-77264bad7943-007-3fce83c8-b7bf-42b3-b2dc-82578dcbab22/TE_445358041415_003-and-MO_781946209112_000-1711746978/staged_job_files/repository/md/TorchML__MD_173118614730_000/libkim-api-model-driver.so(_ZN12PytorchModel3RunERN3c106IValueE+0x381)[0x7f1f3e262fb1] [c004:2649547] [ 9] /mmfs1/scratch/bwaters2/bwaters/job-96cd42be-2438-4e8c-905f-77264bad7943-007-3fce83c8-b7bf-42b3-b2dc-82578dcbab22/TE_445358041415_003-and-MO_781946209112_000-1711746978/staged_job_files/repository/md/TorchML__MD_173118614730_000/libkim-api-model-driver.so(_ZN32TorchMLModelDriverImplementation3RunEPKN3KIM21ModelComputeArgumentsE+0x5c)[0x7f1f3e260d7c] [c004:2649547] [10] /mmfs1/scratch/bwaters2/bwaters/job-96cd42be-2438-4e8c-905f-77264bad7943-007-3fce83c8-b7bf-42b3-b2dc-82578dcbab22/TE_445358041415_003-and-MO_781946209112_000-1711746978/staged_job_files/repository/md/TorchML__MD_173118614730_000/libkim-api-model-driver.so(_ZN32TorchMLModelDriverImplementation7ComputeEPKN3KIM21ModelComputeArgumentsE+0xd)[0x7f1f3e260e3d] [c004:2649547] [11] /mmfs1/scratch/bwaters2/bwaters/job-96cd42be-2438-4e8c-905f-77264bad7943-007-3fce83c8-b7bf-42b3-b2dc-82578dcbab22/TE_445358041415_003-and-MO_781946209112_000-1711746978/staged_job_files/repository/md/TorchML__MD_173118614730_000/libkim-api-model-driver.so(_ZN18TorchMLModelDriver7ComputeEPKN3KIM12ModelComputeEPKNS0_21ModelComputeArgumentsE+0x33)[0x7f1f3e2588e3] [c004:2649547] [12] /usr/local/lib/libkim-api.so.2(_ZNK3KIM19ModelImplementation12ModelComputeEPKNS_16ComputeArgumentsE+0x4a0)[0x7f1f47ce7900] [c004:2649547] [13] /usr/local/lib/libkim-api.so.2(_ZNK3KIM19ModelImplementation7ComputeEPKNS_16ComputeArgumentsE+0x667)[0x7f1f47cf5e67] [c004:2649547] [14] /usr/local/lib/liblammps.so.0(_ZN9LAMMPS_NS7PairKIM7computeEii+0x20d)[0x7f1f4902cc2d] [c004:2649547] [15] /usr/local/lib/liblammps.so.0(_ZN9LAMMPS_NS6Verlet5setupEi+0x3a2)[0x7f1f48f72982] [c004:2649547] [16] /usr/local/lib/liblammps.so.0(_ZN9LAMMPS_NS3Run7commandEiPPc+0xc7e)[0x7f1f48f07b8e] [c004:2649547] [17] /usr/local/lib/liblammps.so.0(_ZN9LAMMPS_NS5Input15execute_commandEv+0xc0f)[0x7f1f48d7f96f] [c004:2649547] [18] /usr/local/lib/liblammps.so.0(_ZN9LAMMPS_NS5Input4fileEv+0x175)[0x7f1f48d7fc25] [c004:2649547] [19] lammps(+0x13fe)[0x55913a4a83fe] [c004:2649547] [20] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f1f48384083] [c004:2649547] [21] lammps(+0x148e)[0x55913a4a848e] [c004:2649547] *** End of error message *** Traceback (most recent call last): File "../../td/TriclinicPBCEnergyAndForces__TD_892847239811_003/runner", line 45, in run_lammps lammps_process = subprocess.check_call( File "/usr/lib/python3.8/subprocess.py", line 364, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['lammps', '-in', 'isolated_atom.lammps.Si.in']' died with . During handling of the above exception, another exception occurred: Traceback (most recent call last): File "../../td/TriclinicPBCEnergyAndForces__TD_892847239811_003/runner", line 207, in isolated_atom_energies[symbol] = get_isolated_atom_energy( File "../../td/TriclinicPBCEnergyAndForces__TD_892847239811_003/runner", line 71, in get_isolated_atom_energy run_lammps(templated_input, lammps_output) File "../../td/TriclinicPBCEnergyAndForces__TD_892847239811_003/runner", line 55, in run_lammps raise Exception("LAMMPS did not exit properly:\n" + extrainfo) Exception: LAMMPS did not exit properly: LAMMPS (2 Aug 2023 - Update 1) Command exited with non-zero status 1 {"realtime":4.43,"usertime":2.99,"systime":5.37,"memmax":321508,"memavg":0}