Subject: Discussion related to cado-nfs
List archive
- From: Emmanuel Thomé <Emmanuel.Thome@inria.fr>
- To: cado-nfs-discuss@lists.gforge.inria.fr
- Subject: Re: [Cado-nfs-discuss] OpenMPI trouble
- Date: Thu, 20 Apr 2017 00:14:03 +0200
- List-archive: <http://lists.gforge.inria.fr/pipermail/cado-nfs-discuss/>
- List-id: A discussion list for Cado-NFS <cado-nfs-discuss.lists.gforge.inria.fr>
Hi,
> Whats the problem?
It seems unlikely that your problem is related to cado-nfs. Something
is apparently wrong with your hardware setup, your openmpi invocation, or
both.
To start with, it's possibly useful that you specify which openmpi
version you're using.
The only thing I can do is give a few suggestions.
- it seems that you're using mpi over tcp (and hence, over ip). It may
be acceptable if you have 100GbE. However, if it so happens that your
interconnect of choice is not ethernet-based (e.g., if you have
infiniband), then you're going to get sub-par performance (and the
mpi-over-tcp-over-ip-over-ib is an awkward, but sometimes
encountered situation).
- 192.168.122.1 is typically the ip address for the virtual bridge
address for libvirt-based hypervisors and maybe others. That you get a
connection from that address means that either you have a vm host and
a vm guest among your nodes, which is a bit odd, or that you have two
vm guests (I hope on different metal...), and that the hosts do
bidirectional nat between them, or something like this. Very odd. You
would need to fix that. Maybe "--mca btl_tcp_if_exclude xxx,yyy" can
be useful, but I don't think it'll spare you the need to do some clean
up first.
- in general, more info about your setup would be useful. E.g. which
interconnect you are planning to use, and what is the output of "ip
addr show" on both nodes.
Best regards,
E.
On Thu, Apr 20, 2017 at 12:29:36AM +0300, Христофор Бобров wrote:
>
>
> Previously, I used MPICH2 to launch my application. But recently I had to
> use openmpi and when I started on 2 nodes, it gives an error:
>
> Open MPI detected an inbound MPI TCP connection request from a peer that
> appears to be part of this MPI job (i.e., it identified itself as part of
> this Open MPI job), but it is from an IP address that is unexpected. This
> is highly unusual.
>
> The inbound connection has been dropped, and the peer should simply try
> again with a different IP interface (i.e., the job should hopefully be able
> to continue).
>
> Local host: node2
>
> Local PID: 6029
>
> Peer hostname: node2([[40861,1],2])
>
> Source IP of socket:192.168.122.1
>
> Known IPs of peer:
>
> I run mpiexec -np 4 -machinefile ./hostfile ./test_hello mpi=2x2.
>
> Contents of the file hostfile:
> 192.168.56.1 slots=2
> 192.168.56.2 slots=2
>
> Whats the problem?
> --
> Best regards
> Бобров Х.
> _______________________________________________
> Cado-nfs-discuss mailing list
> Cado-nfs-discuss@lists.gforge.inria.fr
> https://lists.gforge.inria.fr/mailman/listinfo/cado-nfs-discuss
- [Cado-nfs-discuss] OpenMPI trouble, Христофор Бобров, 04/19/2017
- Re: [Cado-nfs-discuss] OpenMPI trouble, Emmanuel Thomé, 04/20/2017
Archive powered by MHonArc 2.6.19+.