Subject: Discussion related to cado-nfs
List archive
- From: Junyi <9jhzguy@gmail.com>
- To: Emmanuel Thomé <emmanuel.thome@gmail.com>
- Cc: "cado-nfs-discuss@lists.gforge.inria.fr" <cado-nfs-discuss@lists.gforge.inria.fr>
- Subject: Re: [Cado-nfs-discuss] interleaving queries
- Date: Thu, 11 Apr 2013 15:50:58 +0800
- List-archive: <http://lists.gforge.inria.fr/pipermail/cado-nfs-discuss>
- List-id: A discussion list for Cado-NFS <cado-nfs-discuss.lists.gforge.inria.fr>
Apologies for the terribly unhelpful error debug request previously, and appreciate the real speedy response; I just wanted to make sure I got the syntax right.
I started with somthing like /bwc.pl :complete matrix=rsa640.bin nullspace=left wdir=lustre/bwc.split thr=2x2 mpi=8x8 mn=64 ys=0..64 interleaving=0 hosts=n001,n002 interval=
100. This has no issue.
However, the COMMS time was taking 3x longer than the CPU (about 4s / iter, still on 10GbE unfortunately) so I wanted to include interleaving as an option.I cleaned the wdir, and ran the same command, but with interleaving=1. This resulted in a Segmentation Fault in the dispatch stage. README indicates that I should change ys such that each krylov instance gets 64 bit width, and use the splits parameter accordingly.
I then ran it with interleaving=1, ys=0..128, splits=0,64,128, and it ran well until split, which indicated that "last split do not coincide with configured n".
Checking the source, I understood it as having to set n=128, while leaving m=64. This reaches the krylov stage, but throws an "[err] event_queue_remove: 0x26b2bd0 (fd 11) not on queue 8" several times before exiting.
Following your suggested syntax for interleaving, I ran it as such: bwc.pl :complete matrix=rsa640.bin nullspace=left wdir=lustre/bwc.split thr=2x2 mpi=8x8 mn=128 ys=0..128 interleaving=0 hosts=n001,n002 interval=
100. However, this meets an untimely death at the secure stage, with message: abase_u64kl_set_ui_at: Assertion 'k < 64' failed.
---
Openmpi is 1.5.4, version of cado used is 1.1, not the latest release via git. non-interleaved krylov currently appears to be churning along happily on a two-node 64-core infiniband testbed (2.04s / iter, N = 1197000). Size of matrix appears to be around 38M x 38M, based on the merge.log.
On Wed, Apr 10, 2013 at 8:43 PM, Emmanuel Thomé <emmanuel.thome@gmail.com> wrote:
FYI, here is a command line which successfully computes a kernel using
bwc's interleaving:
./linalg/bwc/../.././build/fondue.mpi/linalg/bwc/bwc.pl :complete
matrix=/local/rsa768/mats/rsa100.bin nullspace=left wdir=/tmp/bwc
thr=2x2 mpi=2x2 mn=128 ys=0..128 interleaving=1
hosts=fondue,raclette,tartiflette,berthoud mpi_extra_args="--mca
btl_tcp_if_exclude lo,virbr0" interval=200
(mpi is openmpi 1.6.1 here).
E.
On Wed, Apr 10, 2013 at 1:07 PM, Emmanuel Thomé
<emmanuel.thome@gmail.com> wrote:
> Can you expand on "crashes out" ?
>
> E.
>
> On Wed, Apr 10, 2013 at 12:35 PM, Junyi <a0032547@nus.edu.sg> wrote:
>> I'm running the cado-1.1-released version on a cluster, and have been trying
>> to enable interleaving at the krylov stage to mitigate the high comms
>> overhead.
>>
>> Interleaving = 0, mn = 64, ys=0..64, splits=0,64 currently runs well, but
>> Interleaving = 1, m = 64, n = 128, ys=0..128, splits=0,64,128 crashes out
>>
>> have i misused the parameters? please assist, thanks!
>>
>>
>>
>>
>> _______________________________________________
>> Cado-nfs-discuss mailing list
>> Cado-nfs-discuss@lists.gforge.inria.fr
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/cado-nfs-discuss
>>
_______________________________________________
Cado-nfs-discuss mailing list
Cado-nfs-discuss@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/cado-nfs-discuss
- [Cado-nfs-discuss] interleaving queries, Junyi, 04/10/2013
- Re: [Cado-nfs-discuss] interleaving queries, Emmanuel Thomé, 04/10/2013
- Re: [Cado-nfs-discuss] interleaving queries, Emmanuel Thomé, 04/10/2013
- <Possible follow-up(s)>
- Re: [Cado-nfs-discuss] interleaving queries, Junyi, 04/11/2013
- Re: [Cado-nfs-discuss] interleaving queries, Emmanuel Thomé, 04/11/2013
- Re: [Cado-nfs-discuss] interleaving queries, Junyi, 04/12/2013
- Re: [Cado-nfs-discuss] interleaving queries, Emmanuel Thomé, 04/12/2013
- Re: [Cado-nfs-discuss] interleaving queries, Junyi, 04/12/2013
- Re: [Cado-nfs-discuss] interleaving queries, Emmanuel Thomé, 04/11/2013
- Re: [Cado-nfs-discuss] interleaving queries, Emmanuel Thomé, 04/10/2013
Archive powered by MHonArc 2.6.19+.