Skip to Content.
Sympa Menu

cado-nfs - Re: [Cado-nfs-discuss] interleaving queries

Subject: Discussion related to cado-nfs

List archive

Re: [Cado-nfs-discuss] interleaving queries


Chronological Thread 
  • From: Junyi <9jhzguy@gmail.com>
  • To: Emmanuel Thomé <emmanuel.thome@gmail.com>
  • Cc: "cado-nfs-discuss@lists.gforge.inria.fr" <cado-nfs-discuss@lists.gforge.inria.fr>
  • Subject: Re: [Cado-nfs-discuss] interleaving queries
  • Date: Thu, 11 Apr 2013 15:50:58 +0800
  • List-archive: <http://lists.gforge.inria.fr/pipermail/cado-nfs-discuss>
  • List-id: A discussion list for Cado-NFS <cado-nfs-discuss.lists.gforge.inria.fr>

Apologies for the terribly unhelpful error debug request previously, and appreciate the real speedy response; I just wanted to make sure I got the syntax right. 

I started with somthing like /bwc.pl :complete matrix=rsa640.bin nullspace=left wdir=lustre/bwc.split thr=2x2 mpi=8x8 mn=64 ys=0..64 interleaving=0 hosts=n001,n002 interval=
100. This has no issue. 
However, the COMMS time was taking 3x longer than the CPU (about 4s / iter, still on 10GbE unfortunately) so I wanted to include interleaving as an option.

I cleaned the wdir, and ran the same command, but with interleaving=1. This resulted in a Segmentation Fault in the dispatch stage. README indicates that I should change ys such that each krylov instance gets 64 bit width, and use the splits parameter accordingly.

I then ran it with interleaving=1, ys=0..128, splits=0,64,128, and it ran well until split, which indicated that "last split do not coincide with configured n". 

Checking the source, I understood it as having to set n=128, while leaving m=64. This reaches the krylov stage, but throws an "[err] event_queue_remove: 0x26b2bd0 (fd 11) not on queue 8" several times before exiting.

Following your suggested syntax for interleaving, I ran it as such: bwc.pl :complete matrix=rsa640.bin nullspace=left wdir=lustre/bwc.split thr=2x2 mpi=8x8 mn=128 ys=0..128 interleaving=0 hosts=n001,n002 interval=
100. However, this meets an untimely death at the secure stage, with message: abase_u64kl_set_ui_at: Assertion 'k < 64' failed.

---

Openmpi is 1.5.4, version of cado used is 1.1, not the latest release via git. non-interleaved krylov currently appears to be churning along happily on a two-node 64-core infiniband testbed (2.04s / iter, N = 1197000). Size of matrix appears to be around 38M x 38M, based on the merge.log.


On Wed, Apr 10, 2013 at 8:43 PM, Emmanuel Thomé <emmanuel.thome@gmail.com> wrote:
FYI, here is a command line which successfully computes a kernel using
bwc's interleaving:

./linalg/bwc/../.././build/fondue.mpi/linalg/bwc/bwc.pl  :complete
matrix=/local/rsa768/mats/rsa100.bin nullspace=left wdir=/tmp/bwc
thr=2x2  mpi=2x2 mn=128 ys=0..128 interleaving=1
hosts=fondue,raclette,tartiflette,berthoud mpi_extra_args="--mca
btl_tcp_if_exclude lo,virbr0" interval=200

(mpi is openmpi 1.6.1 here).

E.

On Wed, Apr 10, 2013 at 1:07 PM, Emmanuel Thomé
<emmanuel.thome@gmail.com> wrote:
> Can you expand on "crashes out" ?
>
> E.
>
> On Wed, Apr 10, 2013 at 12:35 PM, Junyi <a0032547@nus.edu.sg> wrote:
>> I'm running the cado-1.1-released version on a cluster, and have been trying
>> to enable interleaving at the krylov stage to mitigate the high comms
>> overhead.
>>
>> Interleaving = 0, mn = 64, ys=0..64, splits=0,64 currently runs well, but
>> Interleaving = 1, m = 64, n = 128, ys=0..128, splits=0,64,128 crashes out
>>
>> have i misused the parameters? please assist, thanks!
>>
>>
>>
>>
>> _______________________________________________
>> Cado-nfs-discuss mailing list
>> Cado-nfs-discuss@lists.gforge.inria.fr
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/cado-nfs-discuss
>>

_______________________________________________
Cado-nfs-discuss mailing list
Cado-nfs-discuss@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/cado-nfs-discuss




Archive powered by MHonArc 2.6.19+.

Top of Page