Skip to Content.
Sympa Menu

cado-nfs - Re: [Cado-nfs-discuss] Using GPU in Sieving Step

Subject: Discussion related to cado-nfs

List archive

Re: [Cado-nfs-discuss] Using GPU in Sieving Step


Chronological Thread 
  • From: Pierrick Gaudry <pierrick.gaudry@loria.fr>
  • To: hamid reza arkian <hamid.arkian@gmail.com>
  • Cc: cado-nfs-discuss@lists.gforge.inria.fr
  • Subject: Re: [Cado-nfs-discuss] Using GPU in Sieving Step
  • Date: Mon, 20 Aug 2012 09:29:29 +0200
  • List-archive: <http://lists.gforge.inria.fr/pipermail/cado-nfs-discuss>
  • List-id: A discussion list for Cado-NFS <cado-nfs-discuss.lists.gforge.inria.fr>

On Tue, Aug 07, 2012 at 11:36:31AM +0430, hamid reza arkian wrote:
> Hi,
>
> I tried GMP-ECM project (the branch that is currently under active
> development for GPU devices) on a GPU cluster and got good results.
> I want to replace ecm part of sieving step in CADO-NFS with GMP-ECM(that
> support GPU), but I have the following questions:
>
> - Basically, it's a good idea or not?
> - Is the GMP-ECM faster than ecm part of CADO-NFS? and if so, why the
> developers don't do this replacement,yet?
> In addition, if all things is OK and it's a good idea, it would be pleasure
> if you give me some tips and points before doing that.
>
> Thanks in advance,
> Hamid

Hi,

Using GPU for the ECM step in the so-called cofactorization step of the
sieving phase is something to consider. However, this is not as simple as
pluging gmp-ecm within Cado-nfs.

Here are a few thoughts on the topic of ECM for NFS:
- the size of the numbers to be tested within NFS is small, and few
curves are tested. This is not the traditionnal target of GMP-ECM, and
therefore another version of ECM was written in CADO-NFS (mostly by
Alex Kruppa, who is also a developper of GMP-ECM).
- Kruppa's thesis contains many details on this topic.
http://tel.archives-ouvertes.fr/tel-00477005/en/
- the main question for using GPU is how to take advantage of the
parallelism. I think that GPU implem of GMP-ECM currently runs many
curves in parallel to make an optimal use of the card. But for the
application to NFS, only few cruves are needed, and we want a
parallelization at the input number level.
- the question of the transfers between the card and the cpu must be
addressed. I tend to think that it would be interesting to accumulate a
list of numbers to be tested and then send all of them at once, and
keep the CPU busy with sieving while the GPU tries to finish the
cofactorization. This implies some reorganization of the sieving code
that is not so simple.

Regards,
Pierrick





Archive powered by MHonArc 2.6.19+.

Top of Page