20101201, 21:57  #1 
Nov 2010
9_{16} Posts 
Factored my first SemiPrime Some Questions
Hey all  I have some beginners questions about the use of msieve in conjunction with ggnfs, hopefully I put this in (mostly) the right place.
The idea of factoring some large numbers appealed to me, but I know it's a bit more complicated than just running a script  so I've been reading up on it a lot lately. I followed the guide and successfully factored RSA100, which was really exciting  but I have some questions so I actually understand what I was doing besides just doing the magical incantations. I realize some of this process might be overkill when talking about a 100digit semiprime, but I want to get the process right for something small, so when I move on to larger numbers I can understand it all. For the record, I'm running on a Linux box, 2GB RAM, DualCore Athlon64 4200+ with Hyperthreading enabled, in 32 bit (x86)  yes that's not a typo. When I installed the OS I made the mistake of not doing x64, and rebuilding the toolchain in gentoo is impossible, so I will probably wipe and reinstall soon, but for now it's 32 bit. I intend to do benchmarking to compare with and without Hyperthreading, x32 and x64, how optimal and suboptimal L1_BITS affects it... I have two GeForce 480's in my desktop (running Win7) to do CUDA testing on, and I have some more boxes I can borrow or turn on to do distributed sieving. Part 1: Polynomial Selection Selecting a good polynomial reduces the time spent sieving, and there are tests to determine what a 'good' polynomial is  the one I see mentioned a lot being the Murphy test. And I saw people comparing polynomials using a Murphy Score, which was of the form 2.64e12... here are some selections from rsa100.dat.p: ...I recognize c0c4 are the coefficients used in the polynomial, and I assume that Y0 and Y1 are more polynomial search parameters that are way over my head. But what are norm, e, alpha, and rroots (real roots?)? Is e the Murphy Score? It goes up and down over the course of the search  I thought it would continually get better. (Why keep track of polynomials that are worse than the one you have?) Here is the log for rsa100.dat.p Later on, I'd like to run multiple polynomial selections in parallel over different ranges and choose the best. How can I objectively compare two polynomials to see which is 'better'? Is it only worth comparing Murphy Scores? Part 2: Sieving After polynomial selection ran for about an hour (longer than the .35 hours it promised) I killed it, and ran the script to start sieving. It estimated about 4M relations needed, and kicked off four jobs at once to compute them. It started q at q = 900000. I understand Q is the range over which we are searching for relations, and the most distrubutable part of the operation, and it seems that you find more relations at lower Q values  but I have a few academic questions. Why did it start Q at 900K? Similarly in over in this thread Q is started at 20M, and here, Q=30M. How do you determine a good Qstartvalue? Related, I have no idea what the M 1 parameter means. I know M 0 means start on the "algebraic side of mpqs treatment" and M 1 means the rational side  but I haven't the faintest what that _actually_ means. I also don't understand the difference between the gnfslasieve4IXXe binaries. 11 through 16, they seem to be suited for different jobs. I've seen one thread talk about using 15e, another 16e. Looking at the code of ./factmsieve.py, it seems each is for a different range of digits in the composite (when working with the gnfs, I'm not too interested in the sfns):  <95 = lasieve4I11e  96110 = lasieve4I12e  111140 = lasieve4I13e  141158 = lasieve4I14e  158185 = lasieve4I15e  >185 = lasieve4I16e If that is correct, what is the (approximate) difference between them? I suppose as far as telling when it is better to take a step up (or down)  it would just do test sieving and see which performs faster? So I sieved and I sieved and it didn't take very long, by the time Q got to 1.3M, I had 4778012 relations, 116.7% of the estimated minimum (4095000). Time to move on. Part 3: Linear Algebra Once upon a time I took a computer vision course, things like edge detection and pattern recognition, and movement prediction  without taking linear algebra. I'm proud (or ashamed) to say that after me they made Linear Algebra a prequisite for the course. So I'm not too hot at it. Step nc1 has to do with checking the relations. Can this be done incrementally as sieve jobs complete? Or must it be run on the entire sieve results? Can it be run, and the results discarded, to check the validity of a sieve job, as a sanity check? If possible, can you explain what it's actually *doing*? Anway, I was so excited when I saw the news  my primes were ready to come out of the oven. I have no primes, no primes for me =( I had no idea what went wrong. But I typed up nearly this whole post, and finally found them! My primes! I haven't the faintest why the python script failed to give the result of all my (okay not really 'my') hard work, but I found them. So I have a lot of academic questions wrapped up in this narrative (I tried to make them stand out), but I also wanted to thank everyone (especially jasonp) profusely for all the hard work they've put in to making these tools  it's really exciting. 
20101202, 02:18  #2 
Tribal Bullet
Oct 2004
3·1,181 Posts 
For the poly selection, Y0 and Y1 are the coefficients of the rational polynomial. The other numbers attempt to measure how good the polynomial will be if you sieved with it; the Murphy E score is the most accurate, and the most expensive to compute. It was basically invented so that two polynomials could be directly compared by their E values. The reason the code prints out large numbers of polynomials is that the E value is not a perfect measure of the potential in a polynomial, and for larger problems there will be many polynomials that have very similar E values. So for the largest problems you have to choose the best polynomial via test sieving, and the E score comparison is there to reduce the amount of test sieving you have to do. I agree that it's more helpful to print out, say, the top 10 polynomials, but you also don't want the code to print nothing for weeks until it's done. Right now any polynomial whose E value exceeds a given (fairly stringent) bound is printed.
The linear algebra is divided into two halves; nc1 does the first and nc2 does the second. Both require all the relations to be available at once. The first half is the 'filtering' phase, which throws away relations that will not help complete the factorization, and combines the rest into small groups that make the resulting matrix much smaller. The actual linear algebra then selects a very specific collection of (about half of) those groups that allow the factorization to proceed. I'm afraid the details here are complicated. Congrats on getting started! 
20101202, 03:21  #3 
Nov 2003
2^{2}×5×373 Posts 
Full (very long) quote removed
Please only quote what you are replying to Would you like references? If you really want to learn about the algorithm, I can provide them. Otherwise, it will probably remain just a black box. And I would be happy to answer any mathematical questions you might have about the algorithm. And you should start by learning about QS and how it works. Trying to understand NFS before studying QS will be nigh to impossible unless you have a very strong math background. (advanced undergrad with at least one course in algebraic number theory or abstract algebra; do you know what a prime ideal is?) An excellent general discussion can be found in the Crandall & Pomerance book: Prime Numbers; A Computational Perspective. It will also give the number theory background material that is needed. Last fiddled with by smh on 20101216 at 20:38 Reason: eliminate some redundancy 
20101202, 03:27  #4  
Aug 2006
3·1,993 Posts 
tal  For what it's worth, Silverman here is a great resource on this subject, but he doesn't like beginner questions. :) He's a real expert on the NFS and quite generous with his time.
Quote:
Quote:
Last fiddled with by CRGreathouse on 20101202 at 03:28 

20101202, 03:41  #5  
Nov 2003
2^{2}×5×373 Posts 
Quote:
is beginner assertions and presumptions. I also openly admit to intolerance toward "willful ignorance": those the claim to be interested, but are unwilling to learn. 

20101202, 05:18  #6  
Aug 2006
13533_{8} Posts 
Quote:


20101202, 14:26  #7 
May 2008
Worcester, United Kingdom
7^{2}×11 Posts 
Hi Tal,
With reference to your 'missing primes' when using my factmsieve.py script, I have had bug reports indicating that the script sometimes fails to run the square root step for no apparent reason. If the final summary file is deleted and the script is then run again, it then completes the factorisation. I have not been able to debug this as I have not seen the behaviour. But I have now spent some time trying to find what might cause this and I think that I have may have found the issue. It seems to be caused by some Python versions becoming confused about indentation when a script contains mixed tab/space indents. I hope the attached version (v0.76) corrects this problem. This file now uses only spaces (no tabs) and is a Windows convention file with CRLF line endings (if it is run on *nix, it _might_ need to be converted for *nix line endings). I have also made a few other minor changes to improve the scripts termination behaviour. Please report any issues here. Brian 
20101202, 14:51  #8 
Nov 2010
3^{2} Posts 
R.D.  I certainly don't want to waste your time or patience, so I'll treat it more as a black box for now, brush up on my background and read up about the Quadratic Sieve for now. I think I understand enough right now to go through factoring larger numbers, and benchmarking each step should give me an idea going forward if things are taking more or less time than expected, and if I should start experimenting with other options.
Brian  I was running v0.74 on a Linux box, under Python 2.6.5  I run gentoo, so everything, including python, is compiled from source. I'll take a look at the newest version and hopefully time will permit me to run through the whole process again today and see what might have gone awry. 
20101202, 15:15  #9  
Aug 2006
3·1,993 Posts 
Quote:


20101202, 15:40  #10  
Nov 2003
1110100100100_{2} Posts 
Quote:
But you can't ask any meaningful questions until you have at least minimal understanding of how the algorithms work. Even something simple as a reply to your question about the meaning of Y0 and Y1 in the inputs will not make much sense unless you have some understanding of the algorithm itself. Similarly for other parametric inputs. There are some simplified explanations that you can find for both QS and NFS on the web. I will give you one of my favorite mantras: "Google is my friend" What is your CS background? Have you had a course in algorithms? You say that you have taken Linear Algebra. Have you taken any other math courses? 

20101202, 16:01  #11  
Nov 2010
3^{2} Posts 
Quote:
Quote:
I'm not scared to start at QS, and work my way backwards googling and reading, ingesting each building block as I come across it  it's just going to take some time. Last fiddled with by tal on 20101202 at 16:04 Reason: clarification 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Semiprime and nalmost prime candidate for the k's with algebra for the Sierpinski/Riesel problem  sweety439  sweety439  11  20200923 01:42 
Generating a uniform random nbit semiprime  danaj  Programming  17  20170915 16:41 
Factored vs. Completely factored  aketilander  Factoring  4  20120808 18:09 
2,709+ factored  JHansen  Factoring  47  20050629 17:59 
RSA200 factored  xilman  Factoring  23  20050602 03:24 