Mersenne Digest Tuesday, 12 May 1998 Volume 01 : Number 358 ---------------------------------------------------------------------- From: GivenRandy Date: Sun, 10 May 1998 09:29:49 EDT Subject: Mersenne: Performance Improvements? > Its heavily hand tuned heavily pipelined pentium floating point assembler > language. The "C" code in Prime95 is purely user interface, setup, etc. > The entire core is assembler. A while ago, I spent a couple days trying to speed up the assembly code, but did not come up with anything better. Hats off to the programmers! Randy Given GivenRandy@aol.com http://members.aol.com/GivenRandy public key at http://members.aol.com/GivenRandy/pgpkey.asc ------------------------------ From: mikus@bga.com (Mikus Grinbergs) Date: Sun, 10 May 1998 08:41:25 -0500 Subject: Re: Mersenne: program The current AMD K6 does not have a FPU pipeline. Further, it takes a full two clock cycles to execute an FXCH instruction. The result is that the K6 is twice as slow as a plain Pentium (of the same clock speed) while running the prime program. The new K6-2 (formerly called the K6-3D) has a new, fast, 32-bit (i.e., SHORT mantissa) floating point multiply instruction. I was wondering - is there a way to do the prime algorithms using 32-bit floating point multiplies (in other words, if there were a version of prime coded specifically to take advantage of the K6-2, could it run noticeably faster on the K6-2 than the current prime version) ? mikus (using the old K6) ------------------------------ From: Will Edgington Date: Sun, 10 May 1998 11:53:17 -0700 Subject: Re: Mersenne: program Mikus Grinbergs writes: The current AMD K6 does not have a FPU pipeline. Further, it takes a full two clock cycles to execute an FXCH instruction. The result is that the K6 is twice as slow as a plain Pentium (of the same clock speed) while running the prime program. The new K6-2 (formerly called the K6-3D) has a new, fast, 32-bit (i.e., SHORT mantissa) floating point multiply instruction. I was wondering - is there a way to do the prime algorithms using 32-bit floating point multiplies (in other words, if there were a version of prime coded specifically to take advantage of the K6-2, could it run noticeably faster on the K6-2 than the current prime version) ? Using a 32 bit mantissa (note: _not_ an IEEE float, which puts the exponent, its sign, _and_ the mantissa within 32 bits) would reduce the FFT word length to about half the current length within a 64 bit mantissa for the Intel CPUs, meaning the FFT run length would have to be doubled. Since the CPU time required goes up as the run length times its logarithm, the increase in CPU time would be more than double and staying with the FXCH instruction is actually a win, based on the above. Right? Will ------------------------------ From: STL137 Date: Sun, 10 May 1998 15:22:22 EDT Subject: Mersenne: Re: Mersenne Digest V1 #357 Hey all. << Hey if some one sends me the algorithm and some basic flow structure guidelines for the things used in prime95 will try and create new faster program. dont expect any results any time soon. will be using different language this is just something for me to try but i dont have the info(algorithms, etc). >> Um, whoever wrote this: "Different language" makes me think of "same computer". If you really want to code a LL test for an Intel computer, I would say: forget it. First of all, you don't know the algorithm real well. Secondly, you'd have to communicate with the PrimeNet server. Thirdly, I believe you cannot create anything faster/better. Mr. Woltman has been working on this for something around 5 years (is this right?), and tweaking it all the way. It is extremely efficient and purrs like a kitten. I wouldn't try making a new Prime95 just like I would never bother creating an operating system. But hey, if you have any speed suggestions, send them to Mr. Woltman. (he says his code for exponents >10 million is slow) STL137 ------------------------------ From: "Scott Kurowski" Date: Sun, 10 May 1998 13:58:04 -0700 Subject: RE: Mersenne: Prime95 version 16.2 --> PrimeNet error 87 [John Simmons ] > I tried running [the v16] version and was getting Error 87 I will update the PrimeNet FAQ page to reflect more v16 issues when George releases it. Chuck, John P. and David did a good job identifying the key checks to make: (a) new v16 versions of the primenet protocol DLLs (b) ensuring the entropia.com domain is available - when I change software or fix things I shut off the HTTP server (c) clearing outstanding transactions before upgrading - always very important (d) resetting the proxy password and mask flag if upgrading from before the 15 March v15.x HTTP DLL. John, thanks for your observation about (d). A change I made in March to handle less robust proxy servers is the source of this, and I didn't catch the PW impact. For 15.4 clients, there's a 23 April patch to the HTTP PrimeNet.dll for error 87 when sending update completion dates, at http://entropia.com/ips/HttpNet1.zip, where (d) also applies. Regards, - -scott ------------------------------ From: Nick Craig-Wood Date: Mon, 11 May 1998 00:09:43 +0100 (BST) Subject: Re: Mersenne: program Will Edgington wrote: > Using a 32 bit mantissa (note: _not_ an IEEE float, which puts the > exponent, its sign, _and_ the mantissa within 32 bits) would reduce the > FFT word length to about half the current length within a 64 bit mantissa > for the Intel CPUs, meaning the FFT run length would have to be doubled. I don't think thats quite right. The current implementation stores about 17 bits of number per double. For the result you need 2 * 17 bits (because you are squaring) + 18 bits carry (one for each stage of the 2^18 = 256k length transform) which is 52 bits. That's how many bits you can fit in a double (which has exponent and sign in the 64 bits as well) plus some round off bits. For a 32 bit mantissa you would need a transform length of 2^20 and you would be putting 6 bits of number in each 32 bit mantissa. I think you would need a few bits for round off errors which would push your transform length up to 2^22 or more making it totally impractical. - -- Nick Craig-Wood ncw1@axis.demon.co.uk http://www.axis.demon.co.uk/ ------------------------------ From: Michiel van Loon Date: Mon, 11 May 1998 06:33:36 +0200 Subject: Mersenne: PRIMEOS2 3.0.3 now supports SOCKS Today I have uploaded version 3.0.3 of PRIMEOS2 which will also run in a SOCKS environment. Thanks to Brian Jongekryg, who investigated the problem. One of the consequences is that the EMX environment is no longer needed. The new version can be found at: http://www.xs4all.nl/~mfvl/prime/ Regards, Michiel van Loon ------------------------------ From: Jean-Luc Cooke Date: Mon, 11 May 1998 12:30:38 -0400 Subject: Mersenne: Why not stop early? This is a multi-part message in MIME format. - --------------A20F4DDEA826BE587B286EAF Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Eh there, I have a question that I'm sure is related to the lack of understanding of the LL test used in Prime95. I understand the LL test to check for 2^i + 1 as a factor of 2^p - 1 for all 0 <= i <= p. If one fails, then 2^p - 1 is not prime. Then why is it that Prime 95 always seems to go through all i's? TTYL JLC - -- Jean-Luc Cooke Carleton University, Electrical Engineering There is a 90% chance that right at this moment I am doing Calculus. God damn Newton, damn him to hell! - ---------------------------------------------------------------------- Hyperactivity Unbounded http://www.engsoc.carleton.ca/~jlcooke - ---------------------------------------------------------------------- AOL-IM ID: JLinOTTAWA http://register.netscape.oscar.aol.com - ---------------------------------------------------------------------- ICQ# 4474419 http://www.mirabilis.com - ---------------------------------------------------------------------- Email addresses jlcooke@ottawa.com jlcooke@magma.ca jlcooke@magmacom.com jlcooke@engsoc.carleton.ca jlcooke@chat.carleton.ca - ---------------------------------------------------------------------- - --------------A20F4DDEA826BE587B286EAF Content-Type: text/html; charset=us-ascii; name="ICQemail.html" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ICQemail.html" Content-Base: "file:///D|/WINNT/Profiles/Administrato r/Desktop/ICQemail.html" JL is... - --------------A20F4DDEA826BE587B286EAF Content-Type: text/x-vcard; charset=us-ascii; name="vcard.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for Jean-Luc Cooke Content-Disposition: attachment; filename="vcard.vcf" begin: vcard fn: Jean-Luc Cooke n: Cooke;Jean-Luc org: Carleton University adr: 270A Dalehurst Dr.;;;Nepean;ON;K2G 4M8;Canada email;internet: jlcooke@ottawa.com title: Electrical Engineering Student tel;work: na tel;fax: na tel;home: na note: My web page address is: www.engsoc.carleton.ca/~jlcooke/ x-mozilla-cpt: ;0 x-mozilla-html: TRUE version: 2.1 end: vcard - --------------A20F4DDEA826BE587B286EAF-- ------------------------------ From: George Woltman Date: Mon, 11 May 1998 14:11:39 -0400 Subject: Re: Mersenne: Why not stop early? Hi, At 12:30 PM 5/11/98 -0400, Jean-Luc Cooke wrote: >Eh there, > >I have a question that I'm sure is related to the lack of understanding >of the LL test used in Prime95. > >I understand the LL test to check for 2^i + 1 as a factor of 2^p - 1 for >all 0 <= i <= p. If one fails, then 2^p - 1 is not prime. Then why is >it that Prime 95 always seems to go through all i's? The LL test does not test for factors, rather it is a primality test that must run though P iterations to determine if 2^P-1 is prime or composite. The Lucas-Lehmer test is described in most books on Number Theory. Best regards, George ------------------------------ From: Susanne Schulz Date: Mon, 11 May 1998 20:54:09 +0200 (MET DST) Subject: Mersenne: Re: Mersenne Digest V1 #357 Dear everyone, I hope this mail makes it through, the last one didn't. I just wish to thank everyone who has helped me to understand prime numbers and mathematics a bit better, Your views have not only been highly instructive, but very inspiring, too. I will get back to those who replied within the next days...to ask some more questions :) I also wish to thank Joel Smith for his help with the e-mails, and the support. Furthermore I wish You all good luck in the search for the next record prime, with kind regards, Susanne Schulz ------------------------------ From: Will Edgington Date: Mon, 11 May 1998 12:22:55 -0700 Subject: Mersenne: Why not stop early? Jean-Luc Cooke writes: I understand the LL test to check for 2^i + 1 as a factor of 2^p - 1 for all 0 <= i <= p. If one fails, then 2^p - 1 is not prime. Then why is it that Prime 95 always seems to go through all i's? Because the LL test does not test for factors; it only tests whether 2^p - 1 is prime or not and gives no information at all until it completes. Since the LL test takes a long time to run, though, George Woltman's prime95 and related programs do some trial factoring first, to see if there's a small factor. The amount of trial factoring done is related to the time it takes to do the LL test, the time it takes to do the trial factoring, and the chance of finding a small factor. Will ------------------------------ From: SpoolDog Date: Mon, 11 May 1998 15:54:56 EDT Subject: Re: Mersenne: program(alpha) In a message dated 98-05-11 12:55:18 EDT, you write: << SpoolDog wrote: > > Hey if some one sends me the algorithm and some basic flow structure > guidelines for the things used in prime95 > > will try and create new faster program. > dont expect any results any time soon. will be using different language > > this is just something for me to try but i dont have the info(algorithms, > etc). Something that we _need_ but don't have is a port of Woltman's optimized FFTs to Alpha machine language instead of 386/pentium machine language The floating point units use a different length data buffer, so some tinkering will be required beyond an opcode-to-opcode porting >> What files on the source.zip collection is this? ------------------------------ From: SpoolDog Date: Mon, 11 May 1998 20:35:56 EDT Subject: Mersenne: the 4-point star hey say you have a star and label all the points on it all the points 4 in a row have to equal up to 22 you have to use the numbers 1-10 once to make all rows equal 22 * * * * * bad sketch but you can draw your own. * * label each * say with a letter a,b,c,d,e... * * * i cannot get a result attached is a BASIC program i made i need a review of this(its a simple program witha ton of if then crap) have not found any better way of doing it (meaing no better results) going to put it into faster language. ------------------------------ From: Will Edgington Date: Mon, 11 May 1998 17:50:44 -0700 Subject: Re: Mersenne: program(alpha) SpoolDog writes: Something that we _need_ but don't have is a port of Woltman's optimized FFTs to Alpha machine language instead of 386/pentium machine language The floating point units use a different length data buffer, so some tinkering will be required beyond an opcode-to-opcode porting >> What files on the source.zip collection is this? First, note that there is already a quite fast program for Alphas that Ernst Mayer wrote; you can get it from his web pages, including the Fortran90 source code, the binaries, and the Fortran90 run time libraries. Also, there is a fairly fast C program in the mers package called MacLucasUNIX and a file called Wisdom with some notes about it. Both are linked from my mersenne page. If you look at those and still feel you can write a faster program, go for it, but the Fortran90 and C sources and the Wisdom file will likely still be useful to you. Will http://www.garlic.com/~wedgingt/mersenne.html ------------------------------ From: SpoolDog Date: Mon, 11 May 1998 21:04:57 EDT Subject: Mersenne: delete my last email my fault ------------------------------ From: Will Edgington Date: Mon, 11 May 1998 18:13:35 -0700 Subject: Re: Mersenne: program Nick Craig-Wood writes: Will Edgington wrote: > Using a 32 bit mantissa (note: _not_ an IEEE float, which puts the > exponent, its sign, _and_ the mantissa within 32 bits) would reduce the > FFT word length to about half the current length within a 64 bit mantissa > for the Intel CPUs, meaning the FFT run length would have to be doubled. I don't think thats quite right. Yes and no; I was purposely simplifying/erring toward 32 bit mantissas since I knew the comparison would still favor the 64 bit instructions. The current implementation stores about 17 bits of number per double. For the result you need 2 * 17 bits (because you are squaring) + 18 bits carry (one for each stage of the 2^18 = 256k length transform) which is 52 bits. That's how many bits you can fit in a double (which has exponent and sign in the 64 bits as well) plus some round off bits. In a 64 bit (IEEE) double, yes, but Intel CPUs have a larger floating point type that has 64 bits of mantissa within a 96 bit "long double". I'm pretty sure that George Woltman's programs use this long double type; I know that the mers package LL testers use it when they can. For a 32 bit mantissa you would need a transform length of 2^20 and you would be putting 6 bits of number in each 32 bit mantissa. I think you would need a few bits for round off errors which would push your transform length up to 2^22 or more making it totally impractical. I guessed that there was probably an effect like this, but didn't know for sure. Will ------------------------------ From: JGrif91665 Date: Mon, 11 May 1998 21:18:27 EDT Subject: Mersenne: No more mail Please, do not send any more e-mail to parmstrong or gdennis @ntcc.cc.tx.us Thank you James ------------------------------ From: "John R Pierce" Date: Mon, 11 May 1998 20:01:12 -0700 Subject: Re: Mersenne: program >In a 64 bit (IEEE) double, yes, but Intel CPUs have a larger floating >point type that has 64 bits of mantissa within a 96 bit "long double". >I'm pretty sure that George Woltman's programs use this long double >type; I know that the mers package LL testers use it when they can. Actually, its 80 bits for the long double format aka FP temp. 64 bits of mantissa, and 16 bits of characteristic. The catch is the loads and stores of these are fairly expensive as they don't align on cache boundaries. ALL x86 FPU calculations are actually performed on long doubles, its just that the normal FPLOAD and FPSTORE instructions convert these to the standard IEEE double format of 48 bit mantissa with 16 bit exponents, effectively treating the extra bits as guard digits. - -jrp ------------------------------ From: Brian Cranton Date: Tue, 12 May 1998 00:59:38 -0400 Subject: RE: Mersenne: the 4-point star This post seems a bit off topic, but a tad interesting. The way I approach these things is to first make sure there isn't a simple way to prove it can't be done, and then simply it so you can limit the brute force search. In a way that's a bit related to Mersenne primes (I guess) since one of the reasons for restricting the search to Mersenne primes is they have special properties which makes for easier primality tests than your normal prime numbers. In GIMPS, we have the "first pass" technique to try and eliminate numbers from the search right away, and then if that fails we do a more in-depth search that is relatively quick because we chose only to search the set of possible Mersenne primes, which are easier to test for primality than your run of the mill whole number. If the star is represented as: ........A........ ..B...C...D...E.. ....F.......G.... ........H........ .I............J.. You can mathematically represent the conditions of your problem as: (1) A + C + F + I = 22 (2) A + D + G + J = 22 (3) B + C + D + E = 22 (4) B + F + H + J = 22 (5) E + G + H + I = 22 Now add those five conditions together to get: (6) 2A+2B+2C+2D+2E+2F+2G+2H+2I+2J=110 Note the significance here in that each numbers appears in two and only two of the five "condition" constraints (equations 1 to 5). The above sum, equation 6, can be simplified to: (7) A+B+C+D+E+F+G+H+I+J=55 Now, on the surface this seems okay, since the sum of the numbers between one and ten is indeed 55. To go back to the original conditions, there are only 17 combinations of numbers between one and ten which will produce a sum of 22: (8) 10 9 2 1 (9) 10 8 3 1 (10) 10 7 3 2 (11) 10 7 4 1 (12) 10 6 5 1 (13) 10 6 4 2 (14) 10 5 4 3 (15) 9 8 4 1 (16) 9 8 3 2 (17) 9 7 5 1 (18) 9 7 4 2 (19) 9 6 5 2 (20) 9 6 4 3 (21) 8 7 6 1 (22) 8 7 5 2 (23) 8 7 4 3 (24) 7 6 5 4 So to solve your problem, we need to pick five of the above 17 which will satisfy the constraint conditions outlined in equations 1 to 5. In addition, you can reduce the problem by realizing that each number can appear only appear twice so, for example, of the solutions 8 to 14 above only two of them are in the final solution while three of the line solutions from 15 to 24 must be in the final solution. I get this by reasoning that the number 10 can only appear twice in the final solution. You can keep working this logic out further and further to keep restricting the set of valid solutions. Now, I wrote a C program to do this brute force and I went through it logically and my conclusion is I can't find a valid solution. So I think your BASIC program may have been giving you the correct response when it told you that the solution set for this problem is indeed NULL. Brian Cranton bcranton@ultranet.com - ---------- From: SpoolDog[SMTP:SpoolDog@aol.com] Sent: Monday, May 11, 1998 8:35 PM To: mersenne@base.com Subject: Mersenne: the 4-point star hey say you have a star and label all the points on it all the points 4 in a row have to equal up to 22 you have to use the numbers 1-10 once to make all rows equal 22 * * * * * bad sketch but you can draw your own. * * label each * say with a letter a,b,c,d,e... * * * i cannot get a result attached is a BASIC program i made i need a review of this(its a simple program witha ton of if then crap) have not found any better way of doing it (meaing no better results) going to put it into faster language. ------------------------------ From: "Scott Kurowski" Date: Tue, 12 May 1998 00:25:04 -0700 Subject: Mersenne: PrimeNet Server 3.1 Factoring CPU Time Adjustment Hi everyone, Between 29 April 05:00 UTC, and 11 May 21:14 UTC, PrimeNet's factoring CPU time credit was low by about 90%. I made an error in the 3.1 changes. For each of 3806 factoring checkins during this window, an exact adjustment, about 9x the original result CPU time, has been added to affected user accounts. A list of the adjustments can be downloaded through 18 May from http://entropia.com/ips/factor_cpu_fix1.txt (250k). Regards, - -scott ------------------------------ End of Mersenne Digest V1 #358 ******************************