Factorization of "big" numbers #159

s-celles · 2024-12-11T10:53:40Z

Hello,

I'm trying to factor a quite big number 632459103267572196107100983820469021721602147490918660274601
I tried

using Primes
factor(632459103267572196107100983820469021721602147490918660274601)

but its hanging

https://factordb.com/ https://www.alpertron.com.ar/ECM.HTM or https://sagecell.sagemath.org/ return result in a few second.

How to proceed to achieve such quick proceeding using Julia.

Thanks

PS: seems to be duplicate of #49

oscardssmith · 2024-12-11T14:23:55Z

Can you try with #104? I think it should be a good bit faster, but I never finished it up.

s-celles · 2024-12-11T15:45:33Z

Is it possible to proceed directly with package manager mode?

(@v1.11) pkg> dev https://github.com/JuliaMath/Primes.jl/pull/104
     Cloning git-repo `https://github.com/JuliaMath/Primes.jl/pull/104`
ERROR: failed to clone from https://github.com/JuliaMath/Primes.jl/pull/104, error: GitError(Code:ERROR, Class:HTTP, request failed with status code: 404)

(@v1.11) pkg> dev https://github.com/oscardssmith/Primes.jl/tree/Lenstra-elliptic-curve
ERROR: rev argument not supported by `develop`; consider using `add` instead

oscardssmith · 2024-12-11T16:44:49Z

you need ]add https://github.com/oscardssmith/Primes.jl.git#Lenstra-elliptic-curve

oscardssmith · 2024-12-11T16:47:08Z

ah. Looks like the branch isn't currently working. that's somewhat unfortunate.

oscardssmith · 2024-12-11T18:47:00Z

The branch is now working. However, ECM isn't up to the task here Looks like to solve this quickly we would need a quadratic sieve implementation. (for comparison, your number is has both factors at roughly 2^100)

julia> @time factor(nextprime(big(2)^62)*nextprime(big(2)^102, 2))
  8.508423 seconds (107.79 M allocations: 2.661 GiB, 11.26% gc time)
4611686018427388039 * 5070602400912917605986812821829

julia> @time factor(nextprime(big(2)^72)*nextprime(big(2)^102, 2))
 61.739692 seconds (788.39 M allocations: 19.518 GiB, 11.07% gc time)
4722366482869645213711 * 5070602400912917605986812821829

s-celles · 2024-12-12T07:59:12Z

It's hanging.
I wonder if we shouldn't have a factor function which is using multiple dispatch according to factorization method that user wants to use. I think it will make the code clearer.

Here is an example (LLM code)

using Primes
using Combinatorics

# Abstract type for different factorization strategies
abstract type FactorizationMethod end

# Concrete types for each factorization method
struct TrialDivision <: FactorizationMethod end
struct PollardRho <: FactorizationMethod end
struct PollardPMinusOne <: FactorizationMethod end
struct FermatMethod <: FactorizationMethod end
struct ECM <: FactorizationMethod end
struct QuadraticSieve <: FactorizationMethod end

# ECPoint struct at top level
mutable struct ECPoint
    x::BigInt
    y::BigInt
    a::BigInt
end

# Default method using Trial Division
function factor(n::Integer)
    return factor(TrialDivision(), n)
end

# Trial Division method
function factor(::TrialDivision, n::Integer)
    factors = Pair{Int,Int}[]
    n <= 1 && return factors
    
    count = 0
    while iseven(n)
        count += 1
        n = div(n, 2)
    end
    count > 0 && push!(factors, 2 => count)
    
    for d in 3:2:isqrt(n)
        count = 0
        while n % d == 0
            count += 1
            n = div(n, d)
        end
        count > 0 && push!(factors, d => count)
    end
    
    n > 1 && push!(factors, n => 1)
    return factors
end

# Pollard's Rho method
function factor(::PollardRho, n::Integer)
    function g(x, n)
        return (powermod(x, 2, n) + 1) % n
    end
    
    function find_factor(n)
        if n == 1 return 1 end
        if isprime(n) return n end
        
        x, y, d = 2, 2, 1
        while d == 1
            x = g(x, n)
            y = g(g(y, n), n)
            d = gcd(abs(x - y), n)
        end
        return d
    end
    
    factors = Pair{Int,Int}[]
    
    function find_all_factors(n)
        if n == 1 return end
        if isprime(n)
            # Update count for this prime factor
            idx = findfirst(p -> p.first == n, factors)
            if idx === nothing
                push!(factors, n => 1)
            else
                factors[idx] = n => (factors[idx].second + 1)
            end
            return
        end
        
        d = find_factor(n)
        if d == n  # If failed to find factor, try trial division
            d = minimum(p for p in 2:isqrt(n) if n % p == 0)
        end
        
        find_all_factors(d)
        find_all_factors(div(n, d))
    end
    
    find_all_factors(n)
    return sort(factors)
end

# Pollard's P-1 method
function factor(::PollardPMinusOne, n::Integer, B::Integer=1000)
    factors = Pair{Int,Int}[]
    
    function find_factor(n, B)
        a = BigInt(2)
        for j in 2:B
            a = powermod(a, j, n)
        end
        return gcd(a - 1, n)
    end
    
    function find_all_factors(n, B)
        if n == 1 return end
        if isprime(n)
            idx = findfirst(p -> p.first == n, factors)
            if idx === nothing
                push!(factors, n => 1)
            else
                factors[idx] = n => (factors[idx].second + 1)
            end
            return
        end
        
        d = find_factor(n, B)
        if d == 1 || d == n
            # If method fails, fall back to trial division
            d = minimum(p for p in 2:isqrt(n) if n % p == 0)
        end
        
        find_all_factors(d, B)
        find_all_factors(div(n, d), B)
    end
    
    find_all_factors(n, B)
    return sort(factors)
end

# Fermat method
function factor(::FermatMethod, n::Integer)
    factors = Pair{Int,Int}[]
    iseven(n) && return factor(TrialDivision(), n)
    
    a = ceil(Int, sqrt(n))
    b2 = a^2 - n
    
    # Fixed the condition: removed incorrect use of !
    while !(isqrt(b2)^2 == b2)
        a += 1
        b2 = a^2 - n
    end
    
    b = isqrt(b2)
    p = a - b
    q = a + b
    
    if p != 1 && q != 1
        push!(factors, p => 1)
        p != q && push!(factors, q => 1)
    end
    
    return sort(factors)
end


# ECM helper functions
function ec_add(P::ECPoint, Q::ECPoint, n)
    if P.x == Q.x && P.y == Q.y
        if P.y == 0
            return ECPoint(BigInt(0), BigInt(1), P.a)
        end
        λ = (3 * P.x^2 + P.a) * invmod(2 * P.y, n) % n
    else
        λ = (Q.y - P.y) * invmod(Q.x - P.x, n) % n
    end
    
    x3 = (λ^2 - P.x - Q.x) % n
    y3 = (λ * (P.x - x3) - P.y) % n
    
    return ECPoint(x3, y3, P.a)
end

function ec_multiply(k::Integer, P::ECPoint, n)
    R = ECPoint(P.x, P.y, P.a)
    Q = ECPoint(BigInt(0), BigInt(1), P.a)
    
    while k > 0
        if k & 1 == 1
            Q = ec_add(Q, R, n)
        end
        R = ec_add(R, R, n)
        k >>= 1
    end
    
    return Q
end

# ECM method
"""
    factor(::ECM, n::Union{Integer,BigInt}, B1::Integer=100, B2::Integer=1000)

Elliptic Curve Method (ECM) for integer factorization with BigInt support.
"""
function factor(::ECM, n::Union{Integer,BigInt}, B1::Integer=100, B2::Integer=1000)
    factors = Pair{BigInt,Int}[]
    if n <= 1 return factors end
    if isprime(n)
        push!(factors, BigInt(n) => 1)
        return factors
    end
    
    function find_factor(n::BigInt, B1::Integer, B2::Integer)
        if n <= 1 return BigInt(1) end
        if isprime(n) return n end
        
        # Convert bounds to Int64 safely
        B1_safe = Int64(min(B1, typemax(Int32)))
        B2_safe = Int64(min(B2, typemax(Int32)))
        
        for curve in 1:20
            try
                # Random starting point and curve parameter
                x = BigInt(rand(1:n-1))
                y = BigInt(rand(1:n-1))
                a = BigInt(rand(1:n-1))
                P = ECPoint(x, y, a)
                
                # Stage 1
                Q = P
                for p in primes(B1_safe)
                    e = floor(Int, log(B1) / log(p))
                    Q = ec_multiply(p^e, Q, n)
                end
                
                # Stage 2
                for p in primes(B1_safe, B2_safe)
                    try
                        Q = ec_multiply(p, Q, n)
                    catch e
                        if isa(e, DomainError)
                            d = gcd(e.val, n)
                            if 1 < d < n
                                return d
                            end
                        end
                    end
                end
            catch
                continue
            end
        end
        
        # If ECM fails, fall back to finding a small factor
        small_limit = Int64(min(isqrt(n), 10^6))
        for p in primes(small_limit)
            if n % p == 0
                return BigInt(p)
            end
        end
        return n
    end
    
    function find_all_factors(n::BigInt)
        if n == 1 return end
        if isprime(n)
            idx = findfirst(p -> p.first == n, factors)
            if idx === nothing
                push!(factors, n => 1)
            else
                factors[idx] = n => (factors[idx].second + 1)
            end
            return
        end
        
        d = find_factor(n, B1, B2)
        if d == n  # If failed to find a proper factor
            # Try increasing bounds
            d = find_factor(n, B1 * 2, B2 * 2)
            if d == n  # If still failed, try larger bounds
                d = find_factor(n, B1 * 10, B2 * 10)
            end
        end
        
        find_all_factors(d)
        find_all_factors(div(n, d))
    end
    
    find_all_factors(BigInt(n))
    return sort(factors)
end

# Quadratic Sieve method
"""
    factor(::QuadraticSieve, n::Union{Integer,BigInt})

Improved Quadratic Sieve implementation with BigInt support.
"""
function factor(::QuadraticSieve, n::Union{Integer,BigInt})
    n = BigInt(n)  # Ensure n is BigInt
    factors = Pair{BigInt,Int}[]
    
    # For small numbers, use trial division
    if n < 10^6
        trial_factors = factor(TrialDivision(), n)
        return [BigInt(p) => e for (p, e) in trial_factors]
    end
    
    # For prime numbers, return immediately
    if isprime(n)
        return [n => 1]
    end
    
    # Calculate factor base size and limit
    digits = ndigits(n)
    base_size = min(digits * 2, 50)
    
    # Safe limit calculation for large numbers
    log_n = log(BigFloat(n))
    log_log_n = log(log_n)
    limit_big = exp(sqrt(log_n * log_log_n))
    
    # Convert to Int64 safely
    limit = Int64(min(
        floor(limit_big),
        typemax(Int32),  # Safe upper bound for primes()
        isqrt(n)
    ))
    
    # Generate factor base with safe integer handling
    factor_base = Int64[]
    for p in primes(limit)
        if powermod(n, (p-1)÷2, p) == 1
            push!(factor_base, p)
            length(factor_base) >= base_size && break
        end
    end
    
    # Find relations with BigInt arithmetic
    relations = Vector{Tuple{BigInt,Vector{Int}}}()
    x = isqrt(n)
    num_relations = length(factor_base) + 5
    max_tries = 1000
    tries = 0
    
    while length(relations) < num_relations && tries < max_tries
        tries += 1
        x += 1
        q = x^2 - n
        q < 0 && continue
        
        temp_q = abs(q)
        exponents = zeros(Int, length(factor_base))
        fully_factored = true
        
        for (i, p) in enumerate(factor_base)
            while temp_q % p == 0
                exponents[i] += 1
                temp_q = div(temp_q, p)
            end
            if temp_q == 1
                break
            end
        end
        
        if temp_q == 1
            push!(relations, (x, exponents))
        end
    end
    
    # If not enough relations found, try ECM with larger bounds
    if length(relations) < num_relations
        @warn "Not enough relations found, switching to ECM with large bounds"
        return factor(ECM(), n, 10^4, 10^5)
    end
    
    # Try to find factors using combinations
    max_subset_size = min(length(relations), 10)
    
    for size in 1:max_subset_size
        for subset in combinations(1:length(relations), size)
            combined_exponents = zeros(Int, length(factor_base))
            x_prod = BigInt(1)
            
            for i in subset
                x_prod = (x_prod * relations[i][1]) % n
                combined_exponents .+= relations[i][2]
            end
            
            if all(iseven, combined_exponents)
                y = BigInt(1)
                for (i, e) in enumerate(combined_exponents)
                    y = (y * powermod(BigInt(factor_base[i]), e ÷ 2, n)) % n
                end
                
                d = gcd(x_prod - y, n)
                if 1 < d < n
                    count = 0
                    temp_n = n
                    while temp_n % d == 0
                        count += 1
                        temp_n = div(temp_n, d)
                    end
                    push!(factors, d => count)
                    
                    if temp_n > 1
                        remaining_factors = factor(QuadraticSieve(), temp_n)
                        append!(factors, remaining_factors)
                    end
                    return sort(factors)
                end
            end
        end
    end
    
    # If no factors found, try ECM with larger bounds
    @warn "No factors found, switching to ECM with large bounds"
    return factor(ECM(), n, 10^4, 10^5)
end

# Testing function
function test_methods(n::Integer, methods::Vector{FactorizationMethod})
    println("Number to factorize: ", n)
        
    for method in methods
        println("\nMethod: ", typeof(method))
        try
            @time result = factor(method, n)
            println("Factors found: ", result)
            
            # Verification
            product = prod([p^e for (p, e) in result])
            println("Verification: ", product == n ? "correct" : "incorrect")
        catch e
            println("Error: ", e)
            println("Backtrace:")
            for (exc, bt) in Base.catch_stack()
                showerror(stdout, exc, bt)
                println()
            end
        end
    end
end

# Main function
function main()
    n = 2021
    println("\nTesting with n = ", n)
    methods = [
        TrialDivision(),
        PollardRho(),
        PollardPMinusOne(),
        FermatMethod(),
        ECM(),
        QuadraticSieve()
    ]
    test_methods(n, methods)

    # n = 1234567891  # Large prime number (FermatMethod is failing)
    # methods = [
    #     TrialDivision(),
    #     PollardRho(),
    #     PollardPMinusOne(),
    #     FermatMethod(),
    #     ECM(),
    #     QuadraticSieve()
    # ]
    # test_methods(n, methods)

    n = 632459103267572196107100983820469021721602147490918660274601
    methods = FactorizationMethod[
        QuadraticSieve()
    ]
    test_methods(n, methods)

end

main()

Unfortunately this code doesn't help factorizing 632459103267572196107100983820469021721602147490918660274601 in a decent duration

Maybe it could be worth looking at SageMath code to see how they proceed so quickly?

oscardssmith · 2024-12-12T14:10:55Z

the primary reason this doesn't help is that the QuadraticSieve it implements isn't actually a Quadratic Sieve implimentation. Specifically, it seems to be trying to do the gausian elimination by trial and error rather than anything sensible.

oscardssmith · 2024-12-13T06:22:30Z

unfortunately, I've now been nerd snipped into implementing a quadratic sieve... By now I have a very mediocre version of Dixon's method, and tomorrow evening I'll try to implement the sieving to make it fast.

s-celles · 2024-12-13T07:42:54Z

What is your opinion about API implementation which make use of multiple dispatch ie factor accepting 2 arguments : one for integer number to factor and an other one about factorization method?

Did you looked how SageMath is implementing factor?
https://github.com/search?q=repo%3Asagemath%2Fsage%20factor&type=code

oscardssmith · 2024-12-13T13:08:04Z

I don't think a dispatch API makes much sense here. we know what the right algorithms are, we just have to implement them.

oscardssmith · 2024-12-15T06:22:43Z

#161 is a first draft. It definitely could use some cleanup, but it's already pretty darn fast for semi-primes (not yet at the level of the well optimized sieves, but way way way better than lenstra for this sort of case.

s-celles · 2024-12-16T08:05:02Z

Hi @oscardssmith

I run out of memory

(@v1.11) pkg> add https://github.com/oscardssmith/Primes.jl.git#os/quadratic-sieve
    Updating git-repo `https://github.com/oscardssmith/Primes.jl.git`
    Updating registry at `C:\Users\scelles\.julia\registries\General.toml`
(...)
   Resolving package versions...
    Updating `C:\Users\scelles\.julia\environments\v1.11\Project.toml`
  [27ebfcd6] + Primes v0.5.6 `https://github.com/oscardssmith/Primes.jl.git#os/quadratic-sieve`
    Updating `C:\Users\scelles\.julia\environments\v1.11\Manifest.toml`
  [18e54dd8] + IntegerMathUtils v0.1.2
  [27ebfcd6] + Primes v0.5.6 `https://github.com/oscardssmith/Primes.jl.git#os/quadratic-sieve`
Precompiling project...
  9 dependencies successfully precompiled in 120 seconds. 446 already precompiled.

julia> using Primes

julia> factor(632459103267572196107100983820469021721602147490918660274601)
ERROR: OutOfMemoryError()

Quadratic Sieve implementation in CrypTool 2 (C#) returns result in less than a second.

Code is available at
https://github.com/CrypToolProject/CrypTool-2/tree/main/CrypPlugins/QuadraticSieve

Hope that can help!

Best regards

s-celles changed the title ~~Factorisation of "big" numbers~~ Factorization of "big" numbers Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Factorization of "big" numbers #159

Factorization of "big" numbers #159

s-celles commented Dec 11, 2024 •

edited

Loading

oscardssmith commented Dec 11, 2024

s-celles commented Dec 11, 2024

oscardssmith commented Dec 11, 2024

oscardssmith commented Dec 11, 2024

oscardssmith commented Dec 11, 2024

s-celles commented Dec 12, 2024

oscardssmith commented Dec 12, 2024

oscardssmith commented Dec 13, 2024

s-celles commented Dec 13, 2024

oscardssmith commented Dec 13, 2024

oscardssmith commented Dec 15, 2024

s-celles commented Dec 16, 2024

Factorization of "big" numbers #159

Factorization of "big" numbers #159

Comments

s-celles commented Dec 11, 2024 • edited Loading

oscardssmith commented Dec 11, 2024

s-celles commented Dec 11, 2024

oscardssmith commented Dec 11, 2024

oscardssmith commented Dec 11, 2024

oscardssmith commented Dec 11, 2024

s-celles commented Dec 12, 2024

oscardssmith commented Dec 12, 2024

oscardssmith commented Dec 13, 2024

s-celles commented Dec 13, 2024

oscardssmith commented Dec 13, 2024

oscardssmith commented Dec 15, 2024

s-celles commented Dec 16, 2024

s-celles commented Dec 11, 2024 •

edited

Loading