Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factorization of "big" numbers #159

Open
s-celles opened this issue Dec 11, 2024 · 12 comments
Open

Factorization of "big" numbers #159

s-celles opened this issue Dec 11, 2024 · 12 comments

Comments

@s-celles
Copy link

s-celles commented Dec 11, 2024

Hello,

I'm trying to factor a quite big number 632459103267572196107100983820469021721602147490918660274601
I tried

using Primes
factor(632459103267572196107100983820469021721602147490918660274601)

but its hanging

https://factordb.com/ https://www.alpertron.com.ar/ECM.HTM or https://sagecell.sagemath.org/ return result in a few second.

How to proceed to achieve such quick proceeding using Julia.

Thanks

PS: seems to be duplicate of #49

@oscardssmith
Copy link
Member

Can you try with #104? I think it should be a good bit faster, but I never finished it up.

@s-celles
Copy link
Author

Is it possible to proceed directly with package manager mode?

(@v1.11) pkg> dev https://github.com/JuliaMath/Primes.jl/pull/104
     Cloning git-repo `https://github.com/JuliaMath/Primes.jl/pull/104`
ERROR: failed to clone from https://github.com/JuliaMath/Primes.jl/pull/104, error: GitError(Code:ERROR, Class:HTTP, request failed with status code: 404)

(@v1.11) pkg> dev https://github.com/oscardssmith/Primes.jl/tree/Lenstra-elliptic-curve
ERROR: rev argument not supported by `develop`; consider using `add` instead

@oscardssmith
Copy link
Member

you need ]add https://github.com/oscardssmith/Primes.jl.git#Lenstra-elliptic-curve

@oscardssmith
Copy link
Member

ah. Looks like the branch isn't currently working. that's somewhat unfortunate.

@oscardssmith
Copy link
Member

The branch is now working. However, ECM isn't up to the task here Looks like to solve this quickly we would need a quadratic sieve implementation. (for comparison, your number is has both factors at roughly 2^100)

julia> @time factor(nextprime(big(2)^62)*nextprime(big(2)^102, 2))
  8.508423 seconds (107.79 M allocations: 2.661 GiB, 11.26% gc time)
4611686018427388039 * 5070602400912917605986812821829

julia> @time factor(nextprime(big(2)^72)*nextprime(big(2)^102, 2))
 61.739692 seconds (788.39 M allocations: 19.518 GiB, 11.07% gc time)
4722366482869645213711 * 5070602400912917605986812821829

@s-celles
Copy link
Author

It's hanging.
I wonder if we shouldn't have a factor function which is using multiple dispatch according to factorization method that user wants to use. I think it will make the code clearer.

Here is an example (LLM code)

using Primes
using Combinatorics

# Abstract type for different factorization strategies
abstract type FactorizationMethod end

# Concrete types for each factorization method
struct TrialDivision <: FactorizationMethod end
struct PollardRho <: FactorizationMethod end
struct PollardPMinusOne <: FactorizationMethod end
struct FermatMethod <: FactorizationMethod end
struct ECM <: FactorizationMethod end
struct QuadraticSieve <: FactorizationMethod end

# ECPoint struct at top level
mutable struct ECPoint
    x::BigInt
    y::BigInt
    a::BigInt
end

# Default method using Trial Division
function factor(n::Integer)
    return factor(TrialDivision(), n)
end

# Trial Division method
function factor(::TrialDivision, n::Integer)
    factors = Pair{Int,Int}[]
    n <= 1 && return factors
    
    count = 0
    while iseven(n)
        count += 1
        n = div(n, 2)
    end
    count > 0 && push!(factors, 2 => count)
    
    for d in 3:2:isqrt(n)
        count = 0
        while n % d == 0
            count += 1
            n = div(n, d)
        end
        count > 0 && push!(factors, d => count)
    end
    
    n > 1 && push!(factors, n => 1)
    return factors
end

# Pollard's Rho method
function factor(::PollardRho, n::Integer)
    function g(x, n)
        return (powermod(x, 2, n) + 1) % n
    end
    
    function find_factor(n)
        if n == 1 return 1 end
        if isprime(n) return n end
        
        x, y, d = 2, 2, 1
        while d == 1
            x = g(x, n)
            y = g(g(y, n), n)
            d = gcd(abs(x - y), n)
        end
        return d
    end
    
    factors = Pair{Int,Int}[]
    
    function find_all_factors(n)
        if n == 1 return end
        if isprime(n)
            # Update count for this prime factor
            idx = findfirst(p -> p.first == n, factors)
            if idx === nothing
                push!(factors, n => 1)
            else
                factors[idx] = n => (factors[idx].second + 1)
            end
            return
        end
        
        d = find_factor(n)
        if d == n  # If failed to find factor, try trial division
            d = minimum(p for p in 2:isqrt(n) if n % p == 0)
        end
        
        find_all_factors(d)
        find_all_factors(div(n, d))
    end
    
    find_all_factors(n)
    return sort(factors)
end

# Pollard's P-1 method
function factor(::PollardPMinusOne, n::Integer, B::Integer=1000)
    factors = Pair{Int,Int}[]
    
    function find_factor(n, B)
        a = BigInt(2)
        for j in 2:B
            a = powermod(a, j, n)
        end
        return gcd(a - 1, n)
    end
    
    function find_all_factors(n, B)
        if n == 1 return end
        if isprime(n)
            idx = findfirst(p -> p.first == n, factors)
            if idx === nothing
                push!(factors, n => 1)
            else
                factors[idx] = n => (factors[idx].second + 1)
            end
            return
        end
        
        d = find_factor(n, B)
        if d == 1 || d == n
            # If method fails, fall back to trial division
            d = minimum(p for p in 2:isqrt(n) if n % p == 0)
        end
        
        find_all_factors(d, B)
        find_all_factors(div(n, d), B)
    end
    
    find_all_factors(n, B)
    return sort(factors)
end

# Fermat method
function factor(::FermatMethod, n::Integer)
    factors = Pair{Int,Int}[]
    iseven(n) && return factor(TrialDivision(), n)
    
    a = ceil(Int, sqrt(n))
    b2 = a^2 - n
    
    # Fixed the condition: removed incorrect use of !
    while !(isqrt(b2)^2 == b2)
        a += 1
        b2 = a^2 - n
    end
    
    b = isqrt(b2)
    p = a - b
    q = a + b
    
    if p != 1 && q != 1
        push!(factors, p => 1)
        p != q && push!(factors, q => 1)
    end
    
    return sort(factors)
end


# ECM helper functions
function ec_add(P::ECPoint, Q::ECPoint, n)
    if P.x == Q.x && P.y == Q.y
        if P.y == 0
            return ECPoint(BigInt(0), BigInt(1), P.a)
        end
        λ = (3 * P.x^2 + P.a) * invmod(2 * P.y, n) % n
    else
        λ = (Q.y - P.y) * invmod(Q.x - P.x, n) % n
    end
    
    x3 =^2 - P.x - Q.x) % n
    y3 =* (P.x - x3) - P.y) % n
    
    return ECPoint(x3, y3, P.a)
end

function ec_multiply(k::Integer, P::ECPoint, n)
    R = ECPoint(P.x, P.y, P.a)
    Q = ECPoint(BigInt(0), BigInt(1), P.a)
    
    while k > 0
        if k & 1 == 1
            Q = ec_add(Q, R, n)
        end
        R = ec_add(R, R, n)
        k >>= 1
    end
    
    return Q
end

# ECM method
"""
    factor(::ECM, n::Union{Integer,BigInt}, B1::Integer=100, B2::Integer=1000)

Elliptic Curve Method (ECM) for integer factorization with BigInt support.
"""
function factor(::ECM, n::Union{Integer,BigInt}, B1::Integer=100, B2::Integer=1000)
    factors = Pair{BigInt,Int}[]
    if n <= 1 return factors end
    if isprime(n)
        push!(factors, BigInt(n) => 1)
        return factors
    end
    
    function find_factor(n::BigInt, B1::Integer, B2::Integer)
        if n <= 1 return BigInt(1) end
        if isprime(n) return n end
        
        # Convert bounds to Int64 safely
        B1_safe = Int64(min(B1, typemax(Int32)))
        B2_safe = Int64(min(B2, typemax(Int32)))
        
        for curve in 1:20
            try
                # Random starting point and curve parameter
                x = BigInt(rand(1:n-1))
                y = BigInt(rand(1:n-1))
                a = BigInt(rand(1:n-1))
                P = ECPoint(x, y, a)
                
                # Stage 1
                Q = P
                for p in primes(B1_safe)
                    e = floor(Int, log(B1) / log(p))
                    Q = ec_multiply(p^e, Q, n)
                end
                
                # Stage 2
                for p in primes(B1_safe, B2_safe)
                    try
                        Q = ec_multiply(p, Q, n)
                    catch e
                        if isa(e, DomainError)
                            d = gcd(e.val, n)
                            if 1 < d < n
                                return d
                            end
                        end
                    end
                end
            catch
                continue
            end
        end
        
        # If ECM fails, fall back to finding a small factor
        small_limit = Int64(min(isqrt(n), 10^6))
        for p in primes(small_limit)
            if n % p == 0
                return BigInt(p)
            end
        end
        return n
    end
    
    function find_all_factors(n::BigInt)
        if n == 1 return end
        if isprime(n)
            idx = findfirst(p -> p.first == n, factors)
            if idx === nothing
                push!(factors, n => 1)
            else
                factors[idx] = n => (factors[idx].second + 1)
            end
            return
        end
        
        d = find_factor(n, B1, B2)
        if d == n  # If failed to find a proper factor
            # Try increasing bounds
            d = find_factor(n, B1 * 2, B2 * 2)
            if d == n  # If still failed, try larger bounds
                d = find_factor(n, B1 * 10, B2 * 10)
            end
        end
        
        find_all_factors(d)
        find_all_factors(div(n, d))
    end
    
    find_all_factors(BigInt(n))
    return sort(factors)
end

# Quadratic Sieve method
"""
    factor(::QuadraticSieve, n::Union{Integer,BigInt})

Improved Quadratic Sieve implementation with BigInt support.
"""
function factor(::QuadraticSieve, n::Union{Integer,BigInt})
    n = BigInt(n)  # Ensure n is BigInt
    factors = Pair{BigInt,Int}[]
    
    # For small numbers, use trial division
    if n < 10^6
        trial_factors = factor(TrialDivision(), n)
        return [BigInt(p) => e for (p, e) in trial_factors]
    end
    
    # For prime numbers, return immediately
    if isprime(n)
        return [n => 1]
    end
    
    # Calculate factor base size and limit
    digits = ndigits(n)
    base_size = min(digits * 2, 50)
    
    # Safe limit calculation for large numbers
    log_n = log(BigFloat(n))
    log_log_n = log(log_n)
    limit_big = exp(sqrt(log_n * log_log_n))
    
    # Convert to Int64 safely
    limit = Int64(min(
        floor(limit_big),
        typemax(Int32),  # Safe upper bound for primes()
        isqrt(n)
    ))
    
    # Generate factor base with safe integer handling
    factor_base = Int64[]
    for p in primes(limit)
        if powermod(n, (p-1)÷2, p) == 1
            push!(factor_base, p)
            length(factor_base) >= base_size && break
        end
    end
    
    # Find relations with BigInt arithmetic
    relations = Vector{Tuple{BigInt,Vector{Int}}}()
    x = isqrt(n)
    num_relations = length(factor_base) + 5
    max_tries = 1000
    tries = 0
    
    while length(relations) < num_relations && tries < max_tries
        tries += 1
        x += 1
        q = x^2 - n
        q < 0 && continue
        
        temp_q = abs(q)
        exponents = zeros(Int, length(factor_base))
        fully_factored = true
        
        for (i, p) in enumerate(factor_base)
            while temp_q % p == 0
                exponents[i] += 1
                temp_q = div(temp_q, p)
            end
            if temp_q == 1
                break
            end
        end
        
        if temp_q == 1
            push!(relations, (x, exponents))
        end
    end
    
    # If not enough relations found, try ECM with larger bounds
    if length(relations) < num_relations
        @warn "Not enough relations found, switching to ECM with large bounds"
        return factor(ECM(), n, 10^4, 10^5)
    end
    
    # Try to find factors using combinations
    max_subset_size = min(length(relations), 10)
    
    for size in 1:max_subset_size
        for subset in combinations(1:length(relations), size)
            combined_exponents = zeros(Int, length(factor_base))
            x_prod = BigInt(1)
            
            for i in subset
                x_prod = (x_prod * relations[i][1]) % n
                combined_exponents .+= relations[i][2]
            end
            
            if all(iseven, combined_exponents)
                y = BigInt(1)
                for (i, e) in enumerate(combined_exponents)
                    y = (y * powermod(BigInt(factor_base[i]), e ÷ 2, n)) % n
                end
                
                d = gcd(x_prod - y, n)
                if 1 < d < n
                    count = 0
                    temp_n = n
                    while temp_n % d == 0
                        count += 1
                        temp_n = div(temp_n, d)
                    end
                    push!(factors, d => count)
                    
                    if temp_n > 1
                        remaining_factors = factor(QuadraticSieve(), temp_n)
                        append!(factors, remaining_factors)
                    end
                    return sort(factors)
                end
            end
        end
    end
    
    # If no factors found, try ECM with larger bounds
    @warn "No factors found, switching to ECM with large bounds"
    return factor(ECM(), n, 10^4, 10^5)
end

# Testing function
function test_methods(n::Integer, methods::Vector{FactorizationMethod})
    println("Number to factorize: ", n)
        
    for method in methods
        println("\nMethod: ", typeof(method))
        try
            @time result = factor(method, n)
            println("Factors found: ", result)
            
            # Verification
            product = prod([p^e for (p, e) in result])
            println("Verification: ", product == n ? "correct" : "incorrect")
        catch e
            println("Error: ", e)
            println("Backtrace:")
            for (exc, bt) in Base.catch_stack()
                showerror(stdout, exc, bt)
                println()
            end
        end
    end
end

# Main function
function main()
    n = 2021
    println("\nTesting with n = ", n)
    methods = [
        TrialDivision(),
        PollardRho(),
        PollardPMinusOne(),
        FermatMethod(),
        ECM(),
        QuadraticSieve()
    ]
    test_methods(n, methods)

    # n = 1234567891  # Large prime number (FermatMethod is failing)
    # methods = [
    #     TrialDivision(),
    #     PollardRho(),
    #     PollardPMinusOne(),
    #     FermatMethod(),
    #     ECM(),
    #     QuadraticSieve()
    # ]
    # test_methods(n, methods)

    n = 632459103267572196107100983820469021721602147490918660274601
    methods = FactorizationMethod[
        QuadraticSieve()
    ]
    test_methods(n, methods)

end

main()

Unfortunately this code doesn't help factorizing 632459103267572196107100983820469021721602147490918660274601 in a decent duration

Maybe it could be worth looking at SageMath code to see how they proceed so quickly?

@s-celles s-celles changed the title Factorisation of "big" numbers Factorization of "big" numbers Dec 12, 2024
@oscardssmith
Copy link
Member

the primary reason this doesn't help is that the QuadraticSieve it implements isn't actually a Quadratic Sieve implimentation. Specifically, it seems to be trying to do the gausian elimination by trial and error rather than anything sensible.

@oscardssmith
Copy link
Member

unfortunately, I've now been nerd snipped into implementing a quadratic sieve... By now I have a very mediocre version of Dixon's method, and tomorrow evening I'll try to implement the sieving to make it fast.

@s-celles
Copy link
Author

What is your opinion about API implementation which make use of multiple dispatch ie factor accepting 2 arguments : one for integer number to factor and an other one about factorization method?

Did you looked how SageMath is implementing factor?
https://github.com/search?q=repo%3Asagemath%2Fsage%20factor&type=code

@oscardssmith
Copy link
Member

I don't think a dispatch API makes much sense here. we know what the right algorithms are, we just have to implement them.

@oscardssmith
Copy link
Member

#161 is a first draft. It definitely could use some cleanup, but it's already pretty darn fast for semi-primes (not yet at the level of the well optimized sieves, but way way way better than lenstra for this sort of case.

@s-celles
Copy link
Author

Hi @oscardssmith

I run out of memory

(@v1.11) pkg> add https://github.com/oscardssmith/Primes.jl.git#os/quadratic-sieve
    Updating git-repo `https://github.com/oscardssmith/Primes.jl.git`
    Updating registry at `C:\Users\scelles\.julia\registries\General.toml`
(...)
   Resolving package versions...
    Updating `C:\Users\scelles\.julia\environments\v1.11\Project.toml`
  [27ebfcd6] + Primes v0.5.6 `https://github.com/oscardssmith/Primes.jl.git#os/quadratic-sieve`
    Updating `C:\Users\scelles\.julia\environments\v1.11\Manifest.toml`
  [18e54dd8] + IntegerMathUtils v0.1.2
  [27ebfcd6] + Primes v0.5.6 `https://github.com/oscardssmith/Primes.jl.git#os/quadratic-sieve`
Precompiling project...
  9 dependencies successfully precompiled in 120 seconds. 446 already precompiled.

julia> using Primes

julia> factor(632459103267572196107100983820469021721602147490918660274601)
ERROR: OutOfMemoryError()

Quadratic Sieve implementation in CrypTool 2 (C#) returns result in less than a second.

image

Code is available at
https://github.com/CrypToolProject/CrypTool-2/tree/main/CrypPlugins/QuadraticSieve

Hope that can help!

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants