Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

String permutations

in General Posts: 1,223

I know this is a textbook problem — I’m just unhappy with what I’ve found of textbook answers.

Given a string, I’d like to return all the unique permutations. However, examples that I’ve found often produce duplicate strings, or repeat the use of letters. That’s an issue when dealing with a problem that already requires recursion to avoid massive iteration.

So, for example, given the string “abc”, I’d like to get back abc, acb, bac, bca, cab, cba. But not aaa or aab, etc.

Surely there’s a nice clean implementation that’s eluded my googling. Thanks.

Tagged:

Comments

  • Posts: 708
    What do you want returned for an input of aab? 3 permutations (aab aba baa) or 6 (aAb Aab abA Aba baA bAa)
  • Posts: 1,223

    The six set. Really, a character array permutation is a clearer statement.

  • dave1707dave1707 Mod
    edited June 6 Posts: 7,195

    @Mark Here’s a version.

    function setup()
        str="abc"
        combination(str,1)
    end
    
    function combination(t,v)
        if v==#t then
            print(str)
            return
        end
        for i=v,#t do
            swap(v,i)      
            combination(t,v+1)
            swap(v,i)
        end
    end
    
    function swap(x,y)
        tab={}
        for z=1,#str do
            table.insert(tab,string.sub(str,z,z))        
        end
        tab[x],tab[y]=tab[y],tab[x]
        str=table.concat(tab)
    end
    
  • Posts: 708
    What @dave1707 said
  • dave1707dave1707 Mod
    edited June 6 Posts: 7,195

    Here’s a better version to list the permutations for larger string values.

    EDIT: I did the string abcdefghij and it showed 3,628,800 permutations. I didn’t scroll thru them to verify all were correct, I’ll leave that to someone else if they want to.
    If I did the string abcdefghijklmnopqrstuvwxyz, there would be 403,291,461,126,605,635,584,000,000 permutations.

    function setup()
        str="abcdef"
        perm={}
        combination(str,1)
        st=1
    end
    
    function combination(t,v)
        if v==#t then
            table.insert(perm,str)
            return
        end
        for i=v,#t do
            swap(v,i)      
            combination(t,v+1)
            swap(v,i)
        end
    end
    
    function swap(x,y)
        tab={}
        for z=1,#str do
            table.insert(tab,string.sub(str,z,z))        
        end
        tab[x],tab[y]=tab[y],tab[x]
        str=table.concat(tab)
    end
    
    function draw()
        background(0)
        fill(255)
        text("Slide finger to scroll list",200,HEIGHT/2)
        text("Permutations "..#perm,200,HEIGHT/2+50)
        en=math.min(st+HEIGHT//20,#perm)
        for z=st,en do
            text(perm[z],WIDTH/2,HEIGHT-(z-st+1)*20)        
        end
    end
    
    function touched(t)
        st=(st+t.deltaY)//1
        if st>#perm-10 then
            st=#perm-10
        end
        if st<1 then
            st=1
        end
    end
    
  • Posts: 1,223

    Thanks, Dave. I dragged the old 178K word dictionary out of ScramWords and am spinning the results of your routine past it to find words-within-words. I was pretty happy with how that old program used the r,g,b channels to snug all those words into. 768x768 image. Now I’m wishing I’d used the a channel to encode grade level. Some of the words popping out are pretty obscure.

  • Posts: 422

    This is another good one for coroutines:

    function permgen (a, n)
       if n == 0 then
          coroutine.yield(a)
       else
          for i=1,n do
    
         -- put i-th element as the last one
         a[n], a[i] = a[i], a[n]
    
         -- generate all permutations of the other elements
         permgen(a, n - 1)
    
         -- restore i-th element
         a[n], a[i] = a[i], a[n]
    
          end
       end
    end
    
    function perm (a)
       local n = table.getn(a)
       return coroutine.wrap(function () permgen(a, n) end)
    end
    
    str = "abcdef"
    tbl = {}
    while str ~= "" do
       table.insert(tbl,str:sub(1,1))
       str = str:sub(2)
    end
    for p in perm(tbl) do
       print(table.concat(p," "))
    end
    
  • dave1707dave1707 Mod
    Posts: 7,195

    I looked up the ScramWords link and saw a post of mine there about a 1025x1025 pixel image that I created that holds 300,249 words.

  • Posts: 1,223

    The Scramwords dictionary is a pretty close match for the Scrabble dictionary, though I compiled it from unofficial sources. Scrabble has 187K words, Scramwords 178k. A big part of that difference comes from two additional lists added to Scrabble since I made the original image.

    But honestly, I’m still just pleased that it worked out. Certainly a lot simpler toting around a modest image than a dictionary of strings.

  • dave1707dave1707 Mod
    Posts: 7,195

    @Mark Is this to play scrabble. If so, then you have to add values to the letters so you can find the words that create the most points.

  • dave1707dave1707 Mod
    Posts: 7,195

    @Mark If you’re searching for scrabble words, then this code will work better than word permutations. It finds and prints any word in a dictionary file that matches the scrabble letters from 1 character to how many letters you have. For testing, you can change the characters in the variable letters and you can change the words in the table dict to try different letters/words combinations. The table dict should be the dictionary table, one word per table entry. The upper/lower case of the letters must match the case of the dictionary or more code could be added to force the case to match. More code can be added to give a scrabble value to each word found.

    function setup() 
        letters="tienvofxs" -- upper/lower case of letters must match case of dictionary
    
        st={}
        for z=1,#letters do
            table.insert(st,string.sub(letters,z,z))
        end
        table.sort(st)
    
        -- read a dictionary file and create a dict table of words
        dict={} 
        dict={"one","two","three","four","five","six","seven","eight","nine","ten"}
    
        foundTab={}
        for z=1,#dict do
            find(dict[z],st)
        end
    
        print("letters = "..letters)
        print("words that match some letters")
        print(table.concat(foundTab,"\n"))
    end
    
    function find(d,s)
        if #d>#s then
            return
        end
        local dt={}
        for z=1,#d do
            table.insert(dt,string.sub(d,z,z))
        end
        table.sort(dt)
        local pos=1
        for z=1,#dt do
            if pos>#s then
                return
            end
            for y=pos,#s do
                if dt[z]==s[y] then
                    pos=y+1
                    break                
                elseif dt[z]>s[y] then
                    if y>=#s then
                        return
                    end
                else
                    return
                end
            end
        end
        table.insert(foundTab,d)
    end
    
  • dave1707dave1707 Mod
    Posts: 7,195

    Here’s an updated version that shows the scrabble value of the words matching the letters.

    function setup() 
        -- scrabble value of the letters a thru z
        valTab={1,3,3,2,1,4,2,4,1,8,5,1,3,1,1,3,10,1,1,1,1,4,4,8,4,10}
    
        local letters="tienvofxs" -- upper/lower case of letters must match case of dictionary
    
        local st={}
        for z=1,#letters do
            table.insert(st,string.sub(letters,z,z))
        end
        table.sort(st)
    
        -- read a dictionary file and create a dict table of words
        local dict={} 
        dict={"one","two","three","four","five","six","seven","eight","nine","ten"}
    
        foundTab={}
        for z=1,#dict do
            find(dict[z],st)
        end
    
        print("letters = "..letters)
        print("words that match some letters")
        print(table.concat(foundTab,"\n"))
    end
    
    function find(d,s)
        if #d>#s then
            return
        end
        local dt={}
        for z=1,#d do
            table.insert(dt,string.sub(d,z,z))
        end
        table.sort(dt)
        local pos=1
        for z=1,#dt do
            if pos>#s then
                return
            end
            for y=pos,#s do
                if dt[z]==s[y] then
                    pos=y+1
                    break                
                elseif dt[z]>s[y] then
                    if y>=#s then
                        return
                    end
                else
                    return
                end
            end
        end
        table.insert(foundTab,d.." "..getValue(d))
    end
    
    function getValue(word)
        local tot=0
        for z=1,#word do
            tot=tot+valTab[string.byte(string.sub(word,z,z))-string.byte("a")+1]
        end  
        return tot
    end
    
  • Posts: 422

    One minor thing: wouldn't it be more efficient to create a list of sorted words at the start rather than sort them every time you invoke the find function?

  • dave1707dave1707 Mod
    edited June 24 Posts: 7,195

    @LoopSpace The dictionary file should already be sorted. I do one sort in setup to get the letters in order, then in find(), I sort the characters of each word. Having the letters and the characters of each word in order make it easy to do the compare. That way I can compare multiple characters that are the same without a problem.

    Edit: Actually, the dictionary file doesn’t need to be sorted. But if you wanted to speed things up, then you could stop the dictionary search once you past the highest letter you have.

  • Posts: 422

    @dave1707 By "sorted words" I meant with each word rearranged into alphabetical order. The point is that the step "in find(), I sort the characters of each word" only needs to be done once, but you're doing it every time.

  • dave1707dave1707 Mod
    Posts: 7,195

    @LoopSpace Are you saying that I should sort each word in the dictionary file so that the letters of each word are in alpha order. I could do that easy enough, but if that’s the case then when I find a match, I won’t know what the word is because the letters are in alpha order. I took my code above and used my 300,249 word dictionary file and it found 290 words matching the letters tienvofxs. It took 1.7 seconds which isn’t that long.

  • Posts: 1,223

    Thanks. I’d already cranked a version on my old ScramWords dictionary. BTW, I do something with my dictionary that’s likely heavy abuse of lua tables, but seems to work fine—I just use the words as indexes and fill the table with empty strings. That makes it very simple to determine if a word is present without iteration.

  • edited June 25 Posts: 422

    If this ends up in a larger program, 1.7s might end up being too long.

    Keep the original dictionary then when a word is found in the alpha-sorted one, return the corresponding entry in the original one.

    Even better would be to generate the corresponding pattern, something like a.?d.?n for and, and use string.find since then it's working at the c-level so will be much faster yet.

    To use hashing, as @Mark suggests, you'd be attacking the problem the other way around: generate all substrings of the tiles and look for those. Hash lookup is very fast, and there are only 127 non-empty alpha-ordered substrings of 7 tiles so that could well be fastest.

    I did have another silly idea: convert each word to a number by assigning a prime to each letter and multiplying. Then a word can be made from the tiles if its number divides the tiles number.

  • dave1707dave1707 Mod
    Posts: 7,195

    @LoopSpace I was thinking about converting each word to a number also, but I wasn’t sure if that would work or not so I just went the other way. When I have nothing to do later today, I’ll see if that might work.

  • dave1707dave1707 Mod
    Posts: 7,195

    @LoopSpace I tried converting words and letters to numbers, but I didn’t see how that would work when I tried match anything. The next thing I tried was sorting the letters in each word and having a dictionary table of words with their sorted letters with each word. It took almost 5 seconds to read the dictionary file and create a table of words and their sorted letters. But on the bright side, instead of taking 1.7 seconds to find the 290 matching words, it took only .3 seconds. The thing is, I don’t think the 1.7 seconds to find matching words is that bad. You have to take into consideration that each time you pick new letters, you have to enter those letters in a input field, tap a button to find matching words, then look thru the list of matched words to determine what word you want to use.

  • edited June 26 Posts: 422

    @dave1707 To convert to numbers, you have to choose a prime for each letter and then multiply the numbers together. So cab = 5 x 2 x 3 = 30. Then t is a substring of s if s%t == 0.

    Re speed: what about when the computer takes a turn? Then you don't want the user waiting around doing nothing.

  • dave1707dave1707 Mod
    Posts: 7,195

    @LoopSpace I’ll play around with what you show above later, but here’s what I think is a good routine. You don’t need to do any sorting at all. You just create the dictionary as a table of words. Using the string.sub function and both the words and letters as strings, you just match a character in one string to a character in the other string. If there’s a match, you just replace the matched character with a ! so it’s not used again. Using this routine on my 300,000+ word dictionary file, it took only .4 seconds to find the 290 words that took 1.7 seconds with the other routine. So you create the dictionary file once and then change the letters with parameter.text and press parameter.find .

    function setup()
        parameter.text("letters","ivnetxovfsne")
        parameter.action("find",find)
        found={}
        dictionary={"one","two","three","four","five","six","seven","eight","nine","ten"}
        for a,b in pairs(dictionary) do
            find(b,letters)
        end
        print("letters = "..letters)
        print("matched words")
        print(table.concat(found,"\n"))
    end
    
    function find(word,letter)
        for z=1,#word do
            c=string.sub(word,z,z)
            letter,n=string.gsub(letter,c,"!",1)
            if n==0 then
                return
            end
        end  
        table.insert(found,word)
    end
    
  • dave1707dave1707 Mod
    Posts: 7,195

    @LoopSpace Here’s an example of converting words to a number value like you suggested. Using the find routine on my 300,000+ dictionary took .8 seconds to find the 290 words. That was 2x longer then the time using string.gsub. I guess the number for each word could be calculated in advance to speed things up.

    function setup()
        val={a=2,b=3,c=5,d=7,e=11,f=13,g=17,h=19,i=23,j=29,k=31,l=37,m=41,n=43,
                o=47,p=53,q=59,r=61,s=67,t=71,u=73,v=79,w=83,x=89,y=97,z=101}
    
        -- calc value of letters
        letters="ivnetxovfsne"
        lettersVal=1
        for z=1,#letters do
            c=string.sub(letters,z,z)
            lettersVal=lettersVal*val[c]
        end
    
        found={}
    
        dictionary={"one","two","three","four","five","six","seven","eight","nine","ten"}
        for a,b in pairs(dictionary) do
            find(lettersVal,b)
        end
    
        print("letters = "..letters)
        print("matched words")
        print(table.concat(found,"\n"))
    end
    
    function find(letter,word)
        local wv=1
        for z=1,#word do
            wv=wv*val[string.sub(word,z,z)]    -- calc value of word
        end
        if letter%wv==0 then
            table.insert(found,word)    
        end
    end
    
  • Posts: 422

    Interesting, but I would expect a significant speed-up by precalculating. I'll have to load in a dictionary to test these ... I'm getting intrigued.

  • dave1707dave1707 Mod
    Posts: 7,195

    @LoopSpace I tried the precalculate on each word and got a run time of .18 seconds. Quite a speed up.

  • Posts: 422

    @dave1707 I'm trying a hash table lookup first. How are you calculating the times? I want to be sure I'm comparing like with like.

  • dave1707dave1707 Mod
    Posts: 7,195

    @LoopSpace

        st=os.clock()
        ..... code to time
        en=os.clock()
        print(“time ”,en-st)
    
  • Posts: 422

    This code compares the primes method with a method based on hashing.

    -- Dictionary
    
    function setup()
        local d = readText("Documents:Dictionary")
        local dict = split(d,"[^\n\r]+")
        primes = {2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,
        101}
        local nd = 0
        for k,v in ipairs(dict) do
            -- for k=1,100 do
                -- v = dict[k]
            if v:find(" ") == nil then
                nd = nd + 1
            end
        end
        print("Dictionary size: " .. nd)
        ndict = {}
        sdict = {}
        local w,t,e,s,n
        s = os.clock()
        for k,v in ipairs(dict) do
        -- for k=1,100 do
            -- v = dict[k]
            if v:find(" ") == nil then
                t = split(v,".")
                table.sort(t)
                w = table.concat(t)
                if not sdict[w] then
                    sdict[w] = {}
                end
                table.insert(sdict[w],v)
            end
        end
        e = os.clock()
        print("Hashing: " .. (e-s))
    
        s = os.clock()
        for k,v in ipairs(dict) do
        -- for k=1,100 do
            -- v = dict[k]
            if v:find(" ") == nil then
                t = split(v,".")
                n = 1
                for l,u in ipairs(t) do
                    n = n * primes[u:upper():byte()-64]
                end
    
                if not ndict[n] then
                    ndict[n] = {}
                end
                table.insert(ndict[n],v)
            end
        end
        e = os.clock()
        print("Primes: " .. (e-s))
    
        parameter.text("letters","gtexbju")
        parameter.action("doFind",doFind)
        --]]
    end
    
    
    --- splits 'str' into pieces matching 'pat', returns them as an array
    function split(str,pat)
      local tbl={}
      str:gsub(pat,function(x) tbl[#tbl+1]=x end)
      return tbl
    end
    
    function findHash(l)
        local w = split(l,".")
        table.sort(w)
    
        local i = {}
        for k=1,#w do
            table.insert(i,0)
        end
        local r = {}
        local j = #w
        local s
        while j > 0 do
            i[j] = i[j] + 1
            if i[j] == 2 then
                i[j] = 0
                j = j - 1
            else
                j = #w
                s = {}
                for l=1,#w do
                    if i[l] == 1 then
                        table.insert(s,w[l])
                    end
                end
                s = table.concat(s)
                if sdict[s] then
                    for k,v in ipairs(sdict[s]) do
                        table.insert(r,v)
                    end
                end
            end
    
        end
    
        return r
    end
    
    function findPrime(l)
        local t = split(l,".")
        local n = 1
        for l,u in ipairs(t) do
            n = n * primes[u:upper():byte()-64]
        end
        local r = {}
        for k,v in pairs(ndict) do
            if n%k == 0 then
                for l,u in ipairs(v) do
                    table.insert(r,u)
                end
            end
        end
        return r
    end
    
    function doFind()
        local s,t,e
    
        s = os.clock()
        findHash(letters)
        e = os.clock()
    
        t = findHash(letters)
        print("Hashing: ")
        print(#t .. " words found in " .. (e-s) .. "s")
    
        s = os.clock()
        findPrime(letters)
        e = os.clock()
    
        t = findPrime(letters)
        print("Primes: ")
        print(#t .. " words found in " .. (e-s) .. "s")
    end
    

    I have a dictionary stored as a text asset. It reports just over 370,000 words.

    In terms of initialising, I'm getting about 18s for the hash table and 25s for the primes table. Obviously, this could be stored in the file itself.

    When it comes to finding all the words, with 7 letters I get of the order of 0.2s for the primes method, but an amazing 0.001s for the hash method.

  • Posts: 422

    I need to double-check my routines. With the letters ivnetxovfsne then the hash method found about 1400 words in less than 0.08s while the prime method only found about 500 words in 0.25s

  • dave1707dave1707 Mod
    Posts: 7,195

    @LoopSpace Your reusing letters in your hash routine. I put the word one in the dict table. When I used the letters one, hash found 1 word. When I entered onee in letters, hash found 2 words. When I entered oneee in letters, hash found 3 words. It kept reusing the letters on for the three e's. When you said you were going to use hash for the dictionary, I wasn’t sure how that was going to work. But I took your hash code and used just a few words in the dict table. I printed out sdict to see what was there and I saw what you were doing. You sorted the letters in each word and any words that had the same letters were considered the same word with the actual words in another table. I wonder how much the full dictionary table was reduced.

  • dave1707dave1707 Mod
    edited June 28 Posts: 7,195

    @LoopSpace I ran your hash routine on my dictionary file. The full dictionary table is 300,130 words. The hash table was 269,725 words. So hash reduced the table entry size by 30,405 words. That's about 10% smaller. So I think there should be about a 10% reduction in speed.

    Edit: I used the letters ivnetxovfsne on my 300,130 words and the hash routine found 1,189 words in .04 sec and the prime routine found 436 words in .08 sec.

    Edit 2: With the letters ivnetxovfsne, you have 2 v's, 2 n's, 2 e's. I removed the duplicate letters and got 299 matches with both the hash and prime routines.

  • Posts: 422

    Good catch!

    Here's my modified search:

    function findHash(l)
        local w = split(l,".")
        table.sort(w)
    
        local i = {}
        for k=1,#w do
            table.insert(i,0)
        end
        local r = {}
        local j = #w
        local s
        local f = {}
        while j > 0 do
            i[j] = i[j] + 1
            if i[j] == 2 then
                i[j] = 0
                j = j - 1
            else
                j = #w
                s = {}
                for l=1,#w do
                    if i[l] == 1 then
                        table.insert(s,w[l])
                    end
                end
                s = table.concat(s)
                f[s] = true
            end
        end
        for s,u in pairs(f) do
            if sdict[s] then
                for k,v in ipairs(sdict[s]) do
                    table.insert(r,v)
                end
            end
        end
    
        return r
    end
    

    With that, I get the same number of matches for the two routines. With the ive... set, I get 528 in .25s for primes and .073 for hash.

    My dictionary also shows about a 10% reduction in size.

  • Loopspace How do you print the actual words found by your routine

  • dave1707dave1707 Mod
    Posts: 7,195

    @colincooper One way you can print the words in @LoopSpace code above is add the line print(table.concat(t,"\n")) after the line of code t=findHash(letters) or t=findPrime(letters) in the function doFind() .

  • Thank you

Sign In or Register to comment.