sequence mapping

mapping sequences

Version 1 / April 15, 2012 / Benjamin Aaron Degenhart

The question that this algorithm answers is; given the sequence of A1 is coupled with B1 and A2 is coupled with B2, what is B3 going to be if we know A3?
Or differently: What is the result if we send the sequence A3 through a mapping process that has been informed by the coupling of A1 with B1 and the coupling of A2 with B2?

You can enter one, two or more couples. The number of source sequences has to be the same as the number of target sequences.

I thought about what an artificial creativity would need to be able to do. I came to think that creativity means recombining existing elements so that this combination becomes an element in itself. The exponential growth of possible combinations with increasing elements requires better and better ways to choose with branch to run down. Human minds use metaphors to find relevant combinations. They mimic the structural information from successful combinations in other domains. The metaphor establishes a mapping between a source- and a target level. Changes in either one of them will then be interpreted in the other one. 'Artificial creativity' needs a way to pull in structural information from other domains to test possible combinations in a pool of elements more effectively - a digital metaphor? It has to go 'collect' graph-navigational data from targets in other domains. By making the mapping-process complex enough the source can be mapped upon any branch in any target - of course it is of interest to choose targets first that are likely to seed potent structural information into the source.

This algorithm is restricted to monochronic sequences and requires that the data is already processed in a way where the elements are coming from a pool that scopes the totality of its domain. Things like "pasta" and "noodles" have to be unified/matched.

In this simplified example with phrases are two source-sequences from the category of <daily routines> and two target-sequences from the category of <making food>. The first coupling says "these 4 steps i go through every morning are like these 5 steps i go through for making pasta" and the second coupling says "these 4 steps i go through every evening are like these 5 steps i go through when eating a cheese sandwich (for whatever reason i choose to compare these two levels). Then i go wild in my daily routine and find myself performing a new sequence of 4 steps. I wonder; how would this new chain of events be like if it would be for making food?? Most results will be complete nonsense or impossible to apply to actual food making - but maybe, maybe there's one that pokes me towards a new recipe, a cominbation that hasn't been tested before :)

given source sequence(s):

given target sequence(s):

new source sequence:

corresponding target sequence: ?

desired length of new target:

reset

results:

log:

How it works:

2 couples given:
DFB -> acre
FAD -> brncd

task:
AFDB -> ?

Find LCM (least common multiple) for each coupling and multiply each element with the factor of [own seq.length / couple-lcm] so that both sequences in the coupling have the same lengths.

LCM of 3 & 4 is 12:

D	D	D	D	F	F	F	F	B	B	B	B
a	a	a	c	c	c	r	r	r	e	e	e

LCM of 3 & 5 is 15:

Find out if there are elements in the new source sequence that are not present in any given source sequence - if so (e.g. "X") then add ",X" to the source input and ",#" to the target input; code-internally it will be treated as another coupling whereas we know it's a placeholder when "#" appears in the new target result.

Extract the corresponding target-elements for each element in the new source sequence <AFDB> and rewrite it in encoded form:

A: r n n n c [1:r,3:n,1:c]

F: c c r r / b b b r [2:c,2:r/3:b,1:r]

D: a a a c / c c d d d [3:a,1:c/2:c,3:d]

B: r e e e [1:r,3:e]

Concatenate encoded raw-string:

AFDB -> 1:r,3:n,1:c + 2:c,2:r/3:b,2:r + 3:a,1:c/2:c,3:d + 1:r,3:e

Multiply the letter-numbers with two different factors to assign the target-letters weight, that gives each the correct chance to appear in the result; one is the LCM of all collected letter-packages and the other one is the LCM of all vertical expansions (bifurcations in the graph) that have to be undergone because of multiple occupancy of source elements with target elements.

To determine the LCM of the letter-packages only look at the sums of each letter package:

5 + 4/5 + 4/5 +4

The two different sums of 4 and 5 give a LCM of 20. Apply these to the raw-string gives:

4:r,15:n,4:c + 10:c,10:r/12:b,8:r + 15:a,4:c/8:c,12:d + 5:r,15:e

Look at the occupancy-numbers to determine the LCM required to achieve compensation for the bifurcations:

1 + 2 + 2 + 1

The LCM of these is 2, apply it to the raw-string:

8:r,24:n,8:c + 10:c,10:r/12:b,8:r + 15:a,5:c/8:c,12:d + 10:r,30:e

Multiplying all occupancy-numbers gives 4; the number of branches required to investigate all equally valid options. Construct the branches, strip off code-internal string-markers + and , :

8:r 24:n 8:c 10:c 10:r 15:a 5:c 10:r 30:e
8:r 24:n 8:c 12:b 8:r 8:c 12:d 10:r 30:e
8:r 24:n 8:c 10:c 10:r 15:a 5:c 10:r 30:e
8:r 24:n 8:c 12:b 8:r 8:c 12:d 10:r 30:e

Leaving it here would give target-sequences with length 9. Averaging the ratios of the given couples it's expected to get a new target sequence with length 6. Take one branch at a time and look at all the different ways subgroups of 6 can be formed - weigh each subgroup by adding up the numbers of its elements.

The first subgroup of 6 in the first branch for instance [8:r 24:n 8:c 10:c 10:r 15:a] has a weight of 75.

Run through all the 336 constellations of 6 across the 4 branches finds a maximum weight of 99. If required; at this point solutions with letter-duplicates could be (not implemented) surpressed by introducing a downgrading factor in the weighing process in the event of finding duplicates.

In this case this maximum was achieved by only one solution; our final mapping result:

ncrare

This solution says: Sending the sequence <AFDB> through a mapping process that has been informed by the coupling of <DFB> with <acre> and the coupling of <FAD> with <brncd>, the resulting sequence is <ncrare>.

Alternative approaches:

I am sure there different and probably more effective (less line/loop-costly) approaches to a sequence-mapping algorithm.

Scanning first for meta-patterns (like reverse, mirror) can surely save lots of processing power. In this situation for instance [given: ABC -> defg, task: CBA -> ?], it's clearly quicker to recognize that the source-sequence was reversed and therefore the target also can just be reversed.

Also i can think of an approach that looks at the relative position of each element according to the sequence as a whole. So in the situation [given: ABCD -> efghij, task: BAC -> ?], the 'movement' that happened on source-level from given to task could be encoded as [1/4 -> 2/3, 2/4 -> 1/3, 3/4 -> 3/3, 4/4 -> _ ] (A moved from 1st among 3 to 2nd among 4, B moved from...) and mapped upon the movement that needs to happen accordingly on target-level.

Distances between all subgroups could be another approach - probably the most costly but the most precise?