1

I am looking for a way in VBA for Excel that's quicker than arrays for assigning serials that are in a group 1 or 2, into an array match for each value depending on the group value and value of match(a,2).

Criteria / process for matching:

  1. match array is initially empty
  2. serial from serial1 list must match serial in serial2 list.
  3. value of match(a,2), which corresponds to serial1 list, must be empty (ie vbnullstring) to be considered - otherwise next value of match(a,2) is considered.
  4. if the serial in serial2 list has a corresponding value of 1 for the group, then the position of that group value - pos is set in match(a,1) corresponding to the serial1 list.
  5. if the serial in serial2 list has a corresponding value of 2 for the group, then the position of that group value - pos is set in match(a,2) corresponding to the serial1 list.

I have tried using scripting.dictionary but got stuck (like the linked question).

Sample data and current code that works are below. Values for serial1 and serial2 are non-unique, and should be taken as having the same amount of values (count)

There is currently over 23000 rows of data and this is set to increase gradually over next few months.

Currently the code below under 8 mins on an i7 processor. The main aim is to reduce this time if possible. A formula might be quicker, but also looking for other solutions such as dictionaries, collections etc.

A database can't be used - client won't permit this.

This is a question following from, but unrelated to: Excel VBA updating dates efficiently with non-unique string values and boolean data

Sample input data:

match1    match2    serial1    serial2    group    pos
(blank)    (blank)    ABC001   ABC002    1    1    
(blank)    (blank)    ABC002   ABC004    2    2  
(blank)    (blank)    ABC003   ABC003    1    3  
(blank)    (blank)    ABC005   ABC006    2    4  
(blank)    (blank)    ABC007   ABC001    2    5  
(blank)    (blank)    ABC004   ABC005    1    6  
(blank)    (blank)    ABC006   ABC007    1    7  

Expected output data:

match1    match2    serial1    serial2    group    pos
(blank)    5    ABC001   ABC002    1    1    
1    (blank)    ABC002   ABC004    2    2  
3    (blank)    ABC003   ABC003    1    3  
6    (blank)    ABC005   ABC006    2    4  
7    (blank)    ABC007   ABC001    2    5  
(blank)    2    ABC004   ABC005    1    6  
(blank)    4    ABC006   ABC007    1    7  

Current code:

match() = sheetnm1.Range("match_nr").Value 'Here match(a,1) is first argument and match(a,2) is second argument
serial1() = sheetnm1.Range("serial_nr1").Value 
serial2() = sheetnm1.Range("serial_nr2").Value 
group() = sheetnm1.Range("group_nr").Value

For a = 1 To UBound(match, 1)
    If match(a, 2) = Empty Then
        For b = 1 To UBound(serial1, 1)
            If serial2(a, 1) = serial1(b, 1) Then
                If group(b, 1) = 2 Then
                    match(a, 2) = b
                Else
                    match(a, 1) = b
                End If
            End If
        Next b
    End If
Next a

Attempted dictionary code - wasn't sure how to structure this

For b = 1 To UBound(serial1, 1)
    If Not Dict1.Exists(serial1(b, 1)) Then
        Dict1.Add serial1(b, 1), b
    End If
Next b

For a = 1 To UBound(match, 1)
    If Not Dict2.Exists(serial2(a, 1)) Then
        Dict2.Add serial2(a, 1), a
    End If
Next b
unseen_rider
  • 324
  • 5
  • 23
  • The slow performance you are seeing is due to the fact that you are linear searching an unsorted array over and over again. You will see improvement by sorting the array first, then binary searching it each time after that. This will require a lot more code than you have now, since you will have to do both the sort and search functions. You can find a sort algorithm for arrays here: https://stackoverflow.com/questions/4873182/sorting-a-multidimensionnal-array-in-vba/5104206#5104206 – Greg Viers Jun 27 '19 at 13:47
  • You could also use dictionaries to index the array data that way you avoid the searching. – Damian Jun 27 '19 at 13:49
  • Which would be quicker - using dictionaries or sorting array then binary searching? I've also added the incomplete dictionary code I tried mentioned – unseen_rider Jun 27 '19 at 13:53
  • 1
    In your example you do 23K^2 comparisions, but match(a,x) will be defined by the entry with the highest b. So why don't you loop backward and break the loop after the 1st match? You'd save quite a few comparisions. Similarly, you could make a try with Excel.Find (SEarchDirection:=xlPrevious). It may be more efficient for this amount of comparisons. – AcsErno Jun 27 '19 at 14:07
  • This follows sort of the same framework I wrote for you yesterday, you can use the logic and build this @unseen_rider. – Damian Jun 27 '19 at 14:07
  • You say your client won't allow a database but you could still use a disconnected recordset. You might find this easier if you are used to dealing with databases. Create a disconnected recordset and clone it. Loop through one and filter on the other then dump the whole lot back. –  Jun 27 '19 at 14:09
  • ... and use `Option Compare Binary` :) – AcsErno Jun 27 '19 at 14:18
  • @AcsErno can you provide some code illustrating the looping backwards method you suggest please? – unseen_rider Jul 04 '19 at 09:19
  • @Damian with the dictionary method, it does not work at present due to data issue preventing key to be added to dictionary needing to be unique. – unseen_rider Jul 04 '19 at 09:23
  • @unseen_rider Just as simply as `For b = UBound(serial1, 1) To 1 Step -1`, and `If Then ... Exit For` – AcsErno Jul 04 '19 at 13:40
  • @AcsErno this won't have the desired effect imo - we can't `exit for` early since need to consider all matches not just the first one as serials are non-unique – unseen_rider Jul 04 '19 at 14:09

1 Answers1

0

Is your row 3 data incorrect in the expected output example?

If so you should be able to easily solve this using formulas rather than VBA. For example assuming a header row exists and data starts in row 2 then the code for the first row columns would be...

=IF(E2=1,"",MATCH(C2,D:D,0)-1)        =IF(E2=2,"",MATCH(C2,D:D,0)-1)
Tragamor
  • 3,594
  • 3
  • 15
  • 32