A new approach to multithreading in Excel

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
31
down vote

favorite
4












Intro





Multithreading tools exist in Excel - often to run macros in multiple instances of Excel, or to convert macros to vbscripts that can run independently. However I've often come across projects where I'd like to delegate multiple tasks out to async processes, and creating multiple instances of Excel itself is overkill for this.



After running into several projects where I needed to execute multiple async requests (think internet scraping, or radioactive decay simulation) I decided to make a general class that, when given an async process, can execute and handle several in parallel.




For example, this Daisy Test makes a multithread group which sends html requests to all the urls in column B. The first link of these google searches are returned to column Cin the order the responses arrive. That triggers a second group (daisy chained onto the 1st one's events) to send internet explorer request to that url, and these are returned in D



Example implementation



To clarify some of the comments; it should be noted that these requests are sent off in order (B1,B2,...), but return unordered (C2 first). That's because my class allows the threads to run in parallel (hence multithreading). These are still only managed in a single Excel thread, but the requests are asynchronous and in different processes so are effectively running in other threads.



Summary



N.B. The term 'thread' will be used loosely here, with no reference to the actual processor. Instead, when I say 'thread' I am talking about a handler for a task (which is running in parallel to other tasks on other handlers)



The multi thread setup consists of a main clsMultiThread parent class which controls the shape of the multi thread collection (i.e. how many tasks are running at any given time), as well as several clsThreadHandle classes.



These thread handlers are each responsible for running an async task, and informing the parent multithread class when each task is finished.



Internally, the tasks are actually run by WorkerClass objects, one worker for each thread. These receive input arguments from their parent threads, run their respective async task, and raise an event to their parent clsThreadHandle when finished. The thread handle then passes this event, and any optional return value, back up to the main clsMultiThread, whose job it is either to close the thread once it's done, or prompt the thread to run another task. The chain of command is summarised in the image below:



Chain of command



Feedback I'm after



  • General feedback on structure, event handling, use of interfaces etc.

  • Proper exiting (which I don't think I'm doing right now)

  • User-side interface

    • Whether this approach to the problem is appropriate and intuitive (passing worker classes etc.)

    • Whether I'm missing some functionality that should be there


This is also the first project I've ever done with the primary aim of making something I can re-use (as well as the longest & most complicated bit of code I've written). For that reason I'd also greatly appreciate any comments on



  • Coding style

  • Use of comments

  • Anything else I should bear in mind when working on such projects

Implementation



Main class clsMultiThread



Right, some code. Here's the main class which handles all the sub-classes



Option Explicit

'''
'VBA class to run multiple asynchronous processes
'Interfaces directly with clsThreadHandle
'Requires references to:
'mscrolib.dll
'''

'THREAD GROUP SHAPE PROPERTIES
Private threadGroup As New Collection 'holds all the treads
Private maxThreads As Long 'maximum number of threads that can be open
Private minThreads As Long '[minimum number of threads]
Private iterableQueue As mscorlib.Queue 'this item holds all the items from iterator set in queue
'replaces iterableGroup, newtaskindex, taskidset
Private iterableSize As Long 'number of items in iterable group or
Private passesArguments As Boolean 'true if iterableGroup exists

'THREAD GROUP REFERENCES
Private WithEvents threadEvents As clsHandleEvents 'Event object to raise events from each thread handle
Private workerClass As IWorker

'THREAD GROUP SETTINGS
Private autoQuitEnabled As Boolean 'dictates whether to quit on Complete event, should be false if daisychaining

'THREAD GROUP STATE PROPERTIES
Private openThreadCount As Long 'number of threads/handles currently open
Private openTaskCount As Long 'number of tasks running on those threads
Private closedTaskCount As Long 'number of threads closed (failed and successful)
Private successfulTaskCount As Long 'number of threads completed sucessfully
Private newThreadIndex As Long 'Iterator over handles (next new handle)
Private newTaskIndex As Long 'Iterator over open tasks (next thread to be started)
Private taskIDset As Collection 'Dictionary mapping taskIDs to iterableGroup location "REPLACE THIS. MERGE COLLECTION JUMBLES"
Private freeThreads As Collection 'holds all the free thread ids

'THREAD GROUP PERFORMANCE PROPERTIES
Private startTime As Date
'Private endTime As Date

'THREAD GROUP EVENTS
Public Event TaskComplete(returnVal As Variant, taskID As String, threadID As String) 'when a task is complete on a thread, maybe if failed
Public Event ThreadOpened(threadCount As Long, threadID As String) 'when a thread is opened, pass the new number of threads
Public Event ThreadClosed(threadCount As Long, threadID As String) 'when closed, pass closed thread ID
Public Event Complete(timeTaken As Date) 'when everything is (nearly) finished
Public Event Closed(timeTaken As Date) 'when entire group is closed
Public Event Opened(startTime As Date) 'when entire group is closed

'PRIVATE TYPES/ENUMS
Private Type Instruction 'instruction on what to do next, and any necessary arguments that can be passed
threadID As String
instructionBody As InstructionType
End Type

Private Enum InstructionType
mltCloseThread
mltOpenThread
mltSetTask
mltDoNothing
mltQuit
End Enum

Private Sub Class_Initialize()
'Set defaults
maxThreads = 5
minThreads = 1
newThreadIndex = 1
newTaskIndex = 1
autoQuitEnabled = True
Set threadEvents = New clsHandleEvents
Set taskIDset = New Collection
Set freeThreads = New Collection
startTime = Now
RaiseEvent Opened(startTime)
'''
'Test space
'''
End Sub

Private Sub threadEvents_Closed(threadID As String)
RaiseEvent ThreadClosed(openThreadCount, threadID)
End Sub

Private Sub threadEvents_Opened(threadID As String)
RaiseEvent ThreadOpened(openThreadCount, threadID)
End Sub

Private Sub threadEvents_Complete(obj As clsThreadHandle, returnVal As Variant) 'called when thread becomes free
'DO NOT mark as free here
RaiseEvent TaskComplete(returnVal, obj.Task, obj.Name) 'failed as boolean
openTaskCount = openTaskCount - 1
closedTaskCount = closedTaskCount + 1
successfulTaskCount = successfulTaskCount + 1 'could be unsuccessful too though
doInstructions obj.Name 'pass object name so it can be marked free
' If failed Then
' failedTaskCount = failedTaskCount + 1
' Else
' successfulTaskCount = successfulTaskCount + 1
' End If
End Sub

Public Sub Execute()
'check validity of user data, if valid, then execute task
If iterableSize = 0 Then
Err.Raise 5, Description:="You must set size argument to a non-zero value, or a non-empty iterable first"
ElseIf workerClass Is Nothing Then
Err.Raise 5, Description:="You must set the async class argument first"
Else
doInstructions
End If
End Sub

Public Sub Quit()
'Remove any references that would prevent proper closing
'Default automatically called when openThreadCount = 0
RaiseEvent Complete(Now - startTime)
Set threadEvents = Nothing
End Sub

Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction

'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID

'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy

BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If

Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select

'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
doInstructions loopcount:=loopcount + 1 'watch for infinite loop
End If

End Sub

Private Function getInstruction() As Instruction
'function to determine what action to take next
'called until do nothing returned
'caller to doinstructions can specify a free thread in which case some parts skipped
Dim results As Instruction 'variable to hold instruction and any arguments

Me.printState

'Do we need to open or close threads?
'Threads free? (threads open > tasks open):
If openThreadCount > openTaskCount Then
'Great we have a free thread, now use it or delete it (cos we have too many or no tasks remaining)
If newTaskIndex > iterableSize Then 'already passed all tasks
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
ElseIf openThreadCount <= maxThreads Then
'[find] & use free thread (run a task on it)
results.instructionBody = mltSetTask
results.threadID = freeThread
Else
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
End If
Else
'No threads free, either open one (if not exceeding max, and there's a task left to put on it)
'Or do nothing (can't close it if not free, shouldn't open new if no more tasks)
If openThreadCount < maxThreads And newTaskIndex <= iterableSize Then
results.instructionBody = mltOpenThread
ElseIf openThreadCount = 0 And autoQuitEnabled Then
results.instructionBody = mltQuit
Else
results.instructionBody = mltDoNothing
End If
End If

getInstruction = results
End Function

Private Sub openThread()
'opens a thread and assigns a task ID to it
Dim newThread As New clsThreadHandle 'create new handle
newThread.OpenHandle Me, threadEvents 'passes parent reference which allows handle to obtain thread ID
threadGroup.Add newThread, newThread.Name 'add it to the group with a new id (set by itself)
openThreadCount = openThreadCount + 1
freeThread = newThread.Name 'mark as free so task can be assigned to it

End Sub

Private Property Let freeThread(threadID As String)
'NOT WORKING"""""
'when a thread comes free, add it to the collection
freeThreads.Add threadID, threadID
Debug.Print threadID; " marked as free; now"; freeThreads.Count; "threads are free"
End Property

Private Property Let BusyThread(threadID As String)
'when a thread is not free or is closed, mark as busy by removing from free group
On Error Resume Next 'only remove ones what are there actually
freeThreads.Remove threadID
Debug.Print threadID; " marked as busy"; IIf(Err.Number <> 0, ", but wasn't in free group", vbNullString)
End Property

Private Property Get freeThread() As String
'gives up a free thread and adds it to the list
freeThread = freeThreads(1)
freeThreads.Remove (1)
End Property

Private Sub assignTaskID(threadID As String)
'@Ignore WriteOnlyProperty
'assigns task ID to thread
'nb does NOT actually run the task (this is instruction stage still)
Dim newThread As clsThreadHandle
Set newThread = threadGroup(threadID)
newThread.Task = NewTaskID
Set newThread.Worker = AsyncClass
End Sub

Private Sub closeThread(threadID As String, Optional failed As Boolean = False)

'close thread with appropriate id
Dim oldThread As clsThreadHandle
Set oldThread = threadGroup(threadID)
'remove from all collections
'taskIDset.Remove oldThread.Task remove from task id set if it was in there
threadGroup.Remove oldThread.Name
BusyThread = oldThread.Name 'remove from free collection
Set oldThread = Nothing
'iterate counters
openThreadCount = openThreadCount - 1
End Sub

Public Property Let Size(sizeFactor As Variant)
'property of the thread group which dictates how many processes to run in total
'size factor is either an iterable item, or an integer to dictate the size

'Check if size factor is number
If IsNumeric(sizeFactor) Then
'If so, size is that
iterableSize = CLng(sizeFactor)
passesArguments = False 'no argument to pass to thread, just run it a load of times

'If not, *check if iterable
ElseIf isIterable(sizeFactor) Then
'If so, size is size of collection from extration
Set iterableQueue = New Queue
iterableSize = addIterableToQueue(sizeFactor, iterableQueue)
passesArguments = True
Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If

End Property

Public Sub IncreaseSize(sizeFactor As Variant)
'method of threadGroup which adds more tasks to the queue, and immediately runs them
'size factor is either an iterable item, or an integer to dictate the size

'Check whether size is set yet
If Me.Size = 0 Then
Err.Raise 5, Description:="You must set Size before you can IncreaseSize"
End If

'check whether new data matches old type
If IsNumeric(sizeFactor) Then
If passesArguments Then
Err.Raise 5, Description:="Size factor type doesn't match original type"
Else
'is numeric and was numeric, grand
iterableSize = iterableSize + CLng(sizeFactor)
End If
ElseIf isIterable(sizeFactor) Then
If passesArguments Then
'was iterable and still is, great!
Dim itemsAdded As Long
itemsAdded = addIterableToQueue(sizeFactor, iterableQueue)
iterableSize = iterableSize + itemsAdded
Else
'wasn't iterble, now is
Err.Raise 5, Description:="Size factor type doesn't match original type"
End If

Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If
Me.Execute
End Sub

Public Property Set AsyncClass(ByVal workObj As IWorker) 'Set the worker who carries out the tasks
Set workerClass = workObj
End Property

Public Property Get AsyncClass() As IWorker
Set AsyncClass = workerClass
End Property

Public Property Get Size() As Variant
Size = iterableSize
End Property

Public Property Let autoQuit(ByVal value As Boolean)
autoQuitEnabled = value
End Property

Public Property Get NewHandleID() As String
NewHandleID = "Handle " & newThreadIndex
newThreadIndex = newThreadIndex + 1 'use next one next time
End Property

Private Property Get NewTaskID() As String
'generates new task, saves its ID to taskIDset, then bumps the task counter along one
NewTaskID = "Task " & newTaskIndex
taskIDset.Add newTaskIndex, NewTaskID 'add id to map
newTaskIndex = newTaskIndex + 1
End Property

Private Sub Class_Terminate()
'Set threadGroup = Nothing
Debug.Print "Terminating group"
RaiseEvent Closed(Now - startTime)
End Sub

Public Sub printState() 'for debugging
Debug.Print _
"State:"; vbCrLf _
; Space(5); "Threads open: "; openThreadCount; vbCrLf _
; Space(5); "Threads in use: "; openTaskCount; vbCrLf _
; Space(5); "Threads marked as free: "; freeThreads.Count; vbCrLf _
; Space(5); "Tasks remaining: "; iterableSize - successfulTaskCount; vbCrLf _
; Space(5); "Next task index: "; newTaskIndex
End Sub


Its key methods are doInstruction (calling getInstruction) and Size and IncreaseSize



The class runs iteratively; each cycle the class finds out what to do and executes that (doInstruction). doInstruction always calls itself unless it's told to do nothing, which allows the call stack to shrink back. There are several options for what to do each cycle



  • Open a thread (create a new instance of clsThreadHandle and add to a collection of possible places to run tasks)

  • Close a thread (quit a handle and remove it from that collection)

  • Run a task on a thread

  • [Force quit a task - t.b. implemented]

  • Do Nothing (allow call stack to go back to zero)

The getInstruction method will tell the class to



  • Open a thread if it doesn't exceed a max count, and if there are tasks to run on it

  • Close a thread if there are no tasks left to run or if there are too many

  • Run a task on a thread if there's a thread marked free

  • Do nothing if there are no threads free, and there are the right number of threads open

Size is what dictates the number of tasks to carry out



  • If Size is numeric, the class will keep running tasks on threads until that number of tasks is run

  • If Sizeis iterable, then the class will keep running tasks and passing arguments by essentially For...Eaching through the iterable argument

    • This allows something like a url to be passed as an argument to each task, or even a range so that the worker knows where on the sheet to write its result to


IncreaseSize is like Size; it is useful if you want to drip feed tasks into the multithread set (say you are daisy chaining one onto the other using the first one's threadComplete events). It increases the size of the numeric/iterable argument.



Thread Handles clsThreadHandle



The main class creates multiple instances of this thread handle class.



Option Explicit


'THREAD HANDLE BASE PROPERTIES
Private eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
Private taskID As String 'holds the id of the current task
Private handleID As String 'holds the id of this handle
Private handleArgs As Variant 'holds any arguments that need to be passed to the task

'THREAD EVENTS
Private WithEvents workerEvents As IWorkerEvents
Private workerObject As IWorker 'interface to whatever worker may be passed to thread


Private Sub workerEvents_Complete(returnVal As Variant)
eventHandle.NotifyComplete Me, returnVal
End Sub

Private Sub workerEvents_Started()
Debug.Print Me.Task; " started event was raised"
End Sub

Public Property Set Worker(ByVal workObj As IWorker)
Set workerObject = workObj.CreateNew 'set worker to be a copy of the passed one
Set workerEvents = New IWorkerEvents 'create event handler
Set workerObject.Events = workerEvents 'pass it to the worker so it can listen in
End Property

Public Sub OpenHandle(multiThreadGroup As clsMultiThread, delegate As clsHandleEvents)
'called when the handle is opened, sets the reference IDs of the string and the handle, as well as parent g
Set eventHandle = delegate
handleID = multiThreadGroup.NewHandleID
eventHandle.NotifyThreadOpened (Name)
Debug.Print Name; " was opened"
End Sub

Public Sub Execute(Optional args As Variant)
Debug.Print Task; " executed on "; Name; " with "; IIf(IsMissing(args), "no arguments", "some arguments")

workerObject.Execute args 'run the event
End Sub

Public Property Get Task() As String
Task = taskID
End Property

Public Property Let Task(val As String)
taskID = val
Debug.Print Name; "'s task was set to "; taskID
End Property

Public Property Get Name() As String
Name = handleID
End Property

Private Sub Class_Initialize()
Debug.Print "I'm made"
End Sub

Private Sub Class_Terminate()
eventHandle.NotifyThreadClosed (Me.Name)
Set eventHandle = Nothing
Set workerObject = Nothing
End Sub

Private Sub workerEvents_StatusChange(statusVal As Variant)
'not yet implemented, probably unnecessary
End Sub


I've chosen individual event handlers rather than a single one (like I did with clsHandleEvents) because



  • I find having an individual thread class for each task/worker object easier to picture mentally

  • I intend to add a functionality where a worker can cache objects in its parent handle (such as an InternetExplorer application) to save re-initialising it between successive tasks on the same thread

    • Having a single cache for each thread makes this simpler


Handle Events class clsHandleEvents



A reference to this class is held by each thread so that it can raise an event to the multiThread class, without directly holding a reference to it (this would mess up garbage collection I think)



Option Explicit
'class to convert calls from the thread handle into events which the multi thread group can tap into

Public Event Complete(obj As clsThreadHandle, returnVal As Variant)
Public Event Opened(threadID As String) 'when thread is actually opened
Public Event Closed(threadID As String) 'when thread is closed

Public Sub NotifyComplete(obj As clsThreadHandle, Optional returnVal As Variant)
RaiseEvent Complete(obj, returnVal)
End Sub

Public Sub NotifyThreadOpened(threadID As String)
RaiseEvent Opened(threadID)
End Sub

Public Sub NotifyThreadClosed(threadID As String)
RaiseEvent Closed(threadID)
End Sub

Private Sub Class_Terminate()
Debug.Print "Events Terminated"
End Sub


Interfaces



There are 2 interface classes (well onlyIWorker is really one, but I'm calling IWorkerEvents one too, similar to this example)



IWorker forms a general template of an asynchronous process you can run, which raises appropriate events as per IWorkerEvents



IWorker



Option Explicit

'class acts as interface for any thread task
'Execute runs the task
'Events are raised by the task if it interfaces properly

Public Property Set Events(ByRef value As IWorkerEvents)
End Property

Public Sub Execute(Optional argument As Variant)
End Sub

Public Function CreateNew() As IWorker
End Function


IWorkerEvents



Option Explicit

'class holds all the events that a thread task can raise

Public Event Complete(returnVal As Variant)
Public Event StatusChange(statusVal As Variant)
Public Event Started()

Public Sub Complete(Optional returnVal As Variant)
RaiseEvent Complete(returnVal)
End Sub

Public Sub StatusChange(statusVal As Variant)
RaiseEvent StatusChange(statusVal)
End Sub

Public Sub Started()
RaiseEvent Started
End Sub


Finally...



There's a module of supplementary functions that I don't especially need review on, but I'll include as they are required for the clsMultiThread to execute



Option Explicit

Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue) As Long
'function to take iterable group and add it to the queue
'returns the number of items added
Dim item As Variant
Dim itemsAdded As Long
itemsAdded = 0
For Each item In iterator
resultQueue.enqueue item
itemsAdded = itemsAdded + 1
Next item
addIterableToQueue = itemsAdded
End Function

Function isIterable(obj As Variant) As Boolean
On Error Resume Next
Dim iterator As Variant
For Each iterator In obj
Exit For
Next
isIterable = Err.Number = 0
End Function



Test code



Don't need feedback on this stuff, except with regard to the way in which a worker is implemented Download example file here



It just occurred to me that I hadn't actually included a worker to test this with. Well here's an example that uses an MSHTML request to return an HTML document from a webpage. It takes a String/Range argument representing a url, and returns an HTMLDocument. NB, this has to be imported as it requires Attribute .VB_UserMemId = 0 as per this article



VERSION 1.0 CLASS
BEGIN
MultiUse = -1 'True
END
Attribute VB_Name = "clsHtmlWorker"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = False
Attribute VB_Exposed = False
Option Explicit
'''
'Basic worker object sends MSHTML GET request to webpage and returns an HTMLDocument or Nothing
'Requires reference to
' Microsoft HTML Object library (mshtml.tlb)
' Microsoft XML, v6.0 (msxml6.dll)
'''

Private httpRequest As MSXML2.XMLHTTP60

Implements IWorker

Private Type TWorker
Events As IWorkerEvents
End Type

Private this As TWorker

Private Function IWorker_CreateNew() As IWorker
Set IWorker_CreateNew = New clsHtmlWorker
End Function

Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property

Private Sub IWorker_Execute(Optional argument As Variant)
Started 'raise event to thread handle
'Do some task
sendRequest argument
End Sub

'''
'Event raising
'''

Private Sub Started()
If Not this.Events Is Nothing Then
this.Events.Started
End If
End Sub

Private Sub statusChange(ByVal statusText As String)
If Not this.Events Is Nothing Then
'status change is not fully implemented yet in clsMultiThread, I may get rid of it
this.Events.statusChange statusText
End If
End Sub

Private Sub Complete(Optional ByVal resultPage As HTMLDocument)
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
If Not this.Events Is Nothing Then
this.Events.Complete resultPage
End If
End Sub

Private Sub sendRequest(ByVal url As String)
'''
'Sub to open a new XMLHTTP request at a given url
'Also assigns OnReadyStateChange callback function to this class' default routine
'''

If httpRequest Is Nothing Then Set httpRequest = New MSXML2.XMLHTTP60

With httpRequest
'Assign callback function to handler class (by default property)
.OnReadyStateChange = Me
'open and send the request
.Open "GET", url, True
.send vbNullString
End With

End Sub

Public Sub OnReadyStateChange()
Attribute OnReadyStateChange.VB_UserMemId = 0
'''
'This is the default callback routine of the class
'''
With httpRequest

statusChange .statusText

If .ReadyState = 4 Then 'loaded
If .Status = 200 Then 'successful
'mark complete and pass document
Dim htmlDoc As HTMLDocument
Set htmlDoc = New HTMLDocument
htmlDoc.body.innerHTML = .responseText
Complete htmlDoc
Else 'unsuccessful
Complete
End If
End If
End With
End Sub

Private Sub Class_Terminate()
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
End Sub


A multi-thread group implementing it can be run in a caller class like codeReviewTest. Sends requests to urls in A1:A10, returns e-mails from those urls in neighbouring columns.



Option Explicit

'''
'This class creates and runs a new multithread instance which runs clsHtmlWorker
'When each HTMLDocument is complete, the class scans it for e-mails
'''
Private WithEvents multiThreadGroup As clsMultiThread
'clsMultiThread is async so must be declared separately (or in a doEvents loop)
Private Const REGEX_PATTERN As String = _
"(?:[a-z0-9!#$%&'*+/=?^_`~-]+(?:.[a-z0-9!#$%&'*+/=?^_`~-]+)*|""(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|\[x01-x09x0bx0cx0e-x7f])*"")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:(2(5[0-5]|[0-4])|1[0-9]|[0-9]?[0-9])).)3(?:(2(5[1-9]|[0-9])|1[0-5]|[0-4]?[0-9])|[0-9]*[0-9]:(?:[1-9]|\[0-9])+)])"

Public Sub run()
'urls to check for emails are in a1:a10
htmlRequestToUrls [a1:a10]
End Sub

Private Sub htmlRequestToUrls(urlCells As Range)

Set multiThreadGroup = New clsMultiThread
With multiThreadGroup
.Size = urlCells 'set iterable, here a load of urls
Set .AsyncClass = New clsHtmlWorker 'set async worker
.Execute 'run the group
End With

End Sub

Private Sub multiThreadGroup_TaskComplete(returnVal As Variant, taskID As String, threadID As String)

Dim rowI As Long, colI As Long
rowI = Right(taskID, Len(taskID) - 4)

If returnVal Is Nothing Then
Cells(rowI, 2) = "Error in loading page"
ElseIf TypeOf returnVal Is HTMLDocument Then
Dim emailMatches() As String
emailMatches = regexMatches(returnVal.body.innerText)
If (Not emailMatches) = -1 Then
'no emails on page
Cells(rowI, 2) = "No e-mail matches"
Else
For colI = LBound(emailMatches) To UBound(emailMatches)
Cells(rowI, colI + 2) = emailMatches(colI)
Next colI
End If
Else 'nothing returned
Cells(rowI, 2) = "Error in loading page"
End If


End Sub

Private Function regexMatches(strInput As String) As String()

Dim rMatch As Object
Dim s As String
Dim arrayMatches() As String
Dim i As Long

With CreateObject("VBScript.Regexp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = REGEX_PATTERN
If .test(strInput) Then
For Each rMatch In .Execute(strInput)
ReDim Preserve arrayMatches(i)
arrayMatches(i) = rMatch.value
i = i + 1
Next
End If
End With

regexMatches = arrayMatches

End Function


The test class will create a new multi thread group. The group will open the default 5 threads, on each thread it will create an instance of clsHtmlWorker. It will convert the range [A1:A10] into 10 arguments which it will pass, 1 at a time, to the workers on each thread when they aren't busy. Once all tasks are run the class will autoQuit - cutting references to all sub classes, allowing it to go out of scope



You can download the example workbook if you want, works best with Rubberduck to organise folders. Test code is in the CodeReviewTestRunner, or just hit the big button







share|improve this question





















  • I'm wondering why we have comments saying reference to mscorlib.dll is needed yet I see no CreateObject or even `GetObject calls to create something from that library? This is all pure VBA and because VBA is single-threaded, everything will run as one thread?
    – this
    Jan 16 at 12:52










  • @This 1) mscorlib.dll I'm using early binding, and Private iterableQueue As mscorlib.Queue in clsMultiThread is what requires it. 2) True, to fully simulate multithreading in Excel you need to create multiple instances of EXCEL.EXE. However this project targets asynchronous processes specifically, as these don't run in Excel directly. Sure the handling is all single-threaded, but in internet applications the main overhead is in waiting for response to load. That can be done asynchronously, and with several instances in parallel. I hope that makes sense
    – Greedo
    Jan 16 at 14:06











  • I think whether this could run code asynchronously very much depends on the existence of a worker class that supports yielding its execution until some asynchronous work has finished. Without such a worker tasks would simply run to completion synchronously whenever they are executed.
    – M.Doerner
    Jan 16 at 14:52










  • @M.Doerner Exactly, an InternetExplorer.Application or an XmlHttprequest (see this) article can be run asynchronously, outside Excel, using Events and the default property callback function hack respectively. My use of IWorker interface was designed to make sure workers raise events when complete, which is a sort of reminder that these should be async workers, not normal routines. Multithreading standard routines requires this sort of approach
    – Greedo
    Jan 16 at 15:25







  • 1




    @Raystafarian I've added a download file in the test area if that helps
    – Greedo
    Feb 12 at 16:16
















up vote
31
down vote

favorite
4












Intro





Multithreading tools exist in Excel - often to run macros in multiple instances of Excel, or to convert macros to vbscripts that can run independently. However I've often come across projects where I'd like to delegate multiple tasks out to async processes, and creating multiple instances of Excel itself is overkill for this.



After running into several projects where I needed to execute multiple async requests (think internet scraping, or radioactive decay simulation) I decided to make a general class that, when given an async process, can execute and handle several in parallel.




For example, this Daisy Test makes a multithread group which sends html requests to all the urls in column B. The first link of these google searches are returned to column Cin the order the responses arrive. That triggers a second group (daisy chained onto the 1st one's events) to send internet explorer request to that url, and these are returned in D



Example implementation



To clarify some of the comments; it should be noted that these requests are sent off in order (B1,B2,...), but return unordered (C2 first). That's because my class allows the threads to run in parallel (hence multithreading). These are still only managed in a single Excel thread, but the requests are asynchronous and in different processes so are effectively running in other threads.



Summary



N.B. The term 'thread' will be used loosely here, with no reference to the actual processor. Instead, when I say 'thread' I am talking about a handler for a task (which is running in parallel to other tasks on other handlers)



The multi thread setup consists of a main clsMultiThread parent class which controls the shape of the multi thread collection (i.e. how many tasks are running at any given time), as well as several clsThreadHandle classes.



These thread handlers are each responsible for running an async task, and informing the parent multithread class when each task is finished.



Internally, the tasks are actually run by WorkerClass objects, one worker for each thread. These receive input arguments from their parent threads, run their respective async task, and raise an event to their parent clsThreadHandle when finished. The thread handle then passes this event, and any optional return value, back up to the main clsMultiThread, whose job it is either to close the thread once it's done, or prompt the thread to run another task. The chain of command is summarised in the image below:



Chain of command



Feedback I'm after



  • General feedback on structure, event handling, use of interfaces etc.

  • Proper exiting (which I don't think I'm doing right now)

  • User-side interface

    • Whether this approach to the problem is appropriate and intuitive (passing worker classes etc.)

    • Whether I'm missing some functionality that should be there


This is also the first project I've ever done with the primary aim of making something I can re-use (as well as the longest & most complicated bit of code I've written). For that reason I'd also greatly appreciate any comments on



  • Coding style

  • Use of comments

  • Anything else I should bear in mind when working on such projects

Implementation



Main class clsMultiThread



Right, some code. Here's the main class which handles all the sub-classes



Option Explicit

'''
'VBA class to run multiple asynchronous processes
'Interfaces directly with clsThreadHandle
'Requires references to:
'mscrolib.dll
'''

'THREAD GROUP SHAPE PROPERTIES
Private threadGroup As New Collection 'holds all the treads
Private maxThreads As Long 'maximum number of threads that can be open
Private minThreads As Long '[minimum number of threads]
Private iterableQueue As mscorlib.Queue 'this item holds all the items from iterator set in queue
'replaces iterableGroup, newtaskindex, taskidset
Private iterableSize As Long 'number of items in iterable group or
Private passesArguments As Boolean 'true if iterableGroup exists

'THREAD GROUP REFERENCES
Private WithEvents threadEvents As clsHandleEvents 'Event object to raise events from each thread handle
Private workerClass As IWorker

'THREAD GROUP SETTINGS
Private autoQuitEnabled As Boolean 'dictates whether to quit on Complete event, should be false if daisychaining

'THREAD GROUP STATE PROPERTIES
Private openThreadCount As Long 'number of threads/handles currently open
Private openTaskCount As Long 'number of tasks running on those threads
Private closedTaskCount As Long 'number of threads closed (failed and successful)
Private successfulTaskCount As Long 'number of threads completed sucessfully
Private newThreadIndex As Long 'Iterator over handles (next new handle)
Private newTaskIndex As Long 'Iterator over open tasks (next thread to be started)
Private taskIDset As Collection 'Dictionary mapping taskIDs to iterableGroup location "REPLACE THIS. MERGE COLLECTION JUMBLES"
Private freeThreads As Collection 'holds all the free thread ids

'THREAD GROUP PERFORMANCE PROPERTIES
Private startTime As Date
'Private endTime As Date

'THREAD GROUP EVENTS
Public Event TaskComplete(returnVal As Variant, taskID As String, threadID As String) 'when a task is complete on a thread, maybe if failed
Public Event ThreadOpened(threadCount As Long, threadID As String) 'when a thread is opened, pass the new number of threads
Public Event ThreadClosed(threadCount As Long, threadID As String) 'when closed, pass closed thread ID
Public Event Complete(timeTaken As Date) 'when everything is (nearly) finished
Public Event Closed(timeTaken As Date) 'when entire group is closed
Public Event Opened(startTime As Date) 'when entire group is closed

'PRIVATE TYPES/ENUMS
Private Type Instruction 'instruction on what to do next, and any necessary arguments that can be passed
threadID As String
instructionBody As InstructionType
End Type

Private Enum InstructionType
mltCloseThread
mltOpenThread
mltSetTask
mltDoNothing
mltQuit
End Enum

Private Sub Class_Initialize()
'Set defaults
maxThreads = 5
minThreads = 1
newThreadIndex = 1
newTaskIndex = 1
autoQuitEnabled = True
Set threadEvents = New clsHandleEvents
Set taskIDset = New Collection
Set freeThreads = New Collection
startTime = Now
RaiseEvent Opened(startTime)
'''
'Test space
'''
End Sub

Private Sub threadEvents_Closed(threadID As String)
RaiseEvent ThreadClosed(openThreadCount, threadID)
End Sub

Private Sub threadEvents_Opened(threadID As String)
RaiseEvent ThreadOpened(openThreadCount, threadID)
End Sub

Private Sub threadEvents_Complete(obj As clsThreadHandle, returnVal As Variant) 'called when thread becomes free
'DO NOT mark as free here
RaiseEvent TaskComplete(returnVal, obj.Task, obj.Name) 'failed as boolean
openTaskCount = openTaskCount - 1
closedTaskCount = closedTaskCount + 1
successfulTaskCount = successfulTaskCount + 1 'could be unsuccessful too though
doInstructions obj.Name 'pass object name so it can be marked free
' If failed Then
' failedTaskCount = failedTaskCount + 1
' Else
' successfulTaskCount = successfulTaskCount + 1
' End If
End Sub

Public Sub Execute()
'check validity of user data, if valid, then execute task
If iterableSize = 0 Then
Err.Raise 5, Description:="You must set size argument to a non-zero value, or a non-empty iterable first"
ElseIf workerClass Is Nothing Then
Err.Raise 5, Description:="You must set the async class argument first"
Else
doInstructions
End If
End Sub

Public Sub Quit()
'Remove any references that would prevent proper closing
'Default automatically called when openThreadCount = 0
RaiseEvent Complete(Now - startTime)
Set threadEvents = Nothing
End Sub

Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction

'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID

'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy

BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If

Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select

'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
doInstructions loopcount:=loopcount + 1 'watch for infinite loop
End If

End Sub

Private Function getInstruction() As Instruction
'function to determine what action to take next
'called until do nothing returned
'caller to doinstructions can specify a free thread in which case some parts skipped
Dim results As Instruction 'variable to hold instruction and any arguments

Me.printState

'Do we need to open or close threads?
'Threads free? (threads open > tasks open):
If openThreadCount > openTaskCount Then
'Great we have a free thread, now use it or delete it (cos we have too many or no tasks remaining)
If newTaskIndex > iterableSize Then 'already passed all tasks
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
ElseIf openThreadCount <= maxThreads Then
'[find] & use free thread (run a task on it)
results.instructionBody = mltSetTask
results.threadID = freeThread
Else
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
End If
Else
'No threads free, either open one (if not exceeding max, and there's a task left to put on it)
'Or do nothing (can't close it if not free, shouldn't open new if no more tasks)
If openThreadCount < maxThreads And newTaskIndex <= iterableSize Then
results.instructionBody = mltOpenThread
ElseIf openThreadCount = 0 And autoQuitEnabled Then
results.instructionBody = mltQuit
Else
results.instructionBody = mltDoNothing
End If
End If

getInstruction = results
End Function

Private Sub openThread()
'opens a thread and assigns a task ID to it
Dim newThread As New clsThreadHandle 'create new handle
newThread.OpenHandle Me, threadEvents 'passes parent reference which allows handle to obtain thread ID
threadGroup.Add newThread, newThread.Name 'add it to the group with a new id (set by itself)
openThreadCount = openThreadCount + 1
freeThread = newThread.Name 'mark as free so task can be assigned to it

End Sub

Private Property Let freeThread(threadID As String)
'NOT WORKING"""""
'when a thread comes free, add it to the collection
freeThreads.Add threadID, threadID
Debug.Print threadID; " marked as free; now"; freeThreads.Count; "threads are free"
End Property

Private Property Let BusyThread(threadID As String)
'when a thread is not free or is closed, mark as busy by removing from free group
On Error Resume Next 'only remove ones what are there actually
freeThreads.Remove threadID
Debug.Print threadID; " marked as busy"; IIf(Err.Number <> 0, ", but wasn't in free group", vbNullString)
End Property

Private Property Get freeThread() As String
'gives up a free thread and adds it to the list
freeThread = freeThreads(1)
freeThreads.Remove (1)
End Property

Private Sub assignTaskID(threadID As String)
'@Ignore WriteOnlyProperty
'assigns task ID to thread
'nb does NOT actually run the task (this is instruction stage still)
Dim newThread As clsThreadHandle
Set newThread = threadGroup(threadID)
newThread.Task = NewTaskID
Set newThread.Worker = AsyncClass
End Sub

Private Sub closeThread(threadID As String, Optional failed As Boolean = False)

'close thread with appropriate id
Dim oldThread As clsThreadHandle
Set oldThread = threadGroup(threadID)
'remove from all collections
'taskIDset.Remove oldThread.Task remove from task id set if it was in there
threadGroup.Remove oldThread.Name
BusyThread = oldThread.Name 'remove from free collection
Set oldThread = Nothing
'iterate counters
openThreadCount = openThreadCount - 1
End Sub

Public Property Let Size(sizeFactor As Variant)
'property of the thread group which dictates how many processes to run in total
'size factor is either an iterable item, or an integer to dictate the size

'Check if size factor is number
If IsNumeric(sizeFactor) Then
'If so, size is that
iterableSize = CLng(sizeFactor)
passesArguments = False 'no argument to pass to thread, just run it a load of times

'If not, *check if iterable
ElseIf isIterable(sizeFactor) Then
'If so, size is size of collection from extration
Set iterableQueue = New Queue
iterableSize = addIterableToQueue(sizeFactor, iterableQueue)
passesArguments = True
Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If

End Property

Public Sub IncreaseSize(sizeFactor As Variant)
'method of threadGroup which adds more tasks to the queue, and immediately runs them
'size factor is either an iterable item, or an integer to dictate the size

'Check whether size is set yet
If Me.Size = 0 Then
Err.Raise 5, Description:="You must set Size before you can IncreaseSize"
End If

'check whether new data matches old type
If IsNumeric(sizeFactor) Then
If passesArguments Then
Err.Raise 5, Description:="Size factor type doesn't match original type"
Else
'is numeric and was numeric, grand
iterableSize = iterableSize + CLng(sizeFactor)
End If
ElseIf isIterable(sizeFactor) Then
If passesArguments Then
'was iterable and still is, great!
Dim itemsAdded As Long
itemsAdded = addIterableToQueue(sizeFactor, iterableQueue)
iterableSize = iterableSize + itemsAdded
Else
'wasn't iterble, now is
Err.Raise 5, Description:="Size factor type doesn't match original type"
End If

Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If
Me.Execute
End Sub

Public Property Set AsyncClass(ByVal workObj As IWorker) 'Set the worker who carries out the tasks
Set workerClass = workObj
End Property

Public Property Get AsyncClass() As IWorker
Set AsyncClass = workerClass
End Property

Public Property Get Size() As Variant
Size = iterableSize
End Property

Public Property Let autoQuit(ByVal value As Boolean)
autoQuitEnabled = value
End Property

Public Property Get NewHandleID() As String
NewHandleID = "Handle " & newThreadIndex
newThreadIndex = newThreadIndex + 1 'use next one next time
End Property

Private Property Get NewTaskID() As String
'generates new task, saves its ID to taskIDset, then bumps the task counter along one
NewTaskID = "Task " & newTaskIndex
taskIDset.Add newTaskIndex, NewTaskID 'add id to map
newTaskIndex = newTaskIndex + 1
End Property

Private Sub Class_Terminate()
'Set threadGroup = Nothing
Debug.Print "Terminating group"
RaiseEvent Closed(Now - startTime)
End Sub

Public Sub printState() 'for debugging
Debug.Print _
"State:"; vbCrLf _
; Space(5); "Threads open: "; openThreadCount; vbCrLf _
; Space(5); "Threads in use: "; openTaskCount; vbCrLf _
; Space(5); "Threads marked as free: "; freeThreads.Count; vbCrLf _
; Space(5); "Tasks remaining: "; iterableSize - successfulTaskCount; vbCrLf _
; Space(5); "Next task index: "; newTaskIndex
End Sub


Its key methods are doInstruction (calling getInstruction) and Size and IncreaseSize



The class runs iteratively; each cycle the class finds out what to do and executes that (doInstruction). doInstruction always calls itself unless it's told to do nothing, which allows the call stack to shrink back. There are several options for what to do each cycle



  • Open a thread (create a new instance of clsThreadHandle and add to a collection of possible places to run tasks)

  • Close a thread (quit a handle and remove it from that collection)

  • Run a task on a thread

  • [Force quit a task - t.b. implemented]

  • Do Nothing (allow call stack to go back to zero)

The getInstruction method will tell the class to



  • Open a thread if it doesn't exceed a max count, and if there are tasks to run on it

  • Close a thread if there are no tasks left to run or if there are too many

  • Run a task on a thread if there's a thread marked free

  • Do nothing if there are no threads free, and there are the right number of threads open

Size is what dictates the number of tasks to carry out



  • If Size is numeric, the class will keep running tasks on threads until that number of tasks is run

  • If Sizeis iterable, then the class will keep running tasks and passing arguments by essentially For...Eaching through the iterable argument

    • This allows something like a url to be passed as an argument to each task, or even a range so that the worker knows where on the sheet to write its result to


IncreaseSize is like Size; it is useful if you want to drip feed tasks into the multithread set (say you are daisy chaining one onto the other using the first one's threadComplete events). It increases the size of the numeric/iterable argument.



Thread Handles clsThreadHandle



The main class creates multiple instances of this thread handle class.



Option Explicit


'THREAD HANDLE BASE PROPERTIES
Private eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
Private taskID As String 'holds the id of the current task
Private handleID As String 'holds the id of this handle
Private handleArgs As Variant 'holds any arguments that need to be passed to the task

'THREAD EVENTS
Private WithEvents workerEvents As IWorkerEvents
Private workerObject As IWorker 'interface to whatever worker may be passed to thread


Private Sub workerEvents_Complete(returnVal As Variant)
eventHandle.NotifyComplete Me, returnVal
End Sub

Private Sub workerEvents_Started()
Debug.Print Me.Task; " started event was raised"
End Sub

Public Property Set Worker(ByVal workObj As IWorker)
Set workerObject = workObj.CreateNew 'set worker to be a copy of the passed one
Set workerEvents = New IWorkerEvents 'create event handler
Set workerObject.Events = workerEvents 'pass it to the worker so it can listen in
End Property

Public Sub OpenHandle(multiThreadGroup As clsMultiThread, delegate As clsHandleEvents)
'called when the handle is opened, sets the reference IDs of the string and the handle, as well as parent g
Set eventHandle = delegate
handleID = multiThreadGroup.NewHandleID
eventHandle.NotifyThreadOpened (Name)
Debug.Print Name; " was opened"
End Sub

Public Sub Execute(Optional args As Variant)
Debug.Print Task; " executed on "; Name; " with "; IIf(IsMissing(args), "no arguments", "some arguments")

workerObject.Execute args 'run the event
End Sub

Public Property Get Task() As String
Task = taskID
End Property

Public Property Let Task(val As String)
taskID = val
Debug.Print Name; "'s task was set to "; taskID
End Property

Public Property Get Name() As String
Name = handleID
End Property

Private Sub Class_Initialize()
Debug.Print "I'm made"
End Sub

Private Sub Class_Terminate()
eventHandle.NotifyThreadClosed (Me.Name)
Set eventHandle = Nothing
Set workerObject = Nothing
End Sub

Private Sub workerEvents_StatusChange(statusVal As Variant)
'not yet implemented, probably unnecessary
End Sub


I've chosen individual event handlers rather than a single one (like I did with clsHandleEvents) because



  • I find having an individual thread class for each task/worker object easier to picture mentally

  • I intend to add a functionality where a worker can cache objects in its parent handle (such as an InternetExplorer application) to save re-initialising it between successive tasks on the same thread

    • Having a single cache for each thread makes this simpler


Handle Events class clsHandleEvents



A reference to this class is held by each thread so that it can raise an event to the multiThread class, without directly holding a reference to it (this would mess up garbage collection I think)



Option Explicit
'class to convert calls from the thread handle into events which the multi thread group can tap into

Public Event Complete(obj As clsThreadHandle, returnVal As Variant)
Public Event Opened(threadID As String) 'when thread is actually opened
Public Event Closed(threadID As String) 'when thread is closed

Public Sub NotifyComplete(obj As clsThreadHandle, Optional returnVal As Variant)
RaiseEvent Complete(obj, returnVal)
End Sub

Public Sub NotifyThreadOpened(threadID As String)
RaiseEvent Opened(threadID)
End Sub

Public Sub NotifyThreadClosed(threadID As String)
RaiseEvent Closed(threadID)
End Sub

Private Sub Class_Terminate()
Debug.Print "Events Terminated"
End Sub


Interfaces



There are 2 interface classes (well onlyIWorker is really one, but I'm calling IWorkerEvents one too, similar to this example)



IWorker forms a general template of an asynchronous process you can run, which raises appropriate events as per IWorkerEvents



IWorker



Option Explicit

'class acts as interface for any thread task
'Execute runs the task
'Events are raised by the task if it interfaces properly

Public Property Set Events(ByRef value As IWorkerEvents)
End Property

Public Sub Execute(Optional argument As Variant)
End Sub

Public Function CreateNew() As IWorker
End Function


IWorkerEvents



Option Explicit

'class holds all the events that a thread task can raise

Public Event Complete(returnVal As Variant)
Public Event StatusChange(statusVal As Variant)
Public Event Started()

Public Sub Complete(Optional returnVal As Variant)
RaiseEvent Complete(returnVal)
End Sub

Public Sub StatusChange(statusVal As Variant)
RaiseEvent StatusChange(statusVal)
End Sub

Public Sub Started()
RaiseEvent Started
End Sub


Finally...



There's a module of supplementary functions that I don't especially need review on, but I'll include as they are required for the clsMultiThread to execute



Option Explicit

Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue) As Long
'function to take iterable group and add it to the queue
'returns the number of items added
Dim item As Variant
Dim itemsAdded As Long
itemsAdded = 0
For Each item In iterator
resultQueue.enqueue item
itemsAdded = itemsAdded + 1
Next item
addIterableToQueue = itemsAdded
End Function

Function isIterable(obj As Variant) As Boolean
On Error Resume Next
Dim iterator As Variant
For Each iterator In obj
Exit For
Next
isIterable = Err.Number = 0
End Function



Test code



Don't need feedback on this stuff, except with regard to the way in which a worker is implemented Download example file here



It just occurred to me that I hadn't actually included a worker to test this with. Well here's an example that uses an MSHTML request to return an HTML document from a webpage. It takes a String/Range argument representing a url, and returns an HTMLDocument. NB, this has to be imported as it requires Attribute .VB_UserMemId = 0 as per this article



VERSION 1.0 CLASS
BEGIN
MultiUse = -1 'True
END
Attribute VB_Name = "clsHtmlWorker"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = False
Attribute VB_Exposed = False
Option Explicit
'''
'Basic worker object sends MSHTML GET request to webpage and returns an HTMLDocument or Nothing
'Requires reference to
' Microsoft HTML Object library (mshtml.tlb)
' Microsoft XML, v6.0 (msxml6.dll)
'''

Private httpRequest As MSXML2.XMLHTTP60

Implements IWorker

Private Type TWorker
Events As IWorkerEvents
End Type

Private this As TWorker

Private Function IWorker_CreateNew() As IWorker
Set IWorker_CreateNew = New clsHtmlWorker
End Function

Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property

Private Sub IWorker_Execute(Optional argument As Variant)
Started 'raise event to thread handle
'Do some task
sendRequest argument
End Sub

'''
'Event raising
'''

Private Sub Started()
If Not this.Events Is Nothing Then
this.Events.Started
End If
End Sub

Private Sub statusChange(ByVal statusText As String)
If Not this.Events Is Nothing Then
'status change is not fully implemented yet in clsMultiThread, I may get rid of it
this.Events.statusChange statusText
End If
End Sub

Private Sub Complete(Optional ByVal resultPage As HTMLDocument)
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
If Not this.Events Is Nothing Then
this.Events.Complete resultPage
End If
End Sub

Private Sub sendRequest(ByVal url As String)
'''
'Sub to open a new XMLHTTP request at a given url
'Also assigns OnReadyStateChange callback function to this class' default routine
'''

If httpRequest Is Nothing Then Set httpRequest = New MSXML2.XMLHTTP60

With httpRequest
'Assign callback function to handler class (by default property)
.OnReadyStateChange = Me
'open and send the request
.Open "GET", url, True
.send vbNullString
End With

End Sub

Public Sub OnReadyStateChange()
Attribute OnReadyStateChange.VB_UserMemId = 0
'''
'This is the default callback routine of the class
'''
With httpRequest

statusChange .statusText

If .ReadyState = 4 Then 'loaded
If .Status = 200 Then 'successful
'mark complete and pass document
Dim htmlDoc As HTMLDocument
Set htmlDoc = New HTMLDocument
htmlDoc.body.innerHTML = .responseText
Complete htmlDoc
Else 'unsuccessful
Complete
End If
End If
End With
End Sub

Private Sub Class_Terminate()
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
End Sub


A multi-thread group implementing it can be run in a caller class like codeReviewTest. Sends requests to urls in A1:A10, returns e-mails from those urls in neighbouring columns.



Option Explicit

'''
'This class creates and runs a new multithread instance which runs clsHtmlWorker
'When each HTMLDocument is complete, the class scans it for e-mails
'''
Private WithEvents multiThreadGroup As clsMultiThread
'clsMultiThread is async so must be declared separately (or in a doEvents loop)
Private Const REGEX_PATTERN As String = _
"(?:[a-z0-9!#$%&'*+/=?^_`~-]+(?:.[a-z0-9!#$%&'*+/=?^_`~-]+)*|""(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|\[x01-x09x0bx0cx0e-x7f])*"")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:(2(5[0-5]|[0-4])|1[0-9]|[0-9]?[0-9])).)3(?:(2(5[1-9]|[0-9])|1[0-5]|[0-4]?[0-9])|[0-9]*[0-9]:(?:[1-9]|\[0-9])+)])"

Public Sub run()
'urls to check for emails are in a1:a10
htmlRequestToUrls [a1:a10]
End Sub

Private Sub htmlRequestToUrls(urlCells As Range)

Set multiThreadGroup = New clsMultiThread
With multiThreadGroup
.Size = urlCells 'set iterable, here a load of urls
Set .AsyncClass = New clsHtmlWorker 'set async worker
.Execute 'run the group
End With

End Sub

Private Sub multiThreadGroup_TaskComplete(returnVal As Variant, taskID As String, threadID As String)

Dim rowI As Long, colI As Long
rowI = Right(taskID, Len(taskID) - 4)

If returnVal Is Nothing Then
Cells(rowI, 2) = "Error in loading page"
ElseIf TypeOf returnVal Is HTMLDocument Then
Dim emailMatches() As String
emailMatches = regexMatches(returnVal.body.innerText)
If (Not emailMatches) = -1 Then
'no emails on page
Cells(rowI, 2) = "No e-mail matches"
Else
For colI = LBound(emailMatches) To UBound(emailMatches)
Cells(rowI, colI + 2) = emailMatches(colI)
Next colI
End If
Else 'nothing returned
Cells(rowI, 2) = "Error in loading page"
End If


End Sub

Private Function regexMatches(strInput As String) As String()

Dim rMatch As Object
Dim s As String
Dim arrayMatches() As String
Dim i As Long

With CreateObject("VBScript.Regexp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = REGEX_PATTERN
If .test(strInput) Then
For Each rMatch In .Execute(strInput)
ReDim Preserve arrayMatches(i)
arrayMatches(i) = rMatch.value
i = i + 1
Next
End If
End With

regexMatches = arrayMatches

End Function


The test class will create a new multi thread group. The group will open the default 5 threads, on each thread it will create an instance of clsHtmlWorker. It will convert the range [A1:A10] into 10 arguments which it will pass, 1 at a time, to the workers on each thread when they aren't busy. Once all tasks are run the class will autoQuit - cutting references to all sub classes, allowing it to go out of scope



You can download the example workbook if you want, works best with Rubberduck to organise folders. Test code is in the CodeReviewTestRunner, or just hit the big button







share|improve this question





















  • I'm wondering why we have comments saying reference to mscorlib.dll is needed yet I see no CreateObject or even `GetObject calls to create something from that library? This is all pure VBA and because VBA is single-threaded, everything will run as one thread?
    – this
    Jan 16 at 12:52










  • @This 1) mscorlib.dll I'm using early binding, and Private iterableQueue As mscorlib.Queue in clsMultiThread is what requires it. 2) True, to fully simulate multithreading in Excel you need to create multiple instances of EXCEL.EXE. However this project targets asynchronous processes specifically, as these don't run in Excel directly. Sure the handling is all single-threaded, but in internet applications the main overhead is in waiting for response to load. That can be done asynchronously, and with several instances in parallel. I hope that makes sense
    – Greedo
    Jan 16 at 14:06











  • I think whether this could run code asynchronously very much depends on the existence of a worker class that supports yielding its execution until some asynchronous work has finished. Without such a worker tasks would simply run to completion synchronously whenever they are executed.
    – M.Doerner
    Jan 16 at 14:52










  • @M.Doerner Exactly, an InternetExplorer.Application or an XmlHttprequest (see this) article can be run asynchronously, outside Excel, using Events and the default property callback function hack respectively. My use of IWorker interface was designed to make sure workers raise events when complete, which is a sort of reminder that these should be async workers, not normal routines. Multithreading standard routines requires this sort of approach
    – Greedo
    Jan 16 at 15:25







  • 1




    @Raystafarian I've added a download file in the test area if that helps
    – Greedo
    Feb 12 at 16:16












up vote
31
down vote

favorite
4









up vote
31
down vote

favorite
4






4





Intro





Multithreading tools exist in Excel - often to run macros in multiple instances of Excel, or to convert macros to vbscripts that can run independently. However I've often come across projects where I'd like to delegate multiple tasks out to async processes, and creating multiple instances of Excel itself is overkill for this.



After running into several projects where I needed to execute multiple async requests (think internet scraping, or radioactive decay simulation) I decided to make a general class that, when given an async process, can execute and handle several in parallel.




For example, this Daisy Test makes a multithread group which sends html requests to all the urls in column B. The first link of these google searches are returned to column Cin the order the responses arrive. That triggers a second group (daisy chained onto the 1st one's events) to send internet explorer request to that url, and these are returned in D



Example implementation



To clarify some of the comments; it should be noted that these requests are sent off in order (B1,B2,...), but return unordered (C2 first). That's because my class allows the threads to run in parallel (hence multithreading). These are still only managed in a single Excel thread, but the requests are asynchronous and in different processes so are effectively running in other threads.



Summary



N.B. The term 'thread' will be used loosely here, with no reference to the actual processor. Instead, when I say 'thread' I am talking about a handler for a task (which is running in parallel to other tasks on other handlers)



The multi thread setup consists of a main clsMultiThread parent class which controls the shape of the multi thread collection (i.e. how many tasks are running at any given time), as well as several clsThreadHandle classes.



These thread handlers are each responsible for running an async task, and informing the parent multithread class when each task is finished.



Internally, the tasks are actually run by WorkerClass objects, one worker for each thread. These receive input arguments from their parent threads, run their respective async task, and raise an event to their parent clsThreadHandle when finished. The thread handle then passes this event, and any optional return value, back up to the main clsMultiThread, whose job it is either to close the thread once it's done, or prompt the thread to run another task. The chain of command is summarised in the image below:



Chain of command



Feedback I'm after



  • General feedback on structure, event handling, use of interfaces etc.

  • Proper exiting (which I don't think I'm doing right now)

  • User-side interface

    • Whether this approach to the problem is appropriate and intuitive (passing worker classes etc.)

    • Whether I'm missing some functionality that should be there


This is also the first project I've ever done with the primary aim of making something I can re-use (as well as the longest & most complicated bit of code I've written). For that reason I'd also greatly appreciate any comments on



  • Coding style

  • Use of comments

  • Anything else I should bear in mind when working on such projects

Implementation



Main class clsMultiThread



Right, some code. Here's the main class which handles all the sub-classes



Option Explicit

'''
'VBA class to run multiple asynchronous processes
'Interfaces directly with clsThreadHandle
'Requires references to:
'mscrolib.dll
'''

'THREAD GROUP SHAPE PROPERTIES
Private threadGroup As New Collection 'holds all the treads
Private maxThreads As Long 'maximum number of threads that can be open
Private minThreads As Long '[minimum number of threads]
Private iterableQueue As mscorlib.Queue 'this item holds all the items from iterator set in queue
'replaces iterableGroup, newtaskindex, taskidset
Private iterableSize As Long 'number of items in iterable group or
Private passesArguments As Boolean 'true if iterableGroup exists

'THREAD GROUP REFERENCES
Private WithEvents threadEvents As clsHandleEvents 'Event object to raise events from each thread handle
Private workerClass As IWorker

'THREAD GROUP SETTINGS
Private autoQuitEnabled As Boolean 'dictates whether to quit on Complete event, should be false if daisychaining

'THREAD GROUP STATE PROPERTIES
Private openThreadCount As Long 'number of threads/handles currently open
Private openTaskCount As Long 'number of tasks running on those threads
Private closedTaskCount As Long 'number of threads closed (failed and successful)
Private successfulTaskCount As Long 'number of threads completed sucessfully
Private newThreadIndex As Long 'Iterator over handles (next new handle)
Private newTaskIndex As Long 'Iterator over open tasks (next thread to be started)
Private taskIDset As Collection 'Dictionary mapping taskIDs to iterableGroup location "REPLACE THIS. MERGE COLLECTION JUMBLES"
Private freeThreads As Collection 'holds all the free thread ids

'THREAD GROUP PERFORMANCE PROPERTIES
Private startTime As Date
'Private endTime As Date

'THREAD GROUP EVENTS
Public Event TaskComplete(returnVal As Variant, taskID As String, threadID As String) 'when a task is complete on a thread, maybe if failed
Public Event ThreadOpened(threadCount As Long, threadID As String) 'when a thread is opened, pass the new number of threads
Public Event ThreadClosed(threadCount As Long, threadID As String) 'when closed, pass closed thread ID
Public Event Complete(timeTaken As Date) 'when everything is (nearly) finished
Public Event Closed(timeTaken As Date) 'when entire group is closed
Public Event Opened(startTime As Date) 'when entire group is closed

'PRIVATE TYPES/ENUMS
Private Type Instruction 'instruction on what to do next, and any necessary arguments that can be passed
threadID As String
instructionBody As InstructionType
End Type

Private Enum InstructionType
mltCloseThread
mltOpenThread
mltSetTask
mltDoNothing
mltQuit
End Enum

Private Sub Class_Initialize()
'Set defaults
maxThreads = 5
minThreads = 1
newThreadIndex = 1
newTaskIndex = 1
autoQuitEnabled = True
Set threadEvents = New clsHandleEvents
Set taskIDset = New Collection
Set freeThreads = New Collection
startTime = Now
RaiseEvent Opened(startTime)
'''
'Test space
'''
End Sub

Private Sub threadEvents_Closed(threadID As String)
RaiseEvent ThreadClosed(openThreadCount, threadID)
End Sub

Private Sub threadEvents_Opened(threadID As String)
RaiseEvent ThreadOpened(openThreadCount, threadID)
End Sub

Private Sub threadEvents_Complete(obj As clsThreadHandle, returnVal As Variant) 'called when thread becomes free
'DO NOT mark as free here
RaiseEvent TaskComplete(returnVal, obj.Task, obj.Name) 'failed as boolean
openTaskCount = openTaskCount - 1
closedTaskCount = closedTaskCount + 1
successfulTaskCount = successfulTaskCount + 1 'could be unsuccessful too though
doInstructions obj.Name 'pass object name so it can be marked free
' If failed Then
' failedTaskCount = failedTaskCount + 1
' Else
' successfulTaskCount = successfulTaskCount + 1
' End If
End Sub

Public Sub Execute()
'check validity of user data, if valid, then execute task
If iterableSize = 0 Then
Err.Raise 5, Description:="You must set size argument to a non-zero value, or a non-empty iterable first"
ElseIf workerClass Is Nothing Then
Err.Raise 5, Description:="You must set the async class argument first"
Else
doInstructions
End If
End Sub

Public Sub Quit()
'Remove any references that would prevent proper closing
'Default automatically called when openThreadCount = 0
RaiseEvent Complete(Now - startTime)
Set threadEvents = Nothing
End Sub

Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction

'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID

'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy

BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If

Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select

'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
doInstructions loopcount:=loopcount + 1 'watch for infinite loop
End If

End Sub

Private Function getInstruction() As Instruction
'function to determine what action to take next
'called until do nothing returned
'caller to doinstructions can specify a free thread in which case some parts skipped
Dim results As Instruction 'variable to hold instruction and any arguments

Me.printState

'Do we need to open or close threads?
'Threads free? (threads open > tasks open):
If openThreadCount > openTaskCount Then
'Great we have a free thread, now use it or delete it (cos we have too many or no tasks remaining)
If newTaskIndex > iterableSize Then 'already passed all tasks
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
ElseIf openThreadCount <= maxThreads Then
'[find] & use free thread (run a task on it)
results.instructionBody = mltSetTask
results.threadID = freeThread
Else
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
End If
Else
'No threads free, either open one (if not exceeding max, and there's a task left to put on it)
'Or do nothing (can't close it if not free, shouldn't open new if no more tasks)
If openThreadCount < maxThreads And newTaskIndex <= iterableSize Then
results.instructionBody = mltOpenThread
ElseIf openThreadCount = 0 And autoQuitEnabled Then
results.instructionBody = mltQuit
Else
results.instructionBody = mltDoNothing
End If
End If

getInstruction = results
End Function

Private Sub openThread()
'opens a thread and assigns a task ID to it
Dim newThread As New clsThreadHandle 'create new handle
newThread.OpenHandle Me, threadEvents 'passes parent reference which allows handle to obtain thread ID
threadGroup.Add newThread, newThread.Name 'add it to the group with a new id (set by itself)
openThreadCount = openThreadCount + 1
freeThread = newThread.Name 'mark as free so task can be assigned to it

End Sub

Private Property Let freeThread(threadID As String)
'NOT WORKING"""""
'when a thread comes free, add it to the collection
freeThreads.Add threadID, threadID
Debug.Print threadID; " marked as free; now"; freeThreads.Count; "threads are free"
End Property

Private Property Let BusyThread(threadID As String)
'when a thread is not free or is closed, mark as busy by removing from free group
On Error Resume Next 'only remove ones what are there actually
freeThreads.Remove threadID
Debug.Print threadID; " marked as busy"; IIf(Err.Number <> 0, ", but wasn't in free group", vbNullString)
End Property

Private Property Get freeThread() As String
'gives up a free thread and adds it to the list
freeThread = freeThreads(1)
freeThreads.Remove (1)
End Property

Private Sub assignTaskID(threadID As String)
'@Ignore WriteOnlyProperty
'assigns task ID to thread
'nb does NOT actually run the task (this is instruction stage still)
Dim newThread As clsThreadHandle
Set newThread = threadGroup(threadID)
newThread.Task = NewTaskID
Set newThread.Worker = AsyncClass
End Sub

Private Sub closeThread(threadID As String, Optional failed As Boolean = False)

'close thread with appropriate id
Dim oldThread As clsThreadHandle
Set oldThread = threadGroup(threadID)
'remove from all collections
'taskIDset.Remove oldThread.Task remove from task id set if it was in there
threadGroup.Remove oldThread.Name
BusyThread = oldThread.Name 'remove from free collection
Set oldThread = Nothing
'iterate counters
openThreadCount = openThreadCount - 1
End Sub

Public Property Let Size(sizeFactor As Variant)
'property of the thread group which dictates how many processes to run in total
'size factor is either an iterable item, or an integer to dictate the size

'Check if size factor is number
If IsNumeric(sizeFactor) Then
'If so, size is that
iterableSize = CLng(sizeFactor)
passesArguments = False 'no argument to pass to thread, just run it a load of times

'If not, *check if iterable
ElseIf isIterable(sizeFactor) Then
'If so, size is size of collection from extration
Set iterableQueue = New Queue
iterableSize = addIterableToQueue(sizeFactor, iterableQueue)
passesArguments = True
Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If

End Property

Public Sub IncreaseSize(sizeFactor As Variant)
'method of threadGroup which adds more tasks to the queue, and immediately runs them
'size factor is either an iterable item, or an integer to dictate the size

'Check whether size is set yet
If Me.Size = 0 Then
Err.Raise 5, Description:="You must set Size before you can IncreaseSize"
End If

'check whether new data matches old type
If IsNumeric(sizeFactor) Then
If passesArguments Then
Err.Raise 5, Description:="Size factor type doesn't match original type"
Else
'is numeric and was numeric, grand
iterableSize = iterableSize + CLng(sizeFactor)
End If
ElseIf isIterable(sizeFactor) Then
If passesArguments Then
'was iterable and still is, great!
Dim itemsAdded As Long
itemsAdded = addIterableToQueue(sizeFactor, iterableQueue)
iterableSize = iterableSize + itemsAdded
Else
'wasn't iterble, now is
Err.Raise 5, Description:="Size factor type doesn't match original type"
End If

Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If
Me.Execute
End Sub

Public Property Set AsyncClass(ByVal workObj As IWorker) 'Set the worker who carries out the tasks
Set workerClass = workObj
End Property

Public Property Get AsyncClass() As IWorker
Set AsyncClass = workerClass
End Property

Public Property Get Size() As Variant
Size = iterableSize
End Property

Public Property Let autoQuit(ByVal value As Boolean)
autoQuitEnabled = value
End Property

Public Property Get NewHandleID() As String
NewHandleID = "Handle " & newThreadIndex
newThreadIndex = newThreadIndex + 1 'use next one next time
End Property

Private Property Get NewTaskID() As String
'generates new task, saves its ID to taskIDset, then bumps the task counter along one
NewTaskID = "Task " & newTaskIndex
taskIDset.Add newTaskIndex, NewTaskID 'add id to map
newTaskIndex = newTaskIndex + 1
End Property

Private Sub Class_Terminate()
'Set threadGroup = Nothing
Debug.Print "Terminating group"
RaiseEvent Closed(Now - startTime)
End Sub

Public Sub printState() 'for debugging
Debug.Print _
"State:"; vbCrLf _
; Space(5); "Threads open: "; openThreadCount; vbCrLf _
; Space(5); "Threads in use: "; openTaskCount; vbCrLf _
; Space(5); "Threads marked as free: "; freeThreads.Count; vbCrLf _
; Space(5); "Tasks remaining: "; iterableSize - successfulTaskCount; vbCrLf _
; Space(5); "Next task index: "; newTaskIndex
End Sub


Its key methods are doInstruction (calling getInstruction) and Size and IncreaseSize



The class runs iteratively; each cycle the class finds out what to do and executes that (doInstruction). doInstruction always calls itself unless it's told to do nothing, which allows the call stack to shrink back. There are several options for what to do each cycle



  • Open a thread (create a new instance of clsThreadHandle and add to a collection of possible places to run tasks)

  • Close a thread (quit a handle and remove it from that collection)

  • Run a task on a thread

  • [Force quit a task - t.b. implemented]

  • Do Nothing (allow call stack to go back to zero)

The getInstruction method will tell the class to



  • Open a thread if it doesn't exceed a max count, and if there are tasks to run on it

  • Close a thread if there are no tasks left to run or if there are too many

  • Run a task on a thread if there's a thread marked free

  • Do nothing if there are no threads free, and there are the right number of threads open

Size is what dictates the number of tasks to carry out



  • If Size is numeric, the class will keep running tasks on threads until that number of tasks is run

  • If Sizeis iterable, then the class will keep running tasks and passing arguments by essentially For...Eaching through the iterable argument

    • This allows something like a url to be passed as an argument to each task, or even a range so that the worker knows where on the sheet to write its result to


IncreaseSize is like Size; it is useful if you want to drip feed tasks into the multithread set (say you are daisy chaining one onto the other using the first one's threadComplete events). It increases the size of the numeric/iterable argument.



Thread Handles clsThreadHandle



The main class creates multiple instances of this thread handle class.



Option Explicit


'THREAD HANDLE BASE PROPERTIES
Private eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
Private taskID As String 'holds the id of the current task
Private handleID As String 'holds the id of this handle
Private handleArgs As Variant 'holds any arguments that need to be passed to the task

'THREAD EVENTS
Private WithEvents workerEvents As IWorkerEvents
Private workerObject As IWorker 'interface to whatever worker may be passed to thread


Private Sub workerEvents_Complete(returnVal As Variant)
eventHandle.NotifyComplete Me, returnVal
End Sub

Private Sub workerEvents_Started()
Debug.Print Me.Task; " started event was raised"
End Sub

Public Property Set Worker(ByVal workObj As IWorker)
Set workerObject = workObj.CreateNew 'set worker to be a copy of the passed one
Set workerEvents = New IWorkerEvents 'create event handler
Set workerObject.Events = workerEvents 'pass it to the worker so it can listen in
End Property

Public Sub OpenHandle(multiThreadGroup As clsMultiThread, delegate As clsHandleEvents)
'called when the handle is opened, sets the reference IDs of the string and the handle, as well as parent g
Set eventHandle = delegate
handleID = multiThreadGroup.NewHandleID
eventHandle.NotifyThreadOpened (Name)
Debug.Print Name; " was opened"
End Sub

Public Sub Execute(Optional args As Variant)
Debug.Print Task; " executed on "; Name; " with "; IIf(IsMissing(args), "no arguments", "some arguments")

workerObject.Execute args 'run the event
End Sub

Public Property Get Task() As String
Task = taskID
End Property

Public Property Let Task(val As String)
taskID = val
Debug.Print Name; "'s task was set to "; taskID
End Property

Public Property Get Name() As String
Name = handleID
End Property

Private Sub Class_Initialize()
Debug.Print "I'm made"
End Sub

Private Sub Class_Terminate()
eventHandle.NotifyThreadClosed (Me.Name)
Set eventHandle = Nothing
Set workerObject = Nothing
End Sub

Private Sub workerEvents_StatusChange(statusVal As Variant)
'not yet implemented, probably unnecessary
End Sub


I've chosen individual event handlers rather than a single one (like I did with clsHandleEvents) because



  • I find having an individual thread class for each task/worker object easier to picture mentally

  • I intend to add a functionality where a worker can cache objects in its parent handle (such as an InternetExplorer application) to save re-initialising it between successive tasks on the same thread

    • Having a single cache for each thread makes this simpler


Handle Events class clsHandleEvents



A reference to this class is held by each thread so that it can raise an event to the multiThread class, without directly holding a reference to it (this would mess up garbage collection I think)



Option Explicit
'class to convert calls from the thread handle into events which the multi thread group can tap into

Public Event Complete(obj As clsThreadHandle, returnVal As Variant)
Public Event Opened(threadID As String) 'when thread is actually opened
Public Event Closed(threadID As String) 'when thread is closed

Public Sub NotifyComplete(obj As clsThreadHandle, Optional returnVal As Variant)
RaiseEvent Complete(obj, returnVal)
End Sub

Public Sub NotifyThreadOpened(threadID As String)
RaiseEvent Opened(threadID)
End Sub

Public Sub NotifyThreadClosed(threadID As String)
RaiseEvent Closed(threadID)
End Sub

Private Sub Class_Terminate()
Debug.Print "Events Terminated"
End Sub


Interfaces



There are 2 interface classes (well onlyIWorker is really one, but I'm calling IWorkerEvents one too, similar to this example)



IWorker forms a general template of an asynchronous process you can run, which raises appropriate events as per IWorkerEvents



IWorker



Option Explicit

'class acts as interface for any thread task
'Execute runs the task
'Events are raised by the task if it interfaces properly

Public Property Set Events(ByRef value As IWorkerEvents)
End Property

Public Sub Execute(Optional argument As Variant)
End Sub

Public Function CreateNew() As IWorker
End Function


IWorkerEvents



Option Explicit

'class holds all the events that a thread task can raise

Public Event Complete(returnVal As Variant)
Public Event StatusChange(statusVal As Variant)
Public Event Started()

Public Sub Complete(Optional returnVal As Variant)
RaiseEvent Complete(returnVal)
End Sub

Public Sub StatusChange(statusVal As Variant)
RaiseEvent StatusChange(statusVal)
End Sub

Public Sub Started()
RaiseEvent Started
End Sub


Finally...



There's a module of supplementary functions that I don't especially need review on, but I'll include as they are required for the clsMultiThread to execute



Option Explicit

Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue) As Long
'function to take iterable group and add it to the queue
'returns the number of items added
Dim item As Variant
Dim itemsAdded As Long
itemsAdded = 0
For Each item In iterator
resultQueue.enqueue item
itemsAdded = itemsAdded + 1
Next item
addIterableToQueue = itemsAdded
End Function

Function isIterable(obj As Variant) As Boolean
On Error Resume Next
Dim iterator As Variant
For Each iterator In obj
Exit For
Next
isIterable = Err.Number = 0
End Function



Test code



Don't need feedback on this stuff, except with regard to the way in which a worker is implemented Download example file here



It just occurred to me that I hadn't actually included a worker to test this with. Well here's an example that uses an MSHTML request to return an HTML document from a webpage. It takes a String/Range argument representing a url, and returns an HTMLDocument. NB, this has to be imported as it requires Attribute .VB_UserMemId = 0 as per this article



VERSION 1.0 CLASS
BEGIN
MultiUse = -1 'True
END
Attribute VB_Name = "clsHtmlWorker"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = False
Attribute VB_Exposed = False
Option Explicit
'''
'Basic worker object sends MSHTML GET request to webpage and returns an HTMLDocument or Nothing
'Requires reference to
' Microsoft HTML Object library (mshtml.tlb)
' Microsoft XML, v6.0 (msxml6.dll)
'''

Private httpRequest As MSXML2.XMLHTTP60

Implements IWorker

Private Type TWorker
Events As IWorkerEvents
End Type

Private this As TWorker

Private Function IWorker_CreateNew() As IWorker
Set IWorker_CreateNew = New clsHtmlWorker
End Function

Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property

Private Sub IWorker_Execute(Optional argument As Variant)
Started 'raise event to thread handle
'Do some task
sendRequest argument
End Sub

'''
'Event raising
'''

Private Sub Started()
If Not this.Events Is Nothing Then
this.Events.Started
End If
End Sub

Private Sub statusChange(ByVal statusText As String)
If Not this.Events Is Nothing Then
'status change is not fully implemented yet in clsMultiThread, I may get rid of it
this.Events.statusChange statusText
End If
End Sub

Private Sub Complete(Optional ByVal resultPage As HTMLDocument)
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
If Not this.Events Is Nothing Then
this.Events.Complete resultPage
End If
End Sub

Private Sub sendRequest(ByVal url As String)
'''
'Sub to open a new XMLHTTP request at a given url
'Also assigns OnReadyStateChange callback function to this class' default routine
'''

If httpRequest Is Nothing Then Set httpRequest = New MSXML2.XMLHTTP60

With httpRequest
'Assign callback function to handler class (by default property)
.OnReadyStateChange = Me
'open and send the request
.Open "GET", url, True
.send vbNullString
End With

End Sub

Public Sub OnReadyStateChange()
Attribute OnReadyStateChange.VB_UserMemId = 0
'''
'This is the default callback routine of the class
'''
With httpRequest

statusChange .statusText

If .ReadyState = 4 Then 'loaded
If .Status = 200 Then 'successful
'mark complete and pass document
Dim htmlDoc As HTMLDocument
Set htmlDoc = New HTMLDocument
htmlDoc.body.innerHTML = .responseText
Complete htmlDoc
Else 'unsuccessful
Complete
End If
End If
End With
End Sub

Private Sub Class_Terminate()
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
End Sub


A multi-thread group implementing it can be run in a caller class like codeReviewTest. Sends requests to urls in A1:A10, returns e-mails from those urls in neighbouring columns.



Option Explicit

'''
'This class creates and runs a new multithread instance which runs clsHtmlWorker
'When each HTMLDocument is complete, the class scans it for e-mails
'''
Private WithEvents multiThreadGroup As clsMultiThread
'clsMultiThread is async so must be declared separately (or in a doEvents loop)
Private Const REGEX_PATTERN As String = _
"(?:[a-z0-9!#$%&'*+/=?^_`~-]+(?:.[a-z0-9!#$%&'*+/=?^_`~-]+)*|""(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|\[x01-x09x0bx0cx0e-x7f])*"")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:(2(5[0-5]|[0-4])|1[0-9]|[0-9]?[0-9])).)3(?:(2(5[1-9]|[0-9])|1[0-5]|[0-4]?[0-9])|[0-9]*[0-9]:(?:[1-9]|\[0-9])+)])"

Public Sub run()
'urls to check for emails are in a1:a10
htmlRequestToUrls [a1:a10]
End Sub

Private Sub htmlRequestToUrls(urlCells As Range)

Set multiThreadGroup = New clsMultiThread
With multiThreadGroup
.Size = urlCells 'set iterable, here a load of urls
Set .AsyncClass = New clsHtmlWorker 'set async worker
.Execute 'run the group
End With

End Sub

Private Sub multiThreadGroup_TaskComplete(returnVal As Variant, taskID As String, threadID As String)

Dim rowI As Long, colI As Long
rowI = Right(taskID, Len(taskID) - 4)

If returnVal Is Nothing Then
Cells(rowI, 2) = "Error in loading page"
ElseIf TypeOf returnVal Is HTMLDocument Then
Dim emailMatches() As String
emailMatches = regexMatches(returnVal.body.innerText)
If (Not emailMatches) = -1 Then
'no emails on page
Cells(rowI, 2) = "No e-mail matches"
Else
For colI = LBound(emailMatches) To UBound(emailMatches)
Cells(rowI, colI + 2) = emailMatches(colI)
Next colI
End If
Else 'nothing returned
Cells(rowI, 2) = "Error in loading page"
End If


End Sub

Private Function regexMatches(strInput As String) As String()

Dim rMatch As Object
Dim s As String
Dim arrayMatches() As String
Dim i As Long

With CreateObject("VBScript.Regexp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = REGEX_PATTERN
If .test(strInput) Then
For Each rMatch In .Execute(strInput)
ReDim Preserve arrayMatches(i)
arrayMatches(i) = rMatch.value
i = i + 1
Next
End If
End With

regexMatches = arrayMatches

End Function


The test class will create a new multi thread group. The group will open the default 5 threads, on each thread it will create an instance of clsHtmlWorker. It will convert the range [A1:A10] into 10 arguments which it will pass, 1 at a time, to the workers on each thread when they aren't busy. Once all tasks are run the class will autoQuit - cutting references to all sub classes, allowing it to go out of scope



You can download the example workbook if you want, works best with Rubberduck to organise folders. Test code is in the CodeReviewTestRunner, or just hit the big button







share|improve this question













Intro





Multithreading tools exist in Excel - often to run macros in multiple instances of Excel, or to convert macros to vbscripts that can run independently. However I've often come across projects where I'd like to delegate multiple tasks out to async processes, and creating multiple instances of Excel itself is overkill for this.



After running into several projects where I needed to execute multiple async requests (think internet scraping, or radioactive decay simulation) I decided to make a general class that, when given an async process, can execute and handle several in parallel.




For example, this Daisy Test makes a multithread group which sends html requests to all the urls in column B. The first link of these google searches are returned to column Cin the order the responses arrive. That triggers a second group (daisy chained onto the 1st one's events) to send internet explorer request to that url, and these are returned in D



Example implementation



To clarify some of the comments; it should be noted that these requests are sent off in order (B1,B2,...), but return unordered (C2 first). That's because my class allows the threads to run in parallel (hence multithreading). These are still only managed in a single Excel thread, but the requests are asynchronous and in different processes so are effectively running in other threads.



Summary



N.B. The term 'thread' will be used loosely here, with no reference to the actual processor. Instead, when I say 'thread' I am talking about a handler for a task (which is running in parallel to other tasks on other handlers)



The multi thread setup consists of a main clsMultiThread parent class which controls the shape of the multi thread collection (i.e. how many tasks are running at any given time), as well as several clsThreadHandle classes.



These thread handlers are each responsible for running an async task, and informing the parent multithread class when each task is finished.



Internally, the tasks are actually run by WorkerClass objects, one worker for each thread. These receive input arguments from their parent threads, run their respective async task, and raise an event to their parent clsThreadHandle when finished. The thread handle then passes this event, and any optional return value, back up to the main clsMultiThread, whose job it is either to close the thread once it's done, or prompt the thread to run another task. The chain of command is summarised in the image below:



Chain of command



Feedback I'm after



  • General feedback on structure, event handling, use of interfaces etc.

  • Proper exiting (which I don't think I'm doing right now)

  • User-side interface

    • Whether this approach to the problem is appropriate and intuitive (passing worker classes etc.)

    • Whether I'm missing some functionality that should be there


This is also the first project I've ever done with the primary aim of making something I can re-use (as well as the longest & most complicated bit of code I've written). For that reason I'd also greatly appreciate any comments on



  • Coding style

  • Use of comments

  • Anything else I should bear in mind when working on such projects

Implementation



Main class clsMultiThread



Right, some code. Here's the main class which handles all the sub-classes



Option Explicit

'''
'VBA class to run multiple asynchronous processes
'Interfaces directly with clsThreadHandle
'Requires references to:
'mscrolib.dll
'''

'THREAD GROUP SHAPE PROPERTIES
Private threadGroup As New Collection 'holds all the treads
Private maxThreads As Long 'maximum number of threads that can be open
Private minThreads As Long '[minimum number of threads]
Private iterableQueue As mscorlib.Queue 'this item holds all the items from iterator set in queue
'replaces iterableGroup, newtaskindex, taskidset
Private iterableSize As Long 'number of items in iterable group or
Private passesArguments As Boolean 'true if iterableGroup exists

'THREAD GROUP REFERENCES
Private WithEvents threadEvents As clsHandleEvents 'Event object to raise events from each thread handle
Private workerClass As IWorker

'THREAD GROUP SETTINGS
Private autoQuitEnabled As Boolean 'dictates whether to quit on Complete event, should be false if daisychaining

'THREAD GROUP STATE PROPERTIES
Private openThreadCount As Long 'number of threads/handles currently open
Private openTaskCount As Long 'number of tasks running on those threads
Private closedTaskCount As Long 'number of threads closed (failed and successful)
Private successfulTaskCount As Long 'number of threads completed sucessfully
Private newThreadIndex As Long 'Iterator over handles (next new handle)
Private newTaskIndex As Long 'Iterator over open tasks (next thread to be started)
Private taskIDset As Collection 'Dictionary mapping taskIDs to iterableGroup location "REPLACE THIS. MERGE COLLECTION JUMBLES"
Private freeThreads As Collection 'holds all the free thread ids

'THREAD GROUP PERFORMANCE PROPERTIES
Private startTime As Date
'Private endTime As Date

'THREAD GROUP EVENTS
Public Event TaskComplete(returnVal As Variant, taskID As String, threadID As String) 'when a task is complete on a thread, maybe if failed
Public Event ThreadOpened(threadCount As Long, threadID As String) 'when a thread is opened, pass the new number of threads
Public Event ThreadClosed(threadCount As Long, threadID As String) 'when closed, pass closed thread ID
Public Event Complete(timeTaken As Date) 'when everything is (nearly) finished
Public Event Closed(timeTaken As Date) 'when entire group is closed
Public Event Opened(startTime As Date) 'when entire group is closed

'PRIVATE TYPES/ENUMS
Private Type Instruction 'instruction on what to do next, and any necessary arguments that can be passed
threadID As String
instructionBody As InstructionType
End Type

Private Enum InstructionType
mltCloseThread
mltOpenThread
mltSetTask
mltDoNothing
mltQuit
End Enum

Private Sub Class_Initialize()
'Set defaults
maxThreads = 5
minThreads = 1
newThreadIndex = 1
newTaskIndex = 1
autoQuitEnabled = True
Set threadEvents = New clsHandleEvents
Set taskIDset = New Collection
Set freeThreads = New Collection
startTime = Now
RaiseEvent Opened(startTime)
'''
'Test space
'''
End Sub

Private Sub threadEvents_Closed(threadID As String)
RaiseEvent ThreadClosed(openThreadCount, threadID)
End Sub

Private Sub threadEvents_Opened(threadID As String)
RaiseEvent ThreadOpened(openThreadCount, threadID)
End Sub

Private Sub threadEvents_Complete(obj As clsThreadHandle, returnVal As Variant) 'called when thread becomes free
'DO NOT mark as free here
RaiseEvent TaskComplete(returnVal, obj.Task, obj.Name) 'failed as boolean
openTaskCount = openTaskCount - 1
closedTaskCount = closedTaskCount + 1
successfulTaskCount = successfulTaskCount + 1 'could be unsuccessful too though
doInstructions obj.Name 'pass object name so it can be marked free
' If failed Then
' failedTaskCount = failedTaskCount + 1
' Else
' successfulTaskCount = successfulTaskCount + 1
' End If
End Sub

Public Sub Execute()
'check validity of user data, if valid, then execute task
If iterableSize = 0 Then
Err.Raise 5, Description:="You must set size argument to a non-zero value, or a non-empty iterable first"
ElseIf workerClass Is Nothing Then
Err.Raise 5, Description:="You must set the async class argument first"
Else
doInstructions
End If
End Sub

Public Sub Quit()
'Remove any references that would prevent proper closing
'Default automatically called when openThreadCount = 0
RaiseEvent Complete(Now - startTime)
Set threadEvents = Nothing
End Sub

Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction

'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID

'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy

BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If

Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select

'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
doInstructions loopcount:=loopcount + 1 'watch for infinite loop
End If

End Sub

Private Function getInstruction() As Instruction
'function to determine what action to take next
'called until do nothing returned
'caller to doinstructions can specify a free thread in which case some parts skipped
Dim results As Instruction 'variable to hold instruction and any arguments

Me.printState

'Do we need to open or close threads?
'Threads free? (threads open > tasks open):
If openThreadCount > openTaskCount Then
'Great we have a free thread, now use it or delete it (cos we have too many or no tasks remaining)
If newTaskIndex > iterableSize Then 'already passed all tasks
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
ElseIf openThreadCount <= maxThreads Then
'[find] & use free thread (run a task on it)
results.instructionBody = mltSetTask
results.threadID = freeThread
Else
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
End If
Else
'No threads free, either open one (if not exceeding max, and there's a task left to put on it)
'Or do nothing (can't close it if not free, shouldn't open new if no more tasks)
If openThreadCount < maxThreads And newTaskIndex <= iterableSize Then
results.instructionBody = mltOpenThread
ElseIf openThreadCount = 0 And autoQuitEnabled Then
results.instructionBody = mltQuit
Else
results.instructionBody = mltDoNothing
End If
End If

getInstruction = results
End Function

Private Sub openThread()
'opens a thread and assigns a task ID to it
Dim newThread As New clsThreadHandle 'create new handle
newThread.OpenHandle Me, threadEvents 'passes parent reference which allows handle to obtain thread ID
threadGroup.Add newThread, newThread.Name 'add it to the group with a new id (set by itself)
openThreadCount = openThreadCount + 1
freeThread = newThread.Name 'mark as free so task can be assigned to it

End Sub

Private Property Let freeThread(threadID As String)
'NOT WORKING"""""
'when a thread comes free, add it to the collection
freeThreads.Add threadID, threadID
Debug.Print threadID; " marked as free; now"; freeThreads.Count; "threads are free"
End Property

Private Property Let BusyThread(threadID As String)
'when a thread is not free or is closed, mark as busy by removing from free group
On Error Resume Next 'only remove ones what are there actually
freeThreads.Remove threadID
Debug.Print threadID; " marked as busy"; IIf(Err.Number <> 0, ", but wasn't in free group", vbNullString)
End Property

Private Property Get freeThread() As String
'gives up a free thread and adds it to the list
freeThread = freeThreads(1)
freeThreads.Remove (1)
End Property

Private Sub assignTaskID(threadID As String)
'@Ignore WriteOnlyProperty
'assigns task ID to thread
'nb does NOT actually run the task (this is instruction stage still)
Dim newThread As clsThreadHandle
Set newThread = threadGroup(threadID)
newThread.Task = NewTaskID
Set newThread.Worker = AsyncClass
End Sub

Private Sub closeThread(threadID As String, Optional failed As Boolean = False)

'close thread with appropriate id
Dim oldThread As clsThreadHandle
Set oldThread = threadGroup(threadID)
'remove from all collections
'taskIDset.Remove oldThread.Task remove from task id set if it was in there
threadGroup.Remove oldThread.Name
BusyThread = oldThread.Name 'remove from free collection
Set oldThread = Nothing
'iterate counters
openThreadCount = openThreadCount - 1
End Sub

Public Property Let Size(sizeFactor As Variant)
'property of the thread group which dictates how many processes to run in total
'size factor is either an iterable item, or an integer to dictate the size

'Check if size factor is number
If IsNumeric(sizeFactor) Then
'If so, size is that
iterableSize = CLng(sizeFactor)
passesArguments = False 'no argument to pass to thread, just run it a load of times

'If not, *check if iterable
ElseIf isIterable(sizeFactor) Then
'If so, size is size of collection from extration
Set iterableQueue = New Queue
iterableSize = addIterableToQueue(sizeFactor, iterableQueue)
passesArguments = True
Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If

End Property

Public Sub IncreaseSize(sizeFactor As Variant)
'method of threadGroup which adds more tasks to the queue, and immediately runs them
'size factor is either an iterable item, or an integer to dictate the size

'Check whether size is set yet
If Me.Size = 0 Then
Err.Raise 5, Description:="You must set Size before you can IncreaseSize"
End If

'check whether new data matches old type
If IsNumeric(sizeFactor) Then
If passesArguments Then
Err.Raise 5, Description:="Size factor type doesn't match original type"
Else
'is numeric and was numeric, grand
iterableSize = iterableSize + CLng(sizeFactor)
End If
ElseIf isIterable(sizeFactor) Then
If passesArguments Then
'was iterable and still is, great!
Dim itemsAdded As Long
itemsAdded = addIterableToQueue(sizeFactor, iterableQueue)
iterableSize = iterableSize + itemsAdded
Else
'wasn't iterble, now is
Err.Raise 5, Description:="Size factor type doesn't match original type"
End If

Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If
Me.Execute
End Sub

Public Property Set AsyncClass(ByVal workObj As IWorker) 'Set the worker who carries out the tasks
Set workerClass = workObj
End Property

Public Property Get AsyncClass() As IWorker
Set AsyncClass = workerClass
End Property

Public Property Get Size() As Variant
Size = iterableSize
End Property

Public Property Let autoQuit(ByVal value As Boolean)
autoQuitEnabled = value
End Property

Public Property Get NewHandleID() As String
NewHandleID = "Handle " & newThreadIndex
newThreadIndex = newThreadIndex + 1 'use next one next time
End Property

Private Property Get NewTaskID() As String
'generates new task, saves its ID to taskIDset, then bumps the task counter along one
NewTaskID = "Task " & newTaskIndex
taskIDset.Add newTaskIndex, NewTaskID 'add id to map
newTaskIndex = newTaskIndex + 1
End Property

Private Sub Class_Terminate()
'Set threadGroup = Nothing
Debug.Print "Terminating group"
RaiseEvent Closed(Now - startTime)
End Sub

Public Sub printState() 'for debugging
Debug.Print _
"State:"; vbCrLf _
; Space(5); "Threads open: "; openThreadCount; vbCrLf _
; Space(5); "Threads in use: "; openTaskCount; vbCrLf _
; Space(5); "Threads marked as free: "; freeThreads.Count; vbCrLf _
; Space(5); "Tasks remaining: "; iterableSize - successfulTaskCount; vbCrLf _
; Space(5); "Next task index: "; newTaskIndex
End Sub


Its key methods are doInstruction (calling getInstruction) and Size and IncreaseSize



The class runs iteratively; each cycle the class finds out what to do and executes that (doInstruction). doInstruction always calls itself unless it's told to do nothing, which allows the call stack to shrink back. There are several options for what to do each cycle



  • Open a thread (create a new instance of clsThreadHandle and add to a collection of possible places to run tasks)

  • Close a thread (quit a handle and remove it from that collection)

  • Run a task on a thread

  • [Force quit a task - t.b. implemented]

  • Do Nothing (allow call stack to go back to zero)

The getInstruction method will tell the class to



  • Open a thread if it doesn't exceed a max count, and if there are tasks to run on it

  • Close a thread if there are no tasks left to run or if there are too many

  • Run a task on a thread if there's a thread marked free

  • Do nothing if there are no threads free, and there are the right number of threads open

Size is what dictates the number of tasks to carry out



  • If Size is numeric, the class will keep running tasks on threads until that number of tasks is run

  • If Sizeis iterable, then the class will keep running tasks and passing arguments by essentially For...Eaching through the iterable argument

    • This allows something like a url to be passed as an argument to each task, or even a range so that the worker knows where on the sheet to write its result to


IncreaseSize is like Size; it is useful if you want to drip feed tasks into the multithread set (say you are daisy chaining one onto the other using the first one's threadComplete events). It increases the size of the numeric/iterable argument.



Thread Handles clsThreadHandle



The main class creates multiple instances of this thread handle class.



Option Explicit


'THREAD HANDLE BASE PROPERTIES
Private eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
Private taskID As String 'holds the id of the current task
Private handleID As String 'holds the id of this handle
Private handleArgs As Variant 'holds any arguments that need to be passed to the task

'THREAD EVENTS
Private WithEvents workerEvents As IWorkerEvents
Private workerObject As IWorker 'interface to whatever worker may be passed to thread


Private Sub workerEvents_Complete(returnVal As Variant)
eventHandle.NotifyComplete Me, returnVal
End Sub

Private Sub workerEvents_Started()
Debug.Print Me.Task; " started event was raised"
End Sub

Public Property Set Worker(ByVal workObj As IWorker)
Set workerObject = workObj.CreateNew 'set worker to be a copy of the passed one
Set workerEvents = New IWorkerEvents 'create event handler
Set workerObject.Events = workerEvents 'pass it to the worker so it can listen in
End Property

Public Sub OpenHandle(multiThreadGroup As clsMultiThread, delegate As clsHandleEvents)
'called when the handle is opened, sets the reference IDs of the string and the handle, as well as parent g
Set eventHandle = delegate
handleID = multiThreadGroup.NewHandleID
eventHandle.NotifyThreadOpened (Name)
Debug.Print Name; " was opened"
End Sub

Public Sub Execute(Optional args As Variant)
Debug.Print Task; " executed on "; Name; " with "; IIf(IsMissing(args), "no arguments", "some arguments")

workerObject.Execute args 'run the event
End Sub

Public Property Get Task() As String
Task = taskID
End Property

Public Property Let Task(val As String)
taskID = val
Debug.Print Name; "'s task was set to "; taskID
End Property

Public Property Get Name() As String
Name = handleID
End Property

Private Sub Class_Initialize()
Debug.Print "I'm made"
End Sub

Private Sub Class_Terminate()
eventHandle.NotifyThreadClosed (Me.Name)
Set eventHandle = Nothing
Set workerObject = Nothing
End Sub

Private Sub workerEvents_StatusChange(statusVal As Variant)
'not yet implemented, probably unnecessary
End Sub


I've chosen individual event handlers rather than a single one (like I did with clsHandleEvents) because



  • I find having an individual thread class for each task/worker object easier to picture mentally

  • I intend to add a functionality where a worker can cache objects in its parent handle (such as an InternetExplorer application) to save re-initialising it between successive tasks on the same thread

    • Having a single cache for each thread makes this simpler


Handle Events class clsHandleEvents



A reference to this class is held by each thread so that it can raise an event to the multiThread class, without directly holding a reference to it (this would mess up garbage collection I think)



Option Explicit
'class to convert calls from the thread handle into events which the multi thread group can tap into

Public Event Complete(obj As clsThreadHandle, returnVal As Variant)
Public Event Opened(threadID As String) 'when thread is actually opened
Public Event Closed(threadID As String) 'when thread is closed

Public Sub NotifyComplete(obj As clsThreadHandle, Optional returnVal As Variant)
RaiseEvent Complete(obj, returnVal)
End Sub

Public Sub NotifyThreadOpened(threadID As String)
RaiseEvent Opened(threadID)
End Sub

Public Sub NotifyThreadClosed(threadID As String)
RaiseEvent Closed(threadID)
End Sub

Private Sub Class_Terminate()
Debug.Print "Events Terminated"
End Sub


Interfaces



There are 2 interface classes (well onlyIWorker is really one, but I'm calling IWorkerEvents one too, similar to this example)



IWorker forms a general template of an asynchronous process you can run, which raises appropriate events as per IWorkerEvents



IWorker



Option Explicit

'class acts as interface for any thread task
'Execute runs the task
'Events are raised by the task if it interfaces properly

Public Property Set Events(ByRef value As IWorkerEvents)
End Property

Public Sub Execute(Optional argument As Variant)
End Sub

Public Function CreateNew() As IWorker
End Function


IWorkerEvents



Option Explicit

'class holds all the events that a thread task can raise

Public Event Complete(returnVal As Variant)
Public Event StatusChange(statusVal As Variant)
Public Event Started()

Public Sub Complete(Optional returnVal As Variant)
RaiseEvent Complete(returnVal)
End Sub

Public Sub StatusChange(statusVal As Variant)
RaiseEvent StatusChange(statusVal)
End Sub

Public Sub Started()
RaiseEvent Started
End Sub


Finally...



There's a module of supplementary functions that I don't especially need review on, but I'll include as they are required for the clsMultiThread to execute



Option Explicit

Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue) As Long
'function to take iterable group and add it to the queue
'returns the number of items added
Dim item As Variant
Dim itemsAdded As Long
itemsAdded = 0
For Each item In iterator
resultQueue.enqueue item
itemsAdded = itemsAdded + 1
Next item
addIterableToQueue = itemsAdded
End Function

Function isIterable(obj As Variant) As Boolean
On Error Resume Next
Dim iterator As Variant
For Each iterator In obj
Exit For
Next
isIterable = Err.Number = 0
End Function



Test code



Don't need feedback on this stuff, except with regard to the way in which a worker is implemented Download example file here



It just occurred to me that I hadn't actually included a worker to test this with. Well here's an example that uses an MSHTML request to return an HTML document from a webpage. It takes a String/Range argument representing a url, and returns an HTMLDocument. NB, this has to be imported as it requires Attribute .VB_UserMemId = 0 as per this article



VERSION 1.0 CLASS
BEGIN
MultiUse = -1 'True
END
Attribute VB_Name = "clsHtmlWorker"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = False
Attribute VB_Exposed = False
Option Explicit
'''
'Basic worker object sends MSHTML GET request to webpage and returns an HTMLDocument or Nothing
'Requires reference to
' Microsoft HTML Object library (mshtml.tlb)
' Microsoft XML, v6.0 (msxml6.dll)
'''

Private httpRequest As MSXML2.XMLHTTP60

Implements IWorker

Private Type TWorker
Events As IWorkerEvents
End Type

Private this As TWorker

Private Function IWorker_CreateNew() As IWorker
Set IWorker_CreateNew = New clsHtmlWorker
End Function

Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property

Private Sub IWorker_Execute(Optional argument As Variant)
Started 'raise event to thread handle
'Do some task
sendRequest argument
End Sub

'''
'Event raising
'''

Private Sub Started()
If Not this.Events Is Nothing Then
this.Events.Started
End If
End Sub

Private Sub statusChange(ByVal statusText As String)
If Not this.Events Is Nothing Then
'status change is not fully implemented yet in clsMultiThread, I may get rid of it
this.Events.statusChange statusText
End If
End Sub

Private Sub Complete(Optional ByVal resultPage As HTMLDocument)
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
If Not this.Events Is Nothing Then
this.Events.Complete resultPage
End If
End Sub

Private Sub sendRequest(ByVal url As String)
'''
'Sub to open a new XMLHTTP request at a given url
'Also assigns OnReadyStateChange callback function to this class' default routine
'''

If httpRequest Is Nothing Then Set httpRequest = New MSXML2.XMLHTTP60

With httpRequest
'Assign callback function to handler class (by default property)
.OnReadyStateChange = Me
'open and send the request
.Open "GET", url, True
.send vbNullString
End With

End Sub

Public Sub OnReadyStateChange()
Attribute OnReadyStateChange.VB_UserMemId = 0
'''
'This is the default callback routine of the class
'''
With httpRequest

statusChange .statusText

If .ReadyState = 4 Then 'loaded
If .Status = 200 Then 'successful
'mark complete and pass document
Dim htmlDoc As HTMLDocument
Set htmlDoc = New HTMLDocument
htmlDoc.body.innerHTML = .responseText
Complete htmlDoc
Else 'unsuccessful
Complete
End If
End If
End With
End Sub

Private Sub Class_Terminate()
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
End Sub


A multi-thread group implementing it can be run in a caller class like codeReviewTest. Sends requests to urls in A1:A10, returns e-mails from those urls in neighbouring columns.



Option Explicit

'''
'This class creates and runs a new multithread instance which runs clsHtmlWorker
'When each HTMLDocument is complete, the class scans it for e-mails
'''
Private WithEvents multiThreadGroup As clsMultiThread
'clsMultiThread is async so must be declared separately (or in a doEvents loop)
Private Const REGEX_PATTERN As String = _
"(?:[a-z0-9!#$%&'*+/=?^_`~-]+(?:.[a-z0-9!#$%&'*+/=?^_`~-]+)*|""(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|\[x01-x09x0bx0cx0e-x7f])*"")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:(2(5[0-5]|[0-4])|1[0-9]|[0-9]?[0-9])).)3(?:(2(5[1-9]|[0-9])|1[0-5]|[0-4]?[0-9])|[0-9]*[0-9]:(?:[1-9]|\[0-9])+)])"

Public Sub run()
'urls to check for emails are in a1:a10
htmlRequestToUrls [a1:a10]
End Sub

Private Sub htmlRequestToUrls(urlCells As Range)

Set multiThreadGroup = New clsMultiThread
With multiThreadGroup
.Size = urlCells 'set iterable, here a load of urls
Set .AsyncClass = New clsHtmlWorker 'set async worker
.Execute 'run the group
End With

End Sub

Private Sub multiThreadGroup_TaskComplete(returnVal As Variant, taskID As String, threadID As String)

Dim rowI As Long, colI As Long
rowI = Right(taskID, Len(taskID) - 4)

If returnVal Is Nothing Then
Cells(rowI, 2) = "Error in loading page"
ElseIf TypeOf returnVal Is HTMLDocument Then
Dim emailMatches() As String
emailMatches = regexMatches(returnVal.body.innerText)
If (Not emailMatches) = -1 Then
'no emails on page
Cells(rowI, 2) = "No e-mail matches"
Else
For colI = LBound(emailMatches) To UBound(emailMatches)
Cells(rowI, colI + 2) = emailMatches(colI)
Next colI
End If
Else 'nothing returned
Cells(rowI, 2) = "Error in loading page"
End If


End Sub

Private Function regexMatches(strInput As String) As String()

Dim rMatch As Object
Dim s As String
Dim arrayMatches() As String
Dim i As Long

With CreateObject("VBScript.Regexp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = REGEX_PATTERN
If .test(strInput) Then
For Each rMatch In .Execute(strInput)
ReDim Preserve arrayMatches(i)
arrayMatches(i) = rMatch.value
i = i + 1
Next
End If
End With

regexMatches = arrayMatches

End Function


The test class will create a new multi thread group. The group will open the default 5 threads, on each thread it will create an instance of clsHtmlWorker. It will convert the range [A1:A10] into 10 arguments which it will pass, 1 at a time, to the workers on each thread when they aren't busy. Once all tasks are run the class will autoQuit - cutting references to all sub classes, allowing it to go out of scope



You can download the example workbook if you want, works best with Rubberduck to organise folders. Test code is in the CodeReviewTestRunner, or just hit the big button









share|improve this question












share|improve this question




share|improve this question








edited Mar 22 at 17:50
























asked Jan 16 at 12:21









Greedo

255312




255312











  • I'm wondering why we have comments saying reference to mscorlib.dll is needed yet I see no CreateObject or even `GetObject calls to create something from that library? This is all pure VBA and because VBA is single-threaded, everything will run as one thread?
    – this
    Jan 16 at 12:52










  • @This 1) mscorlib.dll I'm using early binding, and Private iterableQueue As mscorlib.Queue in clsMultiThread is what requires it. 2) True, to fully simulate multithreading in Excel you need to create multiple instances of EXCEL.EXE. However this project targets asynchronous processes specifically, as these don't run in Excel directly. Sure the handling is all single-threaded, but in internet applications the main overhead is in waiting for response to load. That can be done asynchronously, and with several instances in parallel. I hope that makes sense
    – Greedo
    Jan 16 at 14:06











  • I think whether this could run code asynchronously very much depends on the existence of a worker class that supports yielding its execution until some asynchronous work has finished. Without such a worker tasks would simply run to completion synchronously whenever they are executed.
    – M.Doerner
    Jan 16 at 14:52










  • @M.Doerner Exactly, an InternetExplorer.Application or an XmlHttprequest (see this) article can be run asynchronously, outside Excel, using Events and the default property callback function hack respectively. My use of IWorker interface was designed to make sure workers raise events when complete, which is a sort of reminder that these should be async workers, not normal routines. Multithreading standard routines requires this sort of approach
    – Greedo
    Jan 16 at 15:25







  • 1




    @Raystafarian I've added a download file in the test area if that helps
    – Greedo
    Feb 12 at 16:16
















  • I'm wondering why we have comments saying reference to mscorlib.dll is needed yet I see no CreateObject or even `GetObject calls to create something from that library? This is all pure VBA and because VBA is single-threaded, everything will run as one thread?
    – this
    Jan 16 at 12:52










  • @This 1) mscorlib.dll I'm using early binding, and Private iterableQueue As mscorlib.Queue in clsMultiThread is what requires it. 2) True, to fully simulate multithreading in Excel you need to create multiple instances of EXCEL.EXE. However this project targets asynchronous processes specifically, as these don't run in Excel directly. Sure the handling is all single-threaded, but in internet applications the main overhead is in waiting for response to load. That can be done asynchronously, and with several instances in parallel. I hope that makes sense
    – Greedo
    Jan 16 at 14:06











  • I think whether this could run code asynchronously very much depends on the existence of a worker class that supports yielding its execution until some asynchronous work has finished. Without such a worker tasks would simply run to completion synchronously whenever they are executed.
    – M.Doerner
    Jan 16 at 14:52










  • @M.Doerner Exactly, an InternetExplorer.Application or an XmlHttprequest (see this) article can be run asynchronously, outside Excel, using Events and the default property callback function hack respectively. My use of IWorker interface was designed to make sure workers raise events when complete, which is a sort of reminder that these should be async workers, not normal routines. Multithreading standard routines requires this sort of approach
    – Greedo
    Jan 16 at 15:25







  • 1




    @Raystafarian I've added a download file in the test area if that helps
    – Greedo
    Feb 12 at 16:16















I'm wondering why we have comments saying reference to mscorlib.dll is needed yet I see no CreateObject or even `GetObject calls to create something from that library? This is all pure VBA and because VBA is single-threaded, everything will run as one thread?
– this
Jan 16 at 12:52




I'm wondering why we have comments saying reference to mscorlib.dll is needed yet I see no CreateObject or even `GetObject calls to create something from that library? This is all pure VBA and because VBA is single-threaded, everything will run as one thread?
– this
Jan 16 at 12:52












@This 1) mscorlib.dll I'm using early binding, and Private iterableQueue As mscorlib.Queue in clsMultiThread is what requires it. 2) True, to fully simulate multithreading in Excel you need to create multiple instances of EXCEL.EXE. However this project targets asynchronous processes specifically, as these don't run in Excel directly. Sure the handling is all single-threaded, but in internet applications the main overhead is in waiting for response to load. That can be done asynchronously, and with several instances in parallel. I hope that makes sense
– Greedo
Jan 16 at 14:06





@This 1) mscorlib.dll I'm using early binding, and Private iterableQueue As mscorlib.Queue in clsMultiThread is what requires it. 2) True, to fully simulate multithreading in Excel you need to create multiple instances of EXCEL.EXE. However this project targets asynchronous processes specifically, as these don't run in Excel directly. Sure the handling is all single-threaded, but in internet applications the main overhead is in waiting for response to load. That can be done asynchronously, and with several instances in parallel. I hope that makes sense
– Greedo
Jan 16 at 14:06













I think whether this could run code asynchronously very much depends on the existence of a worker class that supports yielding its execution until some asynchronous work has finished. Without such a worker tasks would simply run to completion synchronously whenever they are executed.
– M.Doerner
Jan 16 at 14:52




I think whether this could run code asynchronously very much depends on the existence of a worker class that supports yielding its execution until some asynchronous work has finished. Without such a worker tasks would simply run to completion synchronously whenever they are executed.
– M.Doerner
Jan 16 at 14:52












@M.Doerner Exactly, an InternetExplorer.Application or an XmlHttprequest (see this) article can be run asynchronously, outside Excel, using Events and the default property callback function hack respectively. My use of IWorker interface was designed to make sure workers raise events when complete, which is a sort of reminder that these should be async workers, not normal routines. Multithreading standard routines requires this sort of approach
– Greedo
Jan 16 at 15:25





@M.Doerner Exactly, an InternetExplorer.Application or an XmlHttprequest (see this) article can be run asynchronously, outside Excel, using Events and the default property callback function hack respectively. My use of IWorker interface was designed to make sure workers raise events when complete, which is a sort of reminder that these should be async workers, not normal routines. Multithreading standard routines requires this sort of approach
– Greedo
Jan 16 at 15:25





1




1




@Raystafarian I've added a download file in the test area if that helps
– Greedo
Feb 12 at 16:16




@Raystafarian I've added a download file in the test area if that helps
– Greedo
Feb 12 at 16:16










2 Answers
2






active

oldest

votes

















up vote
10
down vote



accepted
+500










Interesting idea and well-done!



Naming



I really don't like the names. The names like clsMultiThread is somewhat misleading, since as you noted they don't actually provide any true multi-threading. A unwary user would expect it to work with anything and would be disappointed when all of their queued works painfully completes synchronously. ;)



Also, we are not really using threads, but objects which may or may not run in-process or not. You used MSXML2.XMLHTTP60 so it should run in-process. However, it need not be the case if we were to use something like ShDocVw.WebBrowser or even Excel.Application which may run out-of-process. That's not threading. So in that case we are actually talking about running things asynchronously more than threading.



May I instead suggest names like ParallelTaskCoordinator, TaskHandle, and AsyncObjectWatcher? The point being that it should convey the notion that those objects have nothing to do with running asynchronously; we're just orchestrating the asynchronous tasks to run in parallel.



In the comments you asked about hungarian notations. My personal preference is to not use hungarian notation for objects, so I wouldn't have used cls prefix. The concerns about namespace is not going to be helped by whether you prefix or not, mainly because a good namespace is about logical grouping, as opposed to clumping them together based on their type of modules. I'm OK with using HN for private variables, not so much with public-facing properties such as the module's name, the public property since they just distract from their semantic meaning. The semantic part of the name is far more important, and a good naming convention should support that.



On general, your naming scheme seems reasonably well thought out. An inconsistency I do see is that in Execute method, you have delegate as an argument name, which then get assigned to the eventHandle. Why not call the argument the same way, so it's clear when writing the Execute method what it should be?



Not fully asynchronous, may be blocked by main thread



On its own, it works. However, I do want to call your attention to the fact that it won't work in more complicated scenarios. Here's a example where this breaks:



Executed in immediate windows:



call runtest: for i = 0 to 100 : debug.Print i: doevents: next


In theory, the output of the Debug.Print i should be interleaved with other tasks progress. In practice, I'm consistently getting 5 tasks completed, then 100s of the outputted i, then the other 5 tasks completed. This tells me that if the main thread is allow to do things while the tasks in other threads are running, those will be blocked until the main thread becomes idle. Thus it is easy to destroy the asynchrony if one is not careful.



As you can see, even sprinkling DoEvents is apparently not sufficient. Thinking about it, it's not that surprising because the "events" come from the same VBA code which must run in the UI thread. Therefore, if this becomes a problem, you must fully delegate the asynchronously to an external process/thread (e.g. via an external library) in which you can communicate progress to VBA code via events, rather than relying on the external object to raise event into the worker, which must raise event to the thread handle then to the manager.



Note that even if you went this route, you would need to have a retry logic to successfully raise the event because running VBA code may prevent the entrancy. There is an opportunity for an event to be missed, especially if there is high rate of activity and too much inputs. That has been observed before here, here, here. Those examples illustrates the case where something doesn't always fire the way we expect it to.



As suggested in the comments, changing the # of max threads may work better. For example, setting the max threads to 2, we see the the task 1 and task 2 completing before blocking to print the i for 90 times, then task 3 is allowed to start then complete before we block again for last 10 more times and the rest complete. However, this is very consistent on my system. In 3 times I ran the test, task 3 don't start until 90 has printed, and the rest don't start until 100 has printed. On the other hand, if I set maxthreads to 1, I get one task completed, then am blocked for 100 times before the rest of tasks are allowed to run. I do not expect it to be easily reproducible since this will be affected by host of factors (the hardware, the version of Windows, the version of Excel). This is simply something to be aware and explicitly design for the possibility.



If it's critical to you that you do not get blocked, you need to consider a different approach. For instance, you'd need an external .NET library that creates a thread, run the task, then write the output to a file. That allows the main thread to read it at its leisure and none of the spawned threads will accidentally get blocked when the main thread needs to do something. Even so, it is still subject to the fact that it can be blocked when trying to spawn a new thread (since you need VBA to run code to create one, even if it's just calling a external function in a DLL).



Note: in all of my tests, only Debug.Print printed were the Task N started and Task N completed and Events terminated. I had commented out the PrintState; otherwise, the immediate window would overflow and I can't see all the output from the start to end.



WinHttp instead of MSXML



Also, I want bring your attention to the fact that you could have had a instance of WinHttp.WinHttpRequest which supports events natively. Therefore you could declare a variable like Private WithEvents request As WinHttp.WinHttpRequest and instead listen for the event. That means you don't need to set up default member as you have to with the MSXML and if the internet request is all you do, you don't even need the thread collection; just have a collection of WinHttp.WinHttpRequest and listen to their events.



But that's obviously not a generic solution, and using WinHttp.WinHttpRequest doesn't preclude us from using it with the above solution for case where we want a worker to handle the setup.



Don't recurse if you don't have to



You have a doInstructions which calls itself recursively. But IMPOV, there is no reason to recurse. You can do the same thing with a simple loop as demonstrated with my hacky change. A more proper solution may be to use multiple conditions or a flag variable on the bottom of the loop (guaranteeing that it executes at least once). That ensure you never need to worry about stack overflow.



Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction

Do
'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID

'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy

BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If

Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select

'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
'doInstructions loopcount:=loopcount + 1 'watch for infinite loop
freeThreadID = vbNullString
loopcount = loopcount + 1
Else
Exit Do
End If
Loop

End Sub


Only one worker type per manager class



In the sample, we have this:



Set .AsyncClass = New clsHtmlWorker


I was under the impression that the point of the manager was to allow us to create threads for different workers each running asynchronously. For that reason, it seems strange that I can only use AsyncClass to set a single implementation of IWorker. Shouldn't I be able to assign a bunch of workers, with their own arguments to be then enlisted? That would be more intuitive use of the manager, I think. Something like this:



Set .AsyncObjectsToExecute = Array( _
WorkerType1Factory.Create("some argument one"), _
WorkerType1Factory.Create("another argument"), _
WorkerType2Factory.Create("do that", 123, 495), _
WorkerType2Factory.Create("but not that", 0, 0), _
WorkerType3Factory.Create() _
)


From that, we can obviously see what tasks we are setting up to be executed, by passing in IWorker factories. That would not constraint us to only one particular worker class, and it helps sees what arguments we are sending for each task.



Encapsulate your private fields as a type



I would suggest that you take a page out of @MathieuGuindon's book and use his method:



```'THREAD HANDLE BASE PROPERTIES
Private Type THandle
eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
taskID As String 'holds the id of the current task
handleID As String 'holds the id of this handle
handleArgs As Variant 'holds any arguments that need to be passed to the task
End Type
Private this As THandle


That gives you intellisense of the class' fields just by using this. and thus help clearly differentiating between the class' private backing fields and exposed properties. It's one way to help wrong code look wrong.



RaiseEvents in the Initialize event



This seems to me possibly troublesome:



Private Sub Class_Initialize()
RaiseEvent Opened(startTime)
End Sub


My personal rule for any work inside Initialize and Terminate events is to keep the code totally isolated to the class. It is generally not appropriate to reach out outside the class because it hasn't finished constructing itself and as such, the results may be unpredictable. In this case the effect may be benign because we only raise an event with a reported parameter. However, in more complicated implementation, that can act up in unexpected manners. Besides, it really doesn't buy us anything because if the Initialize was called, we already know about it, since we usually have to had initialized it in the first place.



Do you really need a queue



It's cool that you're using .NET queue to help set up the arguments. However, that means you have a extra reference and you already have a number of references, which makes it hard to distribute the code to different environment.



One way would be to late-bind the queue, by declaring it as object and doing a CreateObject("System.Collections.Queue") instead. That lets you avoid the need to add an explicit reference to the .NET core library and thus be agnostic about the .NET framework version since in this case, the class won't change between versions.



The alternative is to just use a built-in VBA collection. IINM, you only use queues to collect the arguments, which a VBA colection can do just as well. This would give you queue-like behavior:



col.Add ...
col.Add ...
col.Add ...

Do Until col.Count = 0
col.Remove 1
Loop


without any external references.



Don't use default naming from interface



As @Raystafarian alluded to, you shouldn't accept the default RHS naming when you implement an interface. I hate it personally, and always change the name. The implementation of interfaces doesn't care what name you use. The only thing it care about is that there's a procedure named something (check), and it has N number of arguments (check), and that each argument are of same data type (check). It doesn't even look at the names of the arguments. So you should change them to something more sensible. If you're feeling unimaginative, call them Value, which is what I usually do. Just don't leave it as RHS.






share|improve this answer























  • A lot of insightful tips here; recursion, default naming, queueing - sometimes you need someone to look at your code from a different POV to spot those things, thanks. I've also recently been trying out the private field type, (actually did it by accident here too!). A couple of questions though: Naming - you say you don't like the names, and go on to highlight some misleading ones, but are there any other problems with naming (except RHS)? I mostly prefix with cls in Hungarian style purely so all my stuff is within easy intellisense reach (because no namespaces)
    – Greedo
    Mar 29 at 9:08











  • Also the may be blocked by main thread test you ran. I was initially worried when you described what was going on, but I find the class is actually working as expected. The first 5 run asynchronously but all in the same process (an embarrassing oversight on my part of using MSXML2.XMLHTTP60) - they completely occupy the UI thread so no printing. They raise complete events, then we get into the For loop, and I get a 0 printed - then the next 5 events from DoEvents, then the class quits, then the thread is free to print the remaining numbers. So as expected given 1 thread?
    – Greedo
    Mar 29 at 9:16










  • Try commenting out the content of PrintState and changing maxThreads to 2 or something - it reveals more about what's going on, though I'm still not entirely sure myself and was wondering if you could take another look?
    – Greedo
    Mar 29 at 9:20










  • @Greedo I have addressed your questions and edited my answer accordingly.
    – this
    Mar 29 at 10:49







  • 1




    This is a great answer
    – Raystafarian
    Mar 30 at 11:36

















up vote
6
down vote



+25










This is way above my expertise, but maybe adding an answer would cause more views/answers? Also, what's that beginner tag doing there? ;)



I want to say first off, really solid work. That's probably why there hasn't been too much activity here. Maybe some of this will seem like nit-picks and if so, Sorry!




ByRef or ByVal arguments



Every one of the arguments, as you probably know, that aren't declared ByVal are implicitly ByRef. I imagine you probably need a lot of these to be ByRef, but it's better to explicitly declare them ByRef so it's easier to tell that it's supposed to be ByRef.



obj parameter



clsHandleEvents.Complete(obj as clsThreadHandle)
clsHandleEvents.NotifyComplete(obj as clsThreadHandle)
clsMultiThread.threadEvents_Complete(obj as clsThreadHandle)
multiThreadMethods.isIterable(obj as Variant)


The first three could probably use a new name given they are all the same type. Maybe threadObject or threadHandle - up to you.



The fourth one is taking a variant and returning a boolean, it could be named anything from testObject to iteratorGroup. You might want to try renaming some of those



Queue



clsMultiThread



Private iterableQueue As mscorlib.Queue 
Set iterableQueue = New Queue


multiThreadMethods



Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue)


Maybe I'm dense, being a VBA guy, but how would I know what a queue is and what methods it uses? It's not a standard reference library for VBA, is it? I'd probably add a comment in there explaining why you've chosen to do it this way, so nobody would have to go back through all of it to figure out why it's better than some other way.



RHS?



clsHtmlWorker



Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property


What is RHS? Is it a constant, it's in all caps. I can tell you know better using Html in your project rather than HTML - so what is it?




All that being said, I can't even really figure out how this is working when I step through it, I see where requests should be sent (httpRequest), but I can't figure out how they come back and populate the sheet, I don't see it happening, which I think is the point, being asynchronous?






share|improve this answer























  • Thanks alot for this - some good points here I hadn't considered. I put the Beginner tag because this is the first bit of reusable code I've thought worthy of review, I'm glad you think it's solid! Those RHSs were grabbed directly from one of the examples I referenced, but I agree, they are in poor taste. You're exactly right about the debugging asynchronous stuff, that's partly why I have so much debug.print, as you can't step through the operational code. Good point on Queue. Fingers crossed I can get some more good answers like this on some of the other bits of the code
    – Greedo
    Feb 15 at 19:39











  • I like the debugging = true part of it so you can see what's happening in the immediate window, with all the updates that get written to it. Some would say that's some wasted effort, but there's not much else you could do for debugging it.
    – Raystafarian
    Feb 15 at 21:28










  • RHS = Right Hand Side?
    – Solomon Ucko
    Mar 24 at 1:56










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185212%2fa-new-approach-to-multithreading-in-excel%23new-answer', 'question_page');

);

Post as a guest






























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
10
down vote



accepted
+500










Interesting idea and well-done!



Naming



I really don't like the names. The names like clsMultiThread is somewhat misleading, since as you noted they don't actually provide any true multi-threading. A unwary user would expect it to work with anything and would be disappointed when all of their queued works painfully completes synchronously. ;)



Also, we are not really using threads, but objects which may or may not run in-process or not. You used MSXML2.XMLHTTP60 so it should run in-process. However, it need not be the case if we were to use something like ShDocVw.WebBrowser or even Excel.Application which may run out-of-process. That's not threading. So in that case we are actually talking about running things asynchronously more than threading.



May I instead suggest names like ParallelTaskCoordinator, TaskHandle, and AsyncObjectWatcher? The point being that it should convey the notion that those objects have nothing to do with running asynchronously; we're just orchestrating the asynchronous tasks to run in parallel.



In the comments you asked about hungarian notations. My personal preference is to not use hungarian notation for objects, so I wouldn't have used cls prefix. The concerns about namespace is not going to be helped by whether you prefix or not, mainly because a good namespace is about logical grouping, as opposed to clumping them together based on their type of modules. I'm OK with using HN for private variables, not so much with public-facing properties such as the module's name, the public property since they just distract from their semantic meaning. The semantic part of the name is far more important, and a good naming convention should support that.



On general, your naming scheme seems reasonably well thought out. An inconsistency I do see is that in Execute method, you have delegate as an argument name, which then get assigned to the eventHandle. Why not call the argument the same way, so it's clear when writing the Execute method what it should be?



Not fully asynchronous, may be blocked by main thread



On its own, it works. However, I do want to call your attention to the fact that it won't work in more complicated scenarios. Here's a example where this breaks:



Executed in immediate windows:



call runtest: for i = 0 to 100 : debug.Print i: doevents: next


In theory, the output of the Debug.Print i should be interleaved with other tasks progress. In practice, I'm consistently getting 5 tasks completed, then 100s of the outputted i, then the other 5 tasks completed. This tells me that if the main thread is allow to do things while the tasks in other threads are running, those will be blocked until the main thread becomes idle. Thus it is easy to destroy the asynchrony if one is not careful.



As you can see, even sprinkling DoEvents is apparently not sufficient. Thinking about it, it's not that surprising because the "events" come from the same VBA code which must run in the UI thread. Therefore, if this becomes a problem, you must fully delegate the asynchronously to an external process/thread (e.g. via an external library) in which you can communicate progress to VBA code via events, rather than relying on the external object to raise event into the worker, which must raise event to the thread handle then to the manager.



Note that even if you went this route, you would need to have a retry logic to successfully raise the event because running VBA code may prevent the entrancy. There is an opportunity for an event to be missed, especially if there is high rate of activity and too much inputs. That has been observed before here, here, here. Those examples illustrates the case where something doesn't always fire the way we expect it to.



As suggested in the comments, changing the # of max threads may work better. For example, setting the max threads to 2, we see the the task 1 and task 2 completing before blocking to print the i for 90 times, then task 3 is allowed to start then complete before we block again for last 10 more times and the rest complete. However, this is very consistent on my system. In 3 times I ran the test, task 3 don't start until 90 has printed, and the rest don't start until 100 has printed. On the other hand, if I set maxthreads to 1, I get one task completed, then am blocked for 100 times before the rest of tasks are allowed to run. I do not expect it to be easily reproducible since this will be affected by host of factors (the hardware, the version of Windows, the version of Excel). This is simply something to be aware and explicitly design for the possibility.



If it's critical to you that you do not get blocked, you need to consider a different approach. For instance, you'd need an external .NET library that creates a thread, run the task, then write the output to a file. That allows the main thread to read it at its leisure and none of the spawned threads will accidentally get blocked when the main thread needs to do something. Even so, it is still subject to the fact that it can be blocked when trying to spawn a new thread (since you need VBA to run code to create one, even if it's just calling a external function in a DLL).



Note: in all of my tests, only Debug.Print printed were the Task N started and Task N completed and Events terminated. I had commented out the PrintState; otherwise, the immediate window would overflow and I can't see all the output from the start to end.



WinHttp instead of MSXML



Also, I want bring your attention to the fact that you could have had a instance of WinHttp.WinHttpRequest which supports events natively. Therefore you could declare a variable like Private WithEvents request As WinHttp.WinHttpRequest and instead listen for the event. That means you don't need to set up default member as you have to with the MSXML and if the internet request is all you do, you don't even need the thread collection; just have a collection of WinHttp.WinHttpRequest and listen to their events.



But that's obviously not a generic solution, and using WinHttp.WinHttpRequest doesn't preclude us from using it with the above solution for case where we want a worker to handle the setup.



Don't recurse if you don't have to



You have a doInstructions which calls itself recursively. But IMPOV, there is no reason to recurse. You can do the same thing with a simple loop as demonstrated with my hacky change. A more proper solution may be to use multiple conditions or a flag variable on the bottom of the loop (guaranteeing that it executes at least once). That ensure you never need to worry about stack overflow.



Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction

Do
'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID

'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy

BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If

Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select

'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
'doInstructions loopcount:=loopcount + 1 'watch for infinite loop
freeThreadID = vbNullString
loopcount = loopcount + 1
Else
Exit Do
End If
Loop

End Sub


Only one worker type per manager class



In the sample, we have this:



Set .AsyncClass = New clsHtmlWorker


I was under the impression that the point of the manager was to allow us to create threads for different workers each running asynchronously. For that reason, it seems strange that I can only use AsyncClass to set a single implementation of IWorker. Shouldn't I be able to assign a bunch of workers, with their own arguments to be then enlisted? That would be more intuitive use of the manager, I think. Something like this:



Set .AsyncObjectsToExecute = Array( _
WorkerType1Factory.Create("some argument one"), _
WorkerType1Factory.Create("another argument"), _
WorkerType2Factory.Create("do that", 123, 495), _
WorkerType2Factory.Create("but not that", 0, 0), _
WorkerType3Factory.Create() _
)


From that, we can obviously see what tasks we are setting up to be executed, by passing in IWorker factories. That would not constraint us to only one particular worker class, and it helps sees what arguments we are sending for each task.



Encapsulate your private fields as a type



I would suggest that you take a page out of @MathieuGuindon's book and use his method:



```'THREAD HANDLE BASE PROPERTIES
Private Type THandle
eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
taskID As String 'holds the id of the current task
handleID As String 'holds the id of this handle
handleArgs As Variant 'holds any arguments that need to be passed to the task
End Type
Private this As THandle


That gives you intellisense of the class' fields just by using this. and thus help clearly differentiating between the class' private backing fields and exposed properties. It's one way to help wrong code look wrong.



RaiseEvents in the Initialize event



This seems to me possibly troublesome:



Private Sub Class_Initialize()
RaiseEvent Opened(startTime)
End Sub


My personal rule for any work inside Initialize and Terminate events is to keep the code totally isolated to the class. It is generally not appropriate to reach out outside the class because it hasn't finished constructing itself and as such, the results may be unpredictable. In this case the effect may be benign because we only raise an event with a reported parameter. However, in more complicated implementation, that can act up in unexpected manners. Besides, it really doesn't buy us anything because if the Initialize was called, we already know about it, since we usually have to had initialized it in the first place.



Do you really need a queue



It's cool that you're using .NET queue to help set up the arguments. However, that means you have a extra reference and you already have a number of references, which makes it hard to distribute the code to different environment.



One way would be to late-bind the queue, by declaring it as object and doing a CreateObject("System.Collections.Queue") instead. That lets you avoid the need to add an explicit reference to the .NET core library and thus be agnostic about the .NET framework version since in this case, the class won't change between versions.



The alternative is to just use a built-in VBA collection. IINM, you only use queues to collect the arguments, which a VBA colection can do just as well. This would give you queue-like behavior:



col.Add ...
col.Add ...
col.Add ...

Do Until col.Count = 0
col.Remove 1
Loop


without any external references.



Don't use default naming from interface



As @Raystafarian alluded to, you shouldn't accept the default RHS naming when you implement an interface. I hate it personally, and always change the name. The implementation of interfaces doesn't care what name you use. The only thing it care about is that there's a procedure named something (check), and it has N number of arguments (check), and that each argument are of same data type (check). It doesn't even look at the names of the arguments. So you should change them to something more sensible. If you're feeling unimaginative, call them Value, which is what I usually do. Just don't leave it as RHS.






share|improve this answer























  • A lot of insightful tips here; recursion, default naming, queueing - sometimes you need someone to look at your code from a different POV to spot those things, thanks. I've also recently been trying out the private field type, (actually did it by accident here too!). A couple of questions though: Naming - you say you don't like the names, and go on to highlight some misleading ones, but are there any other problems with naming (except RHS)? I mostly prefix with cls in Hungarian style purely so all my stuff is within easy intellisense reach (because no namespaces)
    – Greedo
    Mar 29 at 9:08











  • Also the may be blocked by main thread test you ran. I was initially worried when you described what was going on, but I find the class is actually working as expected. The first 5 run asynchronously but all in the same process (an embarrassing oversight on my part of using MSXML2.XMLHTTP60) - they completely occupy the UI thread so no printing. They raise complete events, then we get into the For loop, and I get a 0 printed - then the next 5 events from DoEvents, then the class quits, then the thread is free to print the remaining numbers. So as expected given 1 thread?
    – Greedo
    Mar 29 at 9:16










  • Try commenting out the content of PrintState and changing maxThreads to 2 or something - it reveals more about what's going on, though I'm still not entirely sure myself and was wondering if you could take another look?
    – Greedo
    Mar 29 at 9:20










  • @Greedo I have addressed your questions and edited my answer accordingly.
    – this
    Mar 29 at 10:49







  • 1




    This is a great answer
    – Raystafarian
    Mar 30 at 11:36














up vote
10
down vote



accepted
+500










Interesting idea and well-done!



Naming



I really don't like the names. The names like clsMultiThread is somewhat misleading, since as you noted they don't actually provide any true multi-threading. A unwary user would expect it to work with anything and would be disappointed when all of their queued works painfully completes synchronously. ;)



Also, we are not really using threads, but objects which may or may not run in-process or not. You used MSXML2.XMLHTTP60 so it should run in-process. However, it need not be the case if we were to use something like ShDocVw.WebBrowser or even Excel.Application which may run out-of-process. That's not threading. So in that case we are actually talking about running things asynchronously more than threading.



May I instead suggest names like ParallelTaskCoordinator, TaskHandle, and AsyncObjectWatcher? The point being that it should convey the notion that those objects have nothing to do with running asynchronously; we're just orchestrating the asynchronous tasks to run in parallel.



In the comments you asked about hungarian notations. My personal preference is to not use hungarian notation for objects, so I wouldn't have used cls prefix. The concerns about namespace is not going to be helped by whether you prefix or not, mainly because a good namespace is about logical grouping, as opposed to clumping them together based on their type of modules. I'm OK with using HN for private variables, not so much with public-facing properties such as the module's name, the public property since they just distract from their semantic meaning. The semantic part of the name is far more important, and a good naming convention should support that.



On general, your naming scheme seems reasonably well thought out. An inconsistency I do see is that in Execute method, you have delegate as an argument name, which then get assigned to the eventHandle. Why not call the argument the same way, so it's clear when writing the Execute method what it should be?



Not fully asynchronous, may be blocked by main thread



On its own, it works. However, I do want to call your attention to the fact that it won't work in more complicated scenarios. Here's a example where this breaks:



Executed in immediate windows:



call runtest: for i = 0 to 100 : debug.Print i: doevents: next


In theory, the output of the Debug.Print i should be interleaved with other tasks progress. In practice, I'm consistently getting 5 tasks completed, then 100s of the outputted i, then the other 5 tasks completed. This tells me that if the main thread is allow to do things while the tasks in other threads are running, those will be blocked until the main thread becomes idle. Thus it is easy to destroy the asynchrony if one is not careful.



As you can see, even sprinkling DoEvents is apparently not sufficient. Thinking about it, it's not that surprising because the "events" come from the same VBA code which must run in the UI thread. Therefore, if this becomes a problem, you must fully delegate the asynchronously to an external process/thread (e.g. via an external library) in which you can communicate progress to VBA code via events, rather than relying on the external object to raise event into the worker, which must raise event to the thread handle then to the manager.



Note that even if you went this route, you would need to have a retry logic to successfully raise the event because running VBA code may prevent the entrancy. There is an opportunity for an event to be missed, especially if there is high rate of activity and too much inputs. That has been observed before here, here, here. Those examples illustrates the case where something doesn't always fire the way we expect it to.



As suggested in the comments, changing the # of max threads may work better. For example, setting the max threads to 2, we see the the task 1 and task 2 completing before blocking to print the i for 90 times, then task 3 is allowed to start then complete before we block again for last 10 more times and the rest complete. However, this is very consistent on my system. In 3 times I ran the test, task 3 don't start until 90 has printed, and the rest don't start until 100 has printed. On the other hand, if I set maxthreads to 1, I get one task completed, then am blocked for 100 times before the rest of tasks are allowed to run. I do not expect it to be easily reproducible since this will be affected by host of factors (the hardware, the version of Windows, the version of Excel). This is simply something to be aware and explicitly design for the possibility.



If it's critical to you that you do not get blocked, you need to consider a different approach. For instance, you'd need an external .NET library that creates a thread, run the task, then write the output to a file. That allows the main thread to read it at its leisure and none of the spawned threads will accidentally get blocked when the main thread needs to do something. Even so, it is still subject to the fact that it can be blocked when trying to spawn a new thread (since you need VBA to run code to create one, even if it's just calling a external function in a DLL).



Note: in all of my tests, only Debug.Print printed were the Task N started and Task N completed and Events terminated. I had commented out the PrintState; otherwise, the immediate window would overflow and I can't see all the output from the start to end.



WinHttp instead of MSXML



Also, I want bring your attention to the fact that you could have had a instance of WinHttp.WinHttpRequest which supports events natively. Therefore you could declare a variable like Private WithEvents request As WinHttp.WinHttpRequest and instead listen for the event. That means you don't need to set up default member as you have to with the MSXML and if the internet request is all you do, you don't even need the thread collection; just have a collection of WinHttp.WinHttpRequest and listen to their events.



But that's obviously not a generic solution, and using WinHttp.WinHttpRequest doesn't preclude us from using it with the above solution for case where we want a worker to handle the setup.



Don't recurse if you don't have to



You have a doInstructions which calls itself recursively. But IMPOV, there is no reason to recurse. You can do the same thing with a simple loop as demonstrated with my hacky change. A more proper solution may be to use multiple conditions or a flag variable on the bottom of the loop (guaranteeing that it executes at least once). That ensure you never need to worry about stack overflow.



Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction

Do
'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID

'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy

BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If

Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select

'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
'doInstructions loopcount:=loopcount + 1 'watch for infinite loop
freeThreadID = vbNullString
loopcount = loopcount + 1
Else
Exit Do
End If
Loop

End Sub


Only one worker type per manager class



In the sample, we have this:



Set .AsyncClass = New clsHtmlWorker


I was under the impression that the point of the manager was to allow us to create threads for different workers each running asynchronously. For that reason, it seems strange that I can only use AsyncClass to set a single implementation of IWorker. Shouldn't I be able to assign a bunch of workers, with their own arguments to be then enlisted? That would be more intuitive use of the manager, I think. Something like this:



Set .AsyncObjectsToExecute = Array( _
WorkerType1Factory.Create("some argument one"), _
WorkerType1Factory.Create("another argument"), _
WorkerType2Factory.Create("do that", 123, 495), _
WorkerType2Factory.Create("but not that", 0, 0), _
WorkerType3Factory.Create() _
)


From that, we can obviously see what tasks we are setting up to be executed, by passing in IWorker factories. That would not constraint us to only one particular worker class, and it helps sees what arguments we are sending for each task.



Encapsulate your private fields as a type



I would suggest that you take a page out of @MathieuGuindon's book and use his method:



```'THREAD HANDLE BASE PROPERTIES
Private Type THandle
eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
taskID As String 'holds the id of the current task
handleID As String 'holds the id of this handle
handleArgs As Variant 'holds any arguments that need to be passed to the task
End Type
Private this As THandle


That gives you intellisense of the class' fields just by using this. and thus help clearly differentiating between the class' private backing fields and exposed properties. It's one way to help wrong code look wrong.



RaiseEvents in the Initialize event



This seems to me possibly troublesome:



Private Sub Class_Initialize()
RaiseEvent Opened(startTime)
End Sub


My personal rule for any work inside Initialize and Terminate events is to keep the code totally isolated to the class. It is generally not appropriate to reach out outside the class because it hasn't finished constructing itself and as such, the results may be unpredictable. In this case the effect may be benign because we only raise an event with a reported parameter. However, in more complicated implementation, that can act up in unexpected manners. Besides, it really doesn't buy us anything because if the Initialize was called, we already know about it, since we usually have to had initialized it in the first place.



Do you really need a queue



It's cool that you're using .NET queue to help set up the arguments. However, that means you have a extra reference and you already have a number of references, which makes it hard to distribute the code to different environment.



One way would be to late-bind the queue, by declaring it as object and doing a CreateObject("System.Collections.Queue") instead. That lets you avoid the need to add an explicit reference to the .NET core library and thus be agnostic about the .NET framework version since in this case, the class won't change between versions.



The alternative is to just use a built-in VBA collection. IINM, you only use queues to collect the arguments, which a VBA colection can do just as well. This would give you queue-like behavior:



col.Add ...
col.Add ...
col.Add ...

Do Until col.Count = 0
col.Remove 1
Loop


without any external references.



Don't use default naming from interface



As @Raystafarian alluded to, you shouldn't accept the default RHS naming when you implement an interface. I hate it personally, and always change the name. The implementation of interfaces doesn't care what name you use. The only thing it care about is that there's a procedure named something (check), and it has N number of arguments (check), and that each argument are of same data type (check). It doesn't even look at the names of the arguments. So you should change them to something more sensible. If you're feeling unimaginative, call them Value, which is what I usually do. Just don't leave it as RHS.






share|improve this answer























  • A lot of insightful tips here; recursion, default naming, queueing - sometimes you need someone to look at your code from a different POV to spot those things, thanks. I've also recently been trying out the private field type, (actually did it by accident here too!). A couple of questions though: Naming - you say you don't like the names, and go on to highlight some misleading ones, but are there any other problems with naming (except RHS)? I mostly prefix with cls in Hungarian style purely so all my stuff is within easy intellisense reach (because no namespaces)
    – Greedo
    Mar 29 at 9:08











  • Also the may be blocked by main thread test you ran. I was initially worried when you described what was going on, but I find the class is actually working as expected. The first 5 run asynchronously but all in the same process (an embarrassing oversight on my part of using MSXML2.XMLHTTP60) - they completely occupy the UI thread so no printing. They raise complete events, then we get into the For loop, and I get a 0 printed - then the next 5 events from DoEvents, then the class quits, then the thread is free to print the remaining numbers. So as expected given 1 thread?
    – Greedo
    Mar 29 at 9:16










  • Try commenting out the content of PrintState and changing maxThreads to 2 or something - it reveals more about what's going on, though I'm still not entirely sure myself and was wondering if you could take another look?
    – Greedo
    Mar 29 at 9:20










  • @Greedo I have addressed your questions and edited my answer accordingly.
    – this
    Mar 29 at 10:49







  • 1




    This is a great answer
    – Raystafarian
    Mar 30 at 11:36












up vote
10
down vote



accepted
+500







up vote
10
down vote



accepted
+500




+500




Interesting idea and well-done!



Naming



I really don't like the names. The names like clsMultiThread is somewhat misleading, since as you noted they don't actually provide any true multi-threading. A unwary user would expect it to work with anything and would be disappointed when all of their queued works painfully completes synchronously. ;)



Also, we are not really using threads, but objects which may or may not run in-process or not. You used MSXML2.XMLHTTP60 so it should run in-process. However, it need not be the case if we were to use something like ShDocVw.WebBrowser or even Excel.Application which may run out-of-process. That's not threading. So in that case we are actually talking about running things asynchronously more than threading.



May I instead suggest names like ParallelTaskCoordinator, TaskHandle, and AsyncObjectWatcher? The point being that it should convey the notion that those objects have nothing to do with running asynchronously; we're just orchestrating the asynchronous tasks to run in parallel.



In the comments you asked about hungarian notations. My personal preference is to not use hungarian notation for objects, so I wouldn't have used cls prefix. The concerns about namespace is not going to be helped by whether you prefix or not, mainly because a good namespace is about logical grouping, as opposed to clumping them together based on their type of modules. I'm OK with using HN for private variables, not so much with public-facing properties such as the module's name, the public property since they just distract from their semantic meaning. The semantic part of the name is far more important, and a good naming convention should support that.



On general, your naming scheme seems reasonably well thought out. An inconsistency I do see is that in Execute method, you have delegate as an argument name, which then get assigned to the eventHandle. Why not call the argument the same way, so it's clear when writing the Execute method what it should be?



Not fully asynchronous, may be blocked by main thread



On its own, it works. However, I do want to call your attention to the fact that it won't work in more complicated scenarios. Here's a example where this breaks:



Executed in immediate windows:



call runtest: for i = 0 to 100 : debug.Print i: doevents: next


In theory, the output of the Debug.Print i should be interleaved with other tasks progress. In practice, I'm consistently getting 5 tasks completed, then 100s of the outputted i, then the other 5 tasks completed. This tells me that if the main thread is allow to do things while the tasks in other threads are running, those will be blocked until the main thread becomes idle. Thus it is easy to destroy the asynchrony if one is not careful.



As you can see, even sprinkling DoEvents is apparently not sufficient. Thinking about it, it's not that surprising because the "events" come from the same VBA code which must run in the UI thread. Therefore, if this becomes a problem, you must fully delegate the asynchronously to an external process/thread (e.g. via an external library) in which you can communicate progress to VBA code via events, rather than relying on the external object to raise event into the worker, which must raise event to the thread handle then to the manager.



Note that even if you went this route, you would need to have a retry logic to successfully raise the event because running VBA code may prevent the entrancy. There is an opportunity for an event to be missed, especially if there is high rate of activity and too much inputs. That has been observed before here, here, here. Those examples illustrates the case where something doesn't always fire the way we expect it to.



As suggested in the comments, changing the # of max threads may work better. For example, setting the max threads to 2, we see the the task 1 and task 2 completing before blocking to print the i for 90 times, then task 3 is allowed to start then complete before we block again for last 10 more times and the rest complete. However, this is very consistent on my system. In 3 times I ran the test, task 3 don't start until 90 has printed, and the rest don't start until 100 has printed. On the other hand, if I set maxthreads to 1, I get one task completed, then am blocked for 100 times before the rest of tasks are allowed to run. I do not expect it to be easily reproducible since this will be affected by host of factors (the hardware, the version of Windows, the version of Excel). This is simply something to be aware and explicitly design for the possibility.



If it's critical to you that you do not get blocked, you need to consider a different approach. For instance, you'd need an external .NET library that creates a thread, run the task, then write the output to a file. That allows the main thread to read it at its leisure and none of the spawned threads will accidentally get blocked when the main thread needs to do something. Even so, it is still subject to the fact that it can be blocked when trying to spawn a new thread (since you need VBA to run code to create one, even if it's just calling a external function in a DLL).



Note: in all of my tests, only Debug.Print printed were the Task N started and Task N completed and Events terminated. I had commented out the PrintState; otherwise, the immediate window would overflow and I can't see all the output from the start to end.



WinHttp instead of MSXML



Also, I want bring your attention to the fact that you could have had a instance of WinHttp.WinHttpRequest which supports events natively. Therefore you could declare a variable like Private WithEvents request As WinHttp.WinHttpRequest and instead listen for the event. That means you don't need to set up default member as you have to with the MSXML and if the internet request is all you do, you don't even need the thread collection; just have a collection of WinHttp.WinHttpRequest and listen to their events.



But that's obviously not a generic solution, and using WinHttp.WinHttpRequest doesn't preclude us from using it with the above solution for case where we want a worker to handle the setup.



Don't recurse if you don't have to



You have a doInstructions which calls itself recursively. But IMPOV, there is no reason to recurse. You can do the same thing with a simple loop as demonstrated with my hacky change. A more proper solution may be to use multiple conditions or a flag variable on the bottom of the loop (guaranteeing that it executes at least once). That ensure you never need to worry about stack overflow.



Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction

Do
'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID

'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy

BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If

Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select

'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
'doInstructions loopcount:=loopcount + 1 'watch for infinite loop
freeThreadID = vbNullString
loopcount = loopcount + 1
Else
Exit Do
End If
Loop

End Sub


Only one worker type per manager class



In the sample, we have this:



Set .AsyncClass = New clsHtmlWorker


I was under the impression that the point of the manager was to allow us to create threads for different workers each running asynchronously. For that reason, it seems strange that I can only use AsyncClass to set a single implementation of IWorker. Shouldn't I be able to assign a bunch of workers, with their own arguments to be then enlisted? That would be more intuitive use of the manager, I think. Something like this:



Set .AsyncObjectsToExecute = Array( _
WorkerType1Factory.Create("some argument one"), _
WorkerType1Factory.Create("another argument"), _
WorkerType2Factory.Create("do that", 123, 495), _
WorkerType2Factory.Create("but not that", 0, 0), _
WorkerType3Factory.Create() _
)


From that, we can obviously see what tasks we are setting up to be executed, by passing in IWorker factories. That would not constraint us to only one particular worker class, and it helps sees what arguments we are sending for each task.



Encapsulate your private fields as a type



I would suggest that you take a page out of @MathieuGuindon's book and use his method:



```'THREAD HANDLE BASE PROPERTIES
Private Type THandle
eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
taskID As String 'holds the id of the current task
handleID As String 'holds the id of this handle
handleArgs As Variant 'holds any arguments that need to be passed to the task
End Type
Private this As THandle


That gives you intellisense of the class' fields just by using this. and thus help clearly differentiating between the class' private backing fields and exposed properties. It's one way to help wrong code look wrong.



RaiseEvents in the Initialize event



This seems to me possibly troublesome:



Private Sub Class_Initialize()
RaiseEvent Opened(startTime)
End Sub


My personal rule for any work inside Initialize and Terminate events is to keep the code totally isolated to the class. It is generally not appropriate to reach out outside the class because it hasn't finished constructing itself and as such, the results may be unpredictable. In this case the effect may be benign because we only raise an event with a reported parameter. However, in more complicated implementation, that can act up in unexpected manners. Besides, it really doesn't buy us anything because if the Initialize was called, we already know about it, since we usually have to had initialized it in the first place.



Do you really need a queue



It's cool that you're using .NET queue to help set up the arguments. However, that means you have a extra reference and you already have a number of references, which makes it hard to distribute the code to different environment.



One way would be to late-bind the queue, by declaring it as object and doing a CreateObject("System.Collections.Queue") instead. That lets you avoid the need to add an explicit reference to the .NET core library and thus be agnostic about the .NET framework version since in this case, the class won't change between versions.



The alternative is to just use a built-in VBA collection. IINM, you only use queues to collect the arguments, which a VBA colection can do just as well. This would give you queue-like behavior:



col.Add ...
col.Add ...
col.Add ...

Do Until col.Count = 0
col.Remove 1
Loop


without any external references.



Don't use default naming from interface



As @Raystafarian alluded to, you shouldn't accept the default RHS naming when you implement an interface. I hate it personally, and always change the name. The implementation of interfaces doesn't care what name you use. The only thing it care about is that there's a procedure named something (check), and it has N number of arguments (check), and that each argument are of same data type (check). It doesn't even look at the names of the arguments. So you should change them to something more sensible. If you're feeling unimaginative, call them Value, which is what I usually do. Just don't leave it as RHS.






share|improve this answer















Interesting idea and well-done!



Naming



I really don't like the names. The names like clsMultiThread is somewhat misleading, since as you noted they don't actually provide any true multi-threading. A unwary user would expect it to work with anything and would be disappointed when all of their queued works painfully completes synchronously. ;)



Also, we are not really using threads, but objects which may or may not run in-process or not. You used MSXML2.XMLHTTP60 so it should run in-process. However, it need not be the case if we were to use something like ShDocVw.WebBrowser or even Excel.Application which may run out-of-process. That's not threading. So in that case we are actually talking about running things asynchronously more than threading.



May I instead suggest names like ParallelTaskCoordinator, TaskHandle, and AsyncObjectWatcher? The point being that it should convey the notion that those objects have nothing to do with running asynchronously; we're just orchestrating the asynchronous tasks to run in parallel.



In the comments you asked about hungarian notations. My personal preference is to not use hungarian notation for objects, so I wouldn't have used cls prefix. The concerns about namespace is not going to be helped by whether you prefix or not, mainly because a good namespace is about logical grouping, as opposed to clumping them together based on their type of modules. I'm OK with using HN for private variables, not so much with public-facing properties such as the module's name, the public property since they just distract from their semantic meaning. The semantic part of the name is far more important, and a good naming convention should support that.



On general, your naming scheme seems reasonably well thought out. An inconsistency I do see is that in Execute method, you have delegate as an argument name, which then get assigned to the eventHandle. Why not call the argument the same way, so it's clear when writing the Execute method what it should be?



Not fully asynchronous, may be blocked by main thread



On its own, it works. However, I do want to call your attention to the fact that it won't work in more complicated scenarios. Here's a example where this breaks:



Executed in immediate windows:



call runtest: for i = 0 to 100 : debug.Print i: doevents: next


In theory, the output of the Debug.Print i should be interleaved with other tasks progress. In practice, I'm consistently getting 5 tasks completed, then 100s of the outputted i, then the other 5 tasks completed. This tells me that if the main thread is allow to do things while the tasks in other threads are running, those will be blocked until the main thread becomes idle. Thus it is easy to destroy the asynchrony if one is not careful.



As you can see, even sprinkling DoEvents is apparently not sufficient. Thinking about it, it's not that surprising because the "events" come from the same VBA code which must run in the UI thread. Therefore, if this becomes a problem, you must fully delegate the asynchronously to an external process/thread (e.g. via an external library) in which you can communicate progress to VBA code via events, rather than relying on the external object to raise event into the worker, which must raise event to the thread handle then to the manager.



Note that even if you went this route, you would need to have a retry logic to successfully raise the event because running VBA code may prevent the entrancy. There is an opportunity for an event to be missed, especially if there is high rate of activity and too much inputs. That has been observed before here, here, here. Those examples illustrates the case where something doesn't always fire the way we expect it to.



As suggested in the comments, changing the # of max threads may work better. For example, setting the max threads to 2, we see the the task 1 and task 2 completing before blocking to print the i for 90 times, then task 3 is allowed to start then complete before we block again for last 10 more times and the rest complete. However, this is very consistent on my system. In 3 times I ran the test, task 3 don't start until 90 has printed, and the rest don't start until 100 has printed. On the other hand, if I set maxthreads to 1, I get one task completed, then am blocked for 100 times before the rest of tasks are allowed to run. I do not expect it to be easily reproducible since this will be affected by host of factors (the hardware, the version of Windows, the version of Excel). This is simply something to be aware and explicitly design for the possibility.



If it's critical to you that you do not get blocked, you need to consider a different approach. For instance, you'd need an external .NET library that creates a thread, run the task, then write the output to a file. That allows the main thread to read it at its leisure and none of the spawned threads will accidentally get blocked when the main thread needs to do something. Even so, it is still subject to the fact that it can be blocked when trying to spawn a new thread (since you need VBA to run code to create one, even if it's just calling a external function in a DLL).



Note: in all of my tests, only Debug.Print printed were the Task N started and Task N completed and Events terminated. I had commented out the PrintState; otherwise, the immediate window would overflow and I can't see all the output from the start to end.



WinHttp instead of MSXML



Also, I want bring your attention to the fact that you could have had a instance of WinHttp.WinHttpRequest which supports events natively. Therefore you could declare a variable like Private WithEvents request As WinHttp.WinHttpRequest and instead listen for the event. That means you don't need to set up default member as you have to with the MSXML and if the internet request is all you do, you don't even need the thread collection; just have a collection of WinHttp.WinHttpRequest and listen to their events.



But that's obviously not a generic solution, and using WinHttp.WinHttpRequest doesn't preclude us from using it with the above solution for case where we want a worker to handle the setup.



Don't recurse if you don't have to



You have a doInstructions which calls itself recursively. But IMPOV, there is no reason to recurse. You can do the same thing with a simple loop as demonstrated with my hacky change. A more proper solution may be to use multiple conditions or a flag variable on the bottom of the loop (guaranteeing that it executes at least once). That ensure you never need to worry about stack overflow.



Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction

Do
'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID

'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy

BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If

Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select

'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
'doInstructions loopcount:=loopcount + 1 'watch for infinite loop
freeThreadID = vbNullString
loopcount = loopcount + 1
Else
Exit Do
End If
Loop

End Sub


Only one worker type per manager class



In the sample, we have this:



Set .AsyncClass = New clsHtmlWorker


I was under the impression that the point of the manager was to allow us to create threads for different workers each running asynchronously. For that reason, it seems strange that I can only use AsyncClass to set a single implementation of IWorker. Shouldn't I be able to assign a bunch of workers, with their own arguments to be then enlisted? That would be more intuitive use of the manager, I think. Something like this:



Set .AsyncObjectsToExecute = Array( _
WorkerType1Factory.Create("some argument one"), _
WorkerType1Factory.Create("another argument"), _
WorkerType2Factory.Create("do that", 123, 495), _
WorkerType2Factory.Create("but not that", 0, 0), _
WorkerType3Factory.Create() _
)


From that, we can obviously see what tasks we are setting up to be executed, by passing in IWorker factories. That would not constraint us to only one particular worker class, and it helps sees what arguments we are sending for each task.



Encapsulate your private fields as a type



I would suggest that you take a page out of @MathieuGuindon's book and use his method:



```'THREAD HANDLE BASE PROPERTIES
Private Type THandle
eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
taskID As String 'holds the id of the current task
handleID As String 'holds the id of this handle
handleArgs As Variant 'holds any arguments that need to be passed to the task
End Type
Private this As THandle


That gives you intellisense of the class' fields just by using this. and thus help clearly differentiating between the class' private backing fields and exposed properties. It's one way to help wrong code look wrong.



RaiseEvents in the Initialize event



This seems to me possibly troublesome:



Private Sub Class_Initialize()
RaiseEvent Opened(startTime)
End Sub


My personal rule for any work inside Initialize and Terminate events is to keep the code totally isolated to the class. It is generally not appropriate to reach out outside the class because it hasn't finished constructing itself and as such, the results may be unpredictable. In this case the effect may be benign because we only raise an event with a reported parameter. However, in more complicated implementation, that can act up in unexpected manners. Besides, it really doesn't buy us anything because if the Initialize was called, we already know about it, since we usually have to had initialized it in the first place.



Do you really need a queue



It's cool that you're using .NET queue to help set up the arguments. However, that means you have a extra reference and you already have a number of references, which makes it hard to distribute the code to different environment.



One way would be to late-bind the queue, by declaring it as object and doing a CreateObject("System.Collections.Queue") instead. That lets you avoid the need to add an explicit reference to the .NET core library and thus be agnostic about the .NET framework version since in this case, the class won't change between versions.



The alternative is to just use a built-in VBA collection. IINM, you only use queues to collect the arguments, which a VBA colection can do just as well. This would give you queue-like behavior:



col.Add ...
col.Add ...
col.Add ...

Do Until col.Count = 0
col.Remove 1
Loop


without any external references.



Don't use default naming from interface



As @Raystafarian alluded to, you shouldn't accept the default RHS naming when you implement an interface. I hate it personally, and always change the name. The implementation of interfaces doesn't care what name you use. The only thing it care about is that there's a procedure named something (check), and it has N number of arguments (check), and that each argument are of same data type (check). It doesn't even look at the names of the arguments. So you should change them to something more sensible. If you're feeling unimaginative, call them Value, which is what I usually do. Just don't leave it as RHS.







share|improve this answer















share|improve this answer



share|improve this answer








edited Mar 29 at 10:48


























answered Mar 29 at 4:14









this

1,242318




1,242318











  • A lot of insightful tips here; recursion, default naming, queueing - sometimes you need someone to look at your code from a different POV to spot those things, thanks. I've also recently been trying out the private field type, (actually did it by accident here too!). A couple of questions though: Naming - you say you don't like the names, and go on to highlight some misleading ones, but are there any other problems with naming (except RHS)? I mostly prefix with cls in Hungarian style purely so all my stuff is within easy intellisense reach (because no namespaces)
    – Greedo
    Mar 29 at 9:08











  • Also the may be blocked by main thread test you ran. I was initially worried when you described what was going on, but I find the class is actually working as expected. The first 5 run asynchronously but all in the same process (an embarrassing oversight on my part of using MSXML2.XMLHTTP60) - they completely occupy the UI thread so no printing. They raise complete events, then we get into the For loop, and I get a 0 printed - then the next 5 events from DoEvents, then the class quits, then the thread is free to print the remaining numbers. So as expected given 1 thread?
    – Greedo
    Mar 29 at 9:16










  • Try commenting out the content of PrintState and changing maxThreads to 2 or something - it reveals more about what's going on, though I'm still not entirely sure myself and was wondering if you could take another look?
    – Greedo
    Mar 29 at 9:20










  • @Greedo I have addressed your questions and edited my answer accordingly.
    – this
    Mar 29 at 10:49







  • 1




    This is a great answer
    – Raystafarian
    Mar 30 at 11:36
















  • A lot of insightful tips here; recursion, default naming, queueing - sometimes you need someone to look at your code from a different POV to spot those things, thanks. I've also recently been trying out the private field type, (actually did it by accident here too!). A couple of questions though: Naming - you say you don't like the names, and go on to highlight some misleading ones, but are there any other problems with naming (except RHS)? I mostly prefix with cls in Hungarian style purely so all my stuff is within easy intellisense reach (because no namespaces)
    – Greedo
    Mar 29 at 9:08











  • Also the may be blocked by main thread test you ran. I was initially worried when you described what was going on, but I find the class is actually working as expected. The first 5 run asynchronously but all in the same process (an embarrassing oversight on my part of using MSXML2.XMLHTTP60) - they completely occupy the UI thread so no printing. They raise complete events, then we get into the For loop, and I get a 0 printed - then the next 5 events from DoEvents, then the class quits, then the thread is free to print the remaining numbers. So as expected given 1 thread?
    – Greedo
    Mar 29 at 9:16










  • Try commenting out the content of PrintState and changing maxThreads to 2 or something - it reveals more about what's going on, though I'm still not entirely sure myself and was wondering if you could take another look?
    – Greedo
    Mar 29 at 9:20










  • @Greedo I have addressed your questions and edited my answer accordingly.
    – this
    Mar 29 at 10:49







  • 1




    This is a great answer
    – Raystafarian
    Mar 30 at 11:36















A lot of insightful tips here; recursion, default naming, queueing - sometimes you need someone to look at your code from a different POV to spot those things, thanks. I've also recently been trying out the private field type, (actually did it by accident here too!). A couple of questions though: Naming - you say you don't like the names, and go on to highlight some misleading ones, but are there any other problems with naming (except RHS)? I mostly prefix with cls in Hungarian style purely so all my stuff is within easy intellisense reach (because no namespaces)
– Greedo
Mar 29 at 9:08





A lot of insightful tips here; recursion, default naming, queueing - sometimes you need someone to look at your code from a different POV to spot those things, thanks. I've also recently been trying out the private field type, (actually did it by accident here too!). A couple of questions though: Naming - you say you don't like the names, and go on to highlight some misleading ones, but are there any other problems with naming (except RHS)? I mostly prefix with cls in Hungarian style purely so all my stuff is within easy intellisense reach (because no namespaces)
– Greedo
Mar 29 at 9:08













Also the may be blocked by main thread test you ran. I was initially worried when you described what was going on, but I find the class is actually working as expected. The first 5 run asynchronously but all in the same process (an embarrassing oversight on my part of using MSXML2.XMLHTTP60) - they completely occupy the UI thread so no printing. They raise complete events, then we get into the For loop, and I get a 0 printed - then the next 5 events from DoEvents, then the class quits, then the thread is free to print the remaining numbers. So as expected given 1 thread?
– Greedo
Mar 29 at 9:16




Also the may be blocked by main thread test you ran. I was initially worried when you described what was going on, but I find the class is actually working as expected. The first 5 run asynchronously but all in the same process (an embarrassing oversight on my part of using MSXML2.XMLHTTP60) - they completely occupy the UI thread so no printing. They raise complete events, then we get into the For loop, and I get a 0 printed - then the next 5 events from DoEvents, then the class quits, then the thread is free to print the remaining numbers. So as expected given 1 thread?
– Greedo
Mar 29 at 9:16












Try commenting out the content of PrintState and changing maxThreads to 2 or something - it reveals more about what's going on, though I'm still not entirely sure myself and was wondering if you could take another look?
– Greedo
Mar 29 at 9:20




Try commenting out the content of PrintState and changing maxThreads to 2 or something - it reveals more about what's going on, though I'm still not entirely sure myself and was wondering if you could take another look?
– Greedo
Mar 29 at 9:20












@Greedo I have addressed your questions and edited my answer accordingly.
– this
Mar 29 at 10:49





@Greedo I have addressed your questions and edited my answer accordingly.
– this
Mar 29 at 10:49





1




1




This is a great answer
– Raystafarian
Mar 30 at 11:36




This is a great answer
– Raystafarian
Mar 30 at 11:36












up vote
6
down vote



+25










This is way above my expertise, but maybe adding an answer would cause more views/answers? Also, what's that beginner tag doing there? ;)



I want to say first off, really solid work. That's probably why there hasn't been too much activity here. Maybe some of this will seem like nit-picks and if so, Sorry!




ByRef or ByVal arguments



Every one of the arguments, as you probably know, that aren't declared ByVal are implicitly ByRef. I imagine you probably need a lot of these to be ByRef, but it's better to explicitly declare them ByRef so it's easier to tell that it's supposed to be ByRef.



obj parameter



clsHandleEvents.Complete(obj as clsThreadHandle)
clsHandleEvents.NotifyComplete(obj as clsThreadHandle)
clsMultiThread.threadEvents_Complete(obj as clsThreadHandle)
multiThreadMethods.isIterable(obj as Variant)


The first three could probably use a new name given they are all the same type. Maybe threadObject or threadHandle - up to you.



The fourth one is taking a variant and returning a boolean, it could be named anything from testObject to iteratorGroup. You might want to try renaming some of those



Queue



clsMultiThread



Private iterableQueue As mscorlib.Queue 
Set iterableQueue = New Queue


multiThreadMethods



Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue)


Maybe I'm dense, being a VBA guy, but how would I know what a queue is and what methods it uses? It's not a standard reference library for VBA, is it? I'd probably add a comment in there explaining why you've chosen to do it this way, so nobody would have to go back through all of it to figure out why it's better than some other way.



RHS?



clsHtmlWorker



Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property


What is RHS? Is it a constant, it's in all caps. I can tell you know better using Html in your project rather than HTML - so what is it?




All that being said, I can't even really figure out how this is working when I step through it, I see where requests should be sent (httpRequest), but I can't figure out how they come back and populate the sheet, I don't see it happening, which I think is the point, being asynchronous?






share|improve this answer























  • Thanks alot for this - some good points here I hadn't considered. I put the Beginner tag because this is the first bit of reusable code I've thought worthy of review, I'm glad you think it's solid! Those RHSs were grabbed directly from one of the examples I referenced, but I agree, they are in poor taste. You're exactly right about the debugging asynchronous stuff, that's partly why I have so much debug.print, as you can't step through the operational code. Good point on Queue. Fingers crossed I can get some more good answers like this on some of the other bits of the code
    – Greedo
    Feb 15 at 19:39











  • I like the debugging = true part of it so you can see what's happening in the immediate window, with all the updates that get written to it. Some would say that's some wasted effort, but there's not much else you could do for debugging it.
    – Raystafarian
    Feb 15 at 21:28










  • RHS = Right Hand Side?
    – Solomon Ucko
    Mar 24 at 1:56














up vote
6
down vote



+25










This is way above my expertise, but maybe adding an answer would cause more views/answers? Also, what's that beginner tag doing there? ;)



I want to say first off, really solid work. That's probably why there hasn't been too much activity here. Maybe some of this will seem like nit-picks and if so, Sorry!




ByRef or ByVal arguments



Every one of the arguments, as you probably know, that aren't declared ByVal are implicitly ByRef. I imagine you probably need a lot of these to be ByRef, but it's better to explicitly declare them ByRef so it's easier to tell that it's supposed to be ByRef.



obj parameter



clsHandleEvents.Complete(obj as clsThreadHandle)
clsHandleEvents.NotifyComplete(obj as clsThreadHandle)
clsMultiThread.threadEvents_Complete(obj as clsThreadHandle)
multiThreadMethods.isIterable(obj as Variant)


The first three could probably use a new name given they are all the same type. Maybe threadObject or threadHandle - up to you.



The fourth one is taking a variant and returning a boolean, it could be named anything from testObject to iteratorGroup. You might want to try renaming some of those



Queue



clsMultiThread



Private iterableQueue As mscorlib.Queue 
Set iterableQueue = New Queue


multiThreadMethods



Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue)


Maybe I'm dense, being a VBA guy, but how would I know what a queue is and what methods it uses? It's not a standard reference library for VBA, is it? I'd probably add a comment in there explaining why you've chosen to do it this way, so nobody would have to go back through all of it to figure out why it's better than some other way.



RHS?



clsHtmlWorker



Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property


What is RHS? Is it a constant, it's in all caps. I can tell you know better using Html in your project rather than HTML - so what is it?




All that being said, I can't even really figure out how this is working when I step through it, I see where requests should be sent (httpRequest), but I can't figure out how they come back and populate the sheet, I don't see it happening, which I think is the point, being asynchronous?






share|improve this answer























  • Thanks alot for this - some good points here I hadn't considered. I put the Beginner tag because this is the first bit of reusable code I've thought worthy of review, I'm glad you think it's solid! Those RHSs were grabbed directly from one of the examples I referenced, but I agree, they are in poor taste. You're exactly right about the debugging asynchronous stuff, that's partly why I have so much debug.print, as you can't step through the operational code. Good point on Queue. Fingers crossed I can get some more good answers like this on some of the other bits of the code
    – Greedo
    Feb 15 at 19:39











  • I like the debugging = true part of it so you can see what's happening in the immediate window, with all the updates that get written to it. Some would say that's some wasted effort, but there's not much else you could do for debugging it.
    – Raystafarian
    Feb 15 at 21:28










  • RHS = Right Hand Side?
    – Solomon Ucko
    Mar 24 at 1:56












up vote
6
down vote



+25







up vote
6
down vote



+25




+25




This is way above my expertise, but maybe adding an answer would cause more views/answers? Also, what's that beginner tag doing there? ;)



I want to say first off, really solid work. That's probably why there hasn't been too much activity here. Maybe some of this will seem like nit-picks and if so, Sorry!




ByRef or ByVal arguments



Every one of the arguments, as you probably know, that aren't declared ByVal are implicitly ByRef. I imagine you probably need a lot of these to be ByRef, but it's better to explicitly declare them ByRef so it's easier to tell that it's supposed to be ByRef.



obj parameter



clsHandleEvents.Complete(obj as clsThreadHandle)
clsHandleEvents.NotifyComplete(obj as clsThreadHandle)
clsMultiThread.threadEvents_Complete(obj as clsThreadHandle)
multiThreadMethods.isIterable(obj as Variant)


The first three could probably use a new name given they are all the same type. Maybe threadObject or threadHandle - up to you.



The fourth one is taking a variant and returning a boolean, it could be named anything from testObject to iteratorGroup. You might want to try renaming some of those



Queue



clsMultiThread



Private iterableQueue As mscorlib.Queue 
Set iterableQueue = New Queue


multiThreadMethods



Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue)


Maybe I'm dense, being a VBA guy, but how would I know what a queue is and what methods it uses? It's not a standard reference library for VBA, is it? I'd probably add a comment in there explaining why you've chosen to do it this way, so nobody would have to go back through all of it to figure out why it's better than some other way.



RHS?



clsHtmlWorker



Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property


What is RHS? Is it a constant, it's in all caps. I can tell you know better using Html in your project rather than HTML - so what is it?




All that being said, I can't even really figure out how this is working when I step through it, I see where requests should be sent (httpRequest), but I can't figure out how they come back and populate the sheet, I don't see it happening, which I think is the point, being asynchronous?






share|improve this answer















This is way above my expertise, but maybe adding an answer would cause more views/answers? Also, what's that beginner tag doing there? ;)



I want to say first off, really solid work. That's probably why there hasn't been too much activity here. Maybe some of this will seem like nit-picks and if so, Sorry!




ByRef or ByVal arguments



Every one of the arguments, as you probably know, that aren't declared ByVal are implicitly ByRef. I imagine you probably need a lot of these to be ByRef, but it's better to explicitly declare them ByRef so it's easier to tell that it's supposed to be ByRef.



obj parameter



clsHandleEvents.Complete(obj as clsThreadHandle)
clsHandleEvents.NotifyComplete(obj as clsThreadHandle)
clsMultiThread.threadEvents_Complete(obj as clsThreadHandle)
multiThreadMethods.isIterable(obj as Variant)


The first three could probably use a new name given they are all the same type. Maybe threadObject or threadHandle - up to you.



The fourth one is taking a variant and returning a boolean, it could be named anything from testObject to iteratorGroup. You might want to try renaming some of those



Queue



clsMultiThread



Private iterableQueue As mscorlib.Queue 
Set iterableQueue = New Queue


multiThreadMethods



Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue)


Maybe I'm dense, being a VBA guy, but how would I know what a queue is and what methods it uses? It's not a standard reference library for VBA, is it? I'd probably add a comment in there explaining why you've chosen to do it this way, so nobody would have to go back through all of it to figure out why it's better than some other way.



RHS?



clsHtmlWorker



Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property


What is RHS? Is it a constant, it's in all caps. I can tell you know better using Html in your project rather than HTML - so what is it?




All that being said, I can't even really figure out how this is working when I step through it, I see where requests should be sent (httpRequest), but I can't figure out how they come back and populate the sheet, I don't see it happening, which I think is the point, being asynchronous?







share|improve this answer















share|improve this answer



share|improve this answer








edited Feb 16 at 22:02









Gerrit0

2,6701518




2,6701518











answered Feb 15 at 5:12









Raystafarian

5,4981046




5,4981046











  • Thanks alot for this - some good points here I hadn't considered. I put the Beginner tag because this is the first bit of reusable code I've thought worthy of review, I'm glad you think it's solid! Those RHSs were grabbed directly from one of the examples I referenced, but I agree, they are in poor taste. You're exactly right about the debugging asynchronous stuff, that's partly why I have so much debug.print, as you can't step through the operational code. Good point on Queue. Fingers crossed I can get some more good answers like this on some of the other bits of the code
    – Greedo
    Feb 15 at 19:39











  • I like the debugging = true part of it so you can see what's happening in the immediate window, with all the updates that get written to it. Some would say that's some wasted effort, but there's not much else you could do for debugging it.
    – Raystafarian
    Feb 15 at 21:28










  • RHS = Right Hand Side?
    – Solomon Ucko
    Mar 24 at 1:56
















  • Thanks alot for this - some good points here I hadn't considered. I put the Beginner tag because this is the first bit of reusable code I've thought worthy of review, I'm glad you think it's solid! Those RHSs were grabbed directly from one of the examples I referenced, but I agree, they are in poor taste. You're exactly right about the debugging asynchronous stuff, that's partly why I have so much debug.print, as you can't step through the operational code. Good point on Queue. Fingers crossed I can get some more good answers like this on some of the other bits of the code
    – Greedo
    Feb 15 at 19:39











  • I like the debugging = true part of it so you can see what's happening in the immediate window, with all the updates that get written to it. Some would say that's some wasted effort, but there's not much else you could do for debugging it.
    – Raystafarian
    Feb 15 at 21:28










  • RHS = Right Hand Side?
    – Solomon Ucko
    Mar 24 at 1:56















Thanks alot for this - some good points here I hadn't considered. I put the Beginner tag because this is the first bit of reusable code I've thought worthy of review, I'm glad you think it's solid! Those RHSs were grabbed directly from one of the examples I referenced, but I agree, they are in poor taste. You're exactly right about the debugging asynchronous stuff, that's partly why I have so much debug.print, as you can't step through the operational code. Good point on Queue. Fingers crossed I can get some more good answers like this on some of the other bits of the code
– Greedo
Feb 15 at 19:39





Thanks alot for this - some good points here I hadn't considered. I put the Beginner tag because this is the first bit of reusable code I've thought worthy of review, I'm glad you think it's solid! Those RHSs were grabbed directly from one of the examples I referenced, but I agree, they are in poor taste. You're exactly right about the debugging asynchronous stuff, that's partly why I have so much debug.print, as you can't step through the operational code. Good point on Queue. Fingers crossed I can get some more good answers like this on some of the other bits of the code
– Greedo
Feb 15 at 19:39













I like the debugging = true part of it so you can see what's happening in the immediate window, with all the updates that get written to it. Some would say that's some wasted effort, but there's not much else you could do for debugging it.
– Raystafarian
Feb 15 at 21:28




I like the debugging = true part of it so you can see what's happening in the immediate window, with all the updates that get written to it. Some would say that's some wasted effort, but there's not much else you could do for debugging it.
– Raystafarian
Feb 15 at 21:28












RHS = Right Hand Side?
– Solomon Ucko
Mar 24 at 1:56




RHS = Right Hand Side?
– Solomon Ucko
Mar 24 at 1:56












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185212%2fa-new-approach-to-multithreading-in-excel%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Chat program with C++ and SFML

Function to Return a JSON Like Objects Using VBA Collections and Arrays

Will my employers contract hold up in court?