A new approach to multithreading in Excel
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
31
down vote
favorite
Intro
Multithreading tools exist in Excel - often to run macros in multiple instances of Excel, or to convert macros to vbscripts that can run independently. However I've often come across projects where I'd like to delegate multiple tasks out to async processes, and creating multiple instances of Excel itself is overkill for this.
After running into several projects where I needed to execute multiple async requests (think internet scraping, or radioactive decay simulation) I decided to make a general class that, when given an async process, can execute and handle several in parallel.
For example, this Daisy Test
makes a multithread group which sends html requests to all the urls in column B
. The first link of these google searches are returned to column C
in the order the responses arrive. That triggers a second group (daisy chained onto the 1st one's events) to send internet explorer request to that url, and these are returned in D
To clarify some of the comments; it should be noted that these requests are sent off in order (B1
,B2
,...), but return unordered (C2
first). That's because my class allows the threads to run in parallel (hence multithreading
). These are still only managed in a single Excel thread, but the requests are asynchronous and in different processes so are effectively running in other threads.
Summary
N.B. The term 'thread' will be used loosely here, with no reference to the actual processor. Instead, when I say 'thread' I am talking about a handler for a task (which is running in parallel to other tasks on other handlers)
The multi thread setup consists of a main clsMultiThread
parent class which controls the shape of the multi thread collection (i.e. how many tasks are running at any given time), as well as several clsThreadHandle
classes.
These thread handlers are each responsible for running an async task, and informing the parent multithread class when each task is finished.
Internally, the tasks are actually run by WorkerClass
objects, one worker for each thread. These receive input arguments from their parent threads, run their respective async task, and raise an event to their parent clsThreadHandle
when finished. The thread handle then passes this event, and any optional return value, back up to the main clsMultiThread
, whose job it is either to close the thread once it's done, or prompt the thread to run another task. The chain of command is summarised in the image below:
Feedback I'm after
- General feedback on structure, event handling, use of interfaces etc.
- Proper exiting (which I don't think I'm doing right now)
- User-side interface
- Whether this approach to the problem is appropriate and intuitive (passing worker classes etc.)
- Whether I'm missing some functionality that should be there
This is also the first project I've ever done with the primary aim of making something I can re-use (as well as the longest & most complicated bit of code I've written). For that reason I'd also greatly appreciate any comments on
- Coding style
- Use of comments
- Anything else I should bear in mind when working on such projects
Implementation
Main class clsMultiThread
Right, some code. Here's the main class which handles all the sub-classes
Option Explicit
'''
'VBA class to run multiple asynchronous processes
'Interfaces directly with clsThreadHandle
'Requires references to:
'mscrolib.dll
'''
'THREAD GROUP SHAPE PROPERTIES
Private threadGroup As New Collection 'holds all the treads
Private maxThreads As Long 'maximum number of threads that can be open
Private minThreads As Long '[minimum number of threads]
Private iterableQueue As mscorlib.Queue 'this item holds all the items from iterator set in queue
'replaces iterableGroup, newtaskindex, taskidset
Private iterableSize As Long 'number of items in iterable group or
Private passesArguments As Boolean 'true if iterableGroup exists
'THREAD GROUP REFERENCES
Private WithEvents threadEvents As clsHandleEvents 'Event object to raise events from each thread handle
Private workerClass As IWorker
'THREAD GROUP SETTINGS
Private autoQuitEnabled As Boolean 'dictates whether to quit on Complete event, should be false if daisychaining
'THREAD GROUP STATE PROPERTIES
Private openThreadCount As Long 'number of threads/handles currently open
Private openTaskCount As Long 'number of tasks running on those threads
Private closedTaskCount As Long 'number of threads closed (failed and successful)
Private successfulTaskCount As Long 'number of threads completed sucessfully
Private newThreadIndex As Long 'Iterator over handles (next new handle)
Private newTaskIndex As Long 'Iterator over open tasks (next thread to be started)
Private taskIDset As Collection 'Dictionary mapping taskIDs to iterableGroup location "REPLACE THIS. MERGE COLLECTION JUMBLES"
Private freeThreads As Collection 'holds all the free thread ids
'THREAD GROUP PERFORMANCE PROPERTIES
Private startTime As Date
'Private endTime As Date
'THREAD GROUP EVENTS
Public Event TaskComplete(returnVal As Variant, taskID As String, threadID As String) 'when a task is complete on a thread, maybe if failed
Public Event ThreadOpened(threadCount As Long, threadID As String) 'when a thread is opened, pass the new number of threads
Public Event ThreadClosed(threadCount As Long, threadID As String) 'when closed, pass closed thread ID
Public Event Complete(timeTaken As Date) 'when everything is (nearly) finished
Public Event Closed(timeTaken As Date) 'when entire group is closed
Public Event Opened(startTime As Date) 'when entire group is closed
'PRIVATE TYPES/ENUMS
Private Type Instruction 'instruction on what to do next, and any necessary arguments that can be passed
threadID As String
instructionBody As InstructionType
End Type
Private Enum InstructionType
mltCloseThread
mltOpenThread
mltSetTask
mltDoNothing
mltQuit
End Enum
Private Sub Class_Initialize()
'Set defaults
maxThreads = 5
minThreads = 1
newThreadIndex = 1
newTaskIndex = 1
autoQuitEnabled = True
Set threadEvents = New clsHandleEvents
Set taskIDset = New Collection
Set freeThreads = New Collection
startTime = Now
RaiseEvent Opened(startTime)
'''
'Test space
'''
End Sub
Private Sub threadEvents_Closed(threadID As String)
RaiseEvent ThreadClosed(openThreadCount, threadID)
End Sub
Private Sub threadEvents_Opened(threadID As String)
RaiseEvent ThreadOpened(openThreadCount, threadID)
End Sub
Private Sub threadEvents_Complete(obj As clsThreadHandle, returnVal As Variant) 'called when thread becomes free
'DO NOT mark as free here
RaiseEvent TaskComplete(returnVal, obj.Task, obj.Name) 'failed as boolean
openTaskCount = openTaskCount - 1
closedTaskCount = closedTaskCount + 1
successfulTaskCount = successfulTaskCount + 1 'could be unsuccessful too though
doInstructions obj.Name 'pass object name so it can be marked free
' If failed Then
' failedTaskCount = failedTaskCount + 1
' Else
' successfulTaskCount = successfulTaskCount + 1
' End If
End Sub
Public Sub Execute()
'check validity of user data, if valid, then execute task
If iterableSize = 0 Then
Err.Raise 5, Description:="You must set size argument to a non-zero value, or a non-empty iterable first"
ElseIf workerClass Is Nothing Then
Err.Raise 5, Description:="You must set the async class argument first"
Else
doInstructions
End If
End Sub
Public Sub Quit()
'Remove any references that would prevent proper closing
'Default automatically called when openThreadCount = 0
RaiseEvent Complete(Now - startTime)
Set threadEvents = Nothing
End Sub
Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction
'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID
'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy
BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If
Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select
'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
doInstructions loopcount:=loopcount + 1 'watch for infinite loop
End If
End Sub
Private Function getInstruction() As Instruction
'function to determine what action to take next
'called until do nothing returned
'caller to doinstructions can specify a free thread in which case some parts skipped
Dim results As Instruction 'variable to hold instruction and any arguments
Me.printState
'Do we need to open or close threads?
'Threads free? (threads open > tasks open):
If openThreadCount > openTaskCount Then
'Great we have a free thread, now use it or delete it (cos we have too many or no tasks remaining)
If newTaskIndex > iterableSize Then 'already passed all tasks
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
ElseIf openThreadCount <= maxThreads Then
'[find] & use free thread (run a task on it)
results.instructionBody = mltSetTask
results.threadID = freeThread
Else
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
End If
Else
'No threads free, either open one (if not exceeding max, and there's a task left to put on it)
'Or do nothing (can't close it if not free, shouldn't open new if no more tasks)
If openThreadCount < maxThreads And newTaskIndex <= iterableSize Then
results.instructionBody = mltOpenThread
ElseIf openThreadCount = 0 And autoQuitEnabled Then
results.instructionBody = mltQuit
Else
results.instructionBody = mltDoNothing
End If
End If
getInstruction = results
End Function
Private Sub openThread()
'opens a thread and assigns a task ID to it
Dim newThread As New clsThreadHandle 'create new handle
newThread.OpenHandle Me, threadEvents 'passes parent reference which allows handle to obtain thread ID
threadGroup.Add newThread, newThread.Name 'add it to the group with a new id (set by itself)
openThreadCount = openThreadCount + 1
freeThread = newThread.Name 'mark as free so task can be assigned to it
End Sub
Private Property Let freeThread(threadID As String)
'NOT WORKING"""""
'when a thread comes free, add it to the collection
freeThreads.Add threadID, threadID
Debug.Print threadID; " marked as free; now"; freeThreads.Count; "threads are free"
End Property
Private Property Let BusyThread(threadID As String)
'when a thread is not free or is closed, mark as busy by removing from free group
On Error Resume Next 'only remove ones what are there actually
freeThreads.Remove threadID
Debug.Print threadID; " marked as busy"; IIf(Err.Number <> 0, ", but wasn't in free group", vbNullString)
End Property
Private Property Get freeThread() As String
'gives up a free thread and adds it to the list
freeThread = freeThreads(1)
freeThreads.Remove (1)
End Property
Private Sub assignTaskID(threadID As String)
'@Ignore WriteOnlyProperty
'assigns task ID to thread
'nb does NOT actually run the task (this is instruction stage still)
Dim newThread As clsThreadHandle
Set newThread = threadGroup(threadID)
newThread.Task = NewTaskID
Set newThread.Worker = AsyncClass
End Sub
Private Sub closeThread(threadID As String, Optional failed As Boolean = False)
'close thread with appropriate id
Dim oldThread As clsThreadHandle
Set oldThread = threadGroup(threadID)
'remove from all collections
'taskIDset.Remove oldThread.Task remove from task id set if it was in there
threadGroup.Remove oldThread.Name
BusyThread = oldThread.Name 'remove from free collection
Set oldThread = Nothing
'iterate counters
openThreadCount = openThreadCount - 1
End Sub
Public Property Let Size(sizeFactor As Variant)
'property of the thread group which dictates how many processes to run in total
'size factor is either an iterable item, or an integer to dictate the size
'Check if size factor is number
If IsNumeric(sizeFactor) Then
'If so, size is that
iterableSize = CLng(sizeFactor)
passesArguments = False 'no argument to pass to thread, just run it a load of times
'If not, *check if iterable
ElseIf isIterable(sizeFactor) Then
'If so, size is size of collection from extration
Set iterableQueue = New Queue
iterableSize = addIterableToQueue(sizeFactor, iterableQueue)
passesArguments = True
Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If
End Property
Public Sub IncreaseSize(sizeFactor As Variant)
'method of threadGroup which adds more tasks to the queue, and immediately runs them
'size factor is either an iterable item, or an integer to dictate the size
'Check whether size is set yet
If Me.Size = 0 Then
Err.Raise 5, Description:="You must set Size before you can IncreaseSize"
End If
'check whether new data matches old type
If IsNumeric(sizeFactor) Then
If passesArguments Then
Err.Raise 5, Description:="Size factor type doesn't match original type"
Else
'is numeric and was numeric, grand
iterableSize = iterableSize + CLng(sizeFactor)
End If
ElseIf isIterable(sizeFactor) Then
If passesArguments Then
'was iterable and still is, great!
Dim itemsAdded As Long
itemsAdded = addIterableToQueue(sizeFactor, iterableQueue)
iterableSize = iterableSize + itemsAdded
Else
'wasn't iterble, now is
Err.Raise 5, Description:="Size factor type doesn't match original type"
End If
Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If
Me.Execute
End Sub
Public Property Set AsyncClass(ByVal workObj As IWorker) 'Set the worker who carries out the tasks
Set workerClass = workObj
End Property
Public Property Get AsyncClass() As IWorker
Set AsyncClass = workerClass
End Property
Public Property Get Size() As Variant
Size = iterableSize
End Property
Public Property Let autoQuit(ByVal value As Boolean)
autoQuitEnabled = value
End Property
Public Property Get NewHandleID() As String
NewHandleID = "Handle " & newThreadIndex
newThreadIndex = newThreadIndex + 1 'use next one next time
End Property
Private Property Get NewTaskID() As String
'generates new task, saves its ID to taskIDset, then bumps the task counter along one
NewTaskID = "Task " & newTaskIndex
taskIDset.Add newTaskIndex, NewTaskID 'add id to map
newTaskIndex = newTaskIndex + 1
End Property
Private Sub Class_Terminate()
'Set threadGroup = Nothing
Debug.Print "Terminating group"
RaiseEvent Closed(Now - startTime)
End Sub
Public Sub printState() 'for debugging
Debug.Print _
"State:"; vbCrLf _
; Space(5); "Threads open: "; openThreadCount; vbCrLf _
; Space(5); "Threads in use: "; openTaskCount; vbCrLf _
; Space(5); "Threads marked as free: "; freeThreads.Count; vbCrLf _
; Space(5); "Tasks remaining: "; iterableSize - successfulTaskCount; vbCrLf _
; Space(5); "Next task index: "; newTaskIndex
End Sub
Its key methods are doInstruction
(calling getInstruction
) and Size
and IncreaseSize
The class runs iteratively; each cycle the class finds out what to do and executes that (doInstruction
). doInstruction always calls itself unless it's told to do nothing, which allows the call stack to shrink back. There are several options for what to do each cycle
- Open a thread (create a new instance of
clsThreadHandle
and add to a collection of possible places to run tasks) - Close a thread (quit a handle and remove it from that collection)
- Run a task on a thread
- [Force quit a task - t.b. implemented]
- Do Nothing (allow call stack to go back to zero)
The getInstruction
method will tell the class to
- Open a thread if it doesn't exceed a max count, and if there are tasks to run on it
- Close a thread if there are no tasks left to run or if there are too many
- Run a task on a thread if there's a thread marked free
- Do nothing if there are no threads free, and there are the right number of threads open
Size
is what dictates the number of tasks to carry out
- If
Size
is numeric, the class will keep running tasks on threads until that number of tasks is run - If
Size
is iterable, then the class will keep running tasks and passing arguments by essentiallyFor...Each
ing through the iterable argument- This allows something like a url to be passed as an argument to each task, or even a range so that the worker knows where on the sheet to write its result to
IncreaseSize
is like Size
; it is useful if you want to drip feed tasks into the multithread set (say you are daisy chaining one onto the other using the first one's threadComplete
events). It increases the size of the numeric/iterable argument.
Thread Handles clsThreadHandle
The main class creates multiple instances of this thread handle class.
Option Explicit
'THREAD HANDLE BASE PROPERTIES
Private eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
Private taskID As String 'holds the id of the current task
Private handleID As String 'holds the id of this handle
Private handleArgs As Variant 'holds any arguments that need to be passed to the task
'THREAD EVENTS
Private WithEvents workerEvents As IWorkerEvents
Private workerObject As IWorker 'interface to whatever worker may be passed to thread
Private Sub workerEvents_Complete(returnVal As Variant)
eventHandle.NotifyComplete Me, returnVal
End Sub
Private Sub workerEvents_Started()
Debug.Print Me.Task; " started event was raised"
End Sub
Public Property Set Worker(ByVal workObj As IWorker)
Set workerObject = workObj.CreateNew 'set worker to be a copy of the passed one
Set workerEvents = New IWorkerEvents 'create event handler
Set workerObject.Events = workerEvents 'pass it to the worker so it can listen in
End Property
Public Sub OpenHandle(multiThreadGroup As clsMultiThread, delegate As clsHandleEvents)
'called when the handle is opened, sets the reference IDs of the string and the handle, as well as parent g
Set eventHandle = delegate
handleID = multiThreadGroup.NewHandleID
eventHandle.NotifyThreadOpened (Name)
Debug.Print Name; " was opened"
End Sub
Public Sub Execute(Optional args As Variant)
Debug.Print Task; " executed on "; Name; " with "; IIf(IsMissing(args), "no arguments", "some arguments")
workerObject.Execute args 'run the event
End Sub
Public Property Get Task() As String
Task = taskID
End Property
Public Property Let Task(val As String)
taskID = val
Debug.Print Name; "'s task was set to "; taskID
End Property
Public Property Get Name() As String
Name = handleID
End Property
Private Sub Class_Initialize()
Debug.Print "I'm made"
End Sub
Private Sub Class_Terminate()
eventHandle.NotifyThreadClosed (Me.Name)
Set eventHandle = Nothing
Set workerObject = Nothing
End Sub
Private Sub workerEvents_StatusChange(statusVal As Variant)
'not yet implemented, probably unnecessary
End Sub
I've chosen individual event handlers rather than a single one (like I did with clsHandleEvents
) because
- I find having an individual thread class for each task/worker object easier to picture mentally
- I intend to add a functionality where a worker can cache objects in its parent handle (such as an InternetExplorer application) to save re-initialising it between successive tasks on the same thread
- Having a single cache for each thread makes this simpler
Handle Events class clsHandleEvents
A reference to this class is held by each thread so that it can raise an event to the multiThread class, without directly holding a reference to it (this would mess up garbage collection I think)
Option Explicit
'class to convert calls from the thread handle into events which the multi thread group can tap into
Public Event Complete(obj As clsThreadHandle, returnVal As Variant)
Public Event Opened(threadID As String) 'when thread is actually opened
Public Event Closed(threadID As String) 'when thread is closed
Public Sub NotifyComplete(obj As clsThreadHandle, Optional returnVal As Variant)
RaiseEvent Complete(obj, returnVal)
End Sub
Public Sub NotifyThreadOpened(threadID As String)
RaiseEvent Opened(threadID)
End Sub
Public Sub NotifyThreadClosed(threadID As String)
RaiseEvent Closed(threadID)
End Sub
Private Sub Class_Terminate()
Debug.Print "Events Terminated"
End Sub
Interfaces
There are 2 interface
classes (well onlyIWorker
is really one, but I'm calling IWorkerEvents
one too, similar to this example)
IWorker
forms a general template of an asynchronous process you can run, which raises appropriate events as per IWorkerEvents
IWorker
Option Explicit
'class acts as interface for any thread task
'Execute runs the task
'Events are raised by the task if it interfaces properly
Public Property Set Events(ByRef value As IWorkerEvents)
End Property
Public Sub Execute(Optional argument As Variant)
End Sub
Public Function CreateNew() As IWorker
End Function
IWorkerEvents
Option Explicit
'class holds all the events that a thread task can raise
Public Event Complete(returnVal As Variant)
Public Event StatusChange(statusVal As Variant)
Public Event Started()
Public Sub Complete(Optional returnVal As Variant)
RaiseEvent Complete(returnVal)
End Sub
Public Sub StatusChange(statusVal As Variant)
RaiseEvent StatusChange(statusVal)
End Sub
Public Sub Started()
RaiseEvent Started
End Sub
Finally...
There's a module of supplementary functions that I don't especially need review on, but I'll include as they are required for the clsMultiThread
to execute
Option Explicit
Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue) As Long
'function to take iterable group and add it to the queue
'returns the number of items added
Dim item As Variant
Dim itemsAdded As Long
itemsAdded = 0
For Each item In iterator
resultQueue.enqueue item
itemsAdded = itemsAdded + 1
Next item
addIterableToQueue = itemsAdded
End Function
Function isIterable(obj As Variant) As Boolean
On Error Resume Next
Dim iterator As Variant
For Each iterator In obj
Exit For
Next
isIterable = Err.Number = 0
End Function
Test code
Don't need feedback on this stuff, except with regard to the way in which a worker
is implemented Download example file here
It just occurred to me that I hadn't actually included a worker to test this with. Well here's an example that uses an MSHTML
request to return an HTML document from a webpage. It takes a String
/Range
argument representing a url, and returns an HTMLDocument
. NB, this has to be imported
as it requires Attribute .VB_UserMemId = 0
as per this article
VERSION 1.0 CLASS
BEGIN
MultiUse = -1 'True
END
Attribute VB_Name = "clsHtmlWorker"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = False
Attribute VB_Exposed = False
Option Explicit
'''
'Basic worker object sends MSHTML GET request to webpage and returns an HTMLDocument or Nothing
'Requires reference to
' Microsoft HTML Object library (mshtml.tlb)
' Microsoft XML, v6.0 (msxml6.dll)
'''
Private httpRequest As MSXML2.XMLHTTP60
Implements IWorker
Private Type TWorker
Events As IWorkerEvents
End Type
Private this As TWorker
Private Function IWorker_CreateNew() As IWorker
Set IWorker_CreateNew = New clsHtmlWorker
End Function
Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property
Private Sub IWorker_Execute(Optional argument As Variant)
Started 'raise event to thread handle
'Do some task
sendRequest argument
End Sub
'''
'Event raising
'''
Private Sub Started()
If Not this.Events Is Nothing Then
this.Events.Started
End If
End Sub
Private Sub statusChange(ByVal statusText As String)
If Not this.Events Is Nothing Then
'status change is not fully implemented yet in clsMultiThread, I may get rid of it
this.Events.statusChange statusText
End If
End Sub
Private Sub Complete(Optional ByVal resultPage As HTMLDocument)
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
If Not this.Events Is Nothing Then
this.Events.Complete resultPage
End If
End Sub
Private Sub sendRequest(ByVal url As String)
'''
'Sub to open a new XMLHTTP request at a given url
'Also assigns OnReadyStateChange callback function to this class' default routine
'''
If httpRequest Is Nothing Then Set httpRequest = New MSXML2.XMLHTTP60
With httpRequest
'Assign callback function to handler class (by default property)
.OnReadyStateChange = Me
'open and send the request
.Open "GET", url, True
.send vbNullString
End With
End Sub
Public Sub OnReadyStateChange()
Attribute OnReadyStateChange.VB_UserMemId = 0
'''
'This is the default callback routine of the class
'''
With httpRequest
statusChange .statusText
If .ReadyState = 4 Then 'loaded
If .Status = 200 Then 'successful
'mark complete and pass document
Dim htmlDoc As HTMLDocument
Set htmlDoc = New HTMLDocument
htmlDoc.body.innerHTML = .responseText
Complete htmlDoc
Else 'unsuccessful
Complete
End If
End If
End With
End Sub
Private Sub Class_Terminate()
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
End Sub
A multi-thread group implementing it can be run in a caller class like codeReviewTest
. Sends requests to urls in A1:A10
, returns e-mails from those urls in neighbouring columns.
Option Explicit
'''
'This class creates and runs a new multithread instance which runs clsHtmlWorker
'When each HTMLDocument is complete, the class scans it for e-mails
'''
Private WithEvents multiThreadGroup As clsMultiThread
'clsMultiThread is async so must be declared separately (or in a doEvents loop)
Private Const REGEX_PATTERN As String = _
"(?:[a-z0-9!#$%&'*+/=?^_`~-]+(?:.[a-z0-9!#$%&'*+/=?^_`~-]+)*|""(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|\[x01-x09x0bx0cx0e-x7f])*"")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:(2(5[0-5]|[0-4])|1[0-9]|[0-9]?[0-9])).)3(?:(2(5[1-9]|[0-9])|1[0-5]|[0-4]?[0-9])|[0-9]*[0-9]:(?:[1-9]|\[0-9])+)])"
Public Sub run()
'urls to check for emails are in a1:a10
htmlRequestToUrls [a1:a10]
End Sub
Private Sub htmlRequestToUrls(urlCells As Range)
Set multiThreadGroup = New clsMultiThread
With multiThreadGroup
.Size = urlCells 'set iterable, here a load of urls
Set .AsyncClass = New clsHtmlWorker 'set async worker
.Execute 'run the group
End With
End Sub
Private Sub multiThreadGroup_TaskComplete(returnVal As Variant, taskID As String, threadID As String)
Dim rowI As Long, colI As Long
rowI = Right(taskID, Len(taskID) - 4)
If returnVal Is Nothing Then
Cells(rowI, 2) = "Error in loading page"
ElseIf TypeOf returnVal Is HTMLDocument Then
Dim emailMatches() As String
emailMatches = regexMatches(returnVal.body.innerText)
If (Not emailMatches) = -1 Then
'no emails on page
Cells(rowI, 2) = "No e-mail matches"
Else
For colI = LBound(emailMatches) To UBound(emailMatches)
Cells(rowI, colI + 2) = emailMatches(colI)
Next colI
End If
Else 'nothing returned
Cells(rowI, 2) = "Error in loading page"
End If
End Sub
Private Function regexMatches(strInput As String) As String()
Dim rMatch As Object
Dim s As String
Dim arrayMatches() As String
Dim i As Long
With CreateObject("VBScript.Regexp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = REGEX_PATTERN
If .test(strInput) Then
For Each rMatch In .Execute(strInput)
ReDim Preserve arrayMatches(i)
arrayMatches(i) = rMatch.value
i = i + 1
Next
End If
End With
regexMatches = arrayMatches
End Function
The test class will create a new multi thread group. The group will open the default 5 threads, on each thread it will create an instance of clsHtmlWorker
. It will convert the range [A1:A10]
into 10 arguments which it will pass, 1 at a time, to the workers on each thread when they aren't busy. Once all tasks are run the class will autoQuit
- cutting references to all sub classes, allowing it to go out of scope
You can download the example workbook if you want, works best with Rubberduck to organise folders. Test code is in the CodeReviewTestRunner
, or just hit the big button
beginner object-oriented multithreading vba asynchronous
 |Â
show 5 more comments
up vote
31
down vote
favorite
Intro
Multithreading tools exist in Excel - often to run macros in multiple instances of Excel, or to convert macros to vbscripts that can run independently. However I've often come across projects where I'd like to delegate multiple tasks out to async processes, and creating multiple instances of Excel itself is overkill for this.
After running into several projects where I needed to execute multiple async requests (think internet scraping, or radioactive decay simulation) I decided to make a general class that, when given an async process, can execute and handle several in parallel.
For example, this Daisy Test
makes a multithread group which sends html requests to all the urls in column B
. The first link of these google searches are returned to column C
in the order the responses arrive. That triggers a second group (daisy chained onto the 1st one's events) to send internet explorer request to that url, and these are returned in D
To clarify some of the comments; it should be noted that these requests are sent off in order (B1
,B2
,...), but return unordered (C2
first). That's because my class allows the threads to run in parallel (hence multithreading
). These are still only managed in a single Excel thread, but the requests are asynchronous and in different processes so are effectively running in other threads.
Summary
N.B. The term 'thread' will be used loosely here, with no reference to the actual processor. Instead, when I say 'thread' I am talking about a handler for a task (which is running in parallel to other tasks on other handlers)
The multi thread setup consists of a main clsMultiThread
parent class which controls the shape of the multi thread collection (i.e. how many tasks are running at any given time), as well as several clsThreadHandle
classes.
These thread handlers are each responsible for running an async task, and informing the parent multithread class when each task is finished.
Internally, the tasks are actually run by WorkerClass
objects, one worker for each thread. These receive input arguments from their parent threads, run their respective async task, and raise an event to their parent clsThreadHandle
when finished. The thread handle then passes this event, and any optional return value, back up to the main clsMultiThread
, whose job it is either to close the thread once it's done, or prompt the thread to run another task. The chain of command is summarised in the image below:
Feedback I'm after
- General feedback on structure, event handling, use of interfaces etc.
- Proper exiting (which I don't think I'm doing right now)
- User-side interface
- Whether this approach to the problem is appropriate and intuitive (passing worker classes etc.)
- Whether I'm missing some functionality that should be there
This is also the first project I've ever done with the primary aim of making something I can re-use (as well as the longest & most complicated bit of code I've written). For that reason I'd also greatly appreciate any comments on
- Coding style
- Use of comments
- Anything else I should bear in mind when working on such projects
Implementation
Main class clsMultiThread
Right, some code. Here's the main class which handles all the sub-classes
Option Explicit
'''
'VBA class to run multiple asynchronous processes
'Interfaces directly with clsThreadHandle
'Requires references to:
'mscrolib.dll
'''
'THREAD GROUP SHAPE PROPERTIES
Private threadGroup As New Collection 'holds all the treads
Private maxThreads As Long 'maximum number of threads that can be open
Private minThreads As Long '[minimum number of threads]
Private iterableQueue As mscorlib.Queue 'this item holds all the items from iterator set in queue
'replaces iterableGroup, newtaskindex, taskidset
Private iterableSize As Long 'number of items in iterable group or
Private passesArguments As Boolean 'true if iterableGroup exists
'THREAD GROUP REFERENCES
Private WithEvents threadEvents As clsHandleEvents 'Event object to raise events from each thread handle
Private workerClass As IWorker
'THREAD GROUP SETTINGS
Private autoQuitEnabled As Boolean 'dictates whether to quit on Complete event, should be false if daisychaining
'THREAD GROUP STATE PROPERTIES
Private openThreadCount As Long 'number of threads/handles currently open
Private openTaskCount As Long 'number of tasks running on those threads
Private closedTaskCount As Long 'number of threads closed (failed and successful)
Private successfulTaskCount As Long 'number of threads completed sucessfully
Private newThreadIndex As Long 'Iterator over handles (next new handle)
Private newTaskIndex As Long 'Iterator over open tasks (next thread to be started)
Private taskIDset As Collection 'Dictionary mapping taskIDs to iterableGroup location "REPLACE THIS. MERGE COLLECTION JUMBLES"
Private freeThreads As Collection 'holds all the free thread ids
'THREAD GROUP PERFORMANCE PROPERTIES
Private startTime As Date
'Private endTime As Date
'THREAD GROUP EVENTS
Public Event TaskComplete(returnVal As Variant, taskID As String, threadID As String) 'when a task is complete on a thread, maybe if failed
Public Event ThreadOpened(threadCount As Long, threadID As String) 'when a thread is opened, pass the new number of threads
Public Event ThreadClosed(threadCount As Long, threadID As String) 'when closed, pass closed thread ID
Public Event Complete(timeTaken As Date) 'when everything is (nearly) finished
Public Event Closed(timeTaken As Date) 'when entire group is closed
Public Event Opened(startTime As Date) 'when entire group is closed
'PRIVATE TYPES/ENUMS
Private Type Instruction 'instruction on what to do next, and any necessary arguments that can be passed
threadID As String
instructionBody As InstructionType
End Type
Private Enum InstructionType
mltCloseThread
mltOpenThread
mltSetTask
mltDoNothing
mltQuit
End Enum
Private Sub Class_Initialize()
'Set defaults
maxThreads = 5
minThreads = 1
newThreadIndex = 1
newTaskIndex = 1
autoQuitEnabled = True
Set threadEvents = New clsHandleEvents
Set taskIDset = New Collection
Set freeThreads = New Collection
startTime = Now
RaiseEvent Opened(startTime)
'''
'Test space
'''
End Sub
Private Sub threadEvents_Closed(threadID As String)
RaiseEvent ThreadClosed(openThreadCount, threadID)
End Sub
Private Sub threadEvents_Opened(threadID As String)
RaiseEvent ThreadOpened(openThreadCount, threadID)
End Sub
Private Sub threadEvents_Complete(obj As clsThreadHandle, returnVal As Variant) 'called when thread becomes free
'DO NOT mark as free here
RaiseEvent TaskComplete(returnVal, obj.Task, obj.Name) 'failed as boolean
openTaskCount = openTaskCount - 1
closedTaskCount = closedTaskCount + 1
successfulTaskCount = successfulTaskCount + 1 'could be unsuccessful too though
doInstructions obj.Name 'pass object name so it can be marked free
' If failed Then
' failedTaskCount = failedTaskCount + 1
' Else
' successfulTaskCount = successfulTaskCount + 1
' End If
End Sub
Public Sub Execute()
'check validity of user data, if valid, then execute task
If iterableSize = 0 Then
Err.Raise 5, Description:="You must set size argument to a non-zero value, or a non-empty iterable first"
ElseIf workerClass Is Nothing Then
Err.Raise 5, Description:="You must set the async class argument first"
Else
doInstructions
End If
End Sub
Public Sub Quit()
'Remove any references that would prevent proper closing
'Default automatically called when openThreadCount = 0
RaiseEvent Complete(Now - startTime)
Set threadEvents = Nothing
End Sub
Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction
'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID
'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy
BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If
Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select
'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
doInstructions loopcount:=loopcount + 1 'watch for infinite loop
End If
End Sub
Private Function getInstruction() As Instruction
'function to determine what action to take next
'called until do nothing returned
'caller to doinstructions can specify a free thread in which case some parts skipped
Dim results As Instruction 'variable to hold instruction and any arguments
Me.printState
'Do we need to open or close threads?
'Threads free? (threads open > tasks open):
If openThreadCount > openTaskCount Then
'Great we have a free thread, now use it or delete it (cos we have too many or no tasks remaining)
If newTaskIndex > iterableSize Then 'already passed all tasks
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
ElseIf openThreadCount <= maxThreads Then
'[find] & use free thread (run a task on it)
results.instructionBody = mltSetTask
results.threadID = freeThread
Else
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
End If
Else
'No threads free, either open one (if not exceeding max, and there's a task left to put on it)
'Or do nothing (can't close it if not free, shouldn't open new if no more tasks)
If openThreadCount < maxThreads And newTaskIndex <= iterableSize Then
results.instructionBody = mltOpenThread
ElseIf openThreadCount = 0 And autoQuitEnabled Then
results.instructionBody = mltQuit
Else
results.instructionBody = mltDoNothing
End If
End If
getInstruction = results
End Function
Private Sub openThread()
'opens a thread and assigns a task ID to it
Dim newThread As New clsThreadHandle 'create new handle
newThread.OpenHandle Me, threadEvents 'passes parent reference which allows handle to obtain thread ID
threadGroup.Add newThread, newThread.Name 'add it to the group with a new id (set by itself)
openThreadCount = openThreadCount + 1
freeThread = newThread.Name 'mark as free so task can be assigned to it
End Sub
Private Property Let freeThread(threadID As String)
'NOT WORKING"""""
'when a thread comes free, add it to the collection
freeThreads.Add threadID, threadID
Debug.Print threadID; " marked as free; now"; freeThreads.Count; "threads are free"
End Property
Private Property Let BusyThread(threadID As String)
'when a thread is not free or is closed, mark as busy by removing from free group
On Error Resume Next 'only remove ones what are there actually
freeThreads.Remove threadID
Debug.Print threadID; " marked as busy"; IIf(Err.Number <> 0, ", but wasn't in free group", vbNullString)
End Property
Private Property Get freeThread() As String
'gives up a free thread and adds it to the list
freeThread = freeThreads(1)
freeThreads.Remove (1)
End Property
Private Sub assignTaskID(threadID As String)
'@Ignore WriteOnlyProperty
'assigns task ID to thread
'nb does NOT actually run the task (this is instruction stage still)
Dim newThread As clsThreadHandle
Set newThread = threadGroup(threadID)
newThread.Task = NewTaskID
Set newThread.Worker = AsyncClass
End Sub
Private Sub closeThread(threadID As String, Optional failed As Boolean = False)
'close thread with appropriate id
Dim oldThread As clsThreadHandle
Set oldThread = threadGroup(threadID)
'remove from all collections
'taskIDset.Remove oldThread.Task remove from task id set if it was in there
threadGroup.Remove oldThread.Name
BusyThread = oldThread.Name 'remove from free collection
Set oldThread = Nothing
'iterate counters
openThreadCount = openThreadCount - 1
End Sub
Public Property Let Size(sizeFactor As Variant)
'property of the thread group which dictates how many processes to run in total
'size factor is either an iterable item, or an integer to dictate the size
'Check if size factor is number
If IsNumeric(sizeFactor) Then
'If so, size is that
iterableSize = CLng(sizeFactor)
passesArguments = False 'no argument to pass to thread, just run it a load of times
'If not, *check if iterable
ElseIf isIterable(sizeFactor) Then
'If so, size is size of collection from extration
Set iterableQueue = New Queue
iterableSize = addIterableToQueue(sizeFactor, iterableQueue)
passesArguments = True
Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If
End Property
Public Sub IncreaseSize(sizeFactor As Variant)
'method of threadGroup which adds more tasks to the queue, and immediately runs them
'size factor is either an iterable item, or an integer to dictate the size
'Check whether size is set yet
If Me.Size = 0 Then
Err.Raise 5, Description:="You must set Size before you can IncreaseSize"
End If
'check whether new data matches old type
If IsNumeric(sizeFactor) Then
If passesArguments Then
Err.Raise 5, Description:="Size factor type doesn't match original type"
Else
'is numeric and was numeric, grand
iterableSize = iterableSize + CLng(sizeFactor)
End If
ElseIf isIterable(sizeFactor) Then
If passesArguments Then
'was iterable and still is, great!
Dim itemsAdded As Long
itemsAdded = addIterableToQueue(sizeFactor, iterableQueue)
iterableSize = iterableSize + itemsAdded
Else
'wasn't iterble, now is
Err.Raise 5, Description:="Size factor type doesn't match original type"
End If
Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If
Me.Execute
End Sub
Public Property Set AsyncClass(ByVal workObj As IWorker) 'Set the worker who carries out the tasks
Set workerClass = workObj
End Property
Public Property Get AsyncClass() As IWorker
Set AsyncClass = workerClass
End Property
Public Property Get Size() As Variant
Size = iterableSize
End Property
Public Property Let autoQuit(ByVal value As Boolean)
autoQuitEnabled = value
End Property
Public Property Get NewHandleID() As String
NewHandleID = "Handle " & newThreadIndex
newThreadIndex = newThreadIndex + 1 'use next one next time
End Property
Private Property Get NewTaskID() As String
'generates new task, saves its ID to taskIDset, then bumps the task counter along one
NewTaskID = "Task " & newTaskIndex
taskIDset.Add newTaskIndex, NewTaskID 'add id to map
newTaskIndex = newTaskIndex + 1
End Property
Private Sub Class_Terminate()
'Set threadGroup = Nothing
Debug.Print "Terminating group"
RaiseEvent Closed(Now - startTime)
End Sub
Public Sub printState() 'for debugging
Debug.Print _
"State:"; vbCrLf _
; Space(5); "Threads open: "; openThreadCount; vbCrLf _
; Space(5); "Threads in use: "; openTaskCount; vbCrLf _
; Space(5); "Threads marked as free: "; freeThreads.Count; vbCrLf _
; Space(5); "Tasks remaining: "; iterableSize - successfulTaskCount; vbCrLf _
; Space(5); "Next task index: "; newTaskIndex
End Sub
Its key methods are doInstruction
(calling getInstruction
) and Size
and IncreaseSize
The class runs iteratively; each cycle the class finds out what to do and executes that (doInstruction
). doInstruction always calls itself unless it's told to do nothing, which allows the call stack to shrink back. There are several options for what to do each cycle
- Open a thread (create a new instance of
clsThreadHandle
and add to a collection of possible places to run tasks) - Close a thread (quit a handle and remove it from that collection)
- Run a task on a thread
- [Force quit a task - t.b. implemented]
- Do Nothing (allow call stack to go back to zero)
The getInstruction
method will tell the class to
- Open a thread if it doesn't exceed a max count, and if there are tasks to run on it
- Close a thread if there are no tasks left to run or if there are too many
- Run a task on a thread if there's a thread marked free
- Do nothing if there are no threads free, and there are the right number of threads open
Size
is what dictates the number of tasks to carry out
- If
Size
is numeric, the class will keep running tasks on threads until that number of tasks is run - If
Size
is iterable, then the class will keep running tasks and passing arguments by essentiallyFor...Each
ing through the iterable argument- This allows something like a url to be passed as an argument to each task, or even a range so that the worker knows where on the sheet to write its result to
IncreaseSize
is like Size
; it is useful if you want to drip feed tasks into the multithread set (say you are daisy chaining one onto the other using the first one's threadComplete
events). It increases the size of the numeric/iterable argument.
Thread Handles clsThreadHandle
The main class creates multiple instances of this thread handle class.
Option Explicit
'THREAD HANDLE BASE PROPERTIES
Private eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
Private taskID As String 'holds the id of the current task
Private handleID As String 'holds the id of this handle
Private handleArgs As Variant 'holds any arguments that need to be passed to the task
'THREAD EVENTS
Private WithEvents workerEvents As IWorkerEvents
Private workerObject As IWorker 'interface to whatever worker may be passed to thread
Private Sub workerEvents_Complete(returnVal As Variant)
eventHandle.NotifyComplete Me, returnVal
End Sub
Private Sub workerEvents_Started()
Debug.Print Me.Task; " started event was raised"
End Sub
Public Property Set Worker(ByVal workObj As IWorker)
Set workerObject = workObj.CreateNew 'set worker to be a copy of the passed one
Set workerEvents = New IWorkerEvents 'create event handler
Set workerObject.Events = workerEvents 'pass it to the worker so it can listen in
End Property
Public Sub OpenHandle(multiThreadGroup As clsMultiThread, delegate As clsHandleEvents)
'called when the handle is opened, sets the reference IDs of the string and the handle, as well as parent g
Set eventHandle = delegate
handleID = multiThreadGroup.NewHandleID
eventHandle.NotifyThreadOpened (Name)
Debug.Print Name; " was opened"
End Sub
Public Sub Execute(Optional args As Variant)
Debug.Print Task; " executed on "; Name; " with "; IIf(IsMissing(args), "no arguments", "some arguments")
workerObject.Execute args 'run the event
End Sub
Public Property Get Task() As String
Task = taskID
End Property
Public Property Let Task(val As String)
taskID = val
Debug.Print Name; "'s task was set to "; taskID
End Property
Public Property Get Name() As String
Name = handleID
End Property
Private Sub Class_Initialize()
Debug.Print "I'm made"
End Sub
Private Sub Class_Terminate()
eventHandle.NotifyThreadClosed (Me.Name)
Set eventHandle = Nothing
Set workerObject = Nothing
End Sub
Private Sub workerEvents_StatusChange(statusVal As Variant)
'not yet implemented, probably unnecessary
End Sub
I've chosen individual event handlers rather than a single one (like I did with clsHandleEvents
) because
- I find having an individual thread class for each task/worker object easier to picture mentally
- I intend to add a functionality where a worker can cache objects in its parent handle (such as an InternetExplorer application) to save re-initialising it between successive tasks on the same thread
- Having a single cache for each thread makes this simpler
Handle Events class clsHandleEvents
A reference to this class is held by each thread so that it can raise an event to the multiThread class, without directly holding a reference to it (this would mess up garbage collection I think)
Option Explicit
'class to convert calls from the thread handle into events which the multi thread group can tap into
Public Event Complete(obj As clsThreadHandle, returnVal As Variant)
Public Event Opened(threadID As String) 'when thread is actually opened
Public Event Closed(threadID As String) 'when thread is closed
Public Sub NotifyComplete(obj As clsThreadHandle, Optional returnVal As Variant)
RaiseEvent Complete(obj, returnVal)
End Sub
Public Sub NotifyThreadOpened(threadID As String)
RaiseEvent Opened(threadID)
End Sub
Public Sub NotifyThreadClosed(threadID As String)
RaiseEvent Closed(threadID)
End Sub
Private Sub Class_Terminate()
Debug.Print "Events Terminated"
End Sub
Interfaces
There are 2 interface
classes (well onlyIWorker
is really one, but I'm calling IWorkerEvents
one too, similar to this example)
IWorker
forms a general template of an asynchronous process you can run, which raises appropriate events as per IWorkerEvents
IWorker
Option Explicit
'class acts as interface for any thread task
'Execute runs the task
'Events are raised by the task if it interfaces properly
Public Property Set Events(ByRef value As IWorkerEvents)
End Property
Public Sub Execute(Optional argument As Variant)
End Sub
Public Function CreateNew() As IWorker
End Function
IWorkerEvents
Option Explicit
'class holds all the events that a thread task can raise
Public Event Complete(returnVal As Variant)
Public Event StatusChange(statusVal As Variant)
Public Event Started()
Public Sub Complete(Optional returnVal As Variant)
RaiseEvent Complete(returnVal)
End Sub
Public Sub StatusChange(statusVal As Variant)
RaiseEvent StatusChange(statusVal)
End Sub
Public Sub Started()
RaiseEvent Started
End Sub
Finally...
There's a module of supplementary functions that I don't especially need review on, but I'll include as they are required for the clsMultiThread
to execute
Option Explicit
Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue) As Long
'function to take iterable group and add it to the queue
'returns the number of items added
Dim item As Variant
Dim itemsAdded As Long
itemsAdded = 0
For Each item In iterator
resultQueue.enqueue item
itemsAdded = itemsAdded + 1
Next item
addIterableToQueue = itemsAdded
End Function
Function isIterable(obj As Variant) As Boolean
On Error Resume Next
Dim iterator As Variant
For Each iterator In obj
Exit For
Next
isIterable = Err.Number = 0
End Function
Test code
Don't need feedback on this stuff, except with regard to the way in which a worker
is implemented Download example file here
It just occurred to me that I hadn't actually included a worker to test this with. Well here's an example that uses an MSHTML
request to return an HTML document from a webpage. It takes a String
/Range
argument representing a url, and returns an HTMLDocument
. NB, this has to be imported
as it requires Attribute .VB_UserMemId = 0
as per this article
VERSION 1.0 CLASS
BEGIN
MultiUse = -1 'True
END
Attribute VB_Name = "clsHtmlWorker"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = False
Attribute VB_Exposed = False
Option Explicit
'''
'Basic worker object sends MSHTML GET request to webpage and returns an HTMLDocument or Nothing
'Requires reference to
' Microsoft HTML Object library (mshtml.tlb)
' Microsoft XML, v6.0 (msxml6.dll)
'''
Private httpRequest As MSXML2.XMLHTTP60
Implements IWorker
Private Type TWorker
Events As IWorkerEvents
End Type
Private this As TWorker
Private Function IWorker_CreateNew() As IWorker
Set IWorker_CreateNew = New clsHtmlWorker
End Function
Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property
Private Sub IWorker_Execute(Optional argument As Variant)
Started 'raise event to thread handle
'Do some task
sendRequest argument
End Sub
'''
'Event raising
'''
Private Sub Started()
If Not this.Events Is Nothing Then
this.Events.Started
End If
End Sub
Private Sub statusChange(ByVal statusText As String)
If Not this.Events Is Nothing Then
'status change is not fully implemented yet in clsMultiThread, I may get rid of it
this.Events.statusChange statusText
End If
End Sub
Private Sub Complete(Optional ByVal resultPage As HTMLDocument)
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
If Not this.Events Is Nothing Then
this.Events.Complete resultPage
End If
End Sub
Private Sub sendRequest(ByVal url As String)
'''
'Sub to open a new XMLHTTP request at a given url
'Also assigns OnReadyStateChange callback function to this class' default routine
'''
If httpRequest Is Nothing Then Set httpRequest = New MSXML2.XMLHTTP60
With httpRequest
'Assign callback function to handler class (by default property)
.OnReadyStateChange = Me
'open and send the request
.Open "GET", url, True
.send vbNullString
End With
End Sub
Public Sub OnReadyStateChange()
Attribute OnReadyStateChange.VB_UserMemId = 0
'''
'This is the default callback routine of the class
'''
With httpRequest
statusChange .statusText
If .ReadyState = 4 Then 'loaded
If .Status = 200 Then 'successful
'mark complete and pass document
Dim htmlDoc As HTMLDocument
Set htmlDoc = New HTMLDocument
htmlDoc.body.innerHTML = .responseText
Complete htmlDoc
Else 'unsuccessful
Complete
End If
End If
End With
End Sub
Private Sub Class_Terminate()
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
End Sub
A multi-thread group implementing it can be run in a caller class like codeReviewTest
. Sends requests to urls in A1:A10
, returns e-mails from those urls in neighbouring columns.
Option Explicit
'''
'This class creates and runs a new multithread instance which runs clsHtmlWorker
'When each HTMLDocument is complete, the class scans it for e-mails
'''
Private WithEvents multiThreadGroup As clsMultiThread
'clsMultiThread is async so must be declared separately (or in a doEvents loop)
Private Const REGEX_PATTERN As String = _
"(?:[a-z0-9!#$%&'*+/=?^_`~-]+(?:.[a-z0-9!#$%&'*+/=?^_`~-]+)*|""(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|\[x01-x09x0bx0cx0e-x7f])*"")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:(2(5[0-5]|[0-4])|1[0-9]|[0-9]?[0-9])).)3(?:(2(5[1-9]|[0-9])|1[0-5]|[0-4]?[0-9])|[0-9]*[0-9]:(?:[1-9]|\[0-9])+)])"
Public Sub run()
'urls to check for emails are in a1:a10
htmlRequestToUrls [a1:a10]
End Sub
Private Sub htmlRequestToUrls(urlCells As Range)
Set multiThreadGroup = New clsMultiThread
With multiThreadGroup
.Size = urlCells 'set iterable, here a load of urls
Set .AsyncClass = New clsHtmlWorker 'set async worker
.Execute 'run the group
End With
End Sub
Private Sub multiThreadGroup_TaskComplete(returnVal As Variant, taskID As String, threadID As String)
Dim rowI As Long, colI As Long
rowI = Right(taskID, Len(taskID) - 4)
If returnVal Is Nothing Then
Cells(rowI, 2) = "Error in loading page"
ElseIf TypeOf returnVal Is HTMLDocument Then
Dim emailMatches() As String
emailMatches = regexMatches(returnVal.body.innerText)
If (Not emailMatches) = -1 Then
'no emails on page
Cells(rowI, 2) = "No e-mail matches"
Else
For colI = LBound(emailMatches) To UBound(emailMatches)
Cells(rowI, colI + 2) = emailMatches(colI)
Next colI
End If
Else 'nothing returned
Cells(rowI, 2) = "Error in loading page"
End If
End Sub
Private Function regexMatches(strInput As String) As String()
Dim rMatch As Object
Dim s As String
Dim arrayMatches() As String
Dim i As Long
With CreateObject("VBScript.Regexp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = REGEX_PATTERN
If .test(strInput) Then
For Each rMatch In .Execute(strInput)
ReDim Preserve arrayMatches(i)
arrayMatches(i) = rMatch.value
i = i + 1
Next
End If
End With
regexMatches = arrayMatches
End Function
The test class will create a new multi thread group. The group will open the default 5 threads, on each thread it will create an instance of clsHtmlWorker
. It will convert the range [A1:A10]
into 10 arguments which it will pass, 1 at a time, to the workers on each thread when they aren't busy. Once all tasks are run the class will autoQuit
- cutting references to all sub classes, allowing it to go out of scope
You can download the example workbook if you want, works best with Rubberduck to organise folders. Test code is in the CodeReviewTestRunner
, or just hit the big button
beginner object-oriented multithreading vba asynchronous
I'm wondering why we have comments saying reference tomscorlib.dll
is needed yet I see noCreateObject
or even `GetObject calls to create something from that library? This is all pure VBA and because VBA is single-threaded, everything will run as one thread?
â this
Jan 16 at 12:52
@This 1)mscorlib.dll
I'm using early binding, andPrivate iterableQueue As mscorlib.Queue
inclsMultiThread
is what requires it. 2) True, to fully simulate multithreading in Excel you need to create multiple instances ofEXCEL.EXE
. However this project targets asynchronous processes specifically, as these don't run in Excel directly. Sure the handling is all single-threaded, but in internet applications the main overhead is in waiting for response to load. That can be done asynchronously, and with several instances in parallel. I hope that makes sense
â Greedo
Jan 16 at 14:06
I think whether this could run code asynchronously very much depends on the existence of a worker class that supports yielding its execution until some asynchronous work has finished. Without such a worker tasks would simply run to completion synchronously whenever they are executed.
â M.Doerner
Jan 16 at 14:52
@M.Doerner Exactly, anInternetExplorer.Application
or anXmlHttp
request (see this) article can be run asynchronously, outside Excel, using Events and the default property callback function hack respectively. My use ofIWorker
interface was designed to make sure workers raise events when complete, which is a sort of reminder that these should be async workers, not normal routines. Multithreading standard routines requires this sort of approach
â Greedo
Jan 16 at 15:25
1
@Raystafarian I've added a download file in the test area if that helps
â Greedo
Feb 12 at 16:16
 |Â
show 5 more comments
up vote
31
down vote
favorite
up vote
31
down vote
favorite
Intro
Multithreading tools exist in Excel - often to run macros in multiple instances of Excel, or to convert macros to vbscripts that can run independently. However I've often come across projects where I'd like to delegate multiple tasks out to async processes, and creating multiple instances of Excel itself is overkill for this.
After running into several projects where I needed to execute multiple async requests (think internet scraping, or radioactive decay simulation) I decided to make a general class that, when given an async process, can execute and handle several in parallel.
For example, this Daisy Test
makes a multithread group which sends html requests to all the urls in column B
. The first link of these google searches are returned to column C
in the order the responses arrive. That triggers a second group (daisy chained onto the 1st one's events) to send internet explorer request to that url, and these are returned in D
To clarify some of the comments; it should be noted that these requests are sent off in order (B1
,B2
,...), but return unordered (C2
first). That's because my class allows the threads to run in parallel (hence multithreading
). These are still only managed in a single Excel thread, but the requests are asynchronous and in different processes so are effectively running in other threads.
Summary
N.B. The term 'thread' will be used loosely here, with no reference to the actual processor. Instead, when I say 'thread' I am talking about a handler for a task (which is running in parallel to other tasks on other handlers)
The multi thread setup consists of a main clsMultiThread
parent class which controls the shape of the multi thread collection (i.e. how many tasks are running at any given time), as well as several clsThreadHandle
classes.
These thread handlers are each responsible for running an async task, and informing the parent multithread class when each task is finished.
Internally, the tasks are actually run by WorkerClass
objects, one worker for each thread. These receive input arguments from their parent threads, run their respective async task, and raise an event to their parent clsThreadHandle
when finished. The thread handle then passes this event, and any optional return value, back up to the main clsMultiThread
, whose job it is either to close the thread once it's done, or prompt the thread to run another task. The chain of command is summarised in the image below:
Feedback I'm after
- General feedback on structure, event handling, use of interfaces etc.
- Proper exiting (which I don't think I'm doing right now)
- User-side interface
- Whether this approach to the problem is appropriate and intuitive (passing worker classes etc.)
- Whether I'm missing some functionality that should be there
This is also the first project I've ever done with the primary aim of making something I can re-use (as well as the longest & most complicated bit of code I've written). For that reason I'd also greatly appreciate any comments on
- Coding style
- Use of comments
- Anything else I should bear in mind when working on such projects
Implementation
Main class clsMultiThread
Right, some code. Here's the main class which handles all the sub-classes
Option Explicit
'''
'VBA class to run multiple asynchronous processes
'Interfaces directly with clsThreadHandle
'Requires references to:
'mscrolib.dll
'''
'THREAD GROUP SHAPE PROPERTIES
Private threadGroup As New Collection 'holds all the treads
Private maxThreads As Long 'maximum number of threads that can be open
Private minThreads As Long '[minimum number of threads]
Private iterableQueue As mscorlib.Queue 'this item holds all the items from iterator set in queue
'replaces iterableGroup, newtaskindex, taskidset
Private iterableSize As Long 'number of items in iterable group or
Private passesArguments As Boolean 'true if iterableGroup exists
'THREAD GROUP REFERENCES
Private WithEvents threadEvents As clsHandleEvents 'Event object to raise events from each thread handle
Private workerClass As IWorker
'THREAD GROUP SETTINGS
Private autoQuitEnabled As Boolean 'dictates whether to quit on Complete event, should be false if daisychaining
'THREAD GROUP STATE PROPERTIES
Private openThreadCount As Long 'number of threads/handles currently open
Private openTaskCount As Long 'number of tasks running on those threads
Private closedTaskCount As Long 'number of threads closed (failed and successful)
Private successfulTaskCount As Long 'number of threads completed sucessfully
Private newThreadIndex As Long 'Iterator over handles (next new handle)
Private newTaskIndex As Long 'Iterator over open tasks (next thread to be started)
Private taskIDset As Collection 'Dictionary mapping taskIDs to iterableGroup location "REPLACE THIS. MERGE COLLECTION JUMBLES"
Private freeThreads As Collection 'holds all the free thread ids
'THREAD GROUP PERFORMANCE PROPERTIES
Private startTime As Date
'Private endTime As Date
'THREAD GROUP EVENTS
Public Event TaskComplete(returnVal As Variant, taskID As String, threadID As String) 'when a task is complete on a thread, maybe if failed
Public Event ThreadOpened(threadCount As Long, threadID As String) 'when a thread is opened, pass the new number of threads
Public Event ThreadClosed(threadCount As Long, threadID As String) 'when closed, pass closed thread ID
Public Event Complete(timeTaken As Date) 'when everything is (nearly) finished
Public Event Closed(timeTaken As Date) 'when entire group is closed
Public Event Opened(startTime As Date) 'when entire group is closed
'PRIVATE TYPES/ENUMS
Private Type Instruction 'instruction on what to do next, and any necessary arguments that can be passed
threadID As String
instructionBody As InstructionType
End Type
Private Enum InstructionType
mltCloseThread
mltOpenThread
mltSetTask
mltDoNothing
mltQuit
End Enum
Private Sub Class_Initialize()
'Set defaults
maxThreads = 5
minThreads = 1
newThreadIndex = 1
newTaskIndex = 1
autoQuitEnabled = True
Set threadEvents = New clsHandleEvents
Set taskIDset = New Collection
Set freeThreads = New Collection
startTime = Now
RaiseEvent Opened(startTime)
'''
'Test space
'''
End Sub
Private Sub threadEvents_Closed(threadID As String)
RaiseEvent ThreadClosed(openThreadCount, threadID)
End Sub
Private Sub threadEvents_Opened(threadID As String)
RaiseEvent ThreadOpened(openThreadCount, threadID)
End Sub
Private Sub threadEvents_Complete(obj As clsThreadHandle, returnVal As Variant) 'called when thread becomes free
'DO NOT mark as free here
RaiseEvent TaskComplete(returnVal, obj.Task, obj.Name) 'failed as boolean
openTaskCount = openTaskCount - 1
closedTaskCount = closedTaskCount + 1
successfulTaskCount = successfulTaskCount + 1 'could be unsuccessful too though
doInstructions obj.Name 'pass object name so it can be marked free
' If failed Then
' failedTaskCount = failedTaskCount + 1
' Else
' successfulTaskCount = successfulTaskCount + 1
' End If
End Sub
Public Sub Execute()
'check validity of user data, if valid, then execute task
If iterableSize = 0 Then
Err.Raise 5, Description:="You must set size argument to a non-zero value, or a non-empty iterable first"
ElseIf workerClass Is Nothing Then
Err.Raise 5, Description:="You must set the async class argument first"
Else
doInstructions
End If
End Sub
Public Sub Quit()
'Remove any references that would prevent proper closing
'Default automatically called when openThreadCount = 0
RaiseEvent Complete(Now - startTime)
Set threadEvents = Nothing
End Sub
Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction
'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID
'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy
BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If
Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select
'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
doInstructions loopcount:=loopcount + 1 'watch for infinite loop
End If
End Sub
Private Function getInstruction() As Instruction
'function to determine what action to take next
'called until do nothing returned
'caller to doinstructions can specify a free thread in which case some parts skipped
Dim results As Instruction 'variable to hold instruction and any arguments
Me.printState
'Do we need to open or close threads?
'Threads free? (threads open > tasks open):
If openThreadCount > openTaskCount Then
'Great we have a free thread, now use it or delete it (cos we have too many or no tasks remaining)
If newTaskIndex > iterableSize Then 'already passed all tasks
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
ElseIf openThreadCount <= maxThreads Then
'[find] & use free thread (run a task on it)
results.instructionBody = mltSetTask
results.threadID = freeThread
Else
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
End If
Else
'No threads free, either open one (if not exceeding max, and there's a task left to put on it)
'Or do nothing (can't close it if not free, shouldn't open new if no more tasks)
If openThreadCount < maxThreads And newTaskIndex <= iterableSize Then
results.instructionBody = mltOpenThread
ElseIf openThreadCount = 0 And autoQuitEnabled Then
results.instructionBody = mltQuit
Else
results.instructionBody = mltDoNothing
End If
End If
getInstruction = results
End Function
Private Sub openThread()
'opens a thread and assigns a task ID to it
Dim newThread As New clsThreadHandle 'create new handle
newThread.OpenHandle Me, threadEvents 'passes parent reference which allows handle to obtain thread ID
threadGroup.Add newThread, newThread.Name 'add it to the group with a new id (set by itself)
openThreadCount = openThreadCount + 1
freeThread = newThread.Name 'mark as free so task can be assigned to it
End Sub
Private Property Let freeThread(threadID As String)
'NOT WORKING"""""
'when a thread comes free, add it to the collection
freeThreads.Add threadID, threadID
Debug.Print threadID; " marked as free; now"; freeThreads.Count; "threads are free"
End Property
Private Property Let BusyThread(threadID As String)
'when a thread is not free or is closed, mark as busy by removing from free group
On Error Resume Next 'only remove ones what are there actually
freeThreads.Remove threadID
Debug.Print threadID; " marked as busy"; IIf(Err.Number <> 0, ", but wasn't in free group", vbNullString)
End Property
Private Property Get freeThread() As String
'gives up a free thread and adds it to the list
freeThread = freeThreads(1)
freeThreads.Remove (1)
End Property
Private Sub assignTaskID(threadID As String)
'@Ignore WriteOnlyProperty
'assigns task ID to thread
'nb does NOT actually run the task (this is instruction stage still)
Dim newThread As clsThreadHandle
Set newThread = threadGroup(threadID)
newThread.Task = NewTaskID
Set newThread.Worker = AsyncClass
End Sub
Private Sub closeThread(threadID As String, Optional failed As Boolean = False)
'close thread with appropriate id
Dim oldThread As clsThreadHandle
Set oldThread = threadGroup(threadID)
'remove from all collections
'taskIDset.Remove oldThread.Task remove from task id set if it was in there
threadGroup.Remove oldThread.Name
BusyThread = oldThread.Name 'remove from free collection
Set oldThread = Nothing
'iterate counters
openThreadCount = openThreadCount - 1
End Sub
Public Property Let Size(sizeFactor As Variant)
'property of the thread group which dictates how many processes to run in total
'size factor is either an iterable item, or an integer to dictate the size
'Check if size factor is number
If IsNumeric(sizeFactor) Then
'If so, size is that
iterableSize = CLng(sizeFactor)
passesArguments = False 'no argument to pass to thread, just run it a load of times
'If not, *check if iterable
ElseIf isIterable(sizeFactor) Then
'If so, size is size of collection from extration
Set iterableQueue = New Queue
iterableSize = addIterableToQueue(sizeFactor, iterableQueue)
passesArguments = True
Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If
End Property
Public Sub IncreaseSize(sizeFactor As Variant)
'method of threadGroup which adds more tasks to the queue, and immediately runs them
'size factor is either an iterable item, or an integer to dictate the size
'Check whether size is set yet
If Me.Size = 0 Then
Err.Raise 5, Description:="You must set Size before you can IncreaseSize"
End If
'check whether new data matches old type
If IsNumeric(sizeFactor) Then
If passesArguments Then
Err.Raise 5, Description:="Size factor type doesn't match original type"
Else
'is numeric and was numeric, grand
iterableSize = iterableSize + CLng(sizeFactor)
End If
ElseIf isIterable(sizeFactor) Then
If passesArguments Then
'was iterable and still is, great!
Dim itemsAdded As Long
itemsAdded = addIterableToQueue(sizeFactor, iterableQueue)
iterableSize = iterableSize + itemsAdded
Else
'wasn't iterble, now is
Err.Raise 5, Description:="Size factor type doesn't match original type"
End If
Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If
Me.Execute
End Sub
Public Property Set AsyncClass(ByVal workObj As IWorker) 'Set the worker who carries out the tasks
Set workerClass = workObj
End Property
Public Property Get AsyncClass() As IWorker
Set AsyncClass = workerClass
End Property
Public Property Get Size() As Variant
Size = iterableSize
End Property
Public Property Let autoQuit(ByVal value As Boolean)
autoQuitEnabled = value
End Property
Public Property Get NewHandleID() As String
NewHandleID = "Handle " & newThreadIndex
newThreadIndex = newThreadIndex + 1 'use next one next time
End Property
Private Property Get NewTaskID() As String
'generates new task, saves its ID to taskIDset, then bumps the task counter along one
NewTaskID = "Task " & newTaskIndex
taskIDset.Add newTaskIndex, NewTaskID 'add id to map
newTaskIndex = newTaskIndex + 1
End Property
Private Sub Class_Terminate()
'Set threadGroup = Nothing
Debug.Print "Terminating group"
RaiseEvent Closed(Now - startTime)
End Sub
Public Sub printState() 'for debugging
Debug.Print _
"State:"; vbCrLf _
; Space(5); "Threads open: "; openThreadCount; vbCrLf _
; Space(5); "Threads in use: "; openTaskCount; vbCrLf _
; Space(5); "Threads marked as free: "; freeThreads.Count; vbCrLf _
; Space(5); "Tasks remaining: "; iterableSize - successfulTaskCount; vbCrLf _
; Space(5); "Next task index: "; newTaskIndex
End Sub
Its key methods are doInstruction
(calling getInstruction
) and Size
and IncreaseSize
The class runs iteratively; each cycle the class finds out what to do and executes that (doInstruction
). doInstruction always calls itself unless it's told to do nothing, which allows the call stack to shrink back. There are several options for what to do each cycle
- Open a thread (create a new instance of
clsThreadHandle
and add to a collection of possible places to run tasks) - Close a thread (quit a handle and remove it from that collection)
- Run a task on a thread
- [Force quit a task - t.b. implemented]
- Do Nothing (allow call stack to go back to zero)
The getInstruction
method will tell the class to
- Open a thread if it doesn't exceed a max count, and if there are tasks to run on it
- Close a thread if there are no tasks left to run or if there are too many
- Run a task on a thread if there's a thread marked free
- Do nothing if there are no threads free, and there are the right number of threads open
Size
is what dictates the number of tasks to carry out
- If
Size
is numeric, the class will keep running tasks on threads until that number of tasks is run - If
Size
is iterable, then the class will keep running tasks and passing arguments by essentiallyFor...Each
ing through the iterable argument- This allows something like a url to be passed as an argument to each task, or even a range so that the worker knows where on the sheet to write its result to
IncreaseSize
is like Size
; it is useful if you want to drip feed tasks into the multithread set (say you are daisy chaining one onto the other using the first one's threadComplete
events). It increases the size of the numeric/iterable argument.
Thread Handles clsThreadHandle
The main class creates multiple instances of this thread handle class.
Option Explicit
'THREAD HANDLE BASE PROPERTIES
Private eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
Private taskID As String 'holds the id of the current task
Private handleID As String 'holds the id of this handle
Private handleArgs As Variant 'holds any arguments that need to be passed to the task
'THREAD EVENTS
Private WithEvents workerEvents As IWorkerEvents
Private workerObject As IWorker 'interface to whatever worker may be passed to thread
Private Sub workerEvents_Complete(returnVal As Variant)
eventHandle.NotifyComplete Me, returnVal
End Sub
Private Sub workerEvents_Started()
Debug.Print Me.Task; " started event was raised"
End Sub
Public Property Set Worker(ByVal workObj As IWorker)
Set workerObject = workObj.CreateNew 'set worker to be a copy of the passed one
Set workerEvents = New IWorkerEvents 'create event handler
Set workerObject.Events = workerEvents 'pass it to the worker so it can listen in
End Property
Public Sub OpenHandle(multiThreadGroup As clsMultiThread, delegate As clsHandleEvents)
'called when the handle is opened, sets the reference IDs of the string and the handle, as well as parent g
Set eventHandle = delegate
handleID = multiThreadGroup.NewHandleID
eventHandle.NotifyThreadOpened (Name)
Debug.Print Name; " was opened"
End Sub
Public Sub Execute(Optional args As Variant)
Debug.Print Task; " executed on "; Name; " with "; IIf(IsMissing(args), "no arguments", "some arguments")
workerObject.Execute args 'run the event
End Sub
Public Property Get Task() As String
Task = taskID
End Property
Public Property Let Task(val As String)
taskID = val
Debug.Print Name; "'s task was set to "; taskID
End Property
Public Property Get Name() As String
Name = handleID
End Property
Private Sub Class_Initialize()
Debug.Print "I'm made"
End Sub
Private Sub Class_Terminate()
eventHandle.NotifyThreadClosed (Me.Name)
Set eventHandle = Nothing
Set workerObject = Nothing
End Sub
Private Sub workerEvents_StatusChange(statusVal As Variant)
'not yet implemented, probably unnecessary
End Sub
I've chosen individual event handlers rather than a single one (like I did with clsHandleEvents
) because
- I find having an individual thread class for each task/worker object easier to picture mentally
- I intend to add a functionality where a worker can cache objects in its parent handle (such as an InternetExplorer application) to save re-initialising it between successive tasks on the same thread
- Having a single cache for each thread makes this simpler
Handle Events class clsHandleEvents
A reference to this class is held by each thread so that it can raise an event to the multiThread class, without directly holding a reference to it (this would mess up garbage collection I think)
Option Explicit
'class to convert calls from the thread handle into events which the multi thread group can tap into
Public Event Complete(obj As clsThreadHandle, returnVal As Variant)
Public Event Opened(threadID As String) 'when thread is actually opened
Public Event Closed(threadID As String) 'when thread is closed
Public Sub NotifyComplete(obj As clsThreadHandle, Optional returnVal As Variant)
RaiseEvent Complete(obj, returnVal)
End Sub
Public Sub NotifyThreadOpened(threadID As String)
RaiseEvent Opened(threadID)
End Sub
Public Sub NotifyThreadClosed(threadID As String)
RaiseEvent Closed(threadID)
End Sub
Private Sub Class_Terminate()
Debug.Print "Events Terminated"
End Sub
Interfaces
There are 2 interface
classes (well onlyIWorker
is really one, but I'm calling IWorkerEvents
one too, similar to this example)
IWorker
forms a general template of an asynchronous process you can run, which raises appropriate events as per IWorkerEvents
IWorker
Option Explicit
'class acts as interface for any thread task
'Execute runs the task
'Events are raised by the task if it interfaces properly
Public Property Set Events(ByRef value As IWorkerEvents)
End Property
Public Sub Execute(Optional argument As Variant)
End Sub
Public Function CreateNew() As IWorker
End Function
IWorkerEvents
Option Explicit
'class holds all the events that a thread task can raise
Public Event Complete(returnVal As Variant)
Public Event StatusChange(statusVal As Variant)
Public Event Started()
Public Sub Complete(Optional returnVal As Variant)
RaiseEvent Complete(returnVal)
End Sub
Public Sub StatusChange(statusVal As Variant)
RaiseEvent StatusChange(statusVal)
End Sub
Public Sub Started()
RaiseEvent Started
End Sub
Finally...
There's a module of supplementary functions that I don't especially need review on, but I'll include as they are required for the clsMultiThread
to execute
Option Explicit
Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue) As Long
'function to take iterable group and add it to the queue
'returns the number of items added
Dim item As Variant
Dim itemsAdded As Long
itemsAdded = 0
For Each item In iterator
resultQueue.enqueue item
itemsAdded = itemsAdded + 1
Next item
addIterableToQueue = itemsAdded
End Function
Function isIterable(obj As Variant) As Boolean
On Error Resume Next
Dim iterator As Variant
For Each iterator In obj
Exit For
Next
isIterable = Err.Number = 0
End Function
Test code
Don't need feedback on this stuff, except with regard to the way in which a worker
is implemented Download example file here
It just occurred to me that I hadn't actually included a worker to test this with. Well here's an example that uses an MSHTML
request to return an HTML document from a webpage. It takes a String
/Range
argument representing a url, and returns an HTMLDocument
. NB, this has to be imported
as it requires Attribute .VB_UserMemId = 0
as per this article
VERSION 1.0 CLASS
BEGIN
MultiUse = -1 'True
END
Attribute VB_Name = "clsHtmlWorker"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = False
Attribute VB_Exposed = False
Option Explicit
'''
'Basic worker object sends MSHTML GET request to webpage and returns an HTMLDocument or Nothing
'Requires reference to
' Microsoft HTML Object library (mshtml.tlb)
' Microsoft XML, v6.0 (msxml6.dll)
'''
Private httpRequest As MSXML2.XMLHTTP60
Implements IWorker
Private Type TWorker
Events As IWorkerEvents
End Type
Private this As TWorker
Private Function IWorker_CreateNew() As IWorker
Set IWorker_CreateNew = New clsHtmlWorker
End Function
Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property
Private Sub IWorker_Execute(Optional argument As Variant)
Started 'raise event to thread handle
'Do some task
sendRequest argument
End Sub
'''
'Event raising
'''
Private Sub Started()
If Not this.Events Is Nothing Then
this.Events.Started
End If
End Sub
Private Sub statusChange(ByVal statusText As String)
If Not this.Events Is Nothing Then
'status change is not fully implemented yet in clsMultiThread, I may get rid of it
this.Events.statusChange statusText
End If
End Sub
Private Sub Complete(Optional ByVal resultPage As HTMLDocument)
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
If Not this.Events Is Nothing Then
this.Events.Complete resultPage
End If
End Sub
Private Sub sendRequest(ByVal url As String)
'''
'Sub to open a new XMLHTTP request at a given url
'Also assigns OnReadyStateChange callback function to this class' default routine
'''
If httpRequest Is Nothing Then Set httpRequest = New MSXML2.XMLHTTP60
With httpRequest
'Assign callback function to handler class (by default property)
.OnReadyStateChange = Me
'open and send the request
.Open "GET", url, True
.send vbNullString
End With
End Sub
Public Sub OnReadyStateChange()
Attribute OnReadyStateChange.VB_UserMemId = 0
'''
'This is the default callback routine of the class
'''
With httpRequest
statusChange .statusText
If .ReadyState = 4 Then 'loaded
If .Status = 200 Then 'successful
'mark complete and pass document
Dim htmlDoc As HTMLDocument
Set htmlDoc = New HTMLDocument
htmlDoc.body.innerHTML = .responseText
Complete htmlDoc
Else 'unsuccessful
Complete
End If
End If
End With
End Sub
Private Sub Class_Terminate()
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
End Sub
A multi-thread group implementing it can be run in a caller class like codeReviewTest
. Sends requests to urls in A1:A10
, returns e-mails from those urls in neighbouring columns.
Option Explicit
'''
'This class creates and runs a new multithread instance which runs clsHtmlWorker
'When each HTMLDocument is complete, the class scans it for e-mails
'''
Private WithEvents multiThreadGroup As clsMultiThread
'clsMultiThread is async so must be declared separately (or in a doEvents loop)
Private Const REGEX_PATTERN As String = _
"(?:[a-z0-9!#$%&'*+/=?^_`~-]+(?:.[a-z0-9!#$%&'*+/=?^_`~-]+)*|""(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|\[x01-x09x0bx0cx0e-x7f])*"")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:(2(5[0-5]|[0-4])|1[0-9]|[0-9]?[0-9])).)3(?:(2(5[1-9]|[0-9])|1[0-5]|[0-4]?[0-9])|[0-9]*[0-9]:(?:[1-9]|\[0-9])+)])"
Public Sub run()
'urls to check for emails are in a1:a10
htmlRequestToUrls [a1:a10]
End Sub
Private Sub htmlRequestToUrls(urlCells As Range)
Set multiThreadGroup = New clsMultiThread
With multiThreadGroup
.Size = urlCells 'set iterable, here a load of urls
Set .AsyncClass = New clsHtmlWorker 'set async worker
.Execute 'run the group
End With
End Sub
Private Sub multiThreadGroup_TaskComplete(returnVal As Variant, taskID As String, threadID As String)
Dim rowI As Long, colI As Long
rowI = Right(taskID, Len(taskID) - 4)
If returnVal Is Nothing Then
Cells(rowI, 2) = "Error in loading page"
ElseIf TypeOf returnVal Is HTMLDocument Then
Dim emailMatches() As String
emailMatches = regexMatches(returnVal.body.innerText)
If (Not emailMatches) = -1 Then
'no emails on page
Cells(rowI, 2) = "No e-mail matches"
Else
For colI = LBound(emailMatches) To UBound(emailMatches)
Cells(rowI, colI + 2) = emailMatches(colI)
Next colI
End If
Else 'nothing returned
Cells(rowI, 2) = "Error in loading page"
End If
End Sub
Private Function regexMatches(strInput As String) As String()
Dim rMatch As Object
Dim s As String
Dim arrayMatches() As String
Dim i As Long
With CreateObject("VBScript.Regexp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = REGEX_PATTERN
If .test(strInput) Then
For Each rMatch In .Execute(strInput)
ReDim Preserve arrayMatches(i)
arrayMatches(i) = rMatch.value
i = i + 1
Next
End If
End With
regexMatches = arrayMatches
End Function
The test class will create a new multi thread group. The group will open the default 5 threads, on each thread it will create an instance of clsHtmlWorker
. It will convert the range [A1:A10]
into 10 arguments which it will pass, 1 at a time, to the workers on each thread when they aren't busy. Once all tasks are run the class will autoQuit
- cutting references to all sub classes, allowing it to go out of scope
You can download the example workbook if you want, works best with Rubberduck to organise folders. Test code is in the CodeReviewTestRunner
, or just hit the big button
beginner object-oriented multithreading vba asynchronous
Intro
Multithreading tools exist in Excel - often to run macros in multiple instances of Excel, or to convert macros to vbscripts that can run independently. However I've often come across projects where I'd like to delegate multiple tasks out to async processes, and creating multiple instances of Excel itself is overkill for this.
After running into several projects where I needed to execute multiple async requests (think internet scraping, or radioactive decay simulation) I decided to make a general class that, when given an async process, can execute and handle several in parallel.
For example, this Daisy Test
makes a multithread group which sends html requests to all the urls in column B
. The first link of these google searches are returned to column C
in the order the responses arrive. That triggers a second group (daisy chained onto the 1st one's events) to send internet explorer request to that url, and these are returned in D
To clarify some of the comments; it should be noted that these requests are sent off in order (B1
,B2
,...), but return unordered (C2
first). That's because my class allows the threads to run in parallel (hence multithreading
). These are still only managed in a single Excel thread, but the requests are asynchronous and in different processes so are effectively running in other threads.
Summary
N.B. The term 'thread' will be used loosely here, with no reference to the actual processor. Instead, when I say 'thread' I am talking about a handler for a task (which is running in parallel to other tasks on other handlers)
The multi thread setup consists of a main clsMultiThread
parent class which controls the shape of the multi thread collection (i.e. how many tasks are running at any given time), as well as several clsThreadHandle
classes.
These thread handlers are each responsible for running an async task, and informing the parent multithread class when each task is finished.
Internally, the tasks are actually run by WorkerClass
objects, one worker for each thread. These receive input arguments from their parent threads, run their respective async task, and raise an event to their parent clsThreadHandle
when finished. The thread handle then passes this event, and any optional return value, back up to the main clsMultiThread
, whose job it is either to close the thread once it's done, or prompt the thread to run another task. The chain of command is summarised in the image below:
Feedback I'm after
- General feedback on structure, event handling, use of interfaces etc.
- Proper exiting (which I don't think I'm doing right now)
- User-side interface
- Whether this approach to the problem is appropriate and intuitive (passing worker classes etc.)
- Whether I'm missing some functionality that should be there
This is also the first project I've ever done with the primary aim of making something I can re-use (as well as the longest & most complicated bit of code I've written). For that reason I'd also greatly appreciate any comments on
- Coding style
- Use of comments
- Anything else I should bear in mind when working on such projects
Implementation
Main class clsMultiThread
Right, some code. Here's the main class which handles all the sub-classes
Option Explicit
'''
'VBA class to run multiple asynchronous processes
'Interfaces directly with clsThreadHandle
'Requires references to:
'mscrolib.dll
'''
'THREAD GROUP SHAPE PROPERTIES
Private threadGroup As New Collection 'holds all the treads
Private maxThreads As Long 'maximum number of threads that can be open
Private minThreads As Long '[minimum number of threads]
Private iterableQueue As mscorlib.Queue 'this item holds all the items from iterator set in queue
'replaces iterableGroup, newtaskindex, taskidset
Private iterableSize As Long 'number of items in iterable group or
Private passesArguments As Boolean 'true if iterableGroup exists
'THREAD GROUP REFERENCES
Private WithEvents threadEvents As clsHandleEvents 'Event object to raise events from each thread handle
Private workerClass As IWorker
'THREAD GROUP SETTINGS
Private autoQuitEnabled As Boolean 'dictates whether to quit on Complete event, should be false if daisychaining
'THREAD GROUP STATE PROPERTIES
Private openThreadCount As Long 'number of threads/handles currently open
Private openTaskCount As Long 'number of tasks running on those threads
Private closedTaskCount As Long 'number of threads closed (failed and successful)
Private successfulTaskCount As Long 'number of threads completed sucessfully
Private newThreadIndex As Long 'Iterator over handles (next new handle)
Private newTaskIndex As Long 'Iterator over open tasks (next thread to be started)
Private taskIDset As Collection 'Dictionary mapping taskIDs to iterableGroup location "REPLACE THIS. MERGE COLLECTION JUMBLES"
Private freeThreads As Collection 'holds all the free thread ids
'THREAD GROUP PERFORMANCE PROPERTIES
Private startTime As Date
'Private endTime As Date
'THREAD GROUP EVENTS
Public Event TaskComplete(returnVal As Variant, taskID As String, threadID As String) 'when a task is complete on a thread, maybe if failed
Public Event ThreadOpened(threadCount As Long, threadID As String) 'when a thread is opened, pass the new number of threads
Public Event ThreadClosed(threadCount As Long, threadID As String) 'when closed, pass closed thread ID
Public Event Complete(timeTaken As Date) 'when everything is (nearly) finished
Public Event Closed(timeTaken As Date) 'when entire group is closed
Public Event Opened(startTime As Date) 'when entire group is closed
'PRIVATE TYPES/ENUMS
Private Type Instruction 'instruction on what to do next, and any necessary arguments that can be passed
threadID As String
instructionBody As InstructionType
End Type
Private Enum InstructionType
mltCloseThread
mltOpenThread
mltSetTask
mltDoNothing
mltQuit
End Enum
Private Sub Class_Initialize()
'Set defaults
maxThreads = 5
minThreads = 1
newThreadIndex = 1
newTaskIndex = 1
autoQuitEnabled = True
Set threadEvents = New clsHandleEvents
Set taskIDset = New Collection
Set freeThreads = New Collection
startTime = Now
RaiseEvent Opened(startTime)
'''
'Test space
'''
End Sub
Private Sub threadEvents_Closed(threadID As String)
RaiseEvent ThreadClosed(openThreadCount, threadID)
End Sub
Private Sub threadEvents_Opened(threadID As String)
RaiseEvent ThreadOpened(openThreadCount, threadID)
End Sub
Private Sub threadEvents_Complete(obj As clsThreadHandle, returnVal As Variant) 'called when thread becomes free
'DO NOT mark as free here
RaiseEvent TaskComplete(returnVal, obj.Task, obj.Name) 'failed as boolean
openTaskCount = openTaskCount - 1
closedTaskCount = closedTaskCount + 1
successfulTaskCount = successfulTaskCount + 1 'could be unsuccessful too though
doInstructions obj.Name 'pass object name so it can be marked free
' If failed Then
' failedTaskCount = failedTaskCount + 1
' Else
' successfulTaskCount = successfulTaskCount + 1
' End If
End Sub
Public Sub Execute()
'check validity of user data, if valid, then execute task
If iterableSize = 0 Then
Err.Raise 5, Description:="You must set size argument to a non-zero value, or a non-empty iterable first"
ElseIf workerClass Is Nothing Then
Err.Raise 5, Description:="You must set the async class argument first"
Else
doInstructions
End If
End Sub
Public Sub Quit()
'Remove any references that would prevent proper closing
'Default automatically called when openThreadCount = 0
RaiseEvent Complete(Now - startTime)
Set threadEvents = Nothing
End Sub
Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction
'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID
'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy
BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If
Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select
'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
doInstructions loopcount:=loopcount + 1 'watch for infinite loop
End If
End Sub
Private Function getInstruction() As Instruction
'function to determine what action to take next
'called until do nothing returned
'caller to doinstructions can specify a free thread in which case some parts skipped
Dim results As Instruction 'variable to hold instruction and any arguments
Me.printState
'Do we need to open or close threads?
'Threads free? (threads open > tasks open):
If openThreadCount > openTaskCount Then
'Great we have a free thread, now use it or delete it (cos we have too many or no tasks remaining)
If newTaskIndex > iterableSize Then 'already passed all tasks
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
ElseIf openThreadCount <= maxThreads Then
'[find] & use free thread (run a task on it)
results.instructionBody = mltSetTask
results.threadID = freeThread
Else
'[find] & close free thread
results.instructionBody = mltCloseThread
results.threadID = freeThread
End If
Else
'No threads free, either open one (if not exceeding max, and there's a task left to put on it)
'Or do nothing (can't close it if not free, shouldn't open new if no more tasks)
If openThreadCount < maxThreads And newTaskIndex <= iterableSize Then
results.instructionBody = mltOpenThread
ElseIf openThreadCount = 0 And autoQuitEnabled Then
results.instructionBody = mltQuit
Else
results.instructionBody = mltDoNothing
End If
End If
getInstruction = results
End Function
Private Sub openThread()
'opens a thread and assigns a task ID to it
Dim newThread As New clsThreadHandle 'create new handle
newThread.OpenHandle Me, threadEvents 'passes parent reference which allows handle to obtain thread ID
threadGroup.Add newThread, newThread.Name 'add it to the group with a new id (set by itself)
openThreadCount = openThreadCount + 1
freeThread = newThread.Name 'mark as free so task can be assigned to it
End Sub
Private Property Let freeThread(threadID As String)
'NOT WORKING"""""
'when a thread comes free, add it to the collection
freeThreads.Add threadID, threadID
Debug.Print threadID; " marked as free; now"; freeThreads.Count; "threads are free"
End Property
Private Property Let BusyThread(threadID As String)
'when a thread is not free or is closed, mark as busy by removing from free group
On Error Resume Next 'only remove ones what are there actually
freeThreads.Remove threadID
Debug.Print threadID; " marked as busy"; IIf(Err.Number <> 0, ", but wasn't in free group", vbNullString)
End Property
Private Property Get freeThread() As String
'gives up a free thread and adds it to the list
freeThread = freeThreads(1)
freeThreads.Remove (1)
End Property
Private Sub assignTaskID(threadID As String)
'@Ignore WriteOnlyProperty
'assigns task ID to thread
'nb does NOT actually run the task (this is instruction stage still)
Dim newThread As clsThreadHandle
Set newThread = threadGroup(threadID)
newThread.Task = NewTaskID
Set newThread.Worker = AsyncClass
End Sub
Private Sub closeThread(threadID As String, Optional failed As Boolean = False)
'close thread with appropriate id
Dim oldThread As clsThreadHandle
Set oldThread = threadGroup(threadID)
'remove from all collections
'taskIDset.Remove oldThread.Task remove from task id set if it was in there
threadGroup.Remove oldThread.Name
BusyThread = oldThread.Name 'remove from free collection
Set oldThread = Nothing
'iterate counters
openThreadCount = openThreadCount - 1
End Sub
Public Property Let Size(sizeFactor As Variant)
'property of the thread group which dictates how many processes to run in total
'size factor is either an iterable item, or an integer to dictate the size
'Check if size factor is number
If IsNumeric(sizeFactor) Then
'If so, size is that
iterableSize = CLng(sizeFactor)
passesArguments = False 'no argument to pass to thread, just run it a load of times
'If not, *check if iterable
ElseIf isIterable(sizeFactor) Then
'If so, size is size of collection from extration
Set iterableQueue = New Queue
iterableSize = addIterableToQueue(sizeFactor, iterableQueue)
passesArguments = True
Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If
End Property
Public Sub IncreaseSize(sizeFactor As Variant)
'method of threadGroup which adds more tasks to the queue, and immediately runs them
'size factor is either an iterable item, or an integer to dictate the size
'Check whether size is set yet
If Me.Size = 0 Then
Err.Raise 5, Description:="You must set Size before you can IncreaseSize"
End If
'check whether new data matches old type
If IsNumeric(sizeFactor) Then
If passesArguments Then
Err.Raise 5, Description:="Size factor type doesn't match original type"
Else
'is numeric and was numeric, grand
iterableSize = iterableSize + CLng(sizeFactor)
End If
ElseIf isIterable(sizeFactor) Then
If passesArguments Then
'was iterable and still is, great!
Dim itemsAdded As Long
itemsAdded = addIterableToQueue(sizeFactor, iterableQueue)
iterableSize = iterableSize + itemsAdded
Else
'wasn't iterble, now is
Err.Raise 5, Description:="Size factor type doesn't match original type"
End If
Else
'[if not, raise error]
Err.Raise 5 'invalid argument
End If
Me.Execute
End Sub
Public Property Set AsyncClass(ByVal workObj As IWorker) 'Set the worker who carries out the tasks
Set workerClass = workObj
End Property
Public Property Get AsyncClass() As IWorker
Set AsyncClass = workerClass
End Property
Public Property Get Size() As Variant
Size = iterableSize
End Property
Public Property Let autoQuit(ByVal value As Boolean)
autoQuitEnabled = value
End Property
Public Property Get NewHandleID() As String
NewHandleID = "Handle " & newThreadIndex
newThreadIndex = newThreadIndex + 1 'use next one next time
End Property
Private Property Get NewTaskID() As String
'generates new task, saves its ID to taskIDset, then bumps the task counter along one
NewTaskID = "Task " & newTaskIndex
taskIDset.Add newTaskIndex, NewTaskID 'add id to map
newTaskIndex = newTaskIndex + 1
End Property
Private Sub Class_Terminate()
'Set threadGroup = Nothing
Debug.Print "Terminating group"
RaiseEvent Closed(Now - startTime)
End Sub
Public Sub printState() 'for debugging
Debug.Print _
"State:"; vbCrLf _
; Space(5); "Threads open: "; openThreadCount; vbCrLf _
; Space(5); "Threads in use: "; openTaskCount; vbCrLf _
; Space(5); "Threads marked as free: "; freeThreads.Count; vbCrLf _
; Space(5); "Tasks remaining: "; iterableSize - successfulTaskCount; vbCrLf _
; Space(5); "Next task index: "; newTaskIndex
End Sub
Its key methods are doInstruction
(calling getInstruction
) and Size
and IncreaseSize
The class runs iteratively; each cycle the class finds out what to do and executes that (doInstruction
). doInstruction always calls itself unless it's told to do nothing, which allows the call stack to shrink back. There are several options for what to do each cycle
- Open a thread (create a new instance of
clsThreadHandle
and add to a collection of possible places to run tasks) - Close a thread (quit a handle and remove it from that collection)
- Run a task on a thread
- [Force quit a task - t.b. implemented]
- Do Nothing (allow call stack to go back to zero)
The getInstruction
method will tell the class to
- Open a thread if it doesn't exceed a max count, and if there are tasks to run on it
- Close a thread if there are no tasks left to run or if there are too many
- Run a task on a thread if there's a thread marked free
- Do nothing if there are no threads free, and there are the right number of threads open
Size
is what dictates the number of tasks to carry out
- If
Size
is numeric, the class will keep running tasks on threads until that number of tasks is run - If
Size
is iterable, then the class will keep running tasks and passing arguments by essentiallyFor...Each
ing through the iterable argument- This allows something like a url to be passed as an argument to each task, or even a range so that the worker knows where on the sheet to write its result to
IncreaseSize
is like Size
; it is useful if you want to drip feed tasks into the multithread set (say you are daisy chaining one onto the other using the first one's threadComplete
events). It increases the size of the numeric/iterable argument.
Thread Handles clsThreadHandle
The main class creates multiple instances of this thread handle class.
Option Explicit
'THREAD HANDLE BASE PROPERTIES
Private eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
Private taskID As String 'holds the id of the current task
Private handleID As String 'holds the id of this handle
Private handleArgs As Variant 'holds any arguments that need to be passed to the task
'THREAD EVENTS
Private WithEvents workerEvents As IWorkerEvents
Private workerObject As IWorker 'interface to whatever worker may be passed to thread
Private Sub workerEvents_Complete(returnVal As Variant)
eventHandle.NotifyComplete Me, returnVal
End Sub
Private Sub workerEvents_Started()
Debug.Print Me.Task; " started event was raised"
End Sub
Public Property Set Worker(ByVal workObj As IWorker)
Set workerObject = workObj.CreateNew 'set worker to be a copy of the passed one
Set workerEvents = New IWorkerEvents 'create event handler
Set workerObject.Events = workerEvents 'pass it to the worker so it can listen in
End Property
Public Sub OpenHandle(multiThreadGroup As clsMultiThread, delegate As clsHandleEvents)
'called when the handle is opened, sets the reference IDs of the string and the handle, as well as parent g
Set eventHandle = delegate
handleID = multiThreadGroup.NewHandleID
eventHandle.NotifyThreadOpened (Name)
Debug.Print Name; " was opened"
End Sub
Public Sub Execute(Optional args As Variant)
Debug.Print Task; " executed on "; Name; " with "; IIf(IsMissing(args), "no arguments", "some arguments")
workerObject.Execute args 'run the event
End Sub
Public Property Get Task() As String
Task = taskID
End Property
Public Property Let Task(val As String)
taskID = val
Debug.Print Name; "'s task was set to "; taskID
End Property
Public Property Get Name() As String
Name = handleID
End Property
Private Sub Class_Initialize()
Debug.Print "I'm made"
End Sub
Private Sub Class_Terminate()
eventHandle.NotifyThreadClosed (Me.Name)
Set eventHandle = Nothing
Set workerObject = Nothing
End Sub
Private Sub workerEvents_StatusChange(statusVal As Variant)
'not yet implemented, probably unnecessary
End Sub
I've chosen individual event handlers rather than a single one (like I did with clsHandleEvents
) because
- I find having an individual thread class for each task/worker object easier to picture mentally
- I intend to add a functionality where a worker can cache objects in its parent handle (such as an InternetExplorer application) to save re-initialising it between successive tasks on the same thread
- Having a single cache for each thread makes this simpler
Handle Events class clsHandleEvents
A reference to this class is held by each thread so that it can raise an event to the multiThread class, without directly holding a reference to it (this would mess up garbage collection I think)
Option Explicit
'class to convert calls from the thread handle into events which the multi thread group can tap into
Public Event Complete(obj As clsThreadHandle, returnVal As Variant)
Public Event Opened(threadID As String) 'when thread is actually opened
Public Event Closed(threadID As String) 'when thread is closed
Public Sub NotifyComplete(obj As clsThreadHandle, Optional returnVal As Variant)
RaiseEvent Complete(obj, returnVal)
End Sub
Public Sub NotifyThreadOpened(threadID As String)
RaiseEvent Opened(threadID)
End Sub
Public Sub NotifyThreadClosed(threadID As String)
RaiseEvent Closed(threadID)
End Sub
Private Sub Class_Terminate()
Debug.Print "Events Terminated"
End Sub
Interfaces
There are 2 interface
classes (well onlyIWorker
is really one, but I'm calling IWorkerEvents
one too, similar to this example)
IWorker
forms a general template of an asynchronous process you can run, which raises appropriate events as per IWorkerEvents
IWorker
Option Explicit
'class acts as interface for any thread task
'Execute runs the task
'Events are raised by the task if it interfaces properly
Public Property Set Events(ByRef value As IWorkerEvents)
End Property
Public Sub Execute(Optional argument As Variant)
End Sub
Public Function CreateNew() As IWorker
End Function
IWorkerEvents
Option Explicit
'class holds all the events that a thread task can raise
Public Event Complete(returnVal As Variant)
Public Event StatusChange(statusVal As Variant)
Public Event Started()
Public Sub Complete(Optional returnVal As Variant)
RaiseEvent Complete(returnVal)
End Sub
Public Sub StatusChange(statusVal As Variant)
RaiseEvent StatusChange(statusVal)
End Sub
Public Sub Started()
RaiseEvent Started
End Sub
Finally...
There's a module of supplementary functions that I don't especially need review on, but I'll include as they are required for the clsMultiThread
to execute
Option Explicit
Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue) As Long
'function to take iterable group and add it to the queue
'returns the number of items added
Dim item As Variant
Dim itemsAdded As Long
itemsAdded = 0
For Each item In iterator
resultQueue.enqueue item
itemsAdded = itemsAdded + 1
Next item
addIterableToQueue = itemsAdded
End Function
Function isIterable(obj As Variant) As Boolean
On Error Resume Next
Dim iterator As Variant
For Each iterator In obj
Exit For
Next
isIterable = Err.Number = 0
End Function
Test code
Don't need feedback on this stuff, except with regard to the way in which a worker
is implemented Download example file here
It just occurred to me that I hadn't actually included a worker to test this with. Well here's an example that uses an MSHTML
request to return an HTML document from a webpage. It takes a String
/Range
argument representing a url, and returns an HTMLDocument
. NB, this has to be imported
as it requires Attribute .VB_UserMemId = 0
as per this article
VERSION 1.0 CLASS
BEGIN
MultiUse = -1 'True
END
Attribute VB_Name = "clsHtmlWorker"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = False
Attribute VB_Exposed = False
Option Explicit
'''
'Basic worker object sends MSHTML GET request to webpage and returns an HTMLDocument or Nothing
'Requires reference to
' Microsoft HTML Object library (mshtml.tlb)
' Microsoft XML, v6.0 (msxml6.dll)
'''
Private httpRequest As MSXML2.XMLHTTP60
Implements IWorker
Private Type TWorker
Events As IWorkerEvents
End Type
Private this As TWorker
Private Function IWorker_CreateNew() As IWorker
Set IWorker_CreateNew = New clsHtmlWorker
End Function
Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property
Private Sub IWorker_Execute(Optional argument As Variant)
Started 'raise event to thread handle
'Do some task
sendRequest argument
End Sub
'''
'Event raising
'''
Private Sub Started()
If Not this.Events Is Nothing Then
this.Events.Started
End If
End Sub
Private Sub statusChange(ByVal statusText As String)
If Not this.Events Is Nothing Then
'status change is not fully implemented yet in clsMultiThread, I may get rid of it
this.Events.statusChange statusText
End If
End Sub
Private Sub Complete(Optional ByVal resultPage As HTMLDocument)
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
If Not this.Events Is Nothing Then
this.Events.Complete resultPage
End If
End Sub
Private Sub sendRequest(ByVal url As String)
'''
'Sub to open a new XMLHTTP request at a given url
'Also assigns OnReadyStateChange callback function to this class' default routine
'''
If httpRequest Is Nothing Then Set httpRequest = New MSXML2.XMLHTTP60
With httpRequest
'Assign callback function to handler class (by default property)
.OnReadyStateChange = Me
'open and send the request
.Open "GET", url, True
.send vbNullString
End With
End Sub
Public Sub OnReadyStateChange()
Attribute OnReadyStateChange.VB_UserMemId = 0
'''
'This is the default callback routine of the class
'''
With httpRequest
statusChange .statusText
If .ReadyState = 4 Then 'loaded
If .Status = 200 Then 'successful
'mark complete and pass document
Dim htmlDoc As HTMLDocument
Set htmlDoc = New HTMLDocument
htmlDoc.body.innerHTML = .responseText
Complete htmlDoc
Else 'unsuccessful
Complete
End If
End If
End With
End Sub
Private Sub Class_Terminate()
If Not httpRequest Is Nothing Then Set httpRequest = Nothing
End Sub
A multi-thread group implementing it can be run in a caller class like codeReviewTest
. Sends requests to urls in A1:A10
, returns e-mails from those urls in neighbouring columns.
Option Explicit
'''
'This class creates and runs a new multithread instance which runs clsHtmlWorker
'When each HTMLDocument is complete, the class scans it for e-mails
'''
Private WithEvents multiThreadGroup As clsMultiThread
'clsMultiThread is async so must be declared separately (or in a doEvents loop)
Private Const REGEX_PATTERN As String = _
"(?:[a-z0-9!#$%&'*+/=?^_`~-]+(?:.[a-z0-9!#$%&'*+/=?^_`~-]+)*|""(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|\[x01-x09x0bx0cx0e-x7f])*"")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:(2(5[0-5]|[0-4])|1[0-9]|[0-9]?[0-9])).)3(?:(2(5[1-9]|[0-9])|1[0-5]|[0-4]?[0-9])|[0-9]*[0-9]:(?:[1-9]|\[0-9])+)])"
Public Sub run()
'urls to check for emails are in a1:a10
htmlRequestToUrls [a1:a10]
End Sub
Private Sub htmlRequestToUrls(urlCells As Range)
Set multiThreadGroup = New clsMultiThread
With multiThreadGroup
.Size = urlCells 'set iterable, here a load of urls
Set .AsyncClass = New clsHtmlWorker 'set async worker
.Execute 'run the group
End With
End Sub
Private Sub multiThreadGroup_TaskComplete(returnVal As Variant, taskID As String, threadID As String)
Dim rowI As Long, colI As Long
rowI = Right(taskID, Len(taskID) - 4)
If returnVal Is Nothing Then
Cells(rowI, 2) = "Error in loading page"
ElseIf TypeOf returnVal Is HTMLDocument Then
Dim emailMatches() As String
emailMatches = regexMatches(returnVal.body.innerText)
If (Not emailMatches) = -1 Then
'no emails on page
Cells(rowI, 2) = "No e-mail matches"
Else
For colI = LBound(emailMatches) To UBound(emailMatches)
Cells(rowI, colI + 2) = emailMatches(colI)
Next colI
End If
Else 'nothing returned
Cells(rowI, 2) = "Error in loading page"
End If
End Sub
Private Function regexMatches(strInput As String) As String()
Dim rMatch As Object
Dim s As String
Dim arrayMatches() As String
Dim i As Long
With CreateObject("VBScript.Regexp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = REGEX_PATTERN
If .test(strInput) Then
For Each rMatch In .Execute(strInput)
ReDim Preserve arrayMatches(i)
arrayMatches(i) = rMatch.value
i = i + 1
Next
End If
End With
regexMatches = arrayMatches
End Function
The test class will create a new multi thread group. The group will open the default 5 threads, on each thread it will create an instance of clsHtmlWorker
. It will convert the range [A1:A10]
into 10 arguments which it will pass, 1 at a time, to the workers on each thread when they aren't busy. Once all tasks are run the class will autoQuit
- cutting references to all sub classes, allowing it to go out of scope
You can download the example workbook if you want, works best with Rubberduck to organise folders. Test code is in the CodeReviewTestRunner
, or just hit the big button
beginner object-oriented multithreading vba asynchronous
edited Mar 22 at 17:50
asked Jan 16 at 12:21
Greedo
255312
255312
I'm wondering why we have comments saying reference tomscorlib.dll
is needed yet I see noCreateObject
or even `GetObject calls to create something from that library? This is all pure VBA and because VBA is single-threaded, everything will run as one thread?
â this
Jan 16 at 12:52
@This 1)mscorlib.dll
I'm using early binding, andPrivate iterableQueue As mscorlib.Queue
inclsMultiThread
is what requires it. 2) True, to fully simulate multithreading in Excel you need to create multiple instances ofEXCEL.EXE
. However this project targets asynchronous processes specifically, as these don't run in Excel directly. Sure the handling is all single-threaded, but in internet applications the main overhead is in waiting for response to load. That can be done asynchronously, and with several instances in parallel. I hope that makes sense
â Greedo
Jan 16 at 14:06
I think whether this could run code asynchronously very much depends on the existence of a worker class that supports yielding its execution until some asynchronous work has finished. Without such a worker tasks would simply run to completion synchronously whenever they are executed.
â M.Doerner
Jan 16 at 14:52
@M.Doerner Exactly, anInternetExplorer.Application
or anXmlHttp
request (see this) article can be run asynchronously, outside Excel, using Events and the default property callback function hack respectively. My use ofIWorker
interface was designed to make sure workers raise events when complete, which is a sort of reminder that these should be async workers, not normal routines. Multithreading standard routines requires this sort of approach
â Greedo
Jan 16 at 15:25
1
@Raystafarian I've added a download file in the test area if that helps
â Greedo
Feb 12 at 16:16
 |Â
show 5 more comments
I'm wondering why we have comments saying reference tomscorlib.dll
is needed yet I see noCreateObject
or even `GetObject calls to create something from that library? This is all pure VBA and because VBA is single-threaded, everything will run as one thread?
â this
Jan 16 at 12:52
@This 1)mscorlib.dll
I'm using early binding, andPrivate iterableQueue As mscorlib.Queue
inclsMultiThread
is what requires it. 2) True, to fully simulate multithreading in Excel you need to create multiple instances ofEXCEL.EXE
. However this project targets asynchronous processes specifically, as these don't run in Excel directly. Sure the handling is all single-threaded, but in internet applications the main overhead is in waiting for response to load. That can be done asynchronously, and with several instances in parallel. I hope that makes sense
â Greedo
Jan 16 at 14:06
I think whether this could run code asynchronously very much depends on the existence of a worker class that supports yielding its execution until some asynchronous work has finished. Without such a worker tasks would simply run to completion synchronously whenever they are executed.
â M.Doerner
Jan 16 at 14:52
@M.Doerner Exactly, anInternetExplorer.Application
or anXmlHttp
request (see this) article can be run asynchronously, outside Excel, using Events and the default property callback function hack respectively. My use ofIWorker
interface was designed to make sure workers raise events when complete, which is a sort of reminder that these should be async workers, not normal routines. Multithreading standard routines requires this sort of approach
â Greedo
Jan 16 at 15:25
1
@Raystafarian I've added a download file in the test area if that helps
â Greedo
Feb 12 at 16:16
I'm wondering why we have comments saying reference to
mscorlib.dll
is needed yet I see no CreateObject
or even `GetObject calls to create something from that library? This is all pure VBA and because VBA is single-threaded, everything will run as one thread?â this
Jan 16 at 12:52
I'm wondering why we have comments saying reference to
mscorlib.dll
is needed yet I see no CreateObject
or even `GetObject calls to create something from that library? This is all pure VBA and because VBA is single-threaded, everything will run as one thread?â this
Jan 16 at 12:52
@This 1)
mscorlib.dll
I'm using early binding, and Private iterableQueue As mscorlib.Queue
in clsMultiThread
is what requires it. 2) True, to fully simulate multithreading in Excel you need to create multiple instances of EXCEL.EXE
. However this project targets asynchronous processes specifically, as these don't run in Excel directly. Sure the handling is all single-threaded, but in internet applications the main overhead is in waiting for response to load. That can be done asynchronously, and with several instances in parallel. I hope that makes senseâ Greedo
Jan 16 at 14:06
@This 1)
mscorlib.dll
I'm using early binding, and Private iterableQueue As mscorlib.Queue
in clsMultiThread
is what requires it. 2) True, to fully simulate multithreading in Excel you need to create multiple instances of EXCEL.EXE
. However this project targets asynchronous processes specifically, as these don't run in Excel directly. Sure the handling is all single-threaded, but in internet applications the main overhead is in waiting for response to load. That can be done asynchronously, and with several instances in parallel. I hope that makes senseâ Greedo
Jan 16 at 14:06
I think whether this could run code asynchronously very much depends on the existence of a worker class that supports yielding its execution until some asynchronous work has finished. Without such a worker tasks would simply run to completion synchronously whenever they are executed.
â M.Doerner
Jan 16 at 14:52
I think whether this could run code asynchronously very much depends on the existence of a worker class that supports yielding its execution until some asynchronous work has finished. Without such a worker tasks would simply run to completion synchronously whenever they are executed.
â M.Doerner
Jan 16 at 14:52
@M.Doerner Exactly, an
InternetExplorer.Application
or an XmlHttp
request (see this) article can be run asynchronously, outside Excel, using Events and the default property callback function hack respectively. My use of IWorker
interface was designed to make sure workers raise events when complete, which is a sort of reminder that these should be async workers, not normal routines. Multithreading standard routines requires this sort of approachâ Greedo
Jan 16 at 15:25
@M.Doerner Exactly, an
InternetExplorer.Application
or an XmlHttp
request (see this) article can be run asynchronously, outside Excel, using Events and the default property callback function hack respectively. My use of IWorker
interface was designed to make sure workers raise events when complete, which is a sort of reminder that these should be async workers, not normal routines. Multithreading standard routines requires this sort of approachâ Greedo
Jan 16 at 15:25
1
1
@Raystafarian I've added a download file in the test area if that helps
â Greedo
Feb 12 at 16:16
@Raystafarian I've added a download file in the test area if that helps
â Greedo
Feb 12 at 16:16
 |Â
show 5 more comments
2 Answers
2
active
oldest
votes
up vote
10
down vote
accepted
Interesting idea and well-done!
Naming
I really don't like the names. The names like clsMultiThread
is somewhat misleading, since as you noted they don't actually provide any true multi-threading. A unwary user would expect it to work with anything and would be disappointed when all of their queued works painfully completes synchronously. ;)
Also, we are not really using threads, but objects which may or may not run in-process or not. You used MSXML2.XMLHTTP60
so it should run in-process. However, it need not be the case if we were to use something like ShDocVw.WebBrowser
or even Excel.Application
which may run out-of-process. That's not threading. So in that case we are actually talking about running things asynchronously more than threading.
May I instead suggest names like ParallelTaskCoordinator
, TaskHandle
, and AsyncObjectWatcher
? The point being that it should convey the notion that those objects have nothing to do with running asynchronously; we're just orchestrating the asynchronous tasks to run in parallel.
In the comments you asked about hungarian notations. My personal preference is to not use hungarian notation for objects, so I wouldn't have used cls
prefix. The concerns about namespace is not going to be helped by whether you prefix or not, mainly because a good namespace is about logical grouping, as opposed to clumping them together based on their type of modules. I'm OK with using HN for private variables, not so much with public-facing properties such as the module's name, the public property since they just distract from their semantic meaning. The semantic part of the name is far more important, and a good naming convention should support that.
On general, your naming scheme seems reasonably well thought out. An inconsistency I do see is that in Execute
method, you have delegate
as an argument name, which then get assigned to the eventHandle
. Why not call the argument the same way, so it's clear when writing the Execute
method what it should be?
Not fully asynchronous, may be blocked by main thread
On its own, it works. However, I do want to call your attention to the fact that it won't work in more complicated scenarios. Here's a example where this breaks:
Executed in immediate windows:
call runtest: for i = 0 to 100 : debug.Print i: doevents: next
In theory, the output of the Debug.Print i
should be interleaved with other tasks progress. In practice, I'm consistently getting 5 tasks completed, then 100s of the outputted i
, then the other 5 tasks completed. This tells me that if the main thread is allow to do things while the tasks in other threads are running, those will be blocked until the main thread becomes idle. Thus it is easy to destroy the asynchrony if one is not careful.
As you can see, even sprinkling DoEvents
is apparently not sufficient. Thinking about it, it's not that surprising because the "events" come from the same VBA code which must run in the UI thread. Therefore, if this becomes a problem, you must fully delegate the asynchronously to an external process/thread (e.g. via an external library) in which you can communicate progress to VBA code via events, rather than relying on the external object to raise event into the worker, which must raise event to the thread handle then to the manager.
Note that even if you went this route, you would need to have a retry logic to successfully raise the event because running VBA code may prevent the entrancy. There is an opportunity for an event to be missed, especially if there is high rate of activity and too much inputs. That has been observed before here, here, here. Those examples illustrates the case where something doesn't always fire the way we expect it to.
As suggested in the comments, changing the # of max threads may work better. For example, setting the max threads to 2, we see the the task 1 and task 2 completing before blocking to print the i
for 90 times, then task 3 is allowed to start then complete before we block again for last 10 more times and the rest complete. However, this is very consistent on my system. In 3 times I ran the test, task 3 don't start until 90 has printed, and the rest don't start until 100 has printed. On the other hand, if I set maxthreads to 1, I get one task completed, then am blocked for 100 times before the rest of tasks are allowed to run. I do not expect it to be easily reproducible since this will be affected by host of factors (the hardware, the version of Windows, the version of Excel). This is simply something to be aware and explicitly design for the possibility.
If it's critical to you that you do not get blocked, you need to consider a different approach. For instance, you'd need an external .NET library that creates a thread, run the task, then write the output to a file. That allows the main thread to read it at its leisure and none of the spawned threads will accidentally get blocked when the main thread needs to do something. Even so, it is still subject to the fact that it can be blocked when trying to spawn a new thread (since you need VBA to run code to create one, even if it's just calling a external function in a DLL).
Note: in all of my tests, only Debug.Print
printed were the Task N started
and Task N completed
and Events terminated
. I had commented out the PrintState
; otherwise, the immediate window would overflow and I can't see all the output from the start to end.
WinHttp instead of MSXML
Also, I want bring your attention to the fact that you could have had a instance of WinHttp.WinHttpRequest
which supports events natively. Therefore you could declare a variable like Private WithEvents request As WinHttp.WinHttpRequest
and instead listen for the event. That means you don't need to set up default member as you have to with the MSXML
and if the internet request is all you do, you don't even need the thread collection; just have a collection of WinHttp.WinHttpRequest
and listen to their events.
But that's obviously not a generic solution, and using WinHttp.WinHttpRequest
doesn't preclude us from using it with the above solution for case where we want a worker to handle the setup.
Don't recurse if you don't have to
You have a doInstructions
which calls itself recursively. But IMPOV, there is no reason to recurse. You can do the same thing with a simple loop as demonstrated with my hacky change. A more proper solution may be to use multiple conditions or a flag variable on the bottom of the loop (guaranteeing that it executes at least once). That ensure you never need to worry about stack overflow.
Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction
Do
'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID
'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy
BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If
Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select
'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
'doInstructions loopcount:=loopcount + 1 'watch for infinite loop
freeThreadID = vbNullString
loopcount = loopcount + 1
Else
Exit Do
End If
Loop
End Sub
Only one worker type per manager class
In the sample, we have this:
Set .AsyncClass = New clsHtmlWorker
I was under the impression that the point of the manager was to allow us to create threads for different workers each running asynchronously. For that reason, it seems strange that I can only use AsyncClass
to set a single implementation of IWorker
. Shouldn't I be able to assign a bunch of workers, with their own arguments to be then enlisted? That would be more intuitive use of the manager, I think. Something like this:
Set .AsyncObjectsToExecute = Array( _
WorkerType1Factory.Create("some argument one"), _
WorkerType1Factory.Create("another argument"), _
WorkerType2Factory.Create("do that", 123, 495), _
WorkerType2Factory.Create("but not that", 0, 0), _
WorkerType3Factory.Create() _
)
From that, we can obviously see what tasks we are setting up to be executed, by passing in IWorker
factories. That would not constraint us to only one particular worker class, and it helps sees what arguments we are sending for each task.
Encapsulate your private fields as a type
I would suggest that you take a page out of @MathieuGuindon's book and use his method:
```'THREAD HANDLE BASE PROPERTIES
Private Type THandle
eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
taskID As String 'holds the id of the current task
handleID As String 'holds the id of this handle
handleArgs As Variant 'holds any arguments that need to be passed to the task
End Type
Private this As THandle
That gives you intellisense of the class' fields just by using this.
and thus help clearly differentiating between the class' private backing fields and exposed properties. It's one way to help wrong code look wrong.
RaiseEvents in the Initialize event
This seems to me possibly troublesome:
Private Sub Class_Initialize()
RaiseEvent Opened(startTime)
End Sub
My personal rule for any work inside Initialize
and Terminate
events is to keep the code totally isolated to the class. It is generally not appropriate to reach out outside the class because it hasn't finished constructing itself and as such, the results may be unpredictable. In this case the effect may be benign because we only raise an event with a reported parameter. However, in more complicated implementation, that can act up in unexpected manners. Besides, it really doesn't buy us anything because if the Initialize
was called, we already know about it, since we usually have to had initialized it in the first place.
Do you really need a queue
It's cool that you're using .NET queue to help set up the arguments. However, that means you have a extra reference and you already have a number of references, which makes it hard to distribute the code to different environment.
One way would be to late-bind the queue, by declaring it as object and doing a CreateObject("System.Collections.Queue")
instead. That lets you avoid the need to add an explicit reference to the .NET core library and thus be agnostic about the .NET framework version since in this case, the class won't change between versions.
The alternative is to just use a built-in VBA collection. IINM, you only use queues to collect the arguments, which a VBA colection can do just as well. This would give you queue-like behavior:
col.Add ...
col.Add ...
col.Add ...
Do Until col.Count = 0
col.Remove 1
Loop
without any external references.
Don't use default naming from interface
As @Raystafarian alluded to, you shouldn't accept the default RHS
naming when you implement an interface. I hate it personally, and always change the name. The implementation of interfaces doesn't care what name you use. The only thing it care about is that there's a procedure named something (check), and it has N number of arguments (check), and that each argument are of same data type (check). It doesn't even look at the names of the arguments. So you should change them to something more sensible. If you're feeling unimaginative, call them Value
, which is what I usually do. Just don't leave it as RHS.
A lot of insightful tips here; recursion, default naming, queueing - sometimes you need someone to look at your code from a different POV to spot those things, thanks. I've also recently been trying out the private field type, (actually did it by accident here too!). A couple of questions though: Naming - you say you don't like the names, and go on to highlight some misleading ones, but are there any other problems with naming (exceptRHS
)? I mostly prefix withcls
in Hungarian style purely so all my stuff is within easy intellisense reach (because no namespaces)
â Greedo
Mar 29 at 9:08
Also the may be blocked by main thread test you ran. I was initially worried when you described what was going on, but I find the class is actually working as expected. The first 5 run asynchronously but all in the same process (an embarrassing oversight on my part of usingMSXML2.XMLHTTP60
) - they completely occupy the UI thread so no printing. They raisecomplete
events, then we get into theFor
loop, and I get a 0 printed - then the next 5 events fromDoEvents
, then the class quits, then the thread is free to print the remaining numbers. So as expected given 1 thread?
â Greedo
Mar 29 at 9:16
Try commenting out the content ofPrintState
and changingmaxThreads
to 2 or something - it reveals more about what's going on, though I'm still not entirely sure myself and was wondering if you could take another look?
â Greedo
Mar 29 at 9:20
@Greedo I have addressed your questions and edited my answer accordingly.
â this
Mar 29 at 10:49
1
This is a great answer
â Raystafarian
Mar 30 at 11:36
add a comment |Â
up vote
6
down vote
This is way above my expertise, but maybe adding an answer would cause more views/answers? Also, what's that beginner tag doing there? ;)
I want to say first off, really solid work. That's probably why there hasn't been too much activity here. Maybe some of this will seem like nit-picks and if so, Sorry!
ByRef or ByVal arguments
Every one of the arguments, as you probably know, that aren't declared ByVal
are implicitly ByRef
. I imagine you probably need a lot of these to be ByRef
, but it's better to explicitly declare them ByRef
so it's easier to tell that it's supposed to be ByRef
.
obj parameter
clsHandleEvents.Complete(obj as clsThreadHandle)
clsHandleEvents.NotifyComplete(obj as clsThreadHandle)
clsMultiThread.threadEvents_Complete(obj as clsThreadHandle)
multiThreadMethods.isIterable(obj as Variant)
The first three could probably use a new name given they are all the same type. Maybe threadObject
or threadHandle
- up to you.
The fourth one is taking a variant and returning a boolean, it could be named anything from testObject
to iteratorGroup
. You might want to try renaming some of those
Queue
clsMultiThread
Private iterableQueue As mscorlib.Queue
Set iterableQueue = New Queue
multiThreadMethods
Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue)
Maybe I'm dense, being a VBA guy, but how would I know what a queue
is and what methods it uses? It's not a standard reference library for VBA, is it? I'd probably add a comment in there explaining why you've chosen to do it this way, so nobody would have to go back through all of it to figure out why it's better than some other way.
RHS?
clsHtmlWorker
Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property
What is RHS? Is it a constant, it's in all caps. I can tell you know better using Html
in your project rather than HTML
- so what is it?
All that being said, I can't even really figure out how this is working when I step through it, I see where requests should be sent (httpRequest), but I can't figure out how they come back and populate the sheet, I don't see it happening, which I think is the point, being asynchronous?
Thanks alot for this - some good points here I hadn't considered. I put theBeginner
tag because this is the first bit of reusable code I've thought worthy of review, I'm glad you think it's solid! ThoseRHS
s were grabbed directly from one of the examples I referenced, but I agree, they are in poor taste. You're exactly right about the debugging asynchronous stuff, that's partly why I have so muchdebug.print
, as you can't step through the operational code. Good point onQueue
. Fingers crossed I can get some more good answers like this on some of the other bits of the code
â Greedo
Feb 15 at 19:39
I like thedebugging = true
part of it so you can see what's happening in the immediate window, with all the updates that get written to it. Some would say that's some wasted effort, but there's not much else you could do for debugging it.
â Raystafarian
Feb 15 at 21:28
RHS = Right Hand Side?
â Solomon Ucko
Mar 24 at 1:56
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
10
down vote
accepted
Interesting idea and well-done!
Naming
I really don't like the names. The names like clsMultiThread
is somewhat misleading, since as you noted they don't actually provide any true multi-threading. A unwary user would expect it to work with anything and would be disappointed when all of their queued works painfully completes synchronously. ;)
Also, we are not really using threads, but objects which may or may not run in-process or not. You used MSXML2.XMLHTTP60
so it should run in-process. However, it need not be the case if we were to use something like ShDocVw.WebBrowser
or even Excel.Application
which may run out-of-process. That's not threading. So in that case we are actually talking about running things asynchronously more than threading.
May I instead suggest names like ParallelTaskCoordinator
, TaskHandle
, and AsyncObjectWatcher
? The point being that it should convey the notion that those objects have nothing to do with running asynchronously; we're just orchestrating the asynchronous tasks to run in parallel.
In the comments you asked about hungarian notations. My personal preference is to not use hungarian notation for objects, so I wouldn't have used cls
prefix. The concerns about namespace is not going to be helped by whether you prefix or not, mainly because a good namespace is about logical grouping, as opposed to clumping them together based on their type of modules. I'm OK with using HN for private variables, not so much with public-facing properties such as the module's name, the public property since they just distract from their semantic meaning. The semantic part of the name is far more important, and a good naming convention should support that.
On general, your naming scheme seems reasonably well thought out. An inconsistency I do see is that in Execute
method, you have delegate
as an argument name, which then get assigned to the eventHandle
. Why not call the argument the same way, so it's clear when writing the Execute
method what it should be?
Not fully asynchronous, may be blocked by main thread
On its own, it works. However, I do want to call your attention to the fact that it won't work in more complicated scenarios. Here's a example where this breaks:
Executed in immediate windows:
call runtest: for i = 0 to 100 : debug.Print i: doevents: next
In theory, the output of the Debug.Print i
should be interleaved with other tasks progress. In practice, I'm consistently getting 5 tasks completed, then 100s of the outputted i
, then the other 5 tasks completed. This tells me that if the main thread is allow to do things while the tasks in other threads are running, those will be blocked until the main thread becomes idle. Thus it is easy to destroy the asynchrony if one is not careful.
As you can see, even sprinkling DoEvents
is apparently not sufficient. Thinking about it, it's not that surprising because the "events" come from the same VBA code which must run in the UI thread. Therefore, if this becomes a problem, you must fully delegate the asynchronously to an external process/thread (e.g. via an external library) in which you can communicate progress to VBA code via events, rather than relying on the external object to raise event into the worker, which must raise event to the thread handle then to the manager.
Note that even if you went this route, you would need to have a retry logic to successfully raise the event because running VBA code may prevent the entrancy. There is an opportunity for an event to be missed, especially if there is high rate of activity and too much inputs. That has been observed before here, here, here. Those examples illustrates the case where something doesn't always fire the way we expect it to.
As suggested in the comments, changing the # of max threads may work better. For example, setting the max threads to 2, we see the the task 1 and task 2 completing before blocking to print the i
for 90 times, then task 3 is allowed to start then complete before we block again for last 10 more times and the rest complete. However, this is very consistent on my system. In 3 times I ran the test, task 3 don't start until 90 has printed, and the rest don't start until 100 has printed. On the other hand, if I set maxthreads to 1, I get one task completed, then am blocked for 100 times before the rest of tasks are allowed to run. I do not expect it to be easily reproducible since this will be affected by host of factors (the hardware, the version of Windows, the version of Excel). This is simply something to be aware and explicitly design for the possibility.
If it's critical to you that you do not get blocked, you need to consider a different approach. For instance, you'd need an external .NET library that creates a thread, run the task, then write the output to a file. That allows the main thread to read it at its leisure and none of the spawned threads will accidentally get blocked when the main thread needs to do something. Even so, it is still subject to the fact that it can be blocked when trying to spawn a new thread (since you need VBA to run code to create one, even if it's just calling a external function in a DLL).
Note: in all of my tests, only Debug.Print
printed were the Task N started
and Task N completed
and Events terminated
. I had commented out the PrintState
; otherwise, the immediate window would overflow and I can't see all the output from the start to end.
WinHttp instead of MSXML
Also, I want bring your attention to the fact that you could have had a instance of WinHttp.WinHttpRequest
which supports events natively. Therefore you could declare a variable like Private WithEvents request As WinHttp.WinHttpRequest
and instead listen for the event. That means you don't need to set up default member as you have to with the MSXML
and if the internet request is all you do, you don't even need the thread collection; just have a collection of WinHttp.WinHttpRequest
and listen to their events.
But that's obviously not a generic solution, and using WinHttp.WinHttpRequest
doesn't preclude us from using it with the above solution for case where we want a worker to handle the setup.
Don't recurse if you don't have to
You have a doInstructions
which calls itself recursively. But IMPOV, there is no reason to recurse. You can do the same thing with a simple loop as demonstrated with my hacky change. A more proper solution may be to use multiple conditions or a flag variable on the bottom of the loop (guaranteeing that it executes at least once). That ensure you never need to worry about stack overflow.
Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction
Do
'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID
'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy
BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If
Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select
'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
'doInstructions loopcount:=loopcount + 1 'watch for infinite loop
freeThreadID = vbNullString
loopcount = loopcount + 1
Else
Exit Do
End If
Loop
End Sub
Only one worker type per manager class
In the sample, we have this:
Set .AsyncClass = New clsHtmlWorker
I was under the impression that the point of the manager was to allow us to create threads for different workers each running asynchronously. For that reason, it seems strange that I can only use AsyncClass
to set a single implementation of IWorker
. Shouldn't I be able to assign a bunch of workers, with their own arguments to be then enlisted? That would be more intuitive use of the manager, I think. Something like this:
Set .AsyncObjectsToExecute = Array( _
WorkerType1Factory.Create("some argument one"), _
WorkerType1Factory.Create("another argument"), _
WorkerType2Factory.Create("do that", 123, 495), _
WorkerType2Factory.Create("but not that", 0, 0), _
WorkerType3Factory.Create() _
)
From that, we can obviously see what tasks we are setting up to be executed, by passing in IWorker
factories. That would not constraint us to only one particular worker class, and it helps sees what arguments we are sending for each task.
Encapsulate your private fields as a type
I would suggest that you take a page out of @MathieuGuindon's book and use his method:
```'THREAD HANDLE BASE PROPERTIES
Private Type THandle
eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
taskID As String 'holds the id of the current task
handleID As String 'holds the id of this handle
handleArgs As Variant 'holds any arguments that need to be passed to the task
End Type
Private this As THandle
That gives you intellisense of the class' fields just by using this.
and thus help clearly differentiating between the class' private backing fields and exposed properties. It's one way to help wrong code look wrong.
RaiseEvents in the Initialize event
This seems to me possibly troublesome:
Private Sub Class_Initialize()
RaiseEvent Opened(startTime)
End Sub
My personal rule for any work inside Initialize
and Terminate
events is to keep the code totally isolated to the class. It is generally not appropriate to reach out outside the class because it hasn't finished constructing itself and as such, the results may be unpredictable. In this case the effect may be benign because we only raise an event with a reported parameter. However, in more complicated implementation, that can act up in unexpected manners. Besides, it really doesn't buy us anything because if the Initialize
was called, we already know about it, since we usually have to had initialized it in the first place.
Do you really need a queue
It's cool that you're using .NET queue to help set up the arguments. However, that means you have a extra reference and you already have a number of references, which makes it hard to distribute the code to different environment.
One way would be to late-bind the queue, by declaring it as object and doing a CreateObject("System.Collections.Queue")
instead. That lets you avoid the need to add an explicit reference to the .NET core library and thus be agnostic about the .NET framework version since in this case, the class won't change between versions.
The alternative is to just use a built-in VBA collection. IINM, you only use queues to collect the arguments, which a VBA colection can do just as well. This would give you queue-like behavior:
col.Add ...
col.Add ...
col.Add ...
Do Until col.Count = 0
col.Remove 1
Loop
without any external references.
Don't use default naming from interface
As @Raystafarian alluded to, you shouldn't accept the default RHS
naming when you implement an interface. I hate it personally, and always change the name. The implementation of interfaces doesn't care what name you use. The only thing it care about is that there's a procedure named something (check), and it has N number of arguments (check), and that each argument are of same data type (check). It doesn't even look at the names of the arguments. So you should change them to something more sensible. If you're feeling unimaginative, call them Value
, which is what I usually do. Just don't leave it as RHS.
A lot of insightful tips here; recursion, default naming, queueing - sometimes you need someone to look at your code from a different POV to spot those things, thanks. I've also recently been trying out the private field type, (actually did it by accident here too!). A couple of questions though: Naming - you say you don't like the names, and go on to highlight some misleading ones, but are there any other problems with naming (exceptRHS
)? I mostly prefix withcls
in Hungarian style purely so all my stuff is within easy intellisense reach (because no namespaces)
â Greedo
Mar 29 at 9:08
Also the may be blocked by main thread test you ran. I was initially worried when you described what was going on, but I find the class is actually working as expected. The first 5 run asynchronously but all in the same process (an embarrassing oversight on my part of usingMSXML2.XMLHTTP60
) - they completely occupy the UI thread so no printing. They raisecomplete
events, then we get into theFor
loop, and I get a 0 printed - then the next 5 events fromDoEvents
, then the class quits, then the thread is free to print the remaining numbers. So as expected given 1 thread?
â Greedo
Mar 29 at 9:16
Try commenting out the content ofPrintState
and changingmaxThreads
to 2 or something - it reveals more about what's going on, though I'm still not entirely sure myself and was wondering if you could take another look?
â Greedo
Mar 29 at 9:20
@Greedo I have addressed your questions and edited my answer accordingly.
â this
Mar 29 at 10:49
1
This is a great answer
â Raystafarian
Mar 30 at 11:36
add a comment |Â
up vote
10
down vote
accepted
Interesting idea and well-done!
Naming
I really don't like the names. The names like clsMultiThread
is somewhat misleading, since as you noted they don't actually provide any true multi-threading. A unwary user would expect it to work with anything and would be disappointed when all of their queued works painfully completes synchronously. ;)
Also, we are not really using threads, but objects which may or may not run in-process or not. You used MSXML2.XMLHTTP60
so it should run in-process. However, it need not be the case if we were to use something like ShDocVw.WebBrowser
or even Excel.Application
which may run out-of-process. That's not threading. So in that case we are actually talking about running things asynchronously more than threading.
May I instead suggest names like ParallelTaskCoordinator
, TaskHandle
, and AsyncObjectWatcher
? The point being that it should convey the notion that those objects have nothing to do with running asynchronously; we're just orchestrating the asynchronous tasks to run in parallel.
In the comments you asked about hungarian notations. My personal preference is to not use hungarian notation for objects, so I wouldn't have used cls
prefix. The concerns about namespace is not going to be helped by whether you prefix or not, mainly because a good namespace is about logical grouping, as opposed to clumping them together based on their type of modules. I'm OK with using HN for private variables, not so much with public-facing properties such as the module's name, the public property since they just distract from their semantic meaning. The semantic part of the name is far more important, and a good naming convention should support that.
On general, your naming scheme seems reasonably well thought out. An inconsistency I do see is that in Execute
method, you have delegate
as an argument name, which then get assigned to the eventHandle
. Why not call the argument the same way, so it's clear when writing the Execute
method what it should be?
Not fully asynchronous, may be blocked by main thread
On its own, it works. However, I do want to call your attention to the fact that it won't work in more complicated scenarios. Here's a example where this breaks:
Executed in immediate windows:
call runtest: for i = 0 to 100 : debug.Print i: doevents: next
In theory, the output of the Debug.Print i
should be interleaved with other tasks progress. In practice, I'm consistently getting 5 tasks completed, then 100s of the outputted i
, then the other 5 tasks completed. This tells me that if the main thread is allow to do things while the tasks in other threads are running, those will be blocked until the main thread becomes idle. Thus it is easy to destroy the asynchrony if one is not careful.
As you can see, even sprinkling DoEvents
is apparently not sufficient. Thinking about it, it's not that surprising because the "events" come from the same VBA code which must run in the UI thread. Therefore, if this becomes a problem, you must fully delegate the asynchronously to an external process/thread (e.g. via an external library) in which you can communicate progress to VBA code via events, rather than relying on the external object to raise event into the worker, which must raise event to the thread handle then to the manager.
Note that even if you went this route, you would need to have a retry logic to successfully raise the event because running VBA code may prevent the entrancy. There is an opportunity for an event to be missed, especially if there is high rate of activity and too much inputs. That has been observed before here, here, here. Those examples illustrates the case where something doesn't always fire the way we expect it to.
As suggested in the comments, changing the # of max threads may work better. For example, setting the max threads to 2, we see the the task 1 and task 2 completing before blocking to print the i
for 90 times, then task 3 is allowed to start then complete before we block again for last 10 more times and the rest complete. However, this is very consistent on my system. In 3 times I ran the test, task 3 don't start until 90 has printed, and the rest don't start until 100 has printed. On the other hand, if I set maxthreads to 1, I get one task completed, then am blocked for 100 times before the rest of tasks are allowed to run. I do not expect it to be easily reproducible since this will be affected by host of factors (the hardware, the version of Windows, the version of Excel). This is simply something to be aware and explicitly design for the possibility.
If it's critical to you that you do not get blocked, you need to consider a different approach. For instance, you'd need an external .NET library that creates a thread, run the task, then write the output to a file. That allows the main thread to read it at its leisure and none of the spawned threads will accidentally get blocked when the main thread needs to do something. Even so, it is still subject to the fact that it can be blocked when trying to spawn a new thread (since you need VBA to run code to create one, even if it's just calling a external function in a DLL).
Note: in all of my tests, only Debug.Print
printed were the Task N started
and Task N completed
and Events terminated
. I had commented out the PrintState
; otherwise, the immediate window would overflow and I can't see all the output from the start to end.
WinHttp instead of MSXML
Also, I want bring your attention to the fact that you could have had a instance of WinHttp.WinHttpRequest
which supports events natively. Therefore you could declare a variable like Private WithEvents request As WinHttp.WinHttpRequest
and instead listen for the event. That means you don't need to set up default member as you have to with the MSXML
and if the internet request is all you do, you don't even need the thread collection; just have a collection of WinHttp.WinHttpRequest
and listen to their events.
But that's obviously not a generic solution, and using WinHttp.WinHttpRequest
doesn't preclude us from using it with the above solution for case where we want a worker to handle the setup.
Don't recurse if you don't have to
You have a doInstructions
which calls itself recursively. But IMPOV, there is no reason to recurse. You can do the same thing with a simple loop as demonstrated with my hacky change. A more proper solution may be to use multiple conditions or a flag variable on the bottom of the loop (guaranteeing that it executes at least once). That ensure you never need to worry about stack overflow.
Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction
Do
'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID
'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy
BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If
Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select
'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
'doInstructions loopcount:=loopcount + 1 'watch for infinite loop
freeThreadID = vbNullString
loopcount = loopcount + 1
Else
Exit Do
End If
Loop
End Sub
Only one worker type per manager class
In the sample, we have this:
Set .AsyncClass = New clsHtmlWorker
I was under the impression that the point of the manager was to allow us to create threads for different workers each running asynchronously. For that reason, it seems strange that I can only use AsyncClass
to set a single implementation of IWorker
. Shouldn't I be able to assign a bunch of workers, with their own arguments to be then enlisted? That would be more intuitive use of the manager, I think. Something like this:
Set .AsyncObjectsToExecute = Array( _
WorkerType1Factory.Create("some argument one"), _
WorkerType1Factory.Create("another argument"), _
WorkerType2Factory.Create("do that", 123, 495), _
WorkerType2Factory.Create("but not that", 0, 0), _
WorkerType3Factory.Create() _
)
From that, we can obviously see what tasks we are setting up to be executed, by passing in IWorker
factories. That would not constraint us to only one particular worker class, and it helps sees what arguments we are sending for each task.
Encapsulate your private fields as a type
I would suggest that you take a page out of @MathieuGuindon's book and use his method:
```'THREAD HANDLE BASE PROPERTIES
Private Type THandle
eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
taskID As String 'holds the id of the current task
handleID As String 'holds the id of this handle
handleArgs As Variant 'holds any arguments that need to be passed to the task
End Type
Private this As THandle
That gives you intellisense of the class' fields just by using this.
and thus help clearly differentiating between the class' private backing fields and exposed properties. It's one way to help wrong code look wrong.
RaiseEvents in the Initialize event
This seems to me possibly troublesome:
Private Sub Class_Initialize()
RaiseEvent Opened(startTime)
End Sub
My personal rule for any work inside Initialize
and Terminate
events is to keep the code totally isolated to the class. It is generally not appropriate to reach out outside the class because it hasn't finished constructing itself and as such, the results may be unpredictable. In this case the effect may be benign because we only raise an event with a reported parameter. However, in more complicated implementation, that can act up in unexpected manners. Besides, it really doesn't buy us anything because if the Initialize
was called, we already know about it, since we usually have to had initialized it in the first place.
Do you really need a queue
It's cool that you're using .NET queue to help set up the arguments. However, that means you have a extra reference and you already have a number of references, which makes it hard to distribute the code to different environment.
One way would be to late-bind the queue, by declaring it as object and doing a CreateObject("System.Collections.Queue")
instead. That lets you avoid the need to add an explicit reference to the .NET core library and thus be agnostic about the .NET framework version since in this case, the class won't change between versions.
The alternative is to just use a built-in VBA collection. IINM, you only use queues to collect the arguments, which a VBA colection can do just as well. This would give you queue-like behavior:
col.Add ...
col.Add ...
col.Add ...
Do Until col.Count = 0
col.Remove 1
Loop
without any external references.
Don't use default naming from interface
As @Raystafarian alluded to, you shouldn't accept the default RHS
naming when you implement an interface. I hate it personally, and always change the name. The implementation of interfaces doesn't care what name you use. The only thing it care about is that there's a procedure named something (check), and it has N number of arguments (check), and that each argument are of same data type (check). It doesn't even look at the names of the arguments. So you should change them to something more sensible. If you're feeling unimaginative, call them Value
, which is what I usually do. Just don't leave it as RHS.
A lot of insightful tips here; recursion, default naming, queueing - sometimes you need someone to look at your code from a different POV to spot those things, thanks. I've also recently been trying out the private field type, (actually did it by accident here too!). A couple of questions though: Naming - you say you don't like the names, and go on to highlight some misleading ones, but are there any other problems with naming (exceptRHS
)? I mostly prefix withcls
in Hungarian style purely so all my stuff is within easy intellisense reach (because no namespaces)
â Greedo
Mar 29 at 9:08
Also the may be blocked by main thread test you ran. I was initially worried when you described what was going on, but I find the class is actually working as expected. The first 5 run asynchronously but all in the same process (an embarrassing oversight on my part of usingMSXML2.XMLHTTP60
) - they completely occupy the UI thread so no printing. They raisecomplete
events, then we get into theFor
loop, and I get a 0 printed - then the next 5 events fromDoEvents
, then the class quits, then the thread is free to print the remaining numbers. So as expected given 1 thread?
â Greedo
Mar 29 at 9:16
Try commenting out the content ofPrintState
and changingmaxThreads
to 2 or something - it reveals more about what's going on, though I'm still not entirely sure myself and was wondering if you could take another look?
â Greedo
Mar 29 at 9:20
@Greedo I have addressed your questions and edited my answer accordingly.
â this
Mar 29 at 10:49
1
This is a great answer
â Raystafarian
Mar 30 at 11:36
add a comment |Â
up vote
10
down vote
accepted
up vote
10
down vote
accepted
Interesting idea and well-done!
Naming
I really don't like the names. The names like clsMultiThread
is somewhat misleading, since as you noted they don't actually provide any true multi-threading. A unwary user would expect it to work with anything and would be disappointed when all of their queued works painfully completes synchronously. ;)
Also, we are not really using threads, but objects which may or may not run in-process or not. You used MSXML2.XMLHTTP60
so it should run in-process. However, it need not be the case if we were to use something like ShDocVw.WebBrowser
or even Excel.Application
which may run out-of-process. That's not threading. So in that case we are actually talking about running things asynchronously more than threading.
May I instead suggest names like ParallelTaskCoordinator
, TaskHandle
, and AsyncObjectWatcher
? The point being that it should convey the notion that those objects have nothing to do with running asynchronously; we're just orchestrating the asynchronous tasks to run in parallel.
In the comments you asked about hungarian notations. My personal preference is to not use hungarian notation for objects, so I wouldn't have used cls
prefix. The concerns about namespace is not going to be helped by whether you prefix or not, mainly because a good namespace is about logical grouping, as opposed to clumping them together based on their type of modules. I'm OK with using HN for private variables, not so much with public-facing properties such as the module's name, the public property since they just distract from their semantic meaning. The semantic part of the name is far more important, and a good naming convention should support that.
On general, your naming scheme seems reasonably well thought out. An inconsistency I do see is that in Execute
method, you have delegate
as an argument name, which then get assigned to the eventHandle
. Why not call the argument the same way, so it's clear when writing the Execute
method what it should be?
Not fully asynchronous, may be blocked by main thread
On its own, it works. However, I do want to call your attention to the fact that it won't work in more complicated scenarios. Here's a example where this breaks:
Executed in immediate windows:
call runtest: for i = 0 to 100 : debug.Print i: doevents: next
In theory, the output of the Debug.Print i
should be interleaved with other tasks progress. In practice, I'm consistently getting 5 tasks completed, then 100s of the outputted i
, then the other 5 tasks completed. This tells me that if the main thread is allow to do things while the tasks in other threads are running, those will be blocked until the main thread becomes idle. Thus it is easy to destroy the asynchrony if one is not careful.
As you can see, even sprinkling DoEvents
is apparently not sufficient. Thinking about it, it's not that surprising because the "events" come from the same VBA code which must run in the UI thread. Therefore, if this becomes a problem, you must fully delegate the asynchronously to an external process/thread (e.g. via an external library) in which you can communicate progress to VBA code via events, rather than relying on the external object to raise event into the worker, which must raise event to the thread handle then to the manager.
Note that even if you went this route, you would need to have a retry logic to successfully raise the event because running VBA code may prevent the entrancy. There is an opportunity for an event to be missed, especially if there is high rate of activity and too much inputs. That has been observed before here, here, here. Those examples illustrates the case where something doesn't always fire the way we expect it to.
As suggested in the comments, changing the # of max threads may work better. For example, setting the max threads to 2, we see the the task 1 and task 2 completing before blocking to print the i
for 90 times, then task 3 is allowed to start then complete before we block again for last 10 more times and the rest complete. However, this is very consistent on my system. In 3 times I ran the test, task 3 don't start until 90 has printed, and the rest don't start until 100 has printed. On the other hand, if I set maxthreads to 1, I get one task completed, then am blocked for 100 times before the rest of tasks are allowed to run. I do not expect it to be easily reproducible since this will be affected by host of factors (the hardware, the version of Windows, the version of Excel). This is simply something to be aware and explicitly design for the possibility.
If it's critical to you that you do not get blocked, you need to consider a different approach. For instance, you'd need an external .NET library that creates a thread, run the task, then write the output to a file. That allows the main thread to read it at its leisure and none of the spawned threads will accidentally get blocked when the main thread needs to do something. Even so, it is still subject to the fact that it can be blocked when trying to spawn a new thread (since you need VBA to run code to create one, even if it's just calling a external function in a DLL).
Note: in all of my tests, only Debug.Print
printed were the Task N started
and Task N completed
and Events terminated
. I had commented out the PrintState
; otherwise, the immediate window would overflow and I can't see all the output from the start to end.
WinHttp instead of MSXML
Also, I want bring your attention to the fact that you could have had a instance of WinHttp.WinHttpRequest
which supports events natively. Therefore you could declare a variable like Private WithEvents request As WinHttp.WinHttpRequest
and instead listen for the event. That means you don't need to set up default member as you have to with the MSXML
and if the internet request is all you do, you don't even need the thread collection; just have a collection of WinHttp.WinHttpRequest
and listen to their events.
But that's obviously not a generic solution, and using WinHttp.WinHttpRequest
doesn't preclude us from using it with the above solution for case where we want a worker to handle the setup.
Don't recurse if you don't have to
You have a doInstructions
which calls itself recursively. But IMPOV, there is no reason to recurse. You can do the same thing with a simple loop as demonstrated with my hacky change. A more proper solution may be to use multiple conditions or a flag variable on the bottom of the loop (guaranteeing that it executes at least once). That ensure you never need to worry about stack overflow.
Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction
Do
'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID
'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy
BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If
Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select
'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
'doInstructions loopcount:=loopcount + 1 'watch for infinite loop
freeThreadID = vbNullString
loopcount = loopcount + 1
Else
Exit Do
End If
Loop
End Sub
Only one worker type per manager class
In the sample, we have this:
Set .AsyncClass = New clsHtmlWorker
I was under the impression that the point of the manager was to allow us to create threads for different workers each running asynchronously. For that reason, it seems strange that I can only use AsyncClass
to set a single implementation of IWorker
. Shouldn't I be able to assign a bunch of workers, with their own arguments to be then enlisted? That would be more intuitive use of the manager, I think. Something like this:
Set .AsyncObjectsToExecute = Array( _
WorkerType1Factory.Create("some argument one"), _
WorkerType1Factory.Create("another argument"), _
WorkerType2Factory.Create("do that", 123, 495), _
WorkerType2Factory.Create("but not that", 0, 0), _
WorkerType3Factory.Create() _
)
From that, we can obviously see what tasks we are setting up to be executed, by passing in IWorker
factories. That would not constraint us to only one particular worker class, and it helps sees what arguments we are sending for each task.
Encapsulate your private fields as a type
I would suggest that you take a page out of @MathieuGuindon's book and use his method:
```'THREAD HANDLE BASE PROPERTIES
Private Type THandle
eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
taskID As String 'holds the id of the current task
handleID As String 'holds the id of this handle
handleArgs As Variant 'holds any arguments that need to be passed to the task
End Type
Private this As THandle
That gives you intellisense of the class' fields just by using this.
and thus help clearly differentiating between the class' private backing fields and exposed properties. It's one way to help wrong code look wrong.
RaiseEvents in the Initialize event
This seems to me possibly troublesome:
Private Sub Class_Initialize()
RaiseEvent Opened(startTime)
End Sub
My personal rule for any work inside Initialize
and Terminate
events is to keep the code totally isolated to the class. It is generally not appropriate to reach out outside the class because it hasn't finished constructing itself and as such, the results may be unpredictable. In this case the effect may be benign because we only raise an event with a reported parameter. However, in more complicated implementation, that can act up in unexpected manners. Besides, it really doesn't buy us anything because if the Initialize
was called, we already know about it, since we usually have to had initialized it in the first place.
Do you really need a queue
It's cool that you're using .NET queue to help set up the arguments. However, that means you have a extra reference and you already have a number of references, which makes it hard to distribute the code to different environment.
One way would be to late-bind the queue, by declaring it as object and doing a CreateObject("System.Collections.Queue")
instead. That lets you avoid the need to add an explicit reference to the .NET core library and thus be agnostic about the .NET framework version since in this case, the class won't change between versions.
The alternative is to just use a built-in VBA collection. IINM, you only use queues to collect the arguments, which a VBA colection can do just as well. This would give you queue-like behavior:
col.Add ...
col.Add ...
col.Add ...
Do Until col.Count = 0
col.Remove 1
Loop
without any external references.
Don't use default naming from interface
As @Raystafarian alluded to, you shouldn't accept the default RHS
naming when you implement an interface. I hate it personally, and always change the name. The implementation of interfaces doesn't care what name you use. The only thing it care about is that there's a procedure named something (check), and it has N number of arguments (check), and that each argument are of same data type (check). It doesn't even look at the names of the arguments. So you should change them to something more sensible. If you're feeling unimaginative, call them Value
, which is what I usually do. Just don't leave it as RHS.
Interesting idea and well-done!
Naming
I really don't like the names. The names like clsMultiThread
is somewhat misleading, since as you noted they don't actually provide any true multi-threading. A unwary user would expect it to work with anything and would be disappointed when all of their queued works painfully completes synchronously. ;)
Also, we are not really using threads, but objects which may or may not run in-process or not. You used MSXML2.XMLHTTP60
so it should run in-process. However, it need not be the case if we were to use something like ShDocVw.WebBrowser
or even Excel.Application
which may run out-of-process. That's not threading. So in that case we are actually talking about running things asynchronously more than threading.
May I instead suggest names like ParallelTaskCoordinator
, TaskHandle
, and AsyncObjectWatcher
? The point being that it should convey the notion that those objects have nothing to do with running asynchronously; we're just orchestrating the asynchronous tasks to run in parallel.
In the comments you asked about hungarian notations. My personal preference is to not use hungarian notation for objects, so I wouldn't have used cls
prefix. The concerns about namespace is not going to be helped by whether you prefix or not, mainly because a good namespace is about logical grouping, as opposed to clumping them together based on their type of modules. I'm OK with using HN for private variables, not so much with public-facing properties such as the module's name, the public property since they just distract from their semantic meaning. The semantic part of the name is far more important, and a good naming convention should support that.
On general, your naming scheme seems reasonably well thought out. An inconsistency I do see is that in Execute
method, you have delegate
as an argument name, which then get assigned to the eventHandle
. Why not call the argument the same way, so it's clear when writing the Execute
method what it should be?
Not fully asynchronous, may be blocked by main thread
On its own, it works. However, I do want to call your attention to the fact that it won't work in more complicated scenarios. Here's a example where this breaks:
Executed in immediate windows:
call runtest: for i = 0 to 100 : debug.Print i: doevents: next
In theory, the output of the Debug.Print i
should be interleaved with other tasks progress. In practice, I'm consistently getting 5 tasks completed, then 100s of the outputted i
, then the other 5 tasks completed. This tells me that if the main thread is allow to do things while the tasks in other threads are running, those will be blocked until the main thread becomes idle. Thus it is easy to destroy the asynchrony if one is not careful.
As you can see, even sprinkling DoEvents
is apparently not sufficient. Thinking about it, it's not that surprising because the "events" come from the same VBA code which must run in the UI thread. Therefore, if this becomes a problem, you must fully delegate the asynchronously to an external process/thread (e.g. via an external library) in which you can communicate progress to VBA code via events, rather than relying on the external object to raise event into the worker, which must raise event to the thread handle then to the manager.
Note that even if you went this route, you would need to have a retry logic to successfully raise the event because running VBA code may prevent the entrancy. There is an opportunity for an event to be missed, especially if there is high rate of activity and too much inputs. That has been observed before here, here, here. Those examples illustrates the case where something doesn't always fire the way we expect it to.
As suggested in the comments, changing the # of max threads may work better. For example, setting the max threads to 2, we see the the task 1 and task 2 completing before blocking to print the i
for 90 times, then task 3 is allowed to start then complete before we block again for last 10 more times and the rest complete. However, this is very consistent on my system. In 3 times I ran the test, task 3 don't start until 90 has printed, and the rest don't start until 100 has printed. On the other hand, if I set maxthreads to 1, I get one task completed, then am blocked for 100 times before the rest of tasks are allowed to run. I do not expect it to be easily reproducible since this will be affected by host of factors (the hardware, the version of Windows, the version of Excel). This is simply something to be aware and explicitly design for the possibility.
If it's critical to you that you do not get blocked, you need to consider a different approach. For instance, you'd need an external .NET library that creates a thread, run the task, then write the output to a file. That allows the main thread to read it at its leisure and none of the spawned threads will accidentally get blocked when the main thread needs to do something. Even so, it is still subject to the fact that it can be blocked when trying to spawn a new thread (since you need VBA to run code to create one, even if it's just calling a external function in a DLL).
Note: in all of my tests, only Debug.Print
printed were the Task N started
and Task N completed
and Events terminated
. I had commented out the PrintState
; otherwise, the immediate window would overflow and I can't see all the output from the start to end.
WinHttp instead of MSXML
Also, I want bring your attention to the fact that you could have had a instance of WinHttp.WinHttpRequest
which supports events natively. Therefore you could declare a variable like Private WithEvents request As WinHttp.WinHttpRequest
and instead listen for the event. That means you don't need to set up default member as you have to with the MSXML
and if the internet request is all you do, you don't even need the thread collection; just have a collection of WinHttp.WinHttpRequest
and listen to their events.
But that's obviously not a generic solution, and using WinHttp.WinHttpRequest
doesn't preclude us from using it with the above solution for case where we want a worker to handle the setup.
Don't recurse if you don't have to
You have a doInstructions
which calls itself recursively. But IMPOV, there is no reason to recurse. You can do the same thing with a simple loop as demonstrated with my hacky change. A more proper solution may be to use multiple conditions or a flag variable on the bottom of the loop (guaranteeing that it executes at least once). That ensure you never need to worry about stack overflow.
Private Sub doInstructions(Optional freeThreadID As String, Optional loopcount As Long = 1)
Dim instructionVal As Instruction
Do
'mark thread free if applicable
If freeThreadID <> vbNullString Then freeThread = freeThreadID
'find out what to do
instructionVal = getInstruction()
'carry out instruction
Select Case instructionVal.instructionBody
Case InstructionType.mltCloseThread
closeThread instructionVal.threadID
Case InstructionType.mltOpenThread
openThread
Case InstructionType.mltSetTask
Dim taskThread As clsThreadHandle
Dim taskArguments As Variant
Set taskThread = threadGroup(instructionVal.threadID)
'assign task to thread
assignTaskID (taskThread.Name)
'get any arguments there may be
'mark thread as busy
BusyThread = taskThread.Name
'iterate open tasks
openTaskCount = openTaskCount + 1
'execute task
If passesArguments Then
'pop appropriate item from queue
Set taskArguments = iterableQueue.Dequeue
taskThread.Execute taskArguments
Else
taskThread.Execute
End If
Case InstructionType.mltQuit
'quit then do nothing
Me.Quit
instructionVal.instructionBody = mltDoNothing
Case InstructionType.mltDoNothing
'do nothing
Case Else
Err.Raise 5 'invalid argument
End Select
'call self until no instruction
If instructionVal.instructionBody <> mltDoNothing Then
Debug.Assert loopcount < maxThreads * 3 + 5 'max loop should be open all threads then run all tasks + a little
'doInstructions loopcount:=loopcount + 1 'watch for infinite loop
freeThreadID = vbNullString
loopcount = loopcount + 1
Else
Exit Do
End If
Loop
End Sub
Only one worker type per manager class
In the sample, we have this:
Set .AsyncClass = New clsHtmlWorker
I was under the impression that the point of the manager was to allow us to create threads for different workers each running asynchronously. For that reason, it seems strange that I can only use AsyncClass
to set a single implementation of IWorker
. Shouldn't I be able to assign a bunch of workers, with their own arguments to be then enlisted? That would be more intuitive use of the manager, I think. Something like this:
Set .AsyncObjectsToExecute = Array( _
WorkerType1Factory.Create("some argument one"), _
WorkerType1Factory.Create("another argument"), _
WorkerType2Factory.Create("do that", 123, 495), _
WorkerType2Factory.Create("but not that", 0, 0), _
WorkerType3Factory.Create() _
)
From that, we can obviously see what tasks we are setting up to be executed, by passing in IWorker
factories. That would not constraint us to only one particular worker class, and it helps sees what arguments we are sending for each task.
Encapsulate your private fields as a type
I would suggest that you take a page out of @MathieuGuindon's book and use his method:
```'THREAD HANDLE BASE PROPERTIES
Private Type THandle
eventHandle As clsHandleEvents 'Events module multithread set which handle belongs to. Called when handle state changes
taskID As String 'holds the id of the current task
handleID As String 'holds the id of this handle
handleArgs As Variant 'holds any arguments that need to be passed to the task
End Type
Private this As THandle
That gives you intellisense of the class' fields just by using this.
and thus help clearly differentiating between the class' private backing fields and exposed properties. It's one way to help wrong code look wrong.
RaiseEvents in the Initialize event
This seems to me possibly troublesome:
Private Sub Class_Initialize()
RaiseEvent Opened(startTime)
End Sub
My personal rule for any work inside Initialize
and Terminate
events is to keep the code totally isolated to the class. It is generally not appropriate to reach out outside the class because it hasn't finished constructing itself and as such, the results may be unpredictable. In this case the effect may be benign because we only raise an event with a reported parameter. However, in more complicated implementation, that can act up in unexpected manners. Besides, it really doesn't buy us anything because if the Initialize
was called, we already know about it, since we usually have to had initialized it in the first place.
Do you really need a queue
It's cool that you're using .NET queue to help set up the arguments. However, that means you have a extra reference and you already have a number of references, which makes it hard to distribute the code to different environment.
One way would be to late-bind the queue, by declaring it as object and doing a CreateObject("System.Collections.Queue")
instead. That lets you avoid the need to add an explicit reference to the .NET core library and thus be agnostic about the .NET framework version since in this case, the class won't change between versions.
The alternative is to just use a built-in VBA collection. IINM, you only use queues to collect the arguments, which a VBA colection can do just as well. This would give you queue-like behavior:
col.Add ...
col.Add ...
col.Add ...
Do Until col.Count = 0
col.Remove 1
Loop
without any external references.
Don't use default naming from interface
As @Raystafarian alluded to, you shouldn't accept the default RHS
naming when you implement an interface. I hate it personally, and always change the name. The implementation of interfaces doesn't care what name you use. The only thing it care about is that there's a procedure named something (check), and it has N number of arguments (check), and that each argument are of same data type (check). It doesn't even look at the names of the arguments. So you should change them to something more sensible. If you're feeling unimaginative, call them Value
, which is what I usually do. Just don't leave it as RHS.
edited Mar 29 at 10:48
answered Mar 29 at 4:14
this
1,242318
1,242318
A lot of insightful tips here; recursion, default naming, queueing - sometimes you need someone to look at your code from a different POV to spot those things, thanks. I've also recently been trying out the private field type, (actually did it by accident here too!). A couple of questions though: Naming - you say you don't like the names, and go on to highlight some misleading ones, but are there any other problems with naming (exceptRHS
)? I mostly prefix withcls
in Hungarian style purely so all my stuff is within easy intellisense reach (because no namespaces)
â Greedo
Mar 29 at 9:08
Also the may be blocked by main thread test you ran. I was initially worried when you described what was going on, but I find the class is actually working as expected. The first 5 run asynchronously but all in the same process (an embarrassing oversight on my part of usingMSXML2.XMLHTTP60
) - they completely occupy the UI thread so no printing. They raisecomplete
events, then we get into theFor
loop, and I get a 0 printed - then the next 5 events fromDoEvents
, then the class quits, then the thread is free to print the remaining numbers. So as expected given 1 thread?
â Greedo
Mar 29 at 9:16
Try commenting out the content ofPrintState
and changingmaxThreads
to 2 or something - it reveals more about what's going on, though I'm still not entirely sure myself and was wondering if you could take another look?
â Greedo
Mar 29 at 9:20
@Greedo I have addressed your questions and edited my answer accordingly.
â this
Mar 29 at 10:49
1
This is a great answer
â Raystafarian
Mar 30 at 11:36
add a comment |Â
A lot of insightful tips here; recursion, default naming, queueing - sometimes you need someone to look at your code from a different POV to spot those things, thanks. I've also recently been trying out the private field type, (actually did it by accident here too!). A couple of questions though: Naming - you say you don't like the names, and go on to highlight some misleading ones, but are there any other problems with naming (exceptRHS
)? I mostly prefix withcls
in Hungarian style purely so all my stuff is within easy intellisense reach (because no namespaces)
â Greedo
Mar 29 at 9:08
Also the may be blocked by main thread test you ran. I was initially worried when you described what was going on, but I find the class is actually working as expected. The first 5 run asynchronously but all in the same process (an embarrassing oversight on my part of usingMSXML2.XMLHTTP60
) - they completely occupy the UI thread so no printing. They raisecomplete
events, then we get into theFor
loop, and I get a 0 printed - then the next 5 events fromDoEvents
, then the class quits, then the thread is free to print the remaining numbers. So as expected given 1 thread?
â Greedo
Mar 29 at 9:16
Try commenting out the content ofPrintState
and changingmaxThreads
to 2 or something - it reveals more about what's going on, though I'm still not entirely sure myself and was wondering if you could take another look?
â Greedo
Mar 29 at 9:20
@Greedo I have addressed your questions and edited my answer accordingly.
â this
Mar 29 at 10:49
1
This is a great answer
â Raystafarian
Mar 30 at 11:36
A lot of insightful tips here; recursion, default naming, queueing - sometimes you need someone to look at your code from a different POV to spot those things, thanks. I've also recently been trying out the private field type, (actually did it by accident here too!). A couple of questions though: Naming - you say you don't like the names, and go on to highlight some misleading ones, but are there any other problems with naming (except
RHS
)? I mostly prefix with cls
in Hungarian style purely so all my stuff is within easy intellisense reach (because no namespaces)â Greedo
Mar 29 at 9:08
A lot of insightful tips here; recursion, default naming, queueing - sometimes you need someone to look at your code from a different POV to spot those things, thanks. I've also recently been trying out the private field type, (actually did it by accident here too!). A couple of questions though: Naming - you say you don't like the names, and go on to highlight some misleading ones, but are there any other problems with naming (except
RHS
)? I mostly prefix with cls
in Hungarian style purely so all my stuff is within easy intellisense reach (because no namespaces)â Greedo
Mar 29 at 9:08
Also the may be blocked by main thread test you ran. I was initially worried when you described what was going on, but I find the class is actually working as expected. The first 5 run asynchronously but all in the same process (an embarrassing oversight on my part of using
MSXML2.XMLHTTP60
) - they completely occupy the UI thread so no printing. They raise complete
events, then we get into the For
loop, and I get a 0 printed - then the next 5 events from DoEvents
, then the class quits, then the thread is free to print the remaining numbers. So as expected given 1 thread?â Greedo
Mar 29 at 9:16
Also the may be blocked by main thread test you ran. I was initially worried when you described what was going on, but I find the class is actually working as expected. The first 5 run asynchronously but all in the same process (an embarrassing oversight on my part of using
MSXML2.XMLHTTP60
) - they completely occupy the UI thread so no printing. They raise complete
events, then we get into the For
loop, and I get a 0 printed - then the next 5 events from DoEvents
, then the class quits, then the thread is free to print the remaining numbers. So as expected given 1 thread?â Greedo
Mar 29 at 9:16
Try commenting out the content of
PrintState
and changing maxThreads
to 2 or something - it reveals more about what's going on, though I'm still not entirely sure myself and was wondering if you could take another look?â Greedo
Mar 29 at 9:20
Try commenting out the content of
PrintState
and changing maxThreads
to 2 or something - it reveals more about what's going on, though I'm still not entirely sure myself and was wondering if you could take another look?â Greedo
Mar 29 at 9:20
@Greedo I have addressed your questions and edited my answer accordingly.
â this
Mar 29 at 10:49
@Greedo I have addressed your questions and edited my answer accordingly.
â this
Mar 29 at 10:49
1
1
This is a great answer
â Raystafarian
Mar 30 at 11:36
This is a great answer
â Raystafarian
Mar 30 at 11:36
add a comment |Â
up vote
6
down vote
This is way above my expertise, but maybe adding an answer would cause more views/answers? Also, what's that beginner tag doing there? ;)
I want to say first off, really solid work. That's probably why there hasn't been too much activity here. Maybe some of this will seem like nit-picks and if so, Sorry!
ByRef or ByVal arguments
Every one of the arguments, as you probably know, that aren't declared ByVal
are implicitly ByRef
. I imagine you probably need a lot of these to be ByRef
, but it's better to explicitly declare them ByRef
so it's easier to tell that it's supposed to be ByRef
.
obj parameter
clsHandleEvents.Complete(obj as clsThreadHandle)
clsHandleEvents.NotifyComplete(obj as clsThreadHandle)
clsMultiThread.threadEvents_Complete(obj as clsThreadHandle)
multiThreadMethods.isIterable(obj as Variant)
The first three could probably use a new name given they are all the same type. Maybe threadObject
or threadHandle
- up to you.
The fourth one is taking a variant and returning a boolean, it could be named anything from testObject
to iteratorGroup
. You might want to try renaming some of those
Queue
clsMultiThread
Private iterableQueue As mscorlib.Queue
Set iterableQueue = New Queue
multiThreadMethods
Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue)
Maybe I'm dense, being a VBA guy, but how would I know what a queue
is and what methods it uses? It's not a standard reference library for VBA, is it? I'd probably add a comment in there explaining why you've chosen to do it this way, so nobody would have to go back through all of it to figure out why it's better than some other way.
RHS?
clsHtmlWorker
Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property
What is RHS? Is it a constant, it's in all caps. I can tell you know better using Html
in your project rather than HTML
- so what is it?
All that being said, I can't even really figure out how this is working when I step through it, I see where requests should be sent (httpRequest), but I can't figure out how they come back and populate the sheet, I don't see it happening, which I think is the point, being asynchronous?
Thanks alot for this - some good points here I hadn't considered. I put theBeginner
tag because this is the first bit of reusable code I've thought worthy of review, I'm glad you think it's solid! ThoseRHS
s were grabbed directly from one of the examples I referenced, but I agree, they are in poor taste. You're exactly right about the debugging asynchronous stuff, that's partly why I have so muchdebug.print
, as you can't step through the operational code. Good point onQueue
. Fingers crossed I can get some more good answers like this on some of the other bits of the code
â Greedo
Feb 15 at 19:39
I like thedebugging = true
part of it so you can see what's happening in the immediate window, with all the updates that get written to it. Some would say that's some wasted effort, but there's not much else you could do for debugging it.
â Raystafarian
Feb 15 at 21:28
RHS = Right Hand Side?
â Solomon Ucko
Mar 24 at 1:56
add a comment |Â
up vote
6
down vote
This is way above my expertise, but maybe adding an answer would cause more views/answers? Also, what's that beginner tag doing there? ;)
I want to say first off, really solid work. That's probably why there hasn't been too much activity here. Maybe some of this will seem like nit-picks and if so, Sorry!
ByRef or ByVal arguments
Every one of the arguments, as you probably know, that aren't declared ByVal
are implicitly ByRef
. I imagine you probably need a lot of these to be ByRef
, but it's better to explicitly declare them ByRef
so it's easier to tell that it's supposed to be ByRef
.
obj parameter
clsHandleEvents.Complete(obj as clsThreadHandle)
clsHandleEvents.NotifyComplete(obj as clsThreadHandle)
clsMultiThread.threadEvents_Complete(obj as clsThreadHandle)
multiThreadMethods.isIterable(obj as Variant)
The first three could probably use a new name given they are all the same type. Maybe threadObject
or threadHandle
- up to you.
The fourth one is taking a variant and returning a boolean, it could be named anything from testObject
to iteratorGroup
. You might want to try renaming some of those
Queue
clsMultiThread
Private iterableQueue As mscorlib.Queue
Set iterableQueue = New Queue
multiThreadMethods
Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue)
Maybe I'm dense, being a VBA guy, but how would I know what a queue
is and what methods it uses? It's not a standard reference library for VBA, is it? I'd probably add a comment in there explaining why you've chosen to do it this way, so nobody would have to go back through all of it to figure out why it's better than some other way.
RHS?
clsHtmlWorker
Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property
What is RHS? Is it a constant, it's in all caps. I can tell you know better using Html
in your project rather than HTML
- so what is it?
All that being said, I can't even really figure out how this is working when I step through it, I see where requests should be sent (httpRequest), but I can't figure out how they come back and populate the sheet, I don't see it happening, which I think is the point, being asynchronous?
Thanks alot for this - some good points here I hadn't considered. I put theBeginner
tag because this is the first bit of reusable code I've thought worthy of review, I'm glad you think it's solid! ThoseRHS
s were grabbed directly from one of the examples I referenced, but I agree, they are in poor taste. You're exactly right about the debugging asynchronous stuff, that's partly why I have so muchdebug.print
, as you can't step through the operational code. Good point onQueue
. Fingers crossed I can get some more good answers like this on some of the other bits of the code
â Greedo
Feb 15 at 19:39
I like thedebugging = true
part of it so you can see what's happening in the immediate window, with all the updates that get written to it. Some would say that's some wasted effort, but there's not much else you could do for debugging it.
â Raystafarian
Feb 15 at 21:28
RHS = Right Hand Side?
â Solomon Ucko
Mar 24 at 1:56
add a comment |Â
up vote
6
down vote
up vote
6
down vote
This is way above my expertise, but maybe adding an answer would cause more views/answers? Also, what's that beginner tag doing there? ;)
I want to say first off, really solid work. That's probably why there hasn't been too much activity here. Maybe some of this will seem like nit-picks and if so, Sorry!
ByRef or ByVal arguments
Every one of the arguments, as you probably know, that aren't declared ByVal
are implicitly ByRef
. I imagine you probably need a lot of these to be ByRef
, but it's better to explicitly declare them ByRef
so it's easier to tell that it's supposed to be ByRef
.
obj parameter
clsHandleEvents.Complete(obj as clsThreadHandle)
clsHandleEvents.NotifyComplete(obj as clsThreadHandle)
clsMultiThread.threadEvents_Complete(obj as clsThreadHandle)
multiThreadMethods.isIterable(obj as Variant)
The first three could probably use a new name given they are all the same type. Maybe threadObject
or threadHandle
- up to you.
The fourth one is taking a variant and returning a boolean, it could be named anything from testObject
to iteratorGroup
. You might want to try renaming some of those
Queue
clsMultiThread
Private iterableQueue As mscorlib.Queue
Set iterableQueue = New Queue
multiThreadMethods
Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue)
Maybe I'm dense, being a VBA guy, but how would I know what a queue
is and what methods it uses? It's not a standard reference library for VBA, is it? I'd probably add a comment in there explaining why you've chosen to do it this way, so nobody would have to go back through all of it to figure out why it's better than some other way.
RHS?
clsHtmlWorker
Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property
What is RHS? Is it a constant, it's in all caps. I can tell you know better using Html
in your project rather than HTML
- so what is it?
All that being said, I can't even really figure out how this is working when I step through it, I see where requests should be sent (httpRequest), but I can't figure out how they come back and populate the sheet, I don't see it happening, which I think is the point, being asynchronous?
This is way above my expertise, but maybe adding an answer would cause more views/answers? Also, what's that beginner tag doing there? ;)
I want to say first off, really solid work. That's probably why there hasn't been too much activity here. Maybe some of this will seem like nit-picks and if so, Sorry!
ByRef or ByVal arguments
Every one of the arguments, as you probably know, that aren't declared ByVal
are implicitly ByRef
. I imagine you probably need a lot of these to be ByRef
, but it's better to explicitly declare them ByRef
so it's easier to tell that it's supposed to be ByRef
.
obj parameter
clsHandleEvents.Complete(obj as clsThreadHandle)
clsHandleEvents.NotifyComplete(obj as clsThreadHandle)
clsMultiThread.threadEvents_Complete(obj as clsThreadHandle)
multiThreadMethods.isIterable(obj as Variant)
The first three could probably use a new name given they are all the same type. Maybe threadObject
or threadHandle
- up to you.
The fourth one is taking a variant and returning a boolean, it could be named anything from testObject
to iteratorGroup
. You might want to try renaming some of those
Queue
clsMultiThread
Private iterableQueue As mscorlib.Queue
Set iterableQueue = New Queue
multiThreadMethods
Public Function addIterableToQueue(iterator As Variant, ByRef resultQueue As Queue)
Maybe I'm dense, being a VBA guy, but how would I know what a queue
is and what methods it uses? It's not a standard reference library for VBA, is it? I'd probably add a comment in there explaining why you've chosen to do it this way, so nobody would have to go back through all of it to figure out why it's better than some other way.
RHS?
clsHtmlWorker
Private Property Set IWorker_Events(RHS As IWorkerEvents)
Set this.Events = RHS
End Property
What is RHS? Is it a constant, it's in all caps. I can tell you know better using Html
in your project rather than HTML
- so what is it?
All that being said, I can't even really figure out how this is working when I step through it, I see where requests should be sent (httpRequest), but I can't figure out how they come back and populate the sheet, I don't see it happening, which I think is the point, being asynchronous?
edited Feb 16 at 22:02
Gerrit0
2,6701518
2,6701518
answered Feb 15 at 5:12
Raystafarian
5,4981046
5,4981046
Thanks alot for this - some good points here I hadn't considered. I put theBeginner
tag because this is the first bit of reusable code I've thought worthy of review, I'm glad you think it's solid! ThoseRHS
s were grabbed directly from one of the examples I referenced, but I agree, they are in poor taste. You're exactly right about the debugging asynchronous stuff, that's partly why I have so muchdebug.print
, as you can't step through the operational code. Good point onQueue
. Fingers crossed I can get some more good answers like this on some of the other bits of the code
â Greedo
Feb 15 at 19:39
I like thedebugging = true
part of it so you can see what's happening in the immediate window, with all the updates that get written to it. Some would say that's some wasted effort, but there's not much else you could do for debugging it.
â Raystafarian
Feb 15 at 21:28
RHS = Right Hand Side?
â Solomon Ucko
Mar 24 at 1:56
add a comment |Â
Thanks alot for this - some good points here I hadn't considered. I put theBeginner
tag because this is the first bit of reusable code I've thought worthy of review, I'm glad you think it's solid! ThoseRHS
s were grabbed directly from one of the examples I referenced, but I agree, they are in poor taste. You're exactly right about the debugging asynchronous stuff, that's partly why I have so muchdebug.print
, as you can't step through the operational code. Good point onQueue
. Fingers crossed I can get some more good answers like this on some of the other bits of the code
â Greedo
Feb 15 at 19:39
I like thedebugging = true
part of it so you can see what's happening in the immediate window, with all the updates that get written to it. Some would say that's some wasted effort, but there's not much else you could do for debugging it.
â Raystafarian
Feb 15 at 21:28
RHS = Right Hand Side?
â Solomon Ucko
Mar 24 at 1:56
Thanks alot for this - some good points here I hadn't considered. I put the
Beginner
tag because this is the first bit of reusable code I've thought worthy of review, I'm glad you think it's solid! Those RHS
s were grabbed directly from one of the examples I referenced, but I agree, they are in poor taste. You're exactly right about the debugging asynchronous stuff, that's partly why I have so much debug.print
, as you can't step through the operational code. Good point on Queue
. Fingers crossed I can get some more good answers like this on some of the other bits of the codeâ Greedo
Feb 15 at 19:39
Thanks alot for this - some good points here I hadn't considered. I put the
Beginner
tag because this is the first bit of reusable code I've thought worthy of review, I'm glad you think it's solid! Those RHS
s were grabbed directly from one of the examples I referenced, but I agree, they are in poor taste. You're exactly right about the debugging asynchronous stuff, that's partly why I have so much debug.print
, as you can't step through the operational code. Good point on Queue
. Fingers crossed I can get some more good answers like this on some of the other bits of the codeâ Greedo
Feb 15 at 19:39
I like the
debugging = true
part of it so you can see what's happening in the immediate window, with all the updates that get written to it. Some would say that's some wasted effort, but there's not much else you could do for debugging it.â Raystafarian
Feb 15 at 21:28
I like the
debugging = true
part of it so you can see what's happening in the immediate window, with all the updates that get written to it. Some would say that's some wasted effort, but there's not much else you could do for debugging it.â Raystafarian
Feb 15 at 21:28
RHS = Right Hand Side?
â Solomon Ucko
Mar 24 at 1:56
RHS = Right Hand Side?
â Solomon Ucko
Mar 24 at 1:56
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185212%2fa-new-approach-to-multithreading-in-excel%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
I'm wondering why we have comments saying reference to
mscorlib.dll
is needed yet I see noCreateObject
or even `GetObject calls to create something from that library? This is all pure VBA and because VBA is single-threaded, everything will run as one thread?â this
Jan 16 at 12:52
@This 1)
mscorlib.dll
I'm using early binding, andPrivate iterableQueue As mscorlib.Queue
inclsMultiThread
is what requires it. 2) True, to fully simulate multithreading in Excel you need to create multiple instances ofEXCEL.EXE
. However this project targets asynchronous processes specifically, as these don't run in Excel directly. Sure the handling is all single-threaded, but in internet applications the main overhead is in waiting for response to load. That can be done asynchronously, and with several instances in parallel. I hope that makes senseâ Greedo
Jan 16 at 14:06
I think whether this could run code asynchronously very much depends on the existence of a worker class that supports yielding its execution until some asynchronous work has finished. Without such a worker tasks would simply run to completion synchronously whenever they are executed.
â M.Doerner
Jan 16 at 14:52
@M.Doerner Exactly, an
InternetExplorer.Application
or anXmlHttp
request (see this) article can be run asynchronously, outside Excel, using Events and the default property callback function hack respectively. My use ofIWorker
interface was designed to make sure workers raise events when complete, which is a sort of reminder that these should be async workers, not normal routines. Multithreading standard routines requires this sort of approachâ Greedo
Jan 16 at 15:25
1
@Raystafarian I've added a download file in the test area if that helps
â Greedo
Feb 12 at 16:16