Preface

Last Saturday, I hosted one weekly session for TokyoAEC. Mainly I talked about how we could develop the RevitAPI tools in Grasshopper. People who found it interested can take a look the link below:

https://github.com/ChenChihYuan/RhinoInsideRevit_ScriptingIntroduction

LineSegmentIntersection_SweepLine

Posted on 2021-11-22

reference

https://alrightchiu.github.io/SecondRound/priority-queuebinary-heap.html

https://alrightchiu.github.io/SecondRound/single-source-shortest-pathdijkstras-algorithm.html

ParametericArchitectureTalkWithRorbertMcNeel

Posted on 2021-11-18 Edited on 2021-11-19 In Rhino , Grasshopper , Interview

https://www.youtube.com/watch?v=QsaOPyipR-s

Making_Task_Capable_ComponentsInGH

Posted on 2021-11-17 In multithread , grasshopper , Csharp

Preface

I just finished the first lesson of Performance_Engineering_Introduction, and saw these posts from McNeel.
https://developer.rhino3d.com/guides/grasshopper/multi-treaded-components/

https://developer.rhino3d.com/guides/grasshopper/programming-task-capable-component/

I might just leave the link here for a while, till I get time to really implement it by myself.

Performance Engineering_IntroductionMatrixMultiplication

Posted on 2021-11-16 Edited on 2021-11-17

Preface

There are usually some recommendation videos from MIT OCW on youtube for me. Shameless to say, although I am interested in the courses, I never finished any whole semester. But, Nah. I just want to note down those I clicked and gain some knowledge after all.

Reference

https://www.youtube.com/watch?v=o7h_sYMk_oc&list=PLUl4u3cNGP63VIBQVWguXxZZi0566y7Wf&index=1

Software Properties

What software properties are more important than performace?

There are many things are more important than performance, but it doesn’t mean that performance is not important.

Performance is like some kind of currency(water resource).

Computer Programming in the Early Days

Software performace engineering was common, because machine resoures were limited.

Many programs strained the machine’s resources.

Programs had to be planned around the machine
Many program would not “fit” without intense performance engineering.

Clockrate, memory was low back in 70s

Clockrate

Core Speed = (Bus/Base Speed) * (Multiplier)

Lessons learned from the 70’s and 80’s

Premature optimization is the root of all evil. - Donald Knuth

More computing sins are committed in the name of efficiency(without necessarily achieving it) than for any other single reason - including blind stupidity -William Wulf

The First Rule of Program Optimization: Don’t do it. The Second Rule of Program Optimization - For experts only: Don’t do it yet. - Michael Jaskson

Making code readable and Fast

Vendor Solution: Multicore

Clock rate seem stop growing after around 2004.

To scale performance, processor manufacturers put many processing cores on the microprocessor chip
Each generation of Moore’s Law potentially doubles the number of cores.

Performance Is No Longer Free

Moore’s Law continues to increase computer performance.
But now that performance looks like big multicore processors with complex cache hierarchies, wide vector units, GPU’s, FGPA’s, etc

Generally, software must be adapted to utilize this hardware efficiently.

Same code writen in different language, Python, Java and C, for example

Interpreters are versatile, but slow

The interpreter reads, interprets, and preforms each program statement and updates the machine state.
Interpreters can easily support high-level programming features - such as dynamic code alteration - at the cost of performance

JIT compilation

JIT compilers can recover some of the performance lost by interpretation.
When code is first executed, it is interpreted.
The runtime system keeps track of how often the various pieces of code are executed.
Whenever some piece of code executes sufficiently frequently, it gets compiled to machine code in real time.
Future executions of that code use the more-efficient compiled version.

Performance of Differnent Order

Loop Order

for(int i = 0; i < n; i++){
    for(int j = 0; j < n; j++){
        for(int k = 0; k < n; k++){
            c[i][j] += A[i][j] * B[k][j];
        }
    }   
}

what if we change the order of the loops, let’s say,

        for(int k = 0; k < n; k++){
    for(int j = 0; j < n; j++){
for(int i = 0; i < n; i++){                    
            c[i][j] += A[i][j] * B[k][j];
        }
    }   
}

Does this cause any performance difference?

cache location

Hardware Caches

Each processor reads and writes main memory in contiguous blocks, called cache lines.

Previously accessed cache lines are stored in a small memory, called a cache, that sits near the processor
Cache hits: accesses to data in cache -> are fast
Cache misses: accesses to data not in cache -> are slow

We can measure the effect of different access patterns using the Cachegrind cache simulator

1	valgrind --tool=cachegrind ./mm

Compiler Optimization

Clang provides a collection of optimization switches. You can specify a switch to the compiler to ask it to optimize.

Parallel Loops

The cilk_for loop allows all iterations of the loop to execute in parallel.

Experimenting with Parallel Loops

Parallel `i` Loop

clik_for(int i = 0; i < n; ++i){
    for(int k =0; k < n ;++k){
        for(int j =0; j < n ;++j){
            c[i][j] += A[i][j] * B[k][j];
        }
    }
}

Running time: 3.18s

Parallel `j` Loop

for(int i = 0; i < n; ++i){
    for(int k =0; k < n ;++k){
        cilk_for(int j =0; j < n ;++j){
            c[i][j] += A[i][j] * B[k][j];
        }
    }
}

Running time: 531.71s

Parallel `i` and `j` loops

cilk_for(int i = 0; i < n; ++i){
    for(int k =0; k < n ;++k){
        cilk_for(int j =0; j < n ;++j){
            c[i][j] += A[i][j] * B[k][j];
        }
    }
}

Running time: 10.64s

For some reason, we cannot parallel k to get the correct answer.

Rule of Thumb: Parallelize outer loops rather than inner loops.

Using parallel loops gets us almost 18x speedup on 18 cores! (Disclaimer: Not all code is so easy to parallelize effictively.)

Vector Hardware

Modern microprocessors incorporate vector hardware to process data in single-instruction stream, multiple data stream(SIMD) fashion

Compiler Vectorization

Clang/LLVM uses vector instructions automatically when compiling at optimization level -02 or higher.

Many machines don’t support the newest set of vector instructions, however, so the compiler uses vector instructions conservatively by default.

Vectorization Flags

Programmers can direct the compiler to use modern vector instructions using compiler flags such as the following:

mavx: Use Intel AVX vector instructions
mavx2: Use Intel AVX2 vector instructions

Version Implementation

Python
Java
C
+interchange loops
+optimization flags
Parallel loops
+tilling
Parallel divide-and-conquer
+compiler vectorization
+AVX intrinsics

Plus More Optimizations

We can apply several more insights and performance-engineering trick to make this code run faster, including:

Preprocessing
Matrix transposition
Data alignment
Memory-managemnet optimizations
A clever algorithm for the base case that uses AVX intrinsic instructions explicitly

SelectGeometryInSameLayer_GH

Posted on 2021-11-16

Preface

Just as another regular work day, I memo down some new things along the task I am facing. Recently I have been crazy about the Voronoi diagram. Although as bad as I wanted to implement the sweepline algorithm, I found the best practice might be patient and learn thing little by little. Also, I am kinds into “Study with me” series on youtube. I feel clam with someone’s company alongside.

Reference

https://discourse.mcneel.com/t/c-select-objects-by-layer/84420/6
Appreciate the help from username Mahdiyar.

Two things are different

select object in Rhino DocView
select object in GH

var ghobjs = new List<object>();
var selectedLayer = RhinoDoc.ActiveDoc.Layers.FindName(layerName);
var layers = new List<Rhino.DocObjects.Layer>(){selectedLayer};
if(subLayer)
{
  foreach(var layer in RhinoDoc.ActiveDoc.Layers)
    if(layer.IsChildOf(selectedLayer))
      layers.Add(layer);
}
B = layers;
foreach(var layer in layers)
{
  var rhobjs = doc.Objects.FindByLayer(layer);
  foreach (var rhObj in rhobjs)
    ghobjs.Add(rhObj.Geometry);
}
A = ghobjs;

VoronoiDiagram_Lectures

Posted on 2021-11-12 Edited on 2021-11-20

Lecture Link

https://www.youtube.com/watch?v=Pn19deUBsM8&t=13s

The Post-Office Problem

https://link.springer.com/chapter/10.1007%2F3-540-10291-4_1

http://blog.ivank.net/voronoi-diagram-in-javascript.html

http://paperjs.org/examples/voronoi/

https://shaneosullivan.wordpress.com/2011/03/22/sweep-line-voronoi-algorithm-in-javascript/

https://jacquesheunis.com/post/fortunes-algorithm/

http://blog.ivank.net/fortunes-algorithm-and-implementation.html

https://github.com/pvigier/FortuneAlgorithm

http://www.cs.uu.nl/geobook/

https://pvigier.github.io/2018/11/18/fortune-algorithm-details.html

TakeAwayFromPvigier’s blog

https://pvigier.github.io/2018/11/18/fortune-algorithm-details.html

pseudo-code from Pvigier’s Blog

add a site event in the event queue for each site
while the event queue is not empty
    pop the top event
    if the event is a site event
        insert a new arc in the beachline
        check for new circle events
    else
        create a vertex in the diagram
        remove the shrunk arc from the beachline
        delete invalidated events
        check for new circle events

site event
circle event

Diagram’s Data Structure

Double connected edge list(DCEL).

class VoronoiDiagram{
    public:
        //...
    private:
        std::vector<Site> mSite;
        std::vector<Face> mFace;
        std::list<Vertex> mVertices;
        std::list<HalfEdge> mHalfEdges;
}

The Site class represents an input point. Each site has an index, which is useful to query them, coordinates and a pointer to their cell(face):

struct Site{
    std::size_t index;
    Vector2 point;
    Face* face;
};

The vertices of the cells are represented by the Vertex class, they just have a cooridinates field:

1
2
3

struct Vertex{
    Vector2 point;
};

Here is the implementation of half-edges:

struct HalfEdge{
    Vertex* origin;
    Vertex* destination;
    HalfEdge* twin;
    Face* incidentFace;
    HalfEdge* prev;
    HalfEdge* next;
};

An edge in a Voronoi diagram is shared by two adjacent cells. In the DCEL data structure, we splite these edfes in two half-edges, one for each cell, and they are linked by the twin pointer.

Moreover, a hald-edge has an origin vertex and a desrination vertex.
The incidenetFace field points to the face to which the half-edge belongs to.

Finally, in DCEL, cells are implemented as a circular doubly linked list of half-edges where adjacent half-edges are linked together. Thus the prev and next fields points to the previous and next half-edges in the cell.

Finally, the Face class represents a cell and it just contains a pointer to its site and another to one of its half-edges. It does not matter which half-edge it is because a cell is a closed polygon. Thus we can access to all of its hald-edges by traversing the circular linked list.

struct Face{
    Site* site;
    HalfEdge* outerComponent;
};

Event Queue

The standard way to implement the event queue is to use a priority queue.

During the processing of site and circle events we may need to remove circle events from the queue because they are not valid anymore.

But most of the standard implementations of priority queues do not allow to remove an element which is not the top one. In particular that is the case for std::priority_queue.

There are two ways to tackle this problem.

The first and simplest one is to add a valid flag to true initially. Then instead of removing the circle event from the queue, we just set its flag to false. Finially, when we process the events in the main loop, if the valid flag of an event equals to false, we simply discard it and process the next one.
The second method which I adopted is not to use std::priority_queue. Instead, I implemented my own priority queue which supports removal of any element it contains. The implementation of such a queue is pretty straightforward. I choose this method because I find it makes the code for the algorithm clearer.

Beachline

If not implemented correctly there is no guarantee that the algorithm will run in O(log(n)). The key to reach this time complecity is to use a self-balancing tree.

In my implementation the beachline is still a tree, but all nodes represent an arc.

struct Arc{
    enum class Color{RED, BLACK};

    // Hierarchy
    Arc* parent;
    Arc* left;
    Arc* right;

    // Diagram
    VoronoiDiagram::Site* site;
    VoronoiDiagram::HalfEdge* leftHalfEdge;
    VoronoiDiagram::HalfEdge* rightHalfEdge;
    Event* event;
    // Optimizations
    Arc* prev;
    Arc* next;
    // Only for balancing
    Color color;
};

In Fortune’s algorithm we have no key, we only know that we want to insert an arc before or after another in the beachline. To do that, I create methods insertBefore and insertAfter:

void Beachline::insertBefore(Arc* x, Arc* y){
    // Find the right place
    if(isNil(x->left)){
        x->left = y;
        y->parent = x;
    }
    else{
        x->prev->right = y;
        y->parent = x->prev;
    }
    // Set the pointers
    y->prev = x->prev;
    if(!isNil(y->prev))
        y->prev->next = y;
    
    y->next = x;
    x->prev = y;

    // Balance the tree
    insertFixup(y);
}

Priority Queues

https://www.youtube.com/watch?v=wptevk0bshY

What is a Priority Queue

A priority queue is an Abstract Data Type(ADT) that operates similar to a normal queue excpet that each element has a certain priority. The priority of the elements in priority queue determin the order in which elements are removed from the PQ.

NOTE: Priority queues only supports comparable data.

What is a Heap?

A heap is a tree based DS that satifies the heap invariant(also called heap property):
If A is a parent node of B then A is ordered with repect to B for all nodes A, B in the heap.

Max Heap
Min Heap

Heaps must be trees.

When and where is a PQ used?

Used in certain implementaitons of Dijkstra’s shortest Path algorithm.
Anytime you need the dynamically fetch the next best or next worst element
Used in Huffman coding (which is often used for lossless data compression).
Best First Search(BFS) algorithms such as A* use PQs to continuously grab the next most promising node
Used by minimum Spanning Tree(MST) algorithms

Complexity PQ with binary heap

Binary Heap construction: O(n)
Polling: O(log(n))
Peeking: O(1)
Adding: O(log(n))

Self balance Tree (AVL Tree)

https://www.youtube.com/watch?v=vRwi_UcZGjU

Balance
Rotation
Implementation

Rebalancing threshold

pseudo-code from Jacquesheunis’ blog

https://jacquesheunis.com/post/fortunes-algorithm/

Fill the event queue with site events for each input site.
While the event queue still has items in it:
    If the next event on the queue is a site event:
        Add the new site to the beachline
    Otherwise it must be an edge-intersection event:
        Remove the squeezed cell from the beachline
Cleanup any remaining intermediate state

Voronoi Diagram - Computataional Geometry

Fortune’s Sweep

VoronoiDiagram(P are in R2)
Q : new PriorityQueue(P) // site events sortes by y-coord.
T : new BalancedBinarySearchTree() // sweep status(B)
D : new DCEL() // to-be Vor(P)

while not Q.empty() do
    p <- Q.ExtractMax()
    if p site event then
        HandleSiteEvent(p)
    else
        a <- arc on B that will disappear
        HandleCircleEvent(a)

treat remaining int. nodes of T(unbnd. edges of Vor(P))
return D

Handling Events

HandleSiteEvent(point p)
- Search in T for the arc a vertically above p. If a has pointer to circle event in Q, delete this event.
- Split a into a0 and a2. Let a1 be the new arc of p
- Add Vor-edges <q, p> and <p, q> to DECL
- Check <., a0, a1> and <a1, a2, .> for circle events.
HandleCircleEvent(arc a)
- T.delete(a); update breakpts
- Delete all circle events involving a from Q
- Add Vor-vtx a_left and a_right

vscodeInBrower

Posted on 2021-11-12

Just want to a quick note for the cool thing I found we can do in github.

https://www.youtube.com/watch?v=ow1gea4FHMc

Basically, as in this video.

Now we can put

1	vscode.dev/ + `any_githubRepo`

then we can editting code on our browser

Preface

reference

Preface

Preface

Reference

Software Properties

Computer Programming in the Early Days

Clockrate, memory was low back in 70s

Lessons learned from the 70’s and 80’s

Vendor Solution: Multicore

Performance Is No Longer Free

Same code writen in different language, Python, Java and C, for example

Interpreters are versatile, but slow

JIT compilation

Performance of Differnent Order

Loop Order

Hardware Caches

Compiler Optimization

Parallel Loops

Experimenting with Parallel Loops

Parallel i Loop

Parallel j Loop

Parallel i and j loops

Vector Hardware

Compiler Vectorization

Vectorization Flags

Version Implementation

Plus More Optimizations

Preface

Reference

Lecture Link

The Post-Office Problem

TakeAwayFromPvigier’s blog

pseudo-code from Pvigier’s Blog

Diagram’s Data Structure

Event Queue

Beachline

Priority Queues

What is a Priority Queue

What is a Heap?

When and where is a PQ used?

Complexity PQ with binary heap

Self balance Tree (AVL Tree)

pseudo-code from Jacquesheunis’ blog

Voronoi Diagram - Computataional Geometry

Fortune’s Sweep

Handling Events

Parallel `i` Loop

Parallel `j` Loop

Parallel `i` and `j` loops