RhinoInsideRevit_Intro
Preface
Last Saturday, I hosted one weekly session for TokyoAEC. Mainly I talked about how we could develop the RevitAPI tools in Grasshopper. People who found it interested can take a look the link below:
https://github.com/ChenChihYuan/RhinoInsideRevit_ScriptingIntroduction
LineSegmentIntersection_SweepLine
ParametericArchitectureTalkWithRorbertMcNeel
Making_Task_Capable_ComponentsInGH
Preface
I just finished the first lesson of Performance_Engineering_Introduction, and saw these posts from McNeel.
https://developer.rhino3d.com/guides/grasshopper/multi-treaded-components/
https://developer.rhino3d.com/guides/grasshopper/programming-task-capable-component/
I might just leave the link here for a while, till I get time to really implement it by myself.
ReadCSVinGH
Performance Engineering_IntroductionMatrixMultiplication
Preface
There are usually some recommendation videos from MIT OCW on youtube for me. Shameless to say, although I am interested in the courses, I never finished any whole semester. But, Nah. I just want to note down those I clicked and gain some knowledge after all.
Reference
https://www.youtube.com/watch?v=o7h_sYMk_oc&list=PLUl4u3cNGP63VIBQVWguXxZZi0566y7Wf&index=1
Software Properties
What software properties are more important than performace?
There are many things are more important than performance, but it doesn’t mean that performance is not important.
Performance is like some kind of currency(water resource).
Computer Programming in the Early Days
Software performace engineering was common, because machine resoures were limited.
Many programs strained the machine’s resources.
- Programs had to be planned around the machine
- Many program would not “fit” without intense performance engineering.
Clockrate, memory was low back in 70s
Core Speed = (Bus/Base Speed) * (Multiplier)
Lessons learned from the 70’s and 80’s
Premature optimization is the root of all evil. - Donald Knuth
More computing sins are committed in the name of efficiency(without necessarily achieving it) than for any other single reason - including blind stupidity -William Wulf
The First Rule of Program Optimization: Don’t do it. The Second Rule of Program Optimization - For experts only: Don’t do it yet. - Michael Jaskson
Making code readable and Fast
Vendor Solution: Multicore
Clock rate seem stop growing after around 2004.
- To scale performance, processor manufacturers put many processing cores on the microprocessor chip
- Each generation of Moore’s Law potentially doubles the number of cores.
Performance Is No Longer Free
Moore’s Law continues to increase computer performance.
But now that performance looks like big multicore processors with complex cache hierarchies, wide vector units, GPU’s, FGPA’s, etc
Generally, software must be adapted
to utilize this hardware efficiently.
Same code writen in different language, Python, Java and C, for example
Interpreters are versatile, but slow
- The interpreter reads, interprets, and preforms each program statement and updates the machine state.
- Interpreters can easily support high-level programming features - such as dynamic code alteration - at the cost of performance
JIT compilation
- JIT compilers can recover some of the performance lost by interpretation.
- When code is first executed, it is interpreted.
- The runtime system keeps track of how often the various pieces of code are executed.
- Whenever some piece of code executes sufficiently frequently, it gets compiled to machine code in real time.
- Future executions of that code use the more-efficient compiled version.
Performance of Differnent Order
Loop Order
1 | for(int i = 0; i < n; i++){ |
what if we change the order of the loops, let’s say,
1 | for(int k = 0; k < n; k++){ |
Does this cause any performance difference?
cache location
Hardware Caches
Each processor reads and writes main memory in contiguous blocks, called cache lines
.
- Previously accessed cache lines are stored in a small memory, called a cache, that sits near the processor
- Cache hits: accesses to data in cache -> are fast
- Cache misses: accesses to data not in cache -> are slow
We can measure the effect of different access patterns using the Cachegrind
cache simulator
1 | valgrind --tool=cachegrind ./mm |
Compiler Optimization
Clang provides a collection of optimization switches. You can specify a switch to the compiler to ask it to optimize.
Parallel Loops
The cilk_for
loop allows all iterations of the loop to execute in parallel.
Experimenting with Parallel Loops
Parallel i
Loop
1 | clik_for(int i = 0; i < n; ++i){ |
Running time: 3.18s
Parallel j
Loop
1 | for(int i = 0; i < n; ++i){ |
Running time: 531.71s
Parallel i
and j
loops
1 | cilk_for(int i = 0; i < n; ++i){ |
Running time: 10.64s
For some reason, we cannot parallel k to get the correct answer.
Rule of Thumb: Parallelize outer loops rather than inner loops.
Using parallel loops gets us almost 18x speedup on 18 cores! (Disclaimer: Not all code is so easy to parallelize effictively.)
Vector Hardware
Modern microprocessors incorporate vector hardware to process data in single-instruction stream, multiple data stream(SIMD) fashion
Compiler Vectorization
Clang/LLVM uses vector instructions automatically when compiling at optimization level -02
or higher.
Many machines don’t support the newest set of vector instructions, however, so the compiler uses vector instructions conservatively by default.
Vectorization Flags
Programmers can direct the compiler to use modern vector instructions using compiler flags such as the following:
- mavx: Use Intel AVX vector instructions
- mavx2: Use Intel AVX2 vector instructions
Version Implementation
- Python
- Java
- C
- +interchange loops
- +optimization flags
- Parallel loops
- +tilling
- Parallel divide-and-conquer
- +compiler vectorization
- +AVX intrinsics
Plus More Optimizations
We can apply several more insights and performance-engineering trick to make this code run faster, including:
- Preprocessing
- Matrix transposition
- Data alignment
- Memory-managemnet optimizations
- A clever algorithm for the base case that uses AVX intrinsic instructions explicitly
SelectGeometryInSameLayer_GH
Preface
Just as another regular work day, I memo down some new things along the task I am facing. Recently I have been crazy about the Voronoi diagram. Although as bad as I wanted to implement the sweepline algorithm, I found the best practice might be patient and learn thing little by little. Also, I am kinds into “Study with me” series on youtube. I feel clam with someone’s company alongside.
Reference
https://discourse.mcneel.com/t/c-select-objects-by-layer/84420/6
Appreciate the help from username Mahdiyar.
Two things are different
- select object in Rhino DocView
- select object in GH
1 | var ghobjs = new List<object>(); |
VoronoiDiagram_Lectures
Lecture Link
https://www.youtube.com/watch?v=Pn19deUBsM8&t=13s
The Post-Office Problem
https://link.springer.com/chapter/10.1007%2F3-540-10291-4_1
http://blog.ivank.net/voronoi-diagram-in-javascript.html
http://paperjs.org/examples/voronoi/
https://shaneosullivan.wordpress.com/2011/03/22/sweep-line-voronoi-algorithm-in-javascript/
https://jacquesheunis.com/post/fortunes-algorithm/
http://blog.ivank.net/fortunes-algorithm-and-implementation.html
https://github.com/pvigier/FortuneAlgorithm
https://pvigier.github.io/2018/11/18/fortune-algorithm-details.html
TakeAwayFromPvigier’s blog
https://pvigier.github.io/2018/11/18/fortune-algorithm-details.html
pseudo-code from Pvigier’s Blog
1 | add a site event in the event queue for each site |
- site event
- circle event
Diagram’s Data Structure
Double connected edge list(DCEL).
1 | class VoronoiDiagram{ |
The Site
class represents an input point. Each site has an index, which is useful to query them, coordinates and a pointer to their cell(face
):
1 | struct Site{ |
The vertices of the cells are represented by the Vertex
class, they just have a cooridinates field:
1 | struct Vertex{ |
Here is the implementation of half-edges
:
1 | struct HalfEdge{ |
An edge in a Voronoi diagram is shared by two adjacent cells. In the DCEL data structure, we splite these edfes in two half-edges, one for each cell, and they are linked by the twin
pointer.
Moreover, a hald-edge has an origin vertex and a desrination vertex.
The incidenetFace
field points to the face to which the half-edge belongs to.
Finally, in DCEL, cells are implemented as a circular doubly linked list of half-edges where adjacent half-edges are linked together. Thus the prev
and next
fields points to the previous and next half-edges in the cell.
Finally, the Face
class represents a cell and it just contains a pointer to its site and another to one of its half-edges. It does not matter which half-edge it is because a cell is a closed polygon. Thus we can access to all of its hald-edges by traversing the circular linked list.
1 | struct Face{ |
Event Queue
The standard way to implement the event queue is to use a priority queue.
During the processing of site and circle events we may need to remove circle events from the queue because they are not valid anymore.
But most of the standard implementations of priority queues do not allow to remove an element which is not the top one. In particular that is the case for std::priority_queue
.
There are two ways to tackle this problem.
- The first and simplest one is to add a
valid
flag totrue
initially. Then instead of removing the circle event from the queue, we just set its flag tofalse
. Finially, when we process the events in the main loop, if thevalid
flag of an event equals tofalse
, we simply discard it and process the next one. - The second method which I adopted is not to use
std::priority_queue
. Instead, I implemented my own priority queue which supports removal of any element it contains. The implementation of such a queue is pretty straightforward. I choose this method because I find it makes the code for the algorithm clearer.
Beachline
If not implemented correctly there is no guarantee that the algorithm will run in O(log(n)). The key to reach this time complecity is to use a self-balancing tree.
In my implementation the beachline is still a tree, but all nodes represent an arc.
1 | struct Arc{ |
In Fortune’s algorithm we have no key, we only know that we want to insert an arc before or after another in the beachline. To do that, I create methods insertBefore
and insertAfter
:
1 | void Beachline::insertBefore(Arc* x, Arc* y){ |
Priority Queues
https://www.youtube.com/watch?v=wptevk0bshY
What is a Priority Queue
A priority queue is an Abstract Data Type(ADT) that operates similar to a normal queue excpet that each element has a certain priority. The priority of the elements in priority queue determin the order in which elements are removed from the PQ.
NOTE: Priority queues only supports comparable data.
What is a Heap?
A heap is a tree based DS that satifies the heap invariant(also called heap property):
If A is a parent node of B then A is ordered with repect to B for all nodes A, B in the heap.
- Max Heap
- Min Heap
Heaps must be trees.
When and where is a PQ used?
- Used in certain implementaitons of Dijkstra’s shortest Path algorithm.
- Anytime you need the dynamically fetch the
next best
ornext worst
element - Used in Huffman coding (which is often used for lossless data compression).
- Best First Search(BFS) algorithms such as A* use PQs to continuously grab the next most promising node
- Used by minimum Spanning Tree(MST) algorithms
Complexity PQ with binary heap
- Binary Heap construction: O(n)
- Polling: O(log(n))
- Peeking: O(1)
- Adding: O(log(n))
Self balance Tree (AVL Tree)
https://www.youtube.com/watch?v=vRwi_UcZGjU
- Balance
- Rotation
- Implementation
Rebalancing threshold
pseudo-code from Jacquesheunis’ blog
https://jacquesheunis.com/post/fortunes-algorithm/
1 | Fill the event queue with site events for each input site. |
Voronoi Diagram - Computataional Geometry
Fortune’s Sweep
1 | VoronoiDiagram(P are in R2) |
Handling Events
- HandleSiteEvent(point p)
- Search in
T
for the arca
vertically abovep
. Ifa
has pointer to circle event inQ
, delete this event. - Split
a
intoa0
anda2
. Leta1
be the new arc ofp
- Add Vor-edges
<q, p>
and<p, q>
to DECL - Check
<., a0, a1>
and<a1, a2, .>
for circle events.
- Search in
- HandleCircleEvent(arc
a
)- T.delete(a); update breakpts
- Delete all circle events involving
a
fromQ
- Add Vor-vtx
a_left
anda_right
vscodeInBrower
Just want to a quick note for the cool thing I found we can do in github.
https://www.youtube.com/watch?v=ow1gea4FHMc
Basically, as in this video.
Now we can put
1 | vscode.dev/ + `any_githubRepo` |
then we can editting code on our browser