Carpe Libertatem Mac OS

broken image


Carpe Libertatem Mac OS

This blog reports on how Dropbox uses C++ for cross-platform iOS and Android development.

All major platforms and operating systems support C++, including server, desktop, embedded, and mobile platforms, and the *nix (including Android), Windows, Mac OS, and iOS OSes, among others. Developers can use C++ to create services with an API for Service-oriented Architecture systems and these services can be compiled and run on virtually any mainstream client and server platform. This approach has several advantages:

  1. Write most or all non-UI logic in C++ just once and deploy on multiple platforms.
    1. Use Qt (see 3.A) to write cross-platform UIs as well.
  2. There is a very large code base of C++ libraries (sample listing), many of them open-source, that implement a wide range of functionalities.
    1. Full, cross-platform support for standard I/O interfaces, such as file systems (Boost Filesystem), USB (libusb), sockets (Boost asio), REST (C++ REST SDK), TCP, SSL, UDP, HTTP, JSON, XML, STUN, SDP, and SocketIO (LibSourcey), parallel-processing (Boost MPI and Open MPI), interprocess communication (Boost Interprocess), and threading (Boost Thread).
      1. Note: the Boost Filesystem library reads and writes to file systems using an OS-agnostic hierarchy, so you need only write code to navigate directory structures once.
    2. The Boost libraries
    3. The Standard Template Library
    4. Library support for all major databases
    5. Cryptography, such as OpenSSL and Crypto++
    6. Most C libraries
  3. Many technologies beyond platforms and OSes are designed to interact with C++. For example,
    1. The Qt Project includes a C++ IDE and a full set of libraries for cross-platform development including UIs, cloud services, and a Webkit scripting language (QML). QML runs in an app's built-in browser and links directly to C++ code. The platforms supported include:
      1. Windows, Linux/X11, Mac OS X desktop platforms,
      2. Embedded Android, Embedded Linux, Windows Embedded (Compact and Standard) embedded platforms,
      3. QNX, VxWorks, and INTEGRITY Real-Time Operating Systems,
      4. Android and iOS mobile platforms,
      5. the BlackBerry 10 and Sailfish OS platforms also support Qt, and
      6. there is work in progress for WinRT (including Windows Phone) and Tizen support Qt.
    2. The Boost libraries contain a Python library that allows C++ code and Python to interoperate seamlessly.
    3. The vast majority of scripting and compiled languages allow the calling of an executable, such as one written in C++.
  4. There is a rich assortment of free, sophisticated C++ compilers, IDEs, and tools.
  5. C++ gives developers complete control over memory management, unlike Java. Objective C offers this, but it requires extra effort, and is not highly portable outside of Apple systems.
  6. If designing a custom client-server interaction, C++ interface code can be written for both sides, which has the benefit of using shared code and shared assumptions.

In the last 4 months I've been working on how to implement a good hash tablefor OPIC (Object Persistence in C). During the development, I madea lot of experiments. Not only for getting better performance, but also knowingdeeper on what's happening inside the hash table. Many of these findings arevery surprising and inspiring. Since my project is getting mature, I'd geta pause and start writing a hash table deep dive series. There was a lot offun while discovering these properties. Hope you enjoy it as I do.

Same disclaimer. I now work at google, and this project(OPIC including the hash table implementation) is approved by googleInvention Assignment Review Committee as my personalproject. The work is done only in my spare time on my own machine,and does not use and/or reference any of the google internal resources.

Welcome to the Channel!!Please share your thoughts and comments, I always enjoy feedback.If you are new please consider SUBSCRIBING!!!You can support this ch. Carpe Diem Gaming Show Us Some Love By Subscribing To Our Youtube Channel. OS X Daily – News and Tips for Mac, iPhone, iPad, and Everything Apple January 18, 2021 Leave a comment News, tips, software, reviews, and more for Mac OS X, iPhone, iPad. Visual Novels 29641 Tags 2629 Releases 75252 Producers 11173 Staff 22551 Characters 93566 Traits 2854.

Background

Hash table is one of the most commonly used data structure. Most standardlibrary use chaining hash table, but there are more options inthe wild. In contrast to chaining, open addressing doesnot create a linked list on bucket with collision, it insert the itemto other bucket instead. By inserting the item to nearby bucket, openaddressing gains better cache locality and is proven to be faster in manybenchmarks. The action of searching through candidate buckets for insertion,look up, or deletion is known as probing. There are many probing strategies:linear probing, quadratic probing, double hashing, robinhood hasing, hopscotch hashing, and cuckoo hashing.Our first post is to examine and analyze the probe distribution among thesestrategies.

To write a good open addressing table, there are several factors to consider:1. load: load is the number of bucket occupied over the bucket capacity. The higher the load, the better the memory utilization is. However, higher load also means the probability to have collision is higher.2. probe numbers: the number of probes is the number of look up to reach the desired items. Regardless of cache efficiency, the lower the total probe count, the better the performance is.3. CPU cache hit and page fault: we can count both the cache hit and pagefault analytically and from cpu counters. I'll write such analysis in laterpost.

Linear probing, quadratic probing, and double hashing

Carpe Libertatem Mac Os X

Linear probing can be represented as a hash function of a key and aprobe number $h(k, i) = (h(k) + i) mod N$. Similarly, quadraticprobing is usually written as $h(k, i) = (h(k) + i^2) mod N$. Doublehashing is defined as $h(k, i) = (h1(k) + i cdot h2(k)) mod N$.

Quadratic probing is used by dense hash map. In my knowledgethis is the fastest hash map with wide adoption. Dense hash map setthe default maximum load to be 50%. Its table capacity is boundedto power of 2. Given a table size $2^n$, insert items $2^{n-1} + 1$,you can trigger a table expansion, and now the load is 25%. We canclaim that if user only insert and query items, the table load isalways within 25% and 50% (the table may need to expand at least once).

I implemented a generic hash table to simulate dense hashmap probing behaviors. Its performance is identical to dense hashmap. The major difference is I allow non power of 2 table size, seemy previous post for why the performance does not degrade.

I setup the test with 1M inserted items. Each test differs in its load(by adjusting the capacity) and probing strategies.Although hash table is O(1) on amortized look up, we'll still hope theworst case not larger than O(log(N)), which is log(1M) = 20 in this case.Let's first look at linear probing, quadraticprobing and double hashing under 30%, 40%, and 50% load.

This is a histogram of probe counts. The Y axis is log scale. One cansee that other than linear probing, most probes are below 15. Doublehashing gives us smallest probe counts, however each of the probe hashigh probability trigger a cpu cache miss, therefore is slower inpractice. Next, we look at these methods under high load.

Mac

This blog reports on how Dropbox uses C++ for cross-platform iOS and Android development.

All major platforms and operating systems support C++, including server, desktop, embedded, and mobile platforms, and the *nix (including Android), Windows, Mac OS, and iOS OSes, among others. Developers can use C++ to create services with an API for Service-oriented Architecture systems and these services can be compiled and run on virtually any mainstream client and server platform. This approach has several advantages:

  1. Write most or all non-UI logic in C++ just once and deploy on multiple platforms.
    1. Use Qt (see 3.A) to write cross-platform UIs as well.
  2. There is a very large code base of C++ libraries (sample listing), many of them open-source, that implement a wide range of functionalities.
    1. Full, cross-platform support for standard I/O interfaces, such as file systems (Boost Filesystem), USB (libusb), sockets (Boost asio), REST (C++ REST SDK), TCP, SSL, UDP, HTTP, JSON, XML, STUN, SDP, and SocketIO (LibSourcey), parallel-processing (Boost MPI and Open MPI), interprocess communication (Boost Interprocess), and threading (Boost Thread).
      1. Note: the Boost Filesystem library reads and writes to file systems using an OS-agnostic hierarchy, so you need only write code to navigate directory structures once.
    2. The Boost libraries
    3. The Standard Template Library
    4. Library support for all major databases
    5. Cryptography, such as OpenSSL and Crypto++
    6. Most C libraries
  3. Many technologies beyond platforms and OSes are designed to interact with C++. For example,
    1. The Qt Project includes a C++ IDE and a full set of libraries for cross-platform development including UIs, cloud services, and a Webkit scripting language (QML). QML runs in an app's built-in browser and links directly to C++ code. The platforms supported include:
      1. Windows, Linux/X11, Mac OS X desktop platforms,
      2. Embedded Android, Embedded Linux, Windows Embedded (Compact and Standard) embedded platforms,
      3. QNX, VxWorks, and INTEGRITY Real-Time Operating Systems,
      4. Android and iOS mobile platforms,
      5. the BlackBerry 10 and Sailfish OS platforms also support Qt, and
      6. there is work in progress for WinRT (including Windows Phone) and Tizen support Qt.
    2. The Boost libraries contain a Python library that allows C++ code and Python to interoperate seamlessly.
    3. The vast majority of scripting and compiled languages allow the calling of an executable, such as one written in C++.
  4. There is a rich assortment of free, sophisticated C++ compilers, IDEs, and tools.
  5. C++ gives developers complete control over memory management, unlike Java. Objective C offers this, but it requires extra effort, and is not highly portable outside of Apple systems.
  6. If designing a custom client-server interaction, C++ interface code can be written for both sides, which has the benefit of using shared code and shared assumptions.

In the last 4 months I've been working on how to implement a good hash tablefor OPIC (Object Persistence in C). During the development, I madea lot of experiments. Not only for getting better performance, but also knowingdeeper on what's happening inside the hash table. Many of these findings arevery surprising and inspiring. Since my project is getting mature, I'd geta pause and start writing a hash table deep dive series. There was a lot offun while discovering these properties. Hope you enjoy it as I do.

Same disclaimer. I now work at google, and this project(OPIC including the hash table implementation) is approved by googleInvention Assignment Review Committee as my personalproject. The work is done only in my spare time on my own machine,and does not use and/or reference any of the google internal resources.

Welcome to the Channel!!Please share your thoughts and comments, I always enjoy feedback.If you are new please consider SUBSCRIBING!!!You can support this ch. Carpe Diem Gaming Show Us Some Love By Subscribing To Our Youtube Channel. OS X Daily – News and Tips for Mac, iPhone, iPad, and Everything Apple January 18, 2021 Leave a comment News, tips, software, reviews, and more for Mac OS X, iPhone, iPad. Visual Novels 29641 Tags 2629 Releases 75252 Producers 11173 Staff 22551 Characters 93566 Traits 2854.

Background

Hash table is one of the most commonly used data structure. Most standardlibrary use chaining hash table, but there are more options inthe wild. In contrast to chaining, open addressing doesnot create a linked list on bucket with collision, it insert the itemto other bucket instead. By inserting the item to nearby bucket, openaddressing gains better cache locality and is proven to be faster in manybenchmarks. The action of searching through candidate buckets for insertion,look up, or deletion is known as probing. There are many probing strategies:linear probing, quadratic probing, double hashing, robinhood hasing, hopscotch hashing, and cuckoo hashing.Our first post is to examine and analyze the probe distribution among thesestrategies.

To write a good open addressing table, there are several factors to consider:1. load: load is the number of bucket occupied over the bucket capacity. The higher the load, the better the memory utilization is. However, higher load also means the probability to have collision is higher.2. probe numbers: the number of probes is the number of look up to reach the desired items. Regardless of cache efficiency, the lower the total probe count, the better the performance is.3. CPU cache hit and page fault: we can count both the cache hit and pagefault analytically and from cpu counters. I'll write such analysis in laterpost.

Linear probing, quadratic probing, and double hashing

Carpe Libertatem Mac Os X

Linear probing can be represented as a hash function of a key and aprobe number $h(k, i) = (h(k) + i) mod N$. Similarly, quadraticprobing is usually written as $h(k, i) = (h(k) + i^2) mod N$. Doublehashing is defined as $h(k, i) = (h1(k) + i cdot h2(k)) mod N$.

Quadratic probing is used by dense hash map. In my knowledgethis is the fastest hash map with wide adoption. Dense hash map setthe default maximum load to be 50%. Its table capacity is boundedto power of 2. Given a table size $2^n$, insert items $2^{n-1} + 1$,you can trigger a table expansion, and now the load is 25%. We canclaim that if user only insert and query items, the table load isalways within 25% and 50% (the table may need to expand at least once).

I implemented a generic hash table to simulate dense hashmap probing behaviors. Its performance is identical to dense hashmap. The major difference is I allow non power of 2 table size, seemy previous post for why the performance does not degrade.

I setup the test with 1M inserted items. Each test differs in its load(by adjusting the capacity) and probing strategies.Although hash table is O(1) on amortized look up, we'll still hope theworst case not larger than O(log(N)), which is log(1M) = 20 in this case.Let's first look at linear probing, quadraticprobing and double hashing under 30%, 40%, and 50% load.

This is a histogram of probe counts. The Y axis is log scale. One cansee that other than linear probing, most probes are below 15. Doublehashing gives us smallest probe counts, however each of the probe hashigh probability trigger a cpu cache miss, therefore is slower inpractice. Next, we look at these methods under high load.

The probe distribution now have a very high variance. Obviously, manyprobes exceeds the 20 threshold, some even reach 800.Linear probing, among the other methods, has very bad variance underhigh load. Quadratic probing is slightly better, but still have someprobes higher than 100. Double hashing still gives the best probestatistics. Below is the zoom in for each probe strategies:

Kega Fusion Emulator Home Page

Robin Hood Hashing for the rescue

The robin hood hashing heuristic is simple and clever. Whena collision occur, compare the two items' probing count, the onewith larger probing number stays and the other continue to probe.Repeat until the probing item finds an empty spot. For more detailedanalysis checkout the original paper.Using this heuristic, we can reduce the variance dramatically.

The linear probing now have the worst case not larger than 50,quadratic probing has the worst case not larger than 10, anddouble hashing has the worst case not larger than 5! Althoughrobin hood hashing adds some extra cost on insert and deletion,but if your table is read heavy, it's really suitable for the job.

Dive deep and ask why

Carpe Libertatem Mac Os Catalina

From engineering perspective, the statistics are sufficient to makedesign decisions and move on to next steps (though, hopscotch andcuckoo hashing was not tested). That what I did 3 months ago. However,I could never stop asking why. How to explain the differences? Canwe model the distribution mathematically?

The analysis on linear probing can trace back to 1963 by Donald Knuth.(It was an unpublished memo dated July 22, 1963. With annotation 'Myfirst analysis of an algorithm, originally done during Summer 1962 inMadison'). Later on the paper worth to read are:

Unfortunately, these research are super hard. Just linear probing (and itsrobin hood variant) is very challenging. Due to my poor survey ability, Iyet to find a good reference to explain what causes linear probing, quadraticprobing and double hashing differ on the probe distribution. Though buildinga full distribution model is hard, but creating a simpler one to convince myselfturns out is not too hard.

Rich get richer

The main reason why linear probing (and probably quadratic probing) gets highprobe counts is rich get richer: if you have a big chunk of elements, theyare more likely to get hit; when they get hit, the size of the chunk grows,and it just get worse.

Let's look at a simplified case. Say the hash table only have 5 items, and allthe items are in one consecutive block. What is the expected probing number forthe next inserted item?

See the linear probing example above. If the element get inserted to bucket 1,it has to probe for 5 times to reach the first empty bucket. (Here we start theprobe sequence from index 0; probe number = 0 means you inserted to an emptyspot without collision). The expectation probing number for next inserted itemis

Win if you can mac os. For quadratic probing, you'll have to look at each of the item and trackwhere it first probe outside of the block.

The expected probe number for next item in quadratic probing is$frac{3+2+2+2+1}{N} = frac{10}{N}$. Double hashing is the easiest:$1cdotfrac{5}{N}+2cdot(frac{5}{N})^2+3cdot(frac{5}{N})^3+cdots$If we only look at the first order (because N » 5), then we cansimplify it to $frac{5}{N}$. A christmas poem mac os.

  • Linear probing: $frac{15}{N}$
  • Quadratic probing: $frac{10}{N}$
  • Double hashing: $sum_{i=1} icdot(frac{5}{N})^i$

The expected probe number of next item shows that linear probing isworse than other method, but not by too far. Next, let's look atwhat is the probability for the block to grow.

To calculate the probability of the block to grow on next insert, wehave to account the two buckets which connected to the block. For linearprobing, the probability is $frac{5+2}{N}$. For quadratic probing, weadd the connected block, but we also have to remove the buckets whichwould jump out during the probe. For double hashing, the probabilityto grow the block has little to do with the size of the block, becauseyou only need to care the case where it inserted to the 2 connectedbuckets.

  • Linear probing: $frac{7}{N}$
  • Quadratic probing: $frac{4}{N}$
  • Double hashing: $frac{2}{N}cdotsum_{i=0}(frac{5}{N})^i =frac{2}{N}cdotfrac{N}{N-5} = frac{2}{N-5}$

Using the same calculation, but making the block size as a variable,we can now visualize the block growth of linear probing, quadraticprobing, and double hashing.

This is not a very formal analysis. However, it gives us a sense of whythe rate of linear probing getting worse is way larger than the others.Not only knowing which one is better than the other, but also knowinghow much their differences are.

How about the robin hood variant of these three probing methods?Unfortunately, I wasn't able to build a good model that can explainthe differences. A formal analysis on robin hood hashing using linearprobing were developed by Viola. I yet to find a good analysisfor applying robin hood on other probing method. If you find it, pleaseleave a comment!

'Female Torture Poetry: Petrarchan Love And Carpe Diem' By ..

Conclusion

Writing a (chaining) hash table to pass an interview is trivial, but writinga good one turns out to be very hard. The key for writing high performancesoftware, is stop guessing.

Kega Fusion Page On Sega Retro Wiki

Measure, measure, and measure. Program elapsed time is just one of thesample point, and can be biased by many things. To understand theprogram runtime performance, we need to further look at programinternal statistics (like probe distribution in this article), cpucache misses, memory usage, page fault count, etc. Capture theinformation, and analyze it scientifically. This is the only way topush the program to its limit.

This my first article of 'Learn hash table the hard way' series. Inthe following post I'll present more angles on examining hash table performance.Hope you enjoy it!





broken image