September 6, 2019
We would like to inform you that this week we will perform maintenance on one of our partner websites, cLHy. We will perform maintenance to improve one of the core cAPIs in cLHy. We have also been conducting research in the implementation of a core cAPI for Resilient Distributed Datasets (RDD) on cLHy.
You may have heard or learned about RDD. At the core, the RDD is an immutable distributed collection of records, elements of data, or objects, or called datasets. The RDD's datasets are then logically partitioned across different nodes in the cluster. These nodes can be computed and operated in parallel. We will use a low-level cLHy API (cAPI) for the RDD.
The cAPI for RDD can contain any type of Hyang's objects, including dinamically user-defined classes, that might offer transformations and actions. RDDs can be created through deterministic operations on either data on stable storage or other RDDs; either by parallelizing an existing datasets, or by referencing a dataset in an external storage system, such as a shared file system HDFS (Hadoop Distributed File System).
The cLHy's maintenance will address a support for either persisting RDDs on disk or replicated across multiple nodes.
By default, each RDD may be recomputed each time whenever any other datasets run a transformation or an action on it. However, we may also persist and cache an RDD in memory, in which case cLHy will cache the elements around on the cluster for a better execution time and much faster access, whenever next time another dataset queries it again.
The illustration above shows an example of interactive operations of cLHy's RDD in one time processing. For a better execution time, any datasets can be cached in memory if different calls or queries are run on it repeatedly.
In connection with this maintenance activities, there is no downtime but internet connection might be slow for a few seconds. So we apologize for any inconvenience.