|講者：戴資力 Gordon Tai / VMFive / Data Engineer
地點： 4F – 國際會議廳
講題：Building a Modern Data Architecture for Playable Ads Data at VMFive
VMFive has a story to tell about why and how we shifted to a modern data architecture for our ever-growing data. At VMFive, our primary source of data comes from AdPlay, the first product from VMFive’s proprietary virtualization technology that allow mobile advertisements to be playable and interactive. This source is a stream of discrete, event-based timestamped data incoming at extreme peak rates and collected from AdPlay users all around the globe. We soon recognized that our analysis requirements goes far beyond what traditional databases can offer. We needed to query continuous user-experience and product quality against these incoming discrete events at real-time, while also leveraging accumulated data for an aggregated timely view and predictive model building. These needs have driven the data team at VMFive to take on a journey to rebuild our data infrastructure. This journey includes transformations from traditional databases to an on-premises cluster solution based on technologies such as Kafka, Storm, Spark, and Redis, as well as another recent transformation to an AWS-based total cloud solution. Entire data infrastructure transformations are never straightforward. We have accumulated many “do’s” and “don'ts” along the way that we hope the audience can take away to make this progress easier. This talk will walk through the key decisions, observations, difficulties and results that we believe will provide useful references for developers who are also interested in transforming to a modern architecture from scratch for their organization’s data.
Tzu-Li Tai (Gordon Tai) is an open-source software enthusiast who has recently graduated after two years of dedicated research on architectures of cluster computing systems, mainly focused on Apache Hadoop and Apache Spark. Currently, he works as a data engineer at VMFive, building organization-wide processing systems to handle mobile industry data at scale.
- Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
- HadoopCon 2015 議程 (第二版)