HDE BLOG

コードもサーバも、雲の上

38th Monthly Technical Session (MTS) Report

38th Monthly Technical Session (MTS) was held on September 22nd, 2017. MTS is a knowledge sharing event, in which HDE members present some topics and have QA sessions, both in English.

The moderator of the 38th MTS was Kevin-san.

f:id:bagus-rahman:20171004190457j:plain

The first topic was "Introduction to AWS SAM" by Bagus. AWS SAM stands for AWS Serverless Application Model. As the name implies, it is a model used to define serverless applications on AWS. Serverless applications are applications composed of functions triggered by events.

AWS SAM is based on AWS CloudFormation, which is a service that allows users to manage AWS resources. AWS CloudFormation uses templates as blueprints for creating AWS resources. It manages related AWS resources as a single unit called a stack. With AWS SAM, a serverless application is defined in an AWS CloudFormation template and deployed as an AWS CloudFormation stack.

AWS SAM builds upon AWS CloudFormation to provide simpler ways to create AWS resources related to serverless applications. It provides new resource types, event source types, and property types. Being based on AWS CloudFormation should make AWS SAM work well with any serverless application frameworks that support AWS CloudFormation. Another advantage of using AWS SAM is AWS SAM Local, an AWS CLI tool for managing serverless applications written with AWS SAM. One of its main features is the ability to test AWS Lambda functions locally.

f:id:bagus-rahman:20171004190541j:plain

The second topic was "Elasticsearch: You Know, for Search" by Bumi-san. Elasticsearch is a distributed, RESTful search and analytics engine. It can be used for a broad variety of use cases, from simple keyword search to log aggregation and geolocation queries.

Elasticsearch is highly scalable, highly available, and is an all-in-one toolbox. It runs well both on a laptop and on a cluster of hundreds of servers handling petabytes of data. According to Bumi-san, in order to achieve desirable availability, it provides automatic recovery and data replication. It also comes with features such as aggregations, suggestions, and on the latest version: machine learning.

Elasticsearch provides full text search, and this feature is built on solid text analysis capabilities. Elasticsearch has no shortage of text analysis tools, such as analyzers, tokenizers, stemmers, and more. It also handles stopwords, synonyms, and misspellings.

f:id:bagus-rahman:20171004190601j:plain

The third topic was "PyCon APAC 2017" by Doi-san and Yuri-san. The event had two keynotes. The first keynote was about Python's impact on the business world. The second keynote was about Python community. It focused on several aspects, one of which is teaching people to program.

In total, there were 29 sessions from 27 speakers. A good number of these sessions are about topics that are trending recently, such as artificial intelligence, machine learning, data analytics, and big data. There were also several sessions about Python community. Based on these sessions, Doi-san concluded that Python 3 is still underutilized.

HDE was a proud sponsor of the event. We set up a booth and interacted with the attendees. Most of the nearly 200 attendees of the event are on jobs. Interestingly, almost half of the attendees came from overseas.

f:id:bagus-rahman:20171004190610j:plain

The fourth topic was "Introduction to Landscape" by Kusumoto-san. Landscape is a management tool to deploy, monitor, and manage Ubuntu servers. Landscape is available both on-premise and as a software-as-a-service. Landscape On-premises is free (for up to 10 machines), while Landscape SaaS is a paid service.

Landscape provides quite a lot of features, such as systems management, monitoring, security and compliance maintenances, inventory control, and package repository management. Kusumoto-san explained how to install Landscape client and demonstrated each of the aforementioned features.

f:id:bagus-rahman:20171004190638j:plain

The fifth topic was "Join the Dark Side (What Is Metaprogramming)" by Stefan-san. He is one of our Global Internship Program (GIP) participants. Metaprogramming is a programming technique in which computer programs have the ability to treat programs as their data. Consequently, a program can be designed to generate, read, or transform other programs. Metaprogramming can even allow a program to modify itself during runtime.

Metaprogramming is done in different ways across multiple programming languages. Stefan-san explained how metaprogramming is done in Python, Java, Go, and Ruby. He had actually used metaprogramming before, in Ruby. He mentioned some use cases of metaprogramming, one of which is 'redirecting' non-existent functions to existing ones.

f:id:bagus-rahman:20171004190646j:plain

The sixth topic was "Why I Love React" by Elvan-san. He is also one of our GIP participants. React is a JavaScript library for building user interfaces. At a glance, React is the view part of web applications, encourages declarative user interfaces, and drives component-based development. React has some unique aspects, such as components, JSX, and virtual DOM.

Elvan-san explained the aspects of React he likes the most. Moving between projects is easy due to common React concepts. There are apparently a lot of React components available. React also has hot reload. But perhaps more than all of these, he likes that React enables maintainability and scalability. In other words, making changes is straightforward and adding features is simple due to straightforward architecture.

f:id:bagus-rahman:20171004190706j:plain

The seventh topic was "Startup Incubators in USA: Chasing the American Dream!" by Stephen-san. He is also one of our GIP participants. Startup incubators are companies that help startups by providing services such as management training or office space. Through incubators, startups gains 'seed' funding, advice, mentorship, networking opportunities, strong community, and friendships, among other things.

Stephen-san told the story of his participation in a startup incubator program in his university. Non-students can participate as well, and in total there were 9 startups. The program was 10 weeks long, during which they have activities such as visiting venture capitals and investors. He also introduced his startup from the program, including the members and the product.

f:id:bagus-rahman:20171004190717j:plain

As usual, we had a party afterwards :)

37th Monthly Technical Session (MTS) Report

37th Monthly Technical Session (MTS) was held on August 25th, 2017. MTS is a knowledge sharing event, in which HDE members present some topics and have QA sessions, both in English.

f:id:bagus-rahman:20170905192040j:plain

The moderator of the 37th MTS was David-san.

f:id:bagus-rahman:20170905190900j:plain

The first topic was "AWS Service Update Summary 2017 Q2+ (April - August)" by Mitsuharu Hamba-san from AWS. He began by sharing information on new regions. A new AWS Region in Paris will be opened in 2017. Another new AWS Region in Stockholm will be opened in 2018. AWS will also open a new region in Osaka in 2018. According to Hamba-san, Osaka will be a local region. It is assumed to be used in combination with Tokyo region.

Some existing services had also been made available for Tokyo region in the last 4 months. First is Amazon EC2 P2 instances, which are ideal for compute-intensive applications that require high-performance GPU coprocessors and massive parallel floating point performance. Next is Amazon Lightsail, which helps the launch and management of virtual private servers. In the case of AWS X-Ray, it had moved from preview to general availability.

There are so many updates to existing services. Detailed, well-written information on them are readily available at the AWS Blog.

f:id:bagus-rahman:20170905191236j:plain

The second topic was an explanation of the data collection flow of our company's data warehouse project by Kogure-san. This project that he has been working on is very important in enabling data visualization and environment analysis. This project consists of several steps, but in this session he focused on explaining the first one, which is building data collection flow.

This data warehouse project collects HDE One services data. Each HDE One service has their own data collection flow. Kogure-san explained all of the completed ones. He showed the architectures and reported the number of records of each data collection flow. Now that Amazon Kinesis Firehose is available in the Tokyo region, he would like to utilize it to improve his current designs.

f:id:bagus-rahman:20170905191338j:plain

The third topic was "How to Find DMARC Failure" by Okubo-san. Okubo-san has been teaching us about DMARC for some time now, presenting topics about it on the 26th and 27th MTS. He also wrote an article about it on this blog. This time, he explained a project he's working on which monitors and visualizes DMARC reports.

Brief review: DMARC is an email authentication, policy, and reporting protocol. Its authentication is based on SPF and DKIM. ISPs provide DMARC reports for email senders. If a DMARC record includes rua parameter, then DMARC reports can be received via email. However, DMARC reports are XML files, so they are not exactly human-readable. Okubo-san's project summarizes and presents DMARC reports so that they are easier to understand. Some of its features include identifying DKIM and SPF results and showing WHOIS information.

f:id:bagus-rahman:20170905191409j:plain

The fourth topic was "Let's Warm up IP Addresses" by Matsuura-san. IP reputation is very important for message transfer agents (MTAs). Sending email, especially lots of them, from a brand new IP address is not advantageous. The spike of volume of emails sent from an IP address may result in it getting a bad reputation. Emails sent from an IP address with bad reputation will likely fail to reach their destinations.

One way to solve this is by warming up IP addresses, gradually increasing the volume of emails sent from IP addresses according to a predetermined schedule. Matsuura-san explained some problems of warming up IP addresses to make them work with Office 365. The first problem is receiving limits, which is the number of emails that can be received per hour. The limits are considerably less than his target, so he utilized shared mailboxes to receive more emails. The second problem is the opacity of Office 365's IP throttling methods. This is solved by warming IP addresses for several weeks, and stop increasing volume of emails sent for a few days whenever an IP address was throttled.

f:id:bagus-rahman:20170905191443j:plain

The fifth topic was "3 Interesting Facts about Filipinos" by Furukawa-san. She had been learning English in Cebu, Philippines, for almost 4 months. She told us some facts that are interesting to her. First, Filipinos seem to love karaoke so much. She saw people singing while walking on the streets and restaurant and shop staffs singing while working. There seem to be many karaoke shops and machines in the Philippines, as she showed some pictures of them.

Second, Filipinos seem to love to take selfies. When Furukawa-san went to tourist attractions with her Filipino teachers, they spent 1 to 2 hours taking selfies. TIME magazine awarded Makati City, Philippines, as the Selfie Capital of the World.

Third, Filipinos celebrate Christmas for 4 months. Christmas season starts from September to December, the so-called 'Ber Months'. Most Filipinos are Christians, and Filipinos seem to love celebrations. When September comes, Filipinos decorate their houses and shops, and Christmas songs can be heard in many places.

f:id:bagus-rahman:20170905191543j:plain

The sixth topic was "Auditing with Lighthouse" by Kevin-san. Lighthouse is an open-source, automated tool for improving the quality of web pages. It analyzes web apps and web pages, collecting modern performance metrics and insights on developer best practices. It has been a part of Chrome DevTools since Chrome 60. So perhaps the simplest way to check it out is to update to Chrome 60 and click on Chrome DevTools' Audits toolbar.

Lighthouse scores 4 categories: Progressive Web App, Performance, Accessibility, and Best Practices. Progressive Web App audit checks whether a site or app is interactive online. Performance audit refreshes a site or app with the new 'Slow 3G' network throttle. Accessibility audit checks for ARIA roles, uses aXe. Best practices audit checks for manifest.json files, checks for passive event listeners, etc.

f:id:bagus-rahman:20170905191636j:plain

The seventh topic was "EuroPython 2017" by Jonas-san. EuroPython 2017 was held from 9th until 16th July, 2017, in Rimini, Italy. It was organized by the European Python Society with the help of 25 sponsors, attended by more than 1000 people. In Jonas' opinion, when compared to PyCon US, EuroPython is a lot smaller, a lot less commercial, a lot less professional, but a lot more enjoyable.

Some of the talks that Jonas found interesting includes "A Python for Future Generations" by Armin Ronacher; "Making Games with Python: Mission Impossible?" by Tomislav Uzelac, Martin Christen, and Roberto De Ioris; "Fighting the Controls: Tragedy and Madness for Programmers and Pilots" by Daniele Procida; "PyPy Meets Python 3 and NumPy" by Armin Rigo; and "The Encounter: Python's Adventures in Africa" by Daniele Procida and Aisha Bello. Jonas himself had a talk, titled "Why You Might Want to Go Async". Besides talks, EuroPython also had social events, sprints (gathering in a room developing anything), and a hallway track.

f:id:bagus-rahman:20170905191732j:plain

As usual, we had a party afterwards :)

f:id:bagus-rahman:20170905192102j:plain

Attending de:code 2017 in Tokyo / de:code 2017 参加レポート

After a big conference Microsoft Build 2017 at Seattle, Washington in US in the beginning of May (my colleagues had written a report of Build), Microsoft also held another conference, de:code 2017, at The Prince Park Tower Tokyo in Japan for 2 days from May 23 to May 24, 2017. I would like to share my experience of attending it.

去る5月上旬にシアトルにて行われたMicrosoft Build 2017 (Buildについては同僚が参加レポートを書いてくれています)から約2週間後、ザ・プリンス パークタワー東京にてde:code 2017が開催されました。今回これに参加してきたので、体験して来たことを参加レポートとして共有したいと思います。

f:id:doi-t:20170523084145j:plain

de:code is an annual conference held by Microsoft, which is aimed for developers same as Build. However, unlike Build, this conference is more focused on Japan’s market and Japanese developer communities. This focus is reflected in its contents. So I would like to focus on the difference between Build and de:code in this report. If you want to know more details about de:code, videos are now available.

de:codeはBuildと同じくマイクロソフト公式の開発者向けカンファレンスではあるのですが、より日本の市場や開発者コミュニティの志向を反映したようなコンテンツでした。最大8トラック並行で様々なセッションがあり、詳細はde:codeの公式ブログにて確認できます。また、Channel 9にて動画も公開されています。

続きを読む

36th Monthly Technical Session (MTS) Report

36th Monthly Technical Session (MTS) was held on July 21st, 2017. MTS is a knowledge sharing event, in which HDE members present some topics and have QA sessions, both in English.

f:id:bagus-rahman:20170816172330j:plain

The moderator of the 36th MTS was Iskandar-san.

f:id:bagus-rahman:20170816104046j:plain

The first topic was "Machine Learning: Intuition" by Nutt-san. He mainly focused on supervised learning. There are two phases of supervised learning, training and testing. Given input-output pairs, a good mapping from input to output is identified in the training phase. This mapping is used to predict new inputs in the testing phase. A predictor should have the smallest error possible on test data (not training data).

Nutt-san also emphasized that supervised learning works on the base of correlation, not causation. A predictor correlates input to output without knowing about causation, so we have to select input features carefully.

Nutt-san also explained the difference between deductive reasoning and inductive reasoning. To put it simply, in deductive reasoning, a conclusion is reached by applying general rules. On the other hand, in inductive reasoning, a conclusion is reached by extrapolating specific cases. Deductive reasoning is always correct, while inductive reasoning is not always correct. Machine learning is a kind of inductive reasoning. In relation to this, he reminded us that no algorithm works best for all supervised learning problem.

f:id:bagus-rahman:20170816102914j:plain

The second topic was "Spurious" by Fukutomi-san. Threads are utilized quite extensively in a project he was working on. Threads are usually executed concurrently and share resources. Sometimes, multiple threads accessing the same resources is not preferable due to concurrency issues.

In Java, one way to solve this is to synchronize threads. Another way is to utilize guarded blocks, which involves methods such as wait() and notify(). Unfortunately, there was a problem when Fukutomi-san was working with guarded blocks. It turned out that a thread can also wake up without being notified, interrupted, or timing out. This is called a spurious wakeup. He worked around this limitation by utilizing a true_wakeup flag.

f:id:bagus-rahman:20170816103032j:plain

The third topic was an explanation of a new component of an HDE service by Ogawa-san. He began by explaining the role of the new component in the HDE service. Then, he explained the technologies involved in the development of the new component. He developed the component using C++14 and Windows API, and he developed the installer program using C# 7 and .NET Framework 4.6.

Ogawa-san had to use C++ due to the component's relationship with Windows' Local Security Authority Subsystem Service (LSASS). High-level features can not be used in core operating system processes such as LSASS. In his opinion, reporting events to the Event Viewer from C++ code is not ideal. He also explained his approaches to unit test and continuous integration.

f:id:bagus-rahman:20170816103123j:plain

The fourth topic was "Security Assessment with Amazon Inspector" by Jeffrey-san. Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS.

Jeffrey-san explained how to use the service. First, Amazon Inspector AWS Agents are installed in the target Amazon EC2 instances. Second, Amazon Inspector Assessment Targets, which are collections of EC2 instances to be scanned, are defined. Third, Amazon Inspector Assessment Templates, which defines the standardized tests to be applied on the assessment targets, are created. Finally, the assessment is run.

Amazon Inspector handles the analysis and even generates its reports. Alternatively, findings from Amazon Inspector can also be retrieved via APIs. This allows users to generate and format their own summarized or detailed reports.

Some pros to using Amazon Inspector are AWS nativity, low cost (30 cents per AWS Agent per assessment), and it is a good option for analysis of infrastructure vulnerabilities. Some cons to using Amazon Inspector are the limitation to EC2 instances and some benchmarks only work for certain operating systems.

f:id:bagus-rahman:20170816103228j:plain

The fifth topic was "But Will It Compile in Space!?" by Ignaty-san. He was one of our Global Internship Program (GIP) participants. This topic is a look at the effects of space radiation on electronics. There are several major radiation sources in space, such as solar wind, Van Allen Belts, changes with solar weather, and cosmic rays.

Radiation in space is much harsher than radiation on Earth. At such harsh levels, radiation can cause several kinds of damage to electronics. It can induce single-event effects which result in data degradation, calculation or logic errors, and any number of malfunctions. It can also cause gradual component degradation, which results in certain components failing entirely.

There are some ways to mitigate the effects of space radiation on electronics. The classic solution is radiation hardening. This essentially means components are made from more durable materials, which is expensive. Other solutions consists of avoiding radiation belts, shielding electronic components, designing fault-tolerant software and utilizing redundant components.

f:id:bagus-rahman:20170816103354j:plain

The sixth topic was "The Sweets and Bitters of React Native" by Rachel-san. She was also one of our GIP participants. React Native is a framework for building native apps using React. The motivation behind it is the desire to write mobile apps with the same logic as web apps, while achieving native behavior, without sacrificing performance. It reuses React logic in app development, is a bridge to native APIs, and executes JavaScript on the background thread.

Some pros to using React Native are easy to pick up for web developers, provides shared logic and code base for iOS and Android, gets rid of heavy IDEs, provides hot reloading, and easy to combine with native codes. Some cons to using React Native are knowledge of mobile native platform is required, relies on third-party libraries and documentations, frequent release cycles, and many ongoing problems due to its relative immaturity.

f:id:bagus-rahman:20170816103452j:plain

The seventh topic was "TensorFlow - Machine Learning without PhD" by Dovile-san. She was also one of our GIP participants. TensorFlow is an open-source software library for machine intelligence. TensorFlow offers lots of speed with less computing power, uses data flow graphs for numerical computations ,and provides API for Java, C++, Python, and Go. Other TensorFlow-related features include TensorBoard for visualization and TensorFlow Research Cloud for computational resource.

Dovile-san demonstrated the usage of TensorFlow to build artificial neural networks. Given the MNIST database of handwritten digits, the task is to train a model to look at images and predict what digits they are. Using TensorFlow, she defined the number and shape of the layers of the neural network. She also specified the learning rule and error measure calculation.

f:id:bagus-rahman:20170816103606j:plain

As usual, we had a party afterwards :)

f:id:bagus-rahman:20170816102526j:plain

builderscon tokyo 2017 参加レポート (セッション/コーヒーカップ裏話/当日スタッフ/懇親会) #builderscon

2017/08/03から2017/08/05までの3日間開催されたbuilderscon tokyo 2017に、弊社エンジニア複数名がスポンサー企業枠または当日スタッフとして参加しました。

以下、

  • 気になったセッションのレポート
    • The Evolution of PHP at Slack HQ
    • 真のコンポーネント粒度を求めて
    • Factory Class
  • コーヒーカップ裏話
  • 当日スタッフ
  • 懇親会

について、参加者4名で持ち寄ったレポートをお送りいたします。

続きを読む

35th Monthly Technical Session (MTS) Report

35th Monthly Technical Session (MTS) was held on June 16th, 2017. MTS is a knowledge sharing event, in which HDE members present some topics and have QA sessions, both in English.

f:id:bagus-rahman:20170620180751j:plain

The moderator of the 35th MTS was Shihan-san.

f:id:bagus-rahman:20170620175358j:plain

The first topic was "Introduction of React from Angular User" by Shinohara-san. He had been using Angular most of the time, but his most recent project required him to use React instead. He mentioned that Angular is a framework, while React is a library. Despite this difference, the two are often compared. Shinohara-san compared the two of them with the help of TodoMVC. He also explained some features of React, such as JSX, props, and state.

f:id:bagus-rahman:20170620175623j:plain

The second topic was "Statistics Analysis Framework 'ROOT'" by Kusumoto-san. Root is a data analysis framework developed by CERN. It deals with big data processing, statistical analysis, visualization, and storage.

Kusumoto-san had used ROOT several years ago, for research in university. He explained some of ROOT features, such as visualizing data as histograms and trees. He also demonstrated the various types of histogram that ROOT provides.

f:id:bagus-rahman:20170620175727j:plain

The third topic was "Stylish Python" by Jonas-san. He suggested several ideas about project structure, code style, and best practices.

Regarding project structure, one of Jonas' ideas were putting code under src/ directory. Some benefits of doing this are preventing imports from the root directory and requires developer to have a functioning setup.py to work locally. The latter helps finding packaging bugs and makes using entry points easier.

Regarding code style, Jonas proposed many ideas based on PEP 8. He covered indentation, line length, imports, naming, class method order, literals, type hints, and function definitions.

Regarding best practices, Jonas recommended how to write setup.py, how to define dependencies, using iterators, using enums, and using sentinels. He also recommended using several third-party libraries, such as pytest and attrs.

f:id:bagus-rahman:20170620175859j:plain

The fourth topic was a report of Open Source Summit Japan 2017 by Xudong-san. Open Source Summit Japan is an annual conference event held by The Linux Foundation. This year's event is a combination of LinuxCon, ContainerCon, and CloudOpen. Every day, there were around 10 time slots with 4 concurrent sessions in each time slot. The number of attendees was about 600.

Xudong-san attended sessions about Kubernetes, AArch64 architecture support for servers, container security, non-root containers / user namespace containers, and Red Hat's and Microsoft's product promotion sessions, among others. Attending Open Source Summit Japan 2017 made him realize several things. First is the increasing interest and support of Kubernetes, related to services and products such as Google Cloud Platform, Microsoft Azure, and Red Hat OpenShift. Second is the fact that container security is still a big issue.

f:id:bagus-rahman:20170620180033j:plain

The fifth topic was "Our Culture is Our Brand" by Kenny-san. This topic came from his participation in Customer Experience Management 2017. It was held in Sydney and more than 70 companies were a part of the event. Some of these companies are Google, Microsoft, and Australia Post, among others.

In Kenny-san's opinion, 'culture' is the most important concept out of all the concepts that were frequently discussed in the event. Culture is important in achieving customer success and engagement. Both are important in our effort to become a world-class IT company.

Kenny-san defined culture as things that connect us, things that we share even as we continue to grow. These things are belief, purpose, value, and passion. The more things we share, the stronger our culture is. In a company context, culture helps us to get and keep the best people, helps us to build teamwork and improve performance, gives us competitive advantage, and gives us meaning, passion, and love for our work.

These far-reaching effects of culture led Kenny-san to believe that our culture is our brand. Therefore, we have to develop good culture. It consists of strong belief, clear vision, good value, good environment, and good behavior, among others. Good culture leads to employee engagement. In turn, employee engagement leads to customer success and engagement.

f:id:bagus-rahman:20170620180156j:plain

The sixth topic was "Cloud Gaming" by Kelvin-san. He was one of our Global Internship Program (GIP) participants. He began by introducing the concept of cloud gaming, which follows a client-server model. The client sends user commands, the server does all the processing, and the client receives the resulting video stream.

Kelvin-san described his own cloud gaming setup, which utilizes an Amazon EC2 g2.2xlarge instance and Parsec. According to him, such a setup is quite expensive, as it costs him 80 cents per hour. He also mentioned other commercial cloud gaming solutions, such as PlayStation Now, GeForce NOW, and LiquidSky.

Kelvin-san also explained the pros and cons of cloud gaming. Some of its pros are the fact that it is essentially Gaming as a Service (GaaS), developers don't have to worry about DRM, developers have full control over software and hardware, and customers enjoy high availability and low setup time. Some of its cons are its dependence to internet connection, the fact that it is a single point of failure, zero possibility for consumers to mod games, and consumers don't really own games.

f:id:bagus-rahman:20170620180325j:plain

The seventh topic was "Intro to Apache Spark" by Weiting-san. She was also one of our Global Internship Program participants. Apache Spark is an open-source cluster computing system. It is often utilized as an engine for large-scale data processing. Some benefits of Spark are its speed (100 times faster than Hadoop MapReduce in memory, or 10 times faster on disk), its API (for Java, Scala, Python, and R), its libraries (SQL and DataFrames, Spark Streaming, MLib, and GraphX), and it runs everywhere.

Weiting-san also explained some of the main concepts of Spark. Resilient Distributed Datasets (RDDs) are fault-tolerant collections of elements that can be operated on in parallel. They are read-only and distributed over a cluster of machines. RDDs support two types of operations, which are transformations and actions. Transformations create a new dataset from an existing one. Actions return a value to the driver program after running a computation on the dataset. Working with RDDs generally involves creating an RDD from a data source, applying transformations (e.g. map) to an RDD, and applying actions (e.g. reduce) to an RDD.

f:id:bagus-rahman:20170620180455j:plain

The eighth topic was "Klassify and Kluster: Machine Learning Essentials" by Jay-san. He was also one of our Global Internship Program participants. He explained two problems in machine learning, which are classification and clustering. Classification is the problem of identifying a class to which a new observation belongs. This is done based on a collection of observations whose class membership is known. On the other hand, clustering is the task of grouping a set of objects in such a way that objects in the same cluster are more similar to each other than to those in other clusters.

Jay-san also taught us some learning algorithms to solve those problems. Regarding classification, he explained k-Nearest Neighbors algorithm (k-NN). k-NN classifies an observation based on a majority vote of its neighbors. An observation is assigned to the class most common among its k nearest neighbors. Regarding clustering, he explained k-means clustering algorithm. It aims to partition n observations into k clusters. Each object belongs to the cluster with the nearest mean. He also presented the performance of both learning algorithms on the Iris Flower Data Set.

f:id:bagus-rahman:20170620180632j:plain

As usual, we had a party afterwards :)

f:id:bagus-rahman:20170620174937j:plain

Attending Microsoft Build 2017 in Seattle

Hello, this blog post might be a bit unusual since it is written by 2 persons. We are Iskandar and Ogawa from the cloud product development team. In this occasion, we would like to share our experience attending Microsoft Build 2017.

Microsoft Build is an annual conference held by Microsoft, which is aimed for developers. The venue for this year event is located at Seattle, Washington, and the event runs for 3 days from May 10 to May 12, 2017. From Japan, we joined this event via an organized PTS tour and there were more than 50 participants. This event was also our first time visiting Seattle, so we were excited to take a quick look around the Emerald City!

f:id:freedomofkeima:20170601111115j:plain

続きを読む