Posts

Showing posts from December, 2015

HDFS performance,tuning, and robustness

HDFS performance,tuning, and robustness 5   questions 1.  Name the configuration file which holds HDFS tuning parameters mapred-site.xml core-site.xml hdfs-site.xml 2.  Name the parameter that controls the replication factor in HDFS dfs.block.replication dfs.replication.count dfs.replication replication.xml 3.  Check answers that apply when replication is lowered HDFS is less robust Less likely that data will be local to more workers Aggregate I/O rate will be worse HDFS will have more space available 4.  Check answers that apply when NameNode fails to receive heartbeat from a DataNode DataNode is marked dead NameNode will attempt to restart DataNode No new I/O is sent to particular DataNode that missed heartbeat check Blocks below replication factor are re-replicated on other Dat

Hive Bike Share Assignment Passed

1. Utilizing the Bay Area Bike Share database (both Year 1 & 2, Aug. 2013- Aug. 2015)- what is the most popular start station based on trip data? Embarcadero at Sansome Market at 4th San Francisco Caltrain Townsend at 7th -- Run this in HIVE Editor -- SELECT startterminal, startstation, COUNT(1) AS count  FROM bay_area_bike_share GROUP BY startterminal, startstation  ORDER BY count DESC LIMIT 10 2. Utilizing the Bay Area Bike Share database (Year 1 only, Aug. 2013- Feb 2014) - Which is the least popular(least used) start station in the Bike share trips data? (Hint: Use the count of start station, group and order in ascending order) Townsend at 7th Mezes Park Market at 4th Embarcadero at Sansome -- Run this in HIVE Editor -- SELECT startstation, COUNT(1) AS count  FROM bay_area_bike_share GROUP BY startstation  ORDER BY count ASC LIMIT 10 3. Utilizing the Bay Area Bike Share database (for Year 1 only, Aug. 2013 - Aug

How To Install Java on Ubuntu with Apt-Get

How To Install Java on Ubuntu with Apt-Get   Java   Ubuntu Introduction As a lot of articles and programs require to have Java installed, this article will guide you through the process of installing and managing different versions of Java. Installing default JRE/JDK This is the recommended and easiest option. This will install OpenJDK 6 on Ubuntu 12.04 and earlier and on 12.10+ it will install OpenJDK 7. Installing Java with  apt-get  is easy. First, update the package index: sudo apt-get update Then, check if Java is not already installed: java -version If it returns "The program java can be found in the following packages", Java hasn't been installed yet, so execute the following command: sudo apt-get install default-jre This will install the Java Runtime Environment (JRE). If you instead need the Java Development Kit (JDK), which is usually needed to compile Java applications (for example  Apache Ant ,  Apache Maven ,  Eclipse and  IntelliJ