Hive Bike Share Assignment Passed

1. Utilizing the Bay Area Bike Share database (both Year 1 & 2, Aug. 2013- Aug. 2015)- what is the most popular start station based on trip data?
  1. Embarcadero at Sansome
  2. Market at 4th
  3. San Francisco Caltrain
  4. Townsend at 7th
-- Run this in HIVE Editor --
SELECT startterminal, startstation, COUNT(1) AS count 
FROM bay_area_bike_share
GROUP BY startterminal, startstation 
ORDER BY count DESC LIMIT 10

2. Utilizing the Bay Area Bike Share database (Year 1 only, Aug. 2013- Feb 2014) - Which is the least popular(least used) start station in the Bike share trips data?
(Hint: Use the count of start station, group and order in ascending order)
  1. Townsend at 7th
  2. Mezes Park
  3. Market at 4th
  4. Embarcadero at Sansome
-- Run this in HIVE Editor --
SELECT startstation, COUNT(1) AS count 
FROM bay_area_bike_share
GROUP BY startstation 
ORDER BY count ASC LIMIT 10


3. Utilizing the Bay Area Bike Share database (for Year 1 only, Aug. 2013 - Aug. 2014 only) - what is the SECOND MOST popular end station based on trip data?
(Hint: Use the count of end station, group and order in descending order)

  1. Steuart at Market
  2. Market at Sansome
  3. Embarcadero at Sansome
  4. Harry Bridges Plaza(Ferry Building)
-- Run this in HIVE Editor --
SELECT endstation, COUNT(1) AS count 
FROM bay_area_bike_share
GROUP BY endstation 
ORDER BY count DESC LIMIT 10

Comments

  1. Hi Vinay, hoping you can help! With regards to the Bike Sharing Assignment.... " Utilizing the Bay Area Bike Share database (both Year 1 & 2, Aug. 2013- Aug. 2015)- what is the most popular start station based on trip data?

    Can you tell me how you joined both Year 1 & 2 to query at the same time? Your table is "bay_area_bike_share". Does this table contain both Year 1 and 2 data?"

    Thanks!

    ReplyDelete
    Replies
    1. Not sure if your still wondering about this but you should be able to do it with the command UNION....

      SELECT t.startstation, COUNT(t.startstation) AS c
      FROM (
      SELECT startstation FROM 201402_tripdata
      UNION ALL
      SELECT startstation FROM 201408_tripdata
      UNION ALL
      SELECT startstation FROM 201508_tripdata
      ) t
      GROUP BY t.startstation
      ORDER BY c DESC
      LIMIT 10

      Q3 uses both Dataset 1 and 2
      Q2 looks at only the first dataset
      Q1 looks at all 3 data sets.

      At least that is how I read it.

      Delete

Post a Comment

Popular posts from this blog

Java Practice Questions MCQ

Linux Assignments - Practise

HackerRank Introduction Challenges - Java