Roaring Elephant

Informações:

Synopsis

Bite-Sized Big Data

Episodes

  • Episode 21 – The Open Data Platform Initiative

    02/08/2016 Duration: 59min

    This episode we have an interview with John Mertic about ODPi. There has been plenty of mystery and even some controversy about ODPi which we attempt to resolve for you. Big thanks to John for giving us some of his time for this interview! Sadly, this time the Skype Gods were not with us and we experienced some drops and hitches. We tried to smooth things over as much as possible, but we were not able to achieve our usual level of quality this time. 00:00 Recent events Vacation for Dave Study for Jhon 10:40 Interview with John Mertic @ ODPi https://www.odpi.org/ John Mertic, Director of Program Management for ODPi and Open Mainframe Project Find John on twitter: @jmertic If you're not familiar with the ODPi here's a few good links to get you started and interested in the area: Links to the ODPi Specifications: https://www.odpi.org/specifications Watch an interview with Alan Gates who discusses what the ODPi is trying to do to simplify the big data world: https://www.youtube.co

  • Episode 20 – Dave’s Hadoop Summit San Jose 2016 Retrospective – Part 2

    19/07/2016 Duration: 01h06min

    In this second part, we discuss the sessions that Dave attended at the San Jose Hadoop Summit and we go in depth on some related topics. Since we ran over an hour with the main topic, and we did not want to make this a three-parter, we decided to forgo the questions from the audience just this one time...   00:00 Recent events Vacation tine! Edx.Org Big Data Courses 04:00 Dave's Hadoop Summit San Jose 2016 Retrospective - Part 2 Session 1: End-to-End Processing of 3.7 Million Telemetry Events per Second Using Lambda Architecture, by Saurabh Mishra @ Hortonworks and Raghavendra Nandagopal @ Symantec Talking point: Hero-culture or why nobody wants to talk about failure anymore Session 2: Top Three - Big Data Governance Issues and How Apache ATLAS resolves it for the Enterprise, by Andrew Ahn @ Hortonworks Talking point: Guaranteed Governance, who certifies the certificate? Session 3: IoT, Streaming Analytics and Machine Learning: Delivering Real-Time Intelligence With Apache NiFi,

  • Episode 19 – Dave’s Hadoop Summit San Jose 2016 Retrospective

    05/07/2016 Duration: 48min

    Dave went to the Hadoop Summit 2016 in San Jose last week and came back with a riveting tale to tell. In this first part of the Summit coverage, join me when I ask Dave all about the keynotes and the general event. Join us next episode where Dave will talk about some of the sessions he attended!   00:00 Recent events Lift and shift to IaaS Hybrid Disaster Recovery Spark & ML goodness MOOC's San Jose Hadoop Summit 09:25 Dave went to the Hadoop Summit in San Jose! Record attendance, maybe a venue change in future Sponsor exhibition area including "interesting" story The Community Corner The keynotes Hadoop is 10 years old Microsoft on Machine Learning Hadoop Assemblies Hadoop fragmentation Cyber security Car insurance premiums "to measure" Ethics session 40:55 Questions from our Listeners Beefy feedback from Kris A listener wants to know if it is worth the trip to go to the US Summit or to just go to the "local" Summit, wherever that is. Nishant would like an

  • Episode 18 – MLeap interview: Productionising Data Science – Part 2

    21/06/2016 Duration: 43min

    In this episode, we have the second part of the interview with Hollin Wilkins and Mikhail Semeniuk, the driving forces behind the MLeap project where they go into more technical details and give tips on deploying MLeap in your environment. If you are working with Spark, are deep into machine learning and are struggling to put those beautifully trained models into production, you definitely do not want to miss this episode! 00:00 Recent events Yet more telco security, again. RFI for european energy company followd by "the RFI rant" Metronnnnnnnnnnn Big Data Hackathon for an airline company predicting delays Preparing an IoT hackathon on predictive maintenance Spreading the word on MLeap at a couple of customers! 11:22 Interview on MLeap with Hollin Wilkins and Mikhail Semeniuk Part 2 http://combust.ml/ http://combust.ml/blog/2016/03/30/flexible-akka-clients-and-servers-part-1.html https://github.com/TrueCar/mleap https://github.com/TrueCar/mleap-demo 35:25 Questions from o

  • Episode 17 – MLeap interview: Productionising Data Science

    07/06/2016 Duration: 54min

    In this episode, we have an interview with Hollin Wilkins and Mikhail Semeniuk, the driving forces behind the MLeap project. If you are working with Spark, are deep into machine learning and are struggling to put those beautifully trained models into production, you definitely do not want to miss this episode! 00:00 Recent events Machine Learning Hackathon on Azure Strata Europe Fighting with Kafka 09:30 Interview on MLeap with Hollin Wilkins and Mikhail Semeniuk Meet Hollin and Mikhail today (7-Jun-2016) at Spark Summit 2016 in San Francisco! https://spark-summit.org/2016/events/mleap-productionize-data-science-workflows-using-spark/ http://combust.ml/ http://combust.ml/blog/2016/03/30/flexible-akka-clients-and-servers-part-1.html https://github.com/TrueCar/mleap https://github.com/TrueCar/mleap-demo 40:50 Questions from our Listeners The Episode 12 mystery unraveled Nifi works well for prototyping, but what's your view on using Nifi in production in a normal D

  • Episode 16 – Interview part two with Sumeet Singh – Senior Director, Cloud and Big Data Platforms @ Yahoo!

    24/05/2016 Duration: 46min

    Hopefully you enjoyed the first part of our interview with Sumeet, here is part two where we go into more detail about Yahoo's use of Hadoop, with lots of interesting topics coming up including the splintering of the ecosystem, governance and much much more.   00:00 Recent events Customer and partner adventures with Apache Nifi Jhon is settling in at Microsoft but is unfortunately quite jet-lagged. 08:15 Part two of our interview with Sumeet Singh - Senior Director, Cloud and Big Data Platforms @ Yahoo! 39:05 Questions from our Listeners Is Apache Atlas Ready for production today?   46:35 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 15 – Interview with Sumeet Singh – Senior Director, Cloud and Big Data Platforms @ Yahoo!

    10/05/2016 Duration: 01h56s

    Having met Sumeet at the Hadoop Summit we thought he'd make a great guest for the podcast, so here he is for your listening pleasure!   00:00 Recent events Louder! iTunes and the missing episode 12 Jhon's new role at Microsoft Hadoop as a Service A fortnight of SAS + Hadoop Metron teething troubles https://issues.apache.org/jira/browse/METRON-136 17:50 Interview with Sumeet Singh - Senior Director, Cloud and Big Data Platforms @ Yahoo!   42:50 Questions from our Listeners One data-lake for all workloads? Or separate clusters for each set of workloads? How large a team do I need to manage a Hadoop cluster?   1:00:56 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 14 – Hadoop Summit – Retrospective

    26/04/2016 Duration: 51min

    After the last two special edition episodes where we quickly covered each Summit day in a "same-day" episode, we go over the full event in this episode, highlighting the sessions we enjoyed the most and sharing our general feelings about the 2016 Hadoop Summit in Dublin.   00:00 Recent events Summit! Sessions on youtube Meetings and planning, Apache Metron https://cwiki.apache.org/confluence/display/METRON/Metron+Wiki https://community.hortonworks.com/articles/26047/apche-metron-tp1-blog-series.html Setting up a new podcast recording "studio" 09:00 Hadoop Summit - Retrospective Summit Schedule App Hortonworks emphasising  Streaming ingest using Nifi, but the other talks did not so much Summit video sessions are starting to appear online https://www.youtube.com/channel/UCAPa-K_rhylDZAUHVxqqsRA/videos Next year: Munich Day one sessions: It's not the size of your cluster, It's how you use it Big Fish - David Darden & Don Smith Unified stream and batch processing w

  • Episode 13 – Hadoop Summit Dublin 2016 – Day 2

    14/04/2016 Duration: 37min

    Welcome to our second special edition podcast bought to you from day 2 of the Hadoop Summit. Breaking our normal fortnightly flow we're delivering a fresh new podcast at the end of each day of the Hadoop Summit. In this episode we cover our impressions of the second day of keynotes and yet more sessions that we enjoyed. 00:00 Recent events Introduction to the Hadoop Summit Dublin 2016 from day 2 01:45 Hadoop Summit 2016 Dublin Day 2 Review Keynote/Session - Yahoo! - Sumeet Singh Keynote - Information is Beautiful - David McCandless http://www.informationisbeautiful.net/ MLeap - Mihael Semeniuk (shift Technologies) Hollin Wilkins (Truecar) Admiral - Adam Morton (Admiral) and Simon Ball (Hortonworks) Hive - Alan Gates (Hortonworks) 37:47 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 12 – Hadoop Summit Dublin 2016 – Day 1

    13/04/2016 Duration: 29min

    Welcome to our special edition podcast bought to you from day 1 of the Hadoop Summit. Breaking our normal fortnightly flow we're delivering a fresh new podcast at the end of each day of the Hadoop Summit. In this episode we cover our impressions of the keynotes and some of the sessions we enjoyed during day 1. 00:00 Recent events Introduction to the Hadoop Summit episode for day 1 01:40 Main Topic Some comments from attendees as to what they're looking forward to at the event Conversation about the keynotes and the sessions we enjoyed 29:38 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 11 – Interview with Community Award Winner Venkatesh Sellappa

    05/04/2016 Duration: 37min

    Venkatesh is a new contributor to Apache NiFI and during his talk at the Hadoop Summit next week, he takes a light-hearted look at his journey of how to become a contributor to an Apache Project. Venkatesh is one of the Community Choice winners, so congratulation are in order and we are certain you will like this interview! Enjoy, and we looking forward to seeing you at the Hadoop Summit in Dublin next week! 00:00 Recent events Easter Break Big Data Analytics Big Telco workshops/meetings and sessions stuff Domain Knowledge is important 05:40 Main Topic Interview with Venkatesh Sellappa 33:50 Questions from our Listeners: No questions this time but information on our activities during the upcoming Hadoop Summit. 37:18 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 10 – Preparing for the 2016 Hadoop Summit in Dublin

    22/03/2016 Duration: 01h03min

    Next month, the European Hadoop Summit will take place in Dublin. Now that the agenda for the event has been nearly finalised we take it upon ourselves to provide a virtual guide to the event. There's a lot of good things happening during the event so we share with you what sessions we think we'll be attending and why. Enjoy, and looking forward to seeing you there! This is another long episode, going over an hour for the first time. We are really curious to know if you like these longer episodes, or if you would prefer it if we kept it under the original 30 to 35 minutes? 00:00 Recent events Hands on upgrading, express vs rolling upgrade Workshop at telecom company in Russia Nifi workshops Securing a Hadoop cluster 08:00 Main Topic Dave has assembled some statistics on the type of sessions available. What sessions we would attend and why. http://hadoopsummit.org/dublin/agenda/ General advice to visitors mixed in...   54:30 Questions from our Listeners: What else is going o

  • Episode 9 – SQL in Hadoop

    08/03/2016 Duration: 53min

    SQL was one of the first data access methods added to vanilla Hadoop. Considering that the many of the people working with Hadoop in the early days came from a database background, this is not surprising. Since then, the SQL ecosystem in Hadoop has grown considerably and in this episode we do a general overview of many of the available choices.This episode runs a bit longer than normal but we hope you'll find it worthwhile! 00:00 Recent events Spark masterclasses NiFi on trains Mifid II and the active archive World Mobile Congress 08:30 Main Topic SQL solutions: Apache Hive https://hive.apache.org/ Apache Spark Sql http://spark.apache.org/sql/ Apache Phoenix https://phoenix.apache.org/ Apache Impala (incubating) https://www.cloudera.com/products/apache-hadoop/impala.html Apache Hawq (incubating) http://hawq.incubator.apache.org/ Apache Drill https://drill.apache.org/ Presto https://prestodb.io/ Oracle Big Data Sql http://www.oracle.com/us/

  • Episode 8 – NiFi Deeper Dive

    23/02/2016 Duration: 47min

    In this episode we'll go into more depth on NiFi complete with our second interview with Joe Witt, Senior Director of Engineering at Hortonworks who dives into how NiFi works under the covers and some considerations to think about when using it for real. 00:00 Recent events New logo for the podcast Hadoop use in telecom Spark masterclass details Apache Nifi "Hype Train" concerns 09:14 Main Topic Second interview with Joe Witt: a deeper dive on Apache NiFi 35:30 Questions from our Listeners: I have already implemented some of my ingest in flume/kafka/storm, do I need to replace that with NiFi? Is it true there is no chance of data loss with NiFi? Can I aggregate or combine data as part of the flow process? Do I need a hadoop cluster to use NiFi? 47:18 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 7 – An introduction to Data Ingest

    09/02/2016 Duration: 37min

    In this episode we'll cover some of the most common options for ingesting data into Hadoop including technologies like Flume, Sqoop, Kafka, NiFi and more. 00:00 Recent events Upcoming masterclasses on NiFi and Spark NiFi deployment on trains Podcast publicizing Global Systems Integrator training day 06:40 Main Topic Apache Sqoop Apache Flume Apache Kafka Apache NiFi Other Low level ingest methods 28:00 Questions from our Listeners:  I want to transform the data to it’s final form before it lands in the Hadoop cluster. Which ingest tool should I use? What about XYZ vendors “hadoop loader/ingest” tool ? Do all these tools run on my hadoop nodes? How does lambda architecture fit with data ingest? 37:15 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 6 – An introduction to NiFi

    26/01/2016 Duration: 30min

    In this episode we'll cover some an introduction to NiFi complete with an interview with Joe Witt, Senior Director of Engineering at Hortonworks who explains exactly where NiFi came from and how it fits into your Big Data plans. 00:00 Recent events The usual "Start of the Year" meetings and events Using Apache NiFi as a self documenting deployment system We are now available on iTunes 04:50 Main Topic Interview with Joe Witt, one of the creators of Apache NiFi and currently Director of Engineering for HDF at Hortonworks. 22:40 Questions from our Listeners: Is NiFi really as easy to use as it looks? Is NiFi a part of Hadoop now? >How do I get started with NiFi? Is NiFi an ETL tool? 30:45 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 5 – An introduction to Spark

    12/01/2016 Duration: 37min

    In this episode we'll cover the basics of Apache Spark, including typical deployment situations, architecture and usage.   00:00 Recent events Seasons Greetings! Jhon shamelessly plugs his mini cluster build Apache Mesos Amazon IoT solution 05:28 Main Topic Who would use Apache Spark, why would you use it, where would you use it Apache Spark Architecture Apache Spark Components Apache Spark MLlib Apache Spark gotcha's Typical use cases for Apache Spark 28:20 Questions from our Listeners:   What happens if all my data does not fit in memory? What is the security like for Spark? Why Spark on Hadoop instead of standalone Python, Scala, Java or something else for Spark? Can I access data on HDFS or local disk from my Spark script? 37:50 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.  

  • Episode 4 – Hadoop: Year in review

    29/12/2015 Duration: 38min

    A bit of Hadoop history of what we have seen happening over the last 12 months, some trends and interesting technologies. Some ups, some downs and possibly even some round and rounds, capped off with some Bold Predictions for 2016. 00:00 Recent events A number of engagements Apache Nifi Why some Hadoop users decide to go for separate clusters per use case or (internal) client 06:00 Main Topic A broad acceptance of Hadoop in Europe A shift from batch workload to multi-tenant, secure platform including IoT and Real time, in memory analytic. Apache Ambari making our life easier all the time Data Governance Initiative Open Data Initiative (http://odpi.org) Public clouds offer Big Data specific environment Tech advances in Hive (CBO/ORC/Zlib) and Transparent Encryption in HDFS Apache NiFi The year of Apache "open community" open source Bold Predictions! 31:00 Questions from our Listeners: What new (incubating) projects should I invest time in today, knowing that they may ne

  • Episode 3 – High level Hadoop architectures

    15/12/2015 Duration: 37min

    What are the hardware and implementation options we see.A discussion ranging from direct attached storage versus network attached storage/storage area networks, to on-premise hardware versus cloud options. 00:00 Recent events Organisations starting their Big Data Journey A lessons learned workshop for a customer after their successful pilot Planning Masterclasses for 2016 Migration customer workshop Big Data and the Connected Car webinar (registration required) 07:30 Main Topic Direct attached storage (DAS) or “traditional” hadoop Network attached storage (NAS) / Storage Area Networks (SAN) Cloud / Azure / AWS / Google Cloud / Openstack etc... SaaS/PaaS/HaaS/HDInsight Ceph & Gluster ObjectStore(S3) and Other cloud storages 25:30 Questions from our Listeners: Doesn’t having a SAN/NAS system break data locality? Can I mix drive sizes and types within a cluster or even within the same node? Hybrid cluster environments, how to mix cloud and on premise deployment?

  • Episode 2 – How to avoid disaster

    01/12/2015 Duration: 43min

    When you are getting started with your journey with Hadoop, how to avoid Hadoop disaster? We have seen many people going through this journey and both of us have seen things people do that makes the project successful, and things people do that make projects more difficult than they should be. 00:00 Recent events Customer pilot completion SQL on Hadoop Masterclasses Multi-tenant Spark notebook issues Spark recommendation engine webinar 11:00 Main Topic Starting too small Baseline and benchmark Config management Backup and/or disaster recovery Leaving security too late 36:00 Questions from our Listeners: Where do I find data scientists? Storage options? Install everything? 43:37 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

page 20 from 21