NoSQL Live Boston
Yesterday I attended NoSQL Live Boston and participated on the NoSQL in the Cloud panel. During the day I captured some notes for myself which are after a brief recap of some of the points I tried to make during the panel discussion.
Q: What benefits and pitfalls have you found with your use of NoSQL in the cloud?
Based on having used Amazon's SimpleDB service over the past 18 months.
Benefits: pay as go is great for starting off, zero maintenance, zero setup, scaling by spreading data across multiple domains, SimpleSB manages data replication and high availability
Pitfalls: new mindset required to use eventually consistent model, increase impact of network latency, keeping SimpleDB's limitations in mind (size of domains, attribute count, etc.)
Q: How viable are these solutions and what is hampering there adoption?
Viability: SimpleDB has a 2 year track record and recently added consistent reads and conditional puts/deletes opening up new classes of application possibilities.
Adoption: Need body of knowledge, best practice patterns and use cases are still emerging
* Welcome to NoSQL Live (Dwight Merriman)
Defines NoSQL as: No joins and light transaction semantics
Makes scaling horizontally easier
Key questions that all NoSQL products need to answer are what are the differences and what kind of consistency model is being used
* Crossroads, Inroads, Pitfalls Bylaws: Peering into NoSQL's Conceivable Future (Tim Anglade)
Database market is big, clearly room for NoSQL
Crossroads: just moving out of the startup only world but the focus is still on early adopters, NoSQL hype is about to peak, most NoSQL projects are at various states of usefulness.
Inroads: Still need to make the technology better, need better marketing of the NoSQL brand, must focus on education, for many database = SQL
Pitfalls: education disrupts current RDBMS market so existing companies will fight back, overlooking the need for education
Bylaws: For some industries innovation is a liability, junior developers trying to work without training very hard
When NoSQL reaches the point that its search term frequency doesn't drop off during the December holiday season, then we know its gotten into the corporate world.
How to fix things: make inroads in education, create a NoSQL book of knowledge, form an interest group to act as a liaison to the world, host two annual conferences start with US/EU.
* Scaling with NoSQL (Bradford Stephens moderator, Mark Atwood, Doug Judd, Alex Feinberg, Ryan Rawson, Ryan King)
Myth that NoSQL maintenance is easier. NoSQL usually has more operational overhead due to the intersystem dependencies for horizontal scaling. Failure modes require less scrambling, no need to wake up at 2am if a node fails.
Hadoop is hard to get up and running.
Murder is a BitTorrent based software deployment package.
Most of these NoSQL solutions have issues with more data than RAM.
Testing NoSQL scaling really needs at least 5-10 nodes before things get interesting.
HBase and others don't handle random reads well due to underlying HDFS usage.
NoSQL scaling bound by CAP Theorem
memcached suffers from lack of dynamic scaling
cassandra rebalancing has issues with hot spotting
hypertable uses masters with hot standby to avoid SPoF
* Schema Design with Document-Oriented Databases (Durran Jordan moderator, Eliot Horowitz, Bryan Fink, Paul Davis)
Many to Many is hard
Solutions make use of soft links for following data to allow sharding
* The Evolution of the Graph Data Structure from Research to Production (Marko Rodriguez moderator, Peter Neubauer, Borislav Iordanov, Sandro Hawke)
Graph database are geared towards modeling relationships. Think of it as the graph is the index.
HyperGraph concept isn't that popular but offers multiple edge connections.
Neo4J property graph with directed edges and key/value on edges and nodes. Think of it as a multi-dimensional linked list with full ACID support and fast path traversal. Working to ad replication, removing SPoF, better scaling.
RDF is like JSON but with loops. RDF Schema or OWL add constraints. Most RDF implementations are not designed for performance but complex pattern matching. ELMO has JavaBean to RDF mapping. SPARQL standard for querying RDF.
Gremlin is a graph database programming language.
* Toward Web Standards for NoSQL (Sandro Hawke)
XML is on the way out (but it will never die)
JSON became IETF standard
Standard bodies are not always needed, look at popular open source programming languages, they are creating there own standards that everyone can contribute due to its openness.
Hard to do that for some segments, there won't be only one open source web browser, so that is where standards bodies help.
* NoSQL Lab