July 12, 2011

Give Your MySQL Account Access to Hive

Posted in Configuration, Hadoop, Hive, Scalability at 3:02 PM by Kirk True

One area of the Apache Hive documentation that’s not entirely explicit is in regard to the database privileges needed for its metastore[1]. Developers often become accustomed to creating a database account that has all privileges granted. But in the Real World, end users of Hive must configure it to point to a metastore RDBMS account with very specific privileges granted by the DBA.

Are You Privileged?

By default, Hive is configured to use a (local-to-that-system) Apache Derby database instance as its metastore. However, once an initial evaluation phase of Hive is complete, the team will likely need to move to a shared database to use as their metastore so the whole team is in sync. MySQL is a suitable database to be used as the metastore, though Hive supports other relational databases as well.

So which database roles must be granted to the account used for the Hive metastore in MySQL?

  • CREATE
  • DROP
  • ALTER
  • INDEX
  • SELECT
  • INSERT
  • UPDATE
  • DELETE

You can refer to the official MySQL documentation for granting privileges regarding the specifics of the roles and the exact syntax.

Here’s an example MySQL script that creates a dedicated user (“hive”), a MySQL database (“hivemetastore”), and grants the user some DDL[2] and DML privileges:


CREATE DATABASE hivemetastore;
CREATE USER 'hive'@'%' IDENTIFIED BY 'mypassword';
GRANT CREATE, DROP, ALTER, INDEX ON hivemetastore.* TO 'hive'@'%';
GRANT SELECT, INSERT, UPDATE, DELETE ON hivemetastore.* TO 'hive'@'%';

Once the DBA has created the account and granted it the requisite privileges, the user running Hive can specify this MySQL database as his Hive metastore. This is done either by exporting the following settings via HIVE_OPTS or by specifying them on the command line when invoking the CLI:


javax.jdo.option.ConnectionURL=jdbc:mysql://your.mysql.server/hivemetastore
javax.jdo.option.ConnectionUserName=hive
javax.jdo.option.ConnectionPassword=mypassword

[1] It can be confusing when discussing Hive someone refers to the “Hive” database, tables, or columns. To which are they referring? To the database, tables, and columns that Hive presents (with data stored in HDFS, S3, etc.) or the metastore database, tables, and columns (with data stored in MySQL, Postgres, etc.) in which Hive stores its meta data? Unfortunately the two are often used interchangeably.

[2] The reason that the CREATE, DROP, ALTER, and INDEX privileges are needed is because Hive can dynamically create the needed schema in the metastore database in which to store information about the Hive database.

June 25, 2011

SSDs in the Data Center — Is $/GB/IOPS the Only Relevant Metric?

Posted in Miscellaneous, Scalability at 8:14 AM by Kirk True

Wikia’s Artur Bergman recently gave a talk at Velocity about SSD adoption that has generated a lot of buzz.

The video can be viewed here.

Warning: the video is rated PG-13 for language and adult situations.

The focus of his talk was that the relevant metric for data center storage is $/GB/IOPS. He showed how adopting SSDs provides orders of magnitude more IOPS over traditional drives and thus will prove themselves more economical.

In addition to greatly increased read and write speeds, the switch to SSDs helps to relieve the bottleneck of random access seek times. Random I/O is often at the core of performance and scalability problems. But as a result of the performance improvements that SSDs bring, he argues that we can once again use standard practices such as monolithic database and file servers and that distributed storage systems are thus irrelevant.

There’s little argument that SSDs will continue to gain a foothold in the data center. But should you switch everything to SSDs? What are some cases where $/GB/IOPS is not the only relevant metric?

CPU-bound Processes

Everyone is focused on the explosion of big data, but not every application is I/O-based. Video encoding or scientific research applications often benefit from beefier processing capabilities than I/O speeds.

Availability

No matter how fast your database server is with SSDs, if it gets knocked off line, it really doesn’t matter. Distributed data storage systems such as Hadoop and Cassandra are also designed for availability, keeping multiple replicas of data so that if some nodes die, data and services are still available.

Not All Data is Equal

The process of moving ‘cool’ data to an archived location is still an observed practice. Companies may not need blazing speeds for all data and so blindly optimizing for IOPS over raw capacity isn’t always the right way to go.

Don’t get me wrong — I love the SSDs in my laptop and workstation. And over time–in the right cases–SSDs will provide another means in our ongoing effort to squeeze more performance out of our systems.

May 20, 2011

Pay Off Your Technical Debt

Posted in Miscellaneous, Scalability at 5:53 PM by Kirk True

The first thing I do when I get my hands on a client’s code is to figure out the size of the code base. I execute something like this to determine the number of lines of code (LoC) using a fresh checkout from trunk:

$ find . -name “*.java” -exec cat {} \; | wc -l

At one recent client, the measurement was over 180,000 lines of Java source code in the project. I was a bit taken aback as the project didn’t seem to need nearly so much code for what they were doing. After all, this was a fairly straightforward web application.

In a search to find what all those 180,000 lines of Java source code were doing, I found several tens of thousands of lines of code dedicated to custom implementations of the following:

  • Database connection pool
  • Transaction layer
  • Multiple ORMs
  • Caching layers

This was technical debt, pure and simple.

The phrase technical debt is attributed to Ward Cunningham from the OOPSLA conference in 1992:

Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite. . . . The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise.

The inclusion of transaction management, caching layers, a connection pool, and ORMs represents technical debt in the form of infrastructure code, not application logic. The infrastructure code represents the foundation and plumbing atop which the application logic rests. In and of itself infrastructure code provides no value to the code and company. In fact, quite the opposite — it requires a disproportionate amount of time to design, implement, test, and maintain.

It is common for engineers to want to write infrastructure code. It’s a nice respite from the sometimes tedium of writing business applications. However, it is easy to understand why an in-house implementation will be lacking in robustness. Here are some oft-repeated causes for the poorly-performing infrastructure code:

  • There was only one engineer tackling the “problem”
  • The engineer lacked in-depth domain knowledge of the internals of such systems
  • Infrastructure wasn’t the engineer’s “real” job, the projects only received part-time attention
  • The use cases for which the infrastructure code was designed were too narrow in scope; the code often implemented a set of features needed by only the then current set of application requirements; when new application requirements are mandated by product management, the infrastructure code needed to be augmented again ad infinitum

Companies have several stages of growth, and it is my opinion that using off-the-shelf solutions will prove adequate even into the hundreds of thousands of users. As a company grows, there will be a point at which off-the-shelf tools will no longer support the load that its users generate. I would argue that the load would have to be measured in the millions of users to warrant infrastructure code implementation. And then, it’s a must that detailed rationale, research, and measurements before entering into such a responsibility. Don’t have a hunch or a gut feeling – have data.

Remember Lord Kelvin who famously said:

If you can not measure it, you can not improve it.

The bottom line is this:

Unless the company is in the business of offering infrastructure or middleware AND/OR has millions of users and terabytes of data, it is best that the company focus on its core technology differentiator and leverage existing for-cost or open source solutions for all of their infrastructure code needs.

For companies like Google, Facebook, Amazon, their needs go beyond what off-the-shelf tools provide, and thus customized tools and implementations are a legitimate effort for such organizations. But they arrive at such systems after very close scrutiny in their correct uses of the current systems before implementing their in-house systems. Even then, they will often open source these solutions and solicit contributions from the community to help in alleviating this maintenance burden.

September 30, 2010

Using Hive with Existing Files on S3

Posted in Hadoop, Hive at 8:41 PM by Kirk True

One feature that Hive gets for free by virtue of being layered atop Hadoop is the S3 file system implementation. The upshot being that all the raw, textual data you have stored in S3 is just a few hoops away from being queried using Hive’s SQL-esque language.

Imagine you have an S3 bucket un-originally named mys3bucket. It contains several really large gzipped files filled with very interesting data that you’d like to query. For the sake of simplicity for this post, let’s assume the data in each file is a simple key=value pairing, one per line.

If you don’t happen to have any data in S3 (or want to use a sample), let’s upload a very simple gzipped file with these values:

    jim=89347
    dave=313925
    noddy=21516
    don=6771

Prerequisites

Both Hive and S3 have their own design requirements which can be a little confusing when you start to use the two together. Let me outline a few things that you need to be aware of before you attempt to mix them together.

First, S3 doesn’t really support directories. Each bucket has a flat namespace of keys that map to chunks of data. However, some S3 tools will create zero-length dummy files that look a whole lot like directories (but really aren’t). It’s best if your data is all at the top level of the bucket and doesn’t try any trickery. There are ways to use these pseudo-directories to keep data separate, but let’s keep things simple for now.

Second, ensure that the S3 bucket that you want to use with Hive only includes homogeneously-formatted files. Don’t include a CSV file, Apache log, and tab-delimited file in the same bucket. We need to tell Hive the format of the data so that when it reads our data it knows what to expect.

Third, even though this tutorial doesn’t instruct you to do this, Hive allows you to overwrite your data. This could mean you might lose all your data in S3 – so please be careful! If you need to, make a copy of the data into another S3 bucket for testing. Note that there is an existing Jira ticket to make external tables optionally read only, but it’s not yet implemented.

Install Hive

Of course, the first thing you have to do is to install Hive. This is fairly straightforward and perhaps my previous post on this topic can help out. I’m doing some development (bug fixes, etc.), so I’m running off of trunk. But a reasonably recent version should work fine.

Let’s assume you’ve defined an environment variable named HIVE_HOME that points to where you’ve installed Hive on your local machine. If you don’t really want to define an environment variable, just replace $HIVE_OPTS with your installation directory in the remaining instructions.

Update Configuration

Now, let’s change our configuration a bit so that we can access the S3 bucket with all our data. First, we need to include the following configuration. This can be done via HIVE_OPTS, configuration files ($HIVE_HOME/conf/hive-site.xml), or via Hive CLI’s SET command.

Here are the configuration parameters:

Name Value
fs.s3n.awsAccessKeyId Your S3 access key
fs.s3n.awsSecretAccessKey Your S3 secret access key

Create Table Over S3 Bucket

Whether you prefer the term veneer, façade, wrapper, or whatever, we need to tell Hive where to find our data and the format of the files. Let’s create a Hive table definition that references the data in S3:

    CREATE EXTERNAL TABLE mydata (key STRING, value INT)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '='
    LOCATION 's3n://mys3bucket/';

Note: don’t forget the trailing slash in the LOCATION clause!

Here we’ve created a Hive table named mydata that has two columns: a key and a value. The FIELDS TERMINATED clause tells Hive that the two columns are separated by the ‘=’ character in the data files. The LOCATION clause points to our external data in mys3bucket.

Now, we can query the data:

    SELECT * FROM mydata ORDER BY key;

The result would look something like this:

    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=
    In order to set a constant number of reducers:
      set mapred.reduce.tasks=
    plan = file:/tmp/kirk/hive_2010-09-30_13-24-57_172_4142256212187799609/-local-10002/plan.xml
    Job running in-process (local Hadoop)
    2010-09-30 13:25:00,715 null map = 100%,  reduce = 100%
    Ended Job = job_local_0001
    OK
    dave	313925
    don	6771
    jim	89347
    noddy	21516
    Time taken: 3.742 seconds

Because we’re kicking off a map-reduce job to query the data and because the data is being pulled out of S3 to our local machine, it’s a bit slow. But at the scale at which you’d use Hive, you would probably want to move your processing to EC2/EMR for data locality.

Conclusion

Of course, there are many other ways that Hive and S3 can be combined. You may opt to use S3 as a place to store source data and tables with data generated by other tools. You can use S3 as a Hive storage from within Amazon’s EC2 and Elastic MapReduce. You can use S3 as a starting point and pull the data into HDFS-based Hive tables. Hive presents a lot of possibilities — which can be daunting at first — but the positive spin is that these options are very likely to coincide with your unique needs.

References

Please see also the following links for Hive and S3 usage from the official Hive wiki:

Overview of Using Hive with AWS
Overview of Using Hive with S3
Using Hive with Compressed Data Storage

August 9, 2010

How To Try Out Hive on Your Local Machine — And Not Upset Your Ops Team

Posted in Configuration, Hive, Java, Scalability at 8:23 PM by Kirk True

According to the Hive web site:

Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files.

Hive is built on top of various technologies, the most notable being Hadoop and HDFS. As a result, to run Hive, you need access to a Hadoop cluster running a job tracker, task trackers, DFS nodes, and so on. You will also need an external database (MySQL, PostgreSQL, etc.) to store Hive’s meta data.

Even if you have such an environment available to you, for exploratory reasons it may be faster and less error prone to work in a sandbox. It’s very easy to seemingly delete files from HDFS using common Hive commands (loading data from HDFS will by default move the file to the Hive-managed location). While you’re finding your way around Hive, it might be best to isolate yourself from the outside world (metaphorically, mind you) to avoid causing any problems.

So then, how can you try out Hive on your local machine, and in a safe sandbox? Well, it’s actually pretty easy.

First, you’ll need to download and build Hive from the source.

Next, before running Hive, simply export the following environment variable:

export HIVE_OPTS="-hiveconf mapred.job.tracker=local \
   -hiveconf fs.default.name=file://`pwd`/tmp \
   -hiveconf hive.metastore.warehouse.dir=file://`pwd`/tmp/warehouse \
   -hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=`pwd`/tmp/metastore_db;create=true"

(Sorry about the WordPress formatting; watch out for the last line in the above…)

The HIVE_OPTS environment variable is used by the Hive command-line utility to provide overrides to the default Hive configuration. (Note: some references use the incorrect name HIVE_OPT (i.e. missing the S) which, of course, causes the values to be silently ignored.) Setting it prior to running the bin/hive script will cause the values set in the environment variable to be used instead.

Just for completeness, let’s review the settings:

  • mapred.job.tracker – This is a standard Hadoop configuration option to point to the URL of the job tracker. The magic value of “local” will cause Hadoop to be run on the local machine instead.
  • fs.default.name – This is another standard Hadoop configuration option to specify the root of the distributed file system used by Hadoop (often HDFS, but not necessarily). By using a file://-based URL, we use our local file system, which–for tests–is likely sufficient.
  • hive.metastore.warehouse.dir – This setting is specific to Hive, and is the directory name (relative to the fs.default.name) in which Hive’s warehouse data is stored. Again, we use a local file system-specific path.
  • javax.jdo.option.ConnectionURL – This is a standard JDBC URL used by Hive to connect to its meta data store. Using the value of jdbc:derby:;databaseName=`pwd`/tmp/metastore_db;create=true" allows us to use a local, embedded Derby database with its files stored on the local file system.

At this point, you should be ready to run Hive. Notice that as you create, load, and query from your database, the directory under the current directory (named “tmp”) is populated with a number of files. And guess what? If you want to start all over from scratch, simply exit Hive, delete that directory, and everything is new again.

Also of value in your travels is to set the logging level from the command line:

export HIVE_OPTS="-hiveconf hive.root.logger=DEBUG,console"

You can adjust the logging level to what you need. Another great benefit of running Hadoop locally is that you can get the debug logging to your console for both the Hive client and map/reduce execution.

July 23, 2009

Results of Scalability Improvements for Project Voldemort

Posted in Design, Java, Linux, Scalability at 2:13 PM by Kirk True

For the past several weeks I’ve been working on and off on an NIO implementation of Project Voldemort‘s socket server. Voldemort is a distributed hash table used at LinkedIn in order to scale certain workloads. Up until now, Voldemort used a blocking I/O (BIO), thread-per-socket model for its binary protocol. My implementation moved to supporting thousands of sockets on a single thread.

I wanted to share the results of some testing I’ve done comparing the existing blocking I/O (BIO) implementation of the socket server vs. my NIO implementation.

Test Environment

My testing environment is kept deliberately simple in order to encourage reproducibility. I’m running a single Voldemort instance on a server to which four client machines are pointing. I am simulating multiple clients by using multiple threads from each client.

In terms of the hardware involved, the server is an Intel Core 2 Duo 6700 (2.66 GHz) with 3.3 GB RAM[1]. The client machines are Intel Core 2 Duo 4300s (1.80 GHz) with a full 4 GB of RAM each. The server and clients are separated by Gigabit Ethernet on a dedicated switch.

The server and client machines are running Fedora 10 64-bit. The JVMs are standardized to version 1.6.0_14. The code is version 0.52 from my git-forked “nio-server” branch. Besides the changes made to my branch, I have made two small uncommitted changes as detailed in the patch below[2]. Basically they uncommitted changes increase the heap size for both client and server to 3 GB and increase the client timeouts to 10 minutes(!).

Client Setup

To perform the tests I’m using the voldemort-remote-test.sh script on each of the four client machines, updated to use 3 GB heap as stated above. I updated the voldemort.performance.RemoteTest to set some timeouts very long (10 minutes). I simply averaged out the result from all four machines over 100 iterations of the test.

Server Setup

The server is set up to use either the BIO or NIO implementation depending on the value of “enable.nio.connector” in server.properties. The value of “storage.configs” was set to “voldemort.store.noop.NoopStorageConfiguration,voldemort.store.bdb.BdbStorageConfiguration” as some tests used the no-op storage engine. When running the BIO implementation the value of “max.threads” was set as appropriate based on the total number of client threads. For the NIO implementation we run only two threads regardless of the fact that we’re serving tens of thousands of clients.

After each test the Voldemort server was shut down and restarted.

For both the client and the server I executed the command “ulimit -n 40000” prior to starting the JVM to run the tests in order to provide room for the socket file descriptors.

Test Results

So far I’ve run three tests comparing the BIO implementation to the NIO implementation. These all center around measuring the writes/second as reported by the voldemort-remote-test.sh script.

Transactions/second using No-op Storage Engine

The first test measured the transactions/second as the number of clients increased. The server is running the no-op storage engine to avoid skewing the results with BDB’s work (as is shown to be significant in the next test).

For this test, the invocation of voldemort-remote-test.sh provided these options:

Number of iterations: 100
Number of requests: 100,000
Operations: write and delete
Value size: 1024 bytes
Threads: as shown in the x-axis; each client ran ¼ of the threads

Here is a graph charting the results:

Transactions/second using No-op Storage Engine

While running the BIO implementation simulating 6,000 clients, the client application started throwing lots of connection timeout errors upon starting up. After several minutes the clients would generally recover and complete the number of iterations. However, the reporting output from the client would show that the number of successful operations was often less than the expected. For example, the delete operation would report “99,984 things deleted” instead of 100,000. This connection timeout behavior explains the increase in speed shown by the BIO implementation. As a fraction of the clients hung (and eventually timed out), the remaining clients would process their work that much faster, producing a more favorable measurement. These connection timeout errors were not seen when running the NIO implementation until much later.

On the server, the load average (as measured by top) was about ½ the number of server threads, e.g. around 4,000 for 8,000 clients when running the BIO implementation. For NIO, it was always ~2. For 8,000 clients, the heap size went to 2 GB for BIO while NIO was about 700 MB. This is largely attributed to the per-thread stack size.

Individual iteration measurements for clients varied wildly with the BIO implementation. Unfortunately I didn’t measure this, but I did note several occasions where iterations would yield 15,000 writes/sec and the next iteration only 4,000 writes/sec. Deviation for the NIO implementation was observed to be minimal, however.

The data for the BIO implementation ends abruptly before recording the results for 10,000 connections since the clients hung until they were forcibly stopped (via kill -9). This was attributed to the inability of the BIO implementation to keep up due to the fact that it was pegged in garbage collection.

At 20,000 connections the NIO implementation did cause one client machine to timeout on connections, though it did recover. At 32,000 connections a noticeable drop in performance was noted and at 34,000 the clients hung until forcibly stopped. This too was attributed to garbage collection times.

Transactions/second using BDB Storage Engine

The next test also measured the transactions/second as the number of clients increased. This time, however, the server is running the Berkeley Database storage engine to see how the socket server implementation is affected.

For this test, the invocation of voldemort-remote-test.sh provided these options:

Number of iterations: 100
Number of requests: 100,000
Operations: write and delete
Value size: 1024 bytes
Threads: as shown in the x-axis; each client ran ¼ of the threads

Here is a graph charting the results:

Transactions/second using BDB Storage Engine

As with the above test, around 6,000 clients with the BIO implementation the clients started throwing lots of connection timeout errors when starting up. At 8,000 clients this connection timeout behavior occurred for all four clients and explains the increase in speed shown by the BIO implementation. These connection timeout errors were not seen when running the NIO implementation until around 30,000 clients.

The BDB storage engine greatly affects the transactions/second that can be processed. My guess is that because the BDB code (and its threads) are competing for CPU time against thousands of other threads in the BIO implementation that it begins to become starved in terms of making progress. The NIO implementation, however, using a fixed two threads allows BDB to accomplish more work.

So even at a small number of clients, we achieve an increase of ~50-100% more transactions/second with the NIO implementation over the BIO implementation.

Transactions/second per Request Size

The next test also measured the transactions/second, but this time as the request size increased. The server is running the no-op storage engine to measure more clearly the nework I/O.

For this test, the invocation of voldemort-remote-test.sh provided these options:

Number of iterations: 100
Number of requests: 100,000
Operations: write and delete
Value size: as shown in the x-axis
Threads: 100 total, each client ran 25 threads

Here is a graph charting the results:

Transactions/second per Request Size

It’s pretty easy to see how the request size has an obvious effect on the number of transactions. Please note that the request sizes in the x-axis are doubling rather than linearly increasing. As can be seen in the graph, this appears to be a matter of JVM throughput as opposed to the nuances of the socket server implementation.

Conclusion

The implementation of the socket server delivers on NIO’s promise of better scalability with regard to using asynchronous I/O and readiness selection over the blocking I/O and thread-per-socket approach.

A Big Surprise

For a long time my NIO implementation suffered from weaker performance than the BIO implementation. I had scoured the code looking for problems but found nothing obvious. I had used the examples that I’d seen posted everywhere wherein the server used a single Selector to process readiness selection and for each ready socket, handed processing of to a thread in a thread pool.

What I found during profiling of the server was that ~30% of the runtime was spent in updating the SelectionKey‘s interestOps. Understandably when a SelectionKey updates its interestOps it requires acquiring a shared lock in the Selector implementation (at least for the 1.6 JDK on Linux). This caused a horrible serialization problem.

What I did instead was to move to a thread-per-Selector design wherein all of the I/O is done serially in a loop. There is literally one thread handling the I/O for tens of thousands of sockets. Because it’s done in the same thread, when the SelectionKey‘s interestOps is updated, the lock is found to already be held, thus no contention.

Even while I’m writing this I’m dumbfounded how a single thread that includes I/O and processing the operation could outperform a multi-threaded design. I’ve looked and looked and tested and tested and it’s almost always faster. The code for the NIO implementation is there, take a look and let me know what I’ve overlooked.

Moving Forward

The next step is to get the NIO implementation accepted into the Voldemort master repository. This would help so that others can peruse the code and employ it in their testing environments. I’d love to see it progress and improve based on others’ input. After that–should it stand up under scrutiny–it could become the default socket server implementation.

Well, I can dream ;)

1. The machine has 4 GB DIMMs installed, but BIOS stole 640 MB for some reason. Despite upgrading the BIOS (try doing that on a server running Linux without a floppy drive — not fun) and trudging through scores of forums I could find no solution. Suffice to say, I was/am very irritated. Is this an Asus thing?

2. Here’s an uncommitted patch that I used to increase the heap size and increase client timeouts:


diff --git a/bin/run-class.sh b/bin/run-class.sh
index ade56bb..a95d593 100755
--- a/bin/run-class.sh
+++ b/bin/run-class.sh
@@ -35,8 +35,8 @@ done
 CLASSPATH=$CLASSPATH:$base_dir/dist/resources

 if [ -z $VOLD_OPTS ]; then
-  VOLD_OPTS="-Xmx2G -server -Dcom.sun.management.jmxremote"
+  VOLD_OPTS="-Xmx3G -server -Dcom.sun.management.jmxremote"
 fi

 export CLASSPATH
-java $VOLD_OPTS -cp $CLASSPATH $@
\ No newline at end of file
+java $VOLD_OPTS -cp $CLASSPATH $@
diff --git a/bin/voldemort-server.sh b/bin/voldemort-server.sh
index 9878095..66a69e5 100755
--- a/bin/voldemort-server.sh
+++ b/bin/voldemort-server.sh
@@ -42,7 +42,7 @@ done
 CLASSPATH=$CLASSPATH:$base_dir/dist/resources

 if [ -z $VOLD_OPTS ]; then
-  VOLD_OPTS="-Xmx2G -server -Dcom.sun.management.jmxremote"
+  VOLD_OPTS="-Xmx3G -server -Dcom.sun.management.jmxremote"
 fi

 java $VOLD_OPTS -cp $CLASSPATH voldemort.server.VoldemortServer $@
diff --git a/test/integration/voldemort/performance/RemoteTest.java b/test/integration/voldemort/performance/RemoteTest.java
index 2a15955..29e945d 100644
--- a/test/integration/voldemort/performance/RemoteTest.java
+++ b/test/integration/voldemort/performance/RemoteTest.java
@@ -26,6 +26,7 @@ import java.util.List;
 import java.util.concurrent.CountDownLatch;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicInteger;

 import joptsimple.OptionParser;
@@ -132,6 +133,9 @@ public class RemoteTest {

         System.out.println("Bootstraping cluster data.");
         StoreClientFactory factory = new SocketStoreClientFactory(new ClientConfig().setMaxThreads(numThreads)
+                                                                                    .setConnectionTimeout(10, TimeUnit.MINUTES)
+                                                                                    .setRoutingTimeout(10, TimeUnit.MINUTES)
+                                                                                    .setSocketTimeout(10, TimeUnit.MINUTES)
                                                                                     .setMaxTotalConnections(numThreads)
                                                                                     .setMaxConnectionsPerNode(numThreads)
                                                                                     .setBootstrapUrls(url));

January 16, 2009

Pay Attention to your Processes!

Posted in Miscellaneous at 2:42 AM by Kirk True

I recently ran into an issue where I would intermittently get this Oracle error:

ORA-12516, TNS:listener could not find available handler with matching protocol stack

Nearly all of the search results for that error detailed the need to have the Oracle’s listener set up correctly. However, in my case the problem was not that the listener wasn’t set up correctly, but that the “processes” database variable needed to be increased. Note: no, it wasn’t an issue with the server code not releasing connections or anything. The problem crept up even when running SQL scripts from the command line :\ Unfortunately, the error message was very misleading.

So, as all of the Oracle user forum entries instructed me to do, I logged into the Oracle box as the oracle user, edited $ORACLE_HOME/dbs/init.ora to update the processes value from 50 to 100, and restarted Oracle.

Same ORA-12516 error.

Running lsnrctl service SERVICENAME showed that the connection turned to the blocked state (from a ready state) at around 50 processes. “That’s strange,” I thought, so I double-checked init.ora, bounced the server, and ran the test.

Same ORA-12516 error. You’ve heard of the ‘definition of insanity,’ right? ;)

OK, long story short, not being a DBA, I didn’t realize that–in some cases–the init.ora file doesn’t appear to be used. To find out if that’s the case, run sqlplus / as sysdba, then type the following:

SQL> show parameter spfile;

If an spfile is in fact in use, the full path will be shown in the VALUE column. If a valid file path is shown, simply update your processes in sqlplus thusly:

SQL> alter system set processes=100 scope=spfile;

You’ll need to restart the server for everything to take effect.

I then re-ran the tests and everything worked perfectly.

I love happy endings…

January 10, 2009

On the Trials of Using Websphere's JMS Provider

Posted in Configuration, Java at 2:22 AM by Kirk True

I was recently tasked with getting a stand-alone JMS-based client running using Websphere’s built-in JMS provider. I was confident that with just a bit of administration, a set of JAR files, and some jndi.properties values I could get something up and running in an hour or maybe two…

After about twelve hours I finally finished a prototype. The Java code was exceedingly simple and non-Websphere specific, but the Websphere-specific administration and settings were a nightmare. Here I’ve retraced my steps in case anyone else ever needs to do this again.

Pay very, very, very close attention to Websphere’s OS-specific requirements regarding system dependencies.

Hurdle 1: Installation

Getting the Software

The environment that I was using was a Fedora 8-based server and client. On the server we’re running Websphere 6.1 on the server as it matches what my client is running. On the server I’m using IBM’s JRE (of course) while on the client I wanted to specifically try to be JRE-agnostic, so I chose to use a late-model Sun 1.6 JRE.

The first piece of software to grab is Websphere itself. This was downloaded as was.cd.6100.trial.base.linux.ia32.tar.gz.

To satisfy Websphere’s requirements for Fedora simply run yum via root:

    # yum -y install compat-libstdc++-* compat-db

Lastly, you’ll need to get the so-called IBM Client for JMS on J2SE with IBM WebSphere Application Server installation package. This is downloaded as sibc_install-o0810.09.jar.

Installation

Installation is graphical by default. I tried to figure out how to install Websphere in some sort of console mode but all my efforts proved fruitless. (It is supposed to have a console and silent install mode, I just couldn’t get it to work.) So I begrudgingly fired up a VNC session to get to a UI.

First, extract was.cd.6100.trial.base.linux.ia32.tar.gz to a temporary directory and then run the script named launchpad.sh. Simply click to install the trial, using the default location of /opt/IBM/WebSphere/AppServer. Run the first steps console to perform “Installation verification” or to “Start the server.”

Note, if clicking the installer link in the UI appears to do nothing, please double-check that you’ve installed any OS-specific dependencies. This step took about two hours to realize why it wasn’t starting. Since Fedora isn’t an officially supported OS, I used the RHEL 4 directions.

At this point you can connect to Websphere’s “Integrated Solutions Console” (AKA the administration front-end) from another machine via the browser:


http://host:9060/ibm/console

Please substitute host with the host on which Websphere is installed.

Once you get the “Integrated Solutions Console” UI up, you’ll need to log in before you do anything else.

Hurdle 2: Configuring Websphere’s JMS Provider

Websphere’s administration UI is fairly clean, but knowing what you’re supposed to do to make it work is something else altogether. Note: there doesn’t seem to be any centralized documentation for Websphere. Every other question I had resulted in a 5-30 minute Google session to find the answer.

Configuring the Bus

The first thing to do is to configure Websphere’s “bus.” You have to do this before you set up any JMS-specific settings.

So – in the “Integrated Solutions Console” UI, expand the “Service Integration” node and then click “Buses”. Click “New” to create a new bus and name it simply “MyBus” and then click “Next” and “Finish”.

Note: You’ll be seeing the message “Changes have been made to your local configuration” a lot. Simply click the link to “Save” the configuration. We’ll restart the server from the command line once we’re all finished.

Next, go to the “MyBus” details screen. Click the link named “Bus members” and then “Add”. You’ll be asked to specify a “Server”, “Cluster”, or “WebSphere MQ server”. Select “Server” and click “Next”. Leave the “type of message store” as its default (“File store”) and click “Next”. The “message store properties” can be left as-is; click “Next” and then “Finish”. Don’t forget to “Save” the configuration as described above.

Next, go to the following screen: “Buses”, then “MyBus”, and then “Destinations”. Click “New” to create a new Destination. When prompted to “Select destination type”, choose “Topic space”, and click “Next”. Enter the “Identifier” “MyTopicSpace” and click “Next”. Leave the “Assign the queue to a bus member” select as-is and click “Next”, then “Finish”, and then save the configuration.

Configuring JMS Resources

In the “Integrated Solutions Console” UI, expand the “Resources” node in the UI, expand the “JMS” node, and then click “JMS providers”. First, select a “Scope” of “Node=Node01, Server=server1″ where is the host on which Websphere is installed. After the page has refreshed, click the link “Default messaging provider”. Click “Connection factories” and then “New”. We’re going to leave most of the options blank, but do enter the “Name” as “MyConnectionFactory” and the “JNDI name” as “jms/MyConnectionFactory”. For the “Bus name” select “MyBus” and then wait for the screen to refresh. Next enter the value “host:7276″ for the “Provider endpoints” option where host is the host on which Websphere is installed. Next, click “OK” and then save the configuration.

Next, expand the “Resources” node in the UI, expand the “JMS” node, and then click “JMS providers”. The “Scope” of “Node=Node01, Server=server1″ should still be selected. Click the link “Default messaging provider”, then click “Topics”, and then “New”. We’re also going to leave most of these options blank, but do enter the “Name” as “MyTopic” and the “JNDI name” as “jms/MyTopic”. For the “Bus name” select “MyBus” and then wait for the screen to refresh. Next select the “Topic space” named “MyTopicSpace” and wait for the screen to refresh. Next, click “OK” and then save the configuration.

Restarting Websphere

Don’t forget to restart! We need to restart so that our new configuration will become effective.

Simply drop into a terminal window and execute the following:

    # /opt/IBM/WebSphere/AppServer/bin/stopServer.sh server1 && sleep 5 && /opt/IBM/WebSphere/AppServer/bin/startServer.sh server1

Hurdle 3: Developing a Stand-alone Java Client

We’re finally ready to tackle wiring up our client to our new JMS server. Fortunately, this piece is pretty straightforward ;)

Installing the Client-side Libraries

It took me quite a bit of time to find the necessary client-side files to talk to Websphere’s JMS provider. These aren’t simply a set of JARs in the Websphere installation somewhere. They’re not up on any Maven repository or anything. Remember that scary sounding file that we downloaded before — sibc_install-o0810.09.jar? That file isn’t the JMS libraries but an installer for the libraries. So, on the client, run the installer thusly:

    $ java -jar /tmp/sibc_install-o0810.09.jar jms_jndi_sun -silent /tmp/jms

Note that the above assumes you’re running a Sun-based JRE on the client. If you’re running under IBM’s JRE you’ll need to use jms_jndi_ibm instead of jms_jndi_sun.

In the /tmp/jms/lib directory are the three JARs you’ll need to have in your CLASSPATH.

(Man, I’m so spoiled by Maven.)

Configuring JNDI Access to JMS

The last bit of the puzzle is the JNDI context configuration to connect to the JMS server. Although it took me about three hours to find these settings, they’re all that are needed. Simply put the following entries in jndi.properties (or Spring or wherever):

    java.naming.provider.url=iiop://stampy:2809
    java.naming.factory.initial=com.ibm.websphere.naming.WsnInitialContextFactory
    com.ibm.CORBA.ORBInit=com.ibm.ws.sib.client.ORB

That should be all you need.

Conclusion

My hat goes off to all Websphere administrators and developers. I have apparently been enjoying a very carefree existence in the lightweight/Spring/POJO world for the last five or so years.

If there’s one thing I’ve learned from this experience, it is the value of patience ;)

December 4, 2008

How to Predict Layoffs Using Google and LinkedIn (or "Adobe Lays Off 10% of Workforce")

Posted in Miscellaneous at 8:27 PM by Kirk True

This post will be a slight diversion from the usual technical talk…

My contract at Adobe (my main client for the last six months) was set to conclude at the end of this December. However, a series of events last week (before Adobe’s layoffs were announced) lead me to suspect it would be ending sooner, and much more abruptly…

Last week I happened to be searching for something in Google via the Firefox toolbar. (I can’t even remember what it was now, to be honest.) I tried a couple of different searches, and as I did I began to notice that as I typed in “adobe lay” the first suggested query was repeatedly “adobe layoff 2008″. Curious, I thought… I hadn’t heard of any layoffs, so I decided to search for “adobe layoff 2008.” But even more curiously, all of the results in the first page returned results about the 2005 layoffs after the Macromedia merger, with nothing about any layoffs this year. “Huh,” I thought and then dismissed it.

A day or two later I got an automated email from LinkedIn that my client manager at Adobe had updated his LinkedIn profile substantially. He’d filled in details, elucidating his responsibilities and achievements, etc. Now, I’ve come to realize one thing as a LinkedIn user – if you see someone a) update his profile summary in any significant way, or b) start receiving recommendations from colleagues, it means only one thing: he/she is looking for a job.

On Monday, I thought about sending an email to him to ask if he was looking, but decided against it…

Wednesday afternoon I found out that Adobe trimmed 10% of its employees as well as many service contractors (that’s me ;) ). I was sad, but not entirely surprised. Fortunately the team for whom I was working sustained only one FTE casualty – my client manager.

So, as of this Friday (tomorrow), my relationship with Adobe is more than likely ended. I say “likely” as there’s some questions about continuing, but I’m going to be proactive and start looking for another project. Take a look at my LinkedIn profile if you’re interested to see how I might help your company.

Lastly, I wanted to say that I really enjoyed having Adobe as client. I’m confident that the Genesis project could be the RIA killer app and wish the team the best. I unreservedly recommend those with whom I worked at Adobe:

I’ll do my best to post my individual recommendations for them via LinkedIn. But don’t take that to mean they’re looking for a job ;)

Update: Adobe did not terminate my contract earlier than its original end date (December 31st, 2008). So that gives me a little bit of time to find another project.

September 17, 2008

Simple Load Balancing using Apache mod_proxy_balancer in Just Three Steps

Posted in Configuration at 10:20 PM by Kirk True

Apache’s mod_proxy_balancer is a very easy to set up load balancer.

You could say that I was born with a hardware-based load balancer, as I have never personally been involved in the purchasing, evaluation, administration, and so forth of production load balancers. This was left up to the ops team, not the engineers. That said I found setting up a rudimentary load balancing development environment with mod_proxy_balancer to be trivial. My use was limited to being able to verify that my client’s web application was designed correctly to scale past a single server instance in as short a time as possible. For that goal, it met my needs.

To set up a trivial load balancing environment, you really only need three machines – one web server (via which we’ll load balance) and two or more backing web application servers. And there are only three steps to do it…

First, double check that mod_proxy_balancer is provided with your Apache web server and is installed. I found that it was in my Fedora-based installation, though I’ve seen reports of other configurations not having this by default. There are a couple of ways to verify the module is installed, but one way is:

# apachectl -t -D DUMP_MODULES 2>&1 | grep proxy_balancer_module

And another is to look for “LoadModule proxy_balancer_module modules/mod_proxy_balancer.so” in your httpd.conf file.

Second, specify the web application servers over which to load balance by opening your httpd.conf file and adding these lines:

ProxyPass / http://app.example.com/

<Proxy http://app.example.com/>
BalancerMember http://app-01.example.com:8080
BalancerMember http://app-02.example.com:8080
</Proxy>

Naturally, you’re going to want to change the host names, ports, etc. to be applicable for your environment. Note: IP addresses work just fine, too.

Third, restart Apache and verify that your “worker” servers are receiving and responding to the requests. (The Apache ‘error log’ has some nice output too if you’ve entered something wrong on configuration.)

That’s it. You’ve just set up a simple load balancing environment.

In my case, my web application servers are identical and my needs were fairly simple. However, mod_proxy_balancer includes options for weighting based on request counts, request byte sizes, and static factoring. It also has an optional module to support dynamic reconfiguring of the balanced machines as well. All in all, very recommended for development use.

Next page

Follow

Get every new post delivered to your Inbox.