Made in China: Country’s new supercomputer uses homegrown chips

China is stepping up its semiconductor manufacturing efforts and using domestic chips for its latest supercomputer. It’s going to be interesting to see how fast China can close in on U.S. supercomputer processor makers Intel, AMD, and Nvidia.

The New York Times reported that a supercomputer called Sunway BlueLight MPP, was installed in September at the National Supercomputer Center in Jinan, China. The details emerged at a technical meeting. The real catch is that China used 8,700 ShenWei SW1600 chips.

Those semiconductors are homegrown and indicate that China is aiming to be a major chip player. The New York Times story was mostly sourced to Jack Dongarra, a computer scientist at the University of Tennessee, but Chinese sites reported on the technical meeting. Dongarra helps manage the list of Top 500 supercomputers. China’s previous supercomputers used Intel and Nvidia chips.

Meanwhile, ZDNet UK highlighted the blog of Hung-Sheng Tsao, founder of HopBit GridComputing, who posted the slides detailing the Sunway BlueLight MPP, which come from covered China’s supercomputing powwow extensively this week.

ZDNet UK’s Jack Clark noted:

According to (Tsao’s) slides, which appear to be from a presentation describing the computer’s capabilities, the ShenWei Sunway BlueLight MPP has 150TB of main storage and 2PB of external storage. Each ShenWei SW1600 processor is 64-bit, has 16-cores and is RISC-based.

Here’s a Google Translate link offering more details via IT168.

The Wall Street Journal noted that the China domestic supercomputing effort is very credible and signals an effort to cut the country’s reliance on western companies. It’s unclear whether China’s chips are completely original blueprints or based on a previous design. One issue for the Sunway chips is power consumption. The Sunway supercomputer apparently doesn’t need that much power relative to rivals.

The New York Times added that that ShenWei chip appears to be based “on some of the same design principles that are favored by Intel’s most advanced microprocessors.”

China’s efforts appear to be a few generations behind, but rest assured the country will try to close any gaps quickly.

This story was originally posted at ZDNet’s Between the Lines under the headline “China steps up its semiconductor game with homegrown supercomputer effort.”

Chinese take out U.S. in supercomputer ranking, named world’s fastest

Chinese supercomputer named world’s fastest

November 14, 2010

China overtook the United States at the head of the world of supercomputing on Sunday when a survey ranked one of its machines the fastest on the planet.

Tianhe-1, meaning Milky Way, achieved a computing speed of 2,570 trillion calculations per second, earning it the number one spot in the Top 500 ( survey of supercomputers.

The Jaguar computer at a US government facility in Tennessee, which had held the top spot, was ranked second with a speed of 1,750 trillion calculations per second.

Tianhe-1 does its warp-speed “thinking” at the National Centre for in the northern port city of Tianjin — using mostly chips designed by US companies.

Another Chinese system, the Nebulae machine at the National Supercomputing Centre in the southern city of Shenzhen, came in third.

The still dominates, with more than half of the entries in the Top 500 list, but now boasts 42 systems in the rankings, putting it ahead of , France, Germany and Britain.

It is not the first time that the United States has had its digital crown stolen by an Asian upstart. In 2002, Japan made a machine with more power than the top 20 American computers put together.

The supercomputers on the Top 500 list, which is produced twice a year, are rated based on speed of performance in a benchmark test by experts from Germany and the United States.

More information: http://www.physorg … omputer.html

(c) 2010 AFP

Watch Video

Play Video

A Chinese supercomputer has been ranked the world’s fastest machine in a list issued by US and European researchers. The move highlights China’s rapid progress in the field.

The Tianhe-1A system at the National Supercomputer Center in Tianjin, is capable of sustaining computation at 2.57 quadrillions of calculations per second. As a result, the former number one system — the US Department of Energy’s, Jaguar, in Oak Ridge, is now ranked second.

The third place is also held by a Chinese system called Nebulae, and it’s located, at the National Supercomputing Center in south China’s city of Shenzhen.

File photo of China’s world-leading supercomputer, Tianhe-1A. (Xinhua File

Related Reading:

Chinese supercomputer ranked world’s fastest

WASHINGTON/SAN FRANCISCO, Nov. 14 (Xinhua) — A Chinese supercomputer was ranked the world’s fastest machine in a list issued Sunday by U.S. and European researchers, highlighting China ‘s rapid progress in the field. Detial >>

Highlights of TOP500 supercomputers

WASHINGTON, Nov. 14 (Xinhua) — A Chinese supercomputer was ranked the world’s fastest machine in the TOP500 list issued Sunday by U.S. and European researchers. The following are highlights of the list: Detail >>

Highlights of Top10 supercomputers

GTON, Nov. 14 (Xinhua) — A Chinese supercomputer was ranked the world’s fastest machine in a list issued Sunday by U.S. and European researchers. The following are highlights from the top 10 supercomputer in the list. Detail >>

Chinese take out U.S. in supercomputer ranking

The Jaguar has fallen from the top of the food chain.

When the Top 500 list of the world’s most powerful supercomputers is released today, the Cray XT5 system at Oak Ridge National Laboratory and run by the University of Tennessee, called “Jaguar,” will drop to No. 2 after a year of eating the lunch of every other supercomputer in the world. In its place will stand Tianhe-1A, a system built by China’s National University of Defense Technology, located at the National Supercomputing Center in Tianjin.

Tianhe-1A achieved a performance level of 2.67 petaflop/s (quadrillions of calculations per second). Jaguar achieved 1.75 petaflop/s. Third place went to another Chinese-built system, called Nebulae, which achieved 1.27 petaflop/s.

And while the news of China’s achievement is not exactly a surprise, the supercomputing community in the U.S. is looking at it two ways: as both as an assurance that U.S. software and components are still elite in their field, and a wake-up call that the country’s prestige in high-performance computing is not a given.

“This is what everybody expected. What the Chinese have done is they’re exploiting the power of GPUs (graphic processing unit) which are…awfully close to being uniquely suited to this particular benchmark,” said Bill Gropp, computer science professor at the University of Illinois Urbana-Champagne, and co-principal investigator of the Blue Waters project, another supercomputer in the works.

The benchmark he’s speaking of is the Linpack, which tests the performance of a system for solving a dense system of linear equations. It’s measured in calculations or floating point operations per second, hence flop/s. Not everyone in this field agrees it’s the best possible way to compare machines, but it is one way.

By using GPUs to accelerate the performance of the Tianhe-1A, the machine can achieve more floating point operations per second.

“The way most of us look at the Chinese machine, is it’s very good at this particular problem (the Linpack benchmark), but not problems the user community is interested in,” said Gropp.

For those worried that this is a blow to the United States’ leadership in supercomputing, it’s actually not a huge cause for alarm if you consider the provenance of the pieces of the Chinese system. Tianhe-1A is a Linux computer built from components from Intel and Nvidia, points out Charlie Zender, professor of Earth Systems Science at the University of California at Irvine.

A timeline of supercomputing speed. (Click to enlarge.) A timeline of supercomputing speed. (Click to enlarge.) 

(Credit: AMD)

“So we find ourselves admiring an achievement that certainly couldn’t have been done without the know-how of Silicon Valley…and an operating system designed mostly by the United States and Europe,” Zender said. “It’s a time for reflection that we are now at a stage where a country that’s motivated and has the resources can take off-the-shelf components and assemble the world’s fastest supercomputer.”

Supercomputers will likely get faster every year, points out Jeremy Smith, director of the Center for Molecular Biophysics at the University of Tennessee, so China’s rise to the top this month isn’t the end of the story. The list will likely be reordered again in June, when the next edition of the Top500 is released.

“What you find historically with these supercomputers is they become the normal machines five or 10 years later that everybody uses,” said Smith, who oversees some projects run on Jaguar. “The Jaguar machine that we’re so amazed at right now, it could be every university or company has one” eventually.

And of course these high-performance computer systems aren’t just made to race each other, most scientists in the field would argue. They’re made to solve complex problems, with eventual real-world consequences like climate change and alternative fuel production.

Smith argues that research like what’s being done on Jaguar to solve the problem of superconductivity at high temperatures couldn’t necessarily be done on Tianhe-1A effectively because it requires very efficient computing and coming up with the software on a computer to do that well is difficult.

But what China has accomplished is still important for supercomputing, argues Gropp, who called the number of flop/s Tianhe-1A achieved “remarkable.”

“I don’t want to downplay what they’ve done,” he said. “It’s like pooh-poohing the original Toyota. The first Toyota was a pile of junk. But a few years later they were eating our lunch.”

It’s not the first time that a non-U.S. machine has topped the rankings–the Japanese NEC Earth Simulator did it in 2004. The U.S. of course bounced back, and as of today has 275, or more than half of the systems, on the Top 500 list. China is next with 42 systems, and Japan and Germany are tied with 26 each. Still, there is concern that China’s focused concentration of resources on supercomputing is fomenting a threat to the U.S.’ long-term dominance there. But just trying to score the highest on the Linpack benchmark–something that any group of researchers with enough money could do fairly easily–is short-sighted.

“What we should be focusing on is not losing our leadership and being able to apply computing to a broad range of science and engineering problems,” said Gropp, who is also deputy director of research at UI’s Institute for Advanced Computing Applications and Technologies.

The Presidential Council of Advisors on Science and Technology (PCAST) is currently working on a report that addresses this exact topic, and didn’t have a comment when contacted. Recently PCAST did release a draft of a document that calls for more funding for scientific computing very soon after news of Tianhe-1A’s speed began to spread. And President Barack Obama weighed in briefly on the topic in a speech two weeks ago, calling for increased science funding specifically for high-performance computing.

But it’s not as if the supercomputing community in the U.S has been sitting still while China sneaked up behind them. There are other projects in the works at U.S. labs that are planning on blowing Jaguar and Tianhe-1A out of the water in terms of speed.

Currently the University of Illinois Urbana-Champagne and the National Science Foundation is building Blue Waters, a supercomputer that researchers say will be the fastest in the world when it is turned on sometime next year.

The Department of Energy, which owns Oak Ridge’s Jaguar supercomputer, is already looking at moving from the current peta-scale computing (a quadrillion floating point operations per second) to exa-scale computing (a quintillion floating point operations per second), a speed a thousand times faster than Jaguar is currently capable of processing at. It’s a goal that’s still a ways out there, but the work is under way.

“To get there in the next five to 10 years, to get to 10 million cores in one room, is a major technical challenge,” noted University of Tennessee’s Jeremy Smith. “It’s going to be fundamentally different than before. It’s a hardware problem, and getting the software working is a major challenge indeed.”

For more statistics on the systems in the Top500 list, please see


Erica Ogg is a CNET News reporter who covers Apple, HP, Dell, and other PC makers, as well as the consumer electronics industry. She’s also one of the hosts of CNET News’ Daily Podcast. In her non-work life, she’s a history geek, a loyal Dodgers fan, and a mac-and-cheese connoisseur. E-mail Erica.

Top 500 supers: China rides GPUs to world domination

The People’s Republic of Petaflops

By Timothy Prickett Morgan

SC10 If the June edition of the bi-annual ranking of the Top 500 supercomputers in the world represented the dawning of the GPU co-processor as a key component in high performance computing, then the November list is breakfast time. The super centers of the world are smacking their lips for some flop-jacks with OpenCL syrup and some x64 bacon on the side.

China has the most voracious appetite for GPU co-processors, and as expected two weeks ago when the Tianhe-1A super was booted up for the first time, this hybrid CPU-GPU machine installed at the National Supercomputer Center in Tianjin has taken the top spot on the Top 500 list with a comfortable margin. Tianhe-1A’s final rating on the Linpack Fortran matrix math benchmark test is 4.7 petaflops of peak theoretical performance spread across its CPUs and GPUs (with about about 70 per cent of that coming from the GPUs) and 2.56 petaflops of sustained performance on the Linpack test.

The Tianhe-1A machine is comprised of 7,168 servers, each equipped with two sockets using Intel’s X5670 processors running at 2.93 GHz and one Nvidia Tesla M2050 fanless GPU co-processor. The resulting machine spans 112 racks, and it would make a hell of a box on which to play Crysis.

While 47 per cent of the floating-point oomph in Tianhe-1A disappears into the void where all missed clock cycles go (it’s also where missing socks from the dryer cavort), the GPU’s flops are relatively inexpensive and the overall machine should offer excellent bang for the buck – provided workloads can scale across the ceepie-geepie of course. The Tianhe-1A super uses a proprietary interconnect called Arch, which was developed by the Chinese government. The Arch switch links the server nodes together using optical-electric cables in a hybrid fat tree configuration and has a bi-directional bandwidth of 160 Gb/sec, a latency for a node hop of 1.57 microseconds, and an aggregate bandwidth of more than 61 Tb/sec.

China's Tianahe-1A SupercomputerThe Tianhe-1A GPU-GPU hybrid super 

This is not the first ceepie-geepie machine that the National Supercomputer Center has put together. A year ago, the Tianhe-1 machine broke onto the Top 500 list using Intel Xeon chips and Advanced Micro Devices Radeon HD 4870 GPUs (no Tesla GPUs, but actual graphics cards). This initial “Milky Way” box (that’s what “Tianhe” translates to in English) had 71,680 cores and had a peak theoretical performance of 1.2 petaflops and a sustained performance of 563.1 teraflops. The efficiency of this cluster was 53 per cent, sustained over peak performance.

Jaguar dethroned

The “Jaguar” XT5 system at the US Department of Energy’s Oak Ridge National Laboratory was knocked out of the top spot by Tianhe-1A, which is what happens when a cat stands still in the GPU era of HPC. The Jaguar machine has 224,162 Opteron cores spinning at 2.6 GHz and delivers 1.76 petaflops of performance on the Linpack test. This Cray machine links Opteron blade servers using its SeaStar2+ interconnect, which has been superseded by the new “Gemini” XE interconnect in the XE6 supers that started rolling out this summer.

If Oak Ridge moved to twelve-core Opteron 6100 processors and the XE6 interconnect, it could have doubled the performance of Jaguar and held into the Top 500 heavyweight title. One other thing to note: The Jaguar machine is 75.5 per cent efficient on the Linpack benchmark, a lot better than the Tianhe-1A ceepie-geepie.

The “Nebulae” ceepie-geepie built from six-core Intel Xeon 5650 processors and Nvidia M2050 GPUs that made its debut on the June 2010 Top 500 list got knocked down from number 2 to number 3 on the list. The Nebulae machine, which is a blade server design from Chinese server maker Dawning, is installed at the National Supercomputing Center in Shenzhen. It is rated at 1.27 sustained petaflops at 43 per cent efficiency against peak theoretical performance.

Number four on the list is also a ceepie-geepie, it is the upgraded Tsubame 2 machine at the Tokyo Institute of Technology. (That’s shortened to TiTech rather than TIT, which would be where you’d expect a machine called Milky Way to be located. But we digress). The Tsubame 2 machine is built from Hewlett-Packard’s SL390s G7 cookie sheet servers, which made their debut in early October. TiTech announced the Tsubame 2 deal back in May, and this machine includes over 1,400 of these HP servers, each with three M2050 GPUs from Nvidia.

The Tsubame 2 machine has 73,278 cores and is rated at 2.29 peak petaflops and delivered 1.19 petaflops of sustained performance on the Linpack test. That’s a 52 percent efficiency, about what the other ceepie-geepies are getting. By the way, the prior Tsubame 1 machine was based on x64 servers from Sun Microsystems, with floating point accelerators from Clearspeed in only some of the nodes. And one more thing: Tsubame 2 runs both Linux and Windows, and according to the Top 500 rankers, both operating systems offer nearly equivalent performance.

In the Hopper

The fifth most-powerful super in the world based on the Linpack tests (at least the ones we know about) is a brand new box called Hopper. Installed at the US DOE’s National Energy Research Scientific Computing center, Hopper is a Cray XE6 super using that new Gemini interconnect and twelve-core Opteron 6100 processors – no fancy schmancy GPU co-processors. (Well, at least not yet, anyway.) Hopper has 153,408 cores spinning at 2.1 GHz and delivers 1.05 petaflops of sustained performance with an efficiency of 82 per cent.

If it is not yet obvious, there is a bottleneck in getting parallel supercomputer nodes to talk through their networking stacks running on their x64 processors and out over the PCI-Express 2.0 bus. If Nvidia or AMD want to do something useful, embedding a baby x64 processor inside of a GPU co-processor along with a switchable 10 Gigabit Ethernet or 40 Gb/sec InfiniBand port would make a very interesting baby server node. Throw in cache coherence between the x64 and GPU processors and maybe getting to 50 petaflops won’t seem like such a big deal.

The Bull Tera-100 super at the Commissariat a l’Energie Atomique in France, is based on Intel’s Xeon 7500 high-end processors and Bull’s bullx supercomputer blades and ranks sixth in the world. The machine uses QDR InfiniBand to lash the nodes together, and is rated at 1.05 petaflops. This machine does not have GPUs in it from either AMD or Nvidia, and neither does number eight, the Kraken XT5 super from Cray that is owned by the University of Tennessee and which is operated by DOE’s Oak Ridge National Laboratory. Kraken delivers 831.7 teraflops of sustained Linpack performance, unchanged from when it came onto the list a year ago.

Number seven on the list, the Roadrunner Opteron blade system at Los Alamos National Laboratory (another DOE site) does use accelerators, but they are IBM’s now defunct Cell co-processors, which are based on IBM’s Power cores and which have eight vector math units per chip. While the Roadrunner machine demonstrated the viability of co-processors to push up to the petaflops. But Roadrunner is stalled at 1.04 petaflops, is probably not going to be upgraded, and is therefore uninteresting even if it will do lots of good work for the DOE. (If you consider designing nuclear weapons good work, of course.)

Number nine on the list is the BlueGene/P super, named Jugene, built by IBM for the Forschungszentrum Juelich in Germany, which debuted at number three at 825.5 teraflops on the June 2009 list and hasn’t changed since then. Rounding out the top ten on the Top 500 list is the Cielo Cray XE6 at Los Alamos, a new box that is rated at 816.6 teraflops of sustained Linpack performance.

GPU is my co-pilot

On the November 2010 list, there are 28 HPC systems that use GPU accelerators, and the researchers who put together the Top 500 for the 36th time – Erich Strohmaier and Horst Simon, computer scientists at Lawrence Berkeley National Laboratory, Jack Dongarra of the University of Tennessee, and Hans Meuer of the University of Manheim – consider IBM’s Cell chip a GPU co-processor. On this list, there are sixteen machines that use Cell chips to goose their floating point oomph, with ten using Nvidia GPUs and two using AMD Radeon graphics cards.

The Linpack Fortran matrix benchmark was created by Dongarra and colleagues Jim Bunch, Cleve Moler, and Pete Stewart back in the 1970s to gauge the relative number-crunching performance of computers and is the touchstone for ranking supercomputers.

There are three questions that will be on the minds of people at the SC10 supercomputing conference in New Orleans this week. The first is: Can the efficiency of ceepie-geepie supers be improved? The second will be: Does it matter if it can’t? And the third will be: At what point in our future will GPUs be standard components in parallel supers, just like parallel architectures now dominate supercomputing and have largely displaced vector and federated RISC machines?

To get onto the Top 500 list this time around, a machine had to come in at 31.1 teraflops, up from 24.7 teraflops only six months ago. This used to sound like a lot of math power. But these days, it really doesn’t. A cluster with 120 of the current Nvidia Tesla GPUs with only half of the flops coming through where the CUDA meets the Fortran compiler will get you on the list. The growth is linear, then on the June list next year, you will need something like 40 teraflops or about 150 of the current generation of GPUs. And with GPU performance on the upswing, maybe the number of GPUs in a ceepie-geepie to get onto the Top 500 list might not require so many GPUs.

Core counting

As has been the case for many years, processors from Intel absolutely dominate the current Top 500 list, with 398 machines (79.6 per cent of the boxes on the list). Of these, 56 machines are using the Xeon 5600 processors, one is still based on 32-bit Xeons, one is based on Core desktop chips, five are based on Itanium processors, and three are based on the new high-end Xeon 7500s.

In the November 2010 rankings, there are 57 machines using AMD’s Opteron processors, while there are 40 machines using one or another variant of IBM’s Power processors. While the machine counts are low for these two families of chips, the core counts sure are not because of the monster systems that are based on Power and Opteron chips.

There are 1.41 million Power cores on the Top 500 list this time around, which was 21.5 per cent of the total 6.53 million cores inside of the 500 boxes and which represented 7.35 aggregate petaflops or 11.2 per cent of the total 65.8 petaflops on the list. There are 1.54 million Opteron cores (23.5 per cent of cores) on the aggregate list for 14.2 peak petaflops (21.6 per cent of total flops)

None of these core counts include the GPU core counts, which is something that the Top 500 people should reconsider, even though in all cases the flops are counted.

Across all processor architectures, there are 365 machines using quad-core processors and 19 already are using CPUs with six or more processors per socket. It is safe to say that the HPC market will eat whatever number of cores the chip makers can bake.

There are two Sparc-based supers on the current Top 500 list and the Earth Simulator super built by NEC for the Japanese government is still barely on the list (and will probably be knocked off on the next list in June 2011).

Xeon rides the wave

Having said all of that, the 391 machines using Intel’s Xeon processors represent the belly of the Top 500 list. With a total of 3.5 million cores (53.5 per cent of the total core count on the list) and 43.2 petaflops of number-crunching oomph (65.8 per cent of total flops), the Xeon is the champion of the top-end HPC world. Of course, the Xeon CPUs is getting credit for flops that are being done by GPUs in many cases.

In terms of core count, there are 289 machines that have between 4,096 and 8,192 cores, and 96 machines that have from 8,192 to 16,384 cores. You need more than 1,000 cores to make the list, and there are only two boxes that have fewer than 2,048 cores and only 61 have between 2,048 and 4,096 cores. The system count drops off pretty fast above this core count, with 52 machines having more than 16,384 cores.

The Top 500 list is pretty evenly split between Ethernet, with 226 machine, and InfiniBand of various speeds, at 226 machines. The remaining machines are a smattering of Myrinet, Quadrics, Silicon Graphics NUMAlink, and Cray SeaStar and Gemini interconnects. There were seven machines on the list using 10 Gigabit Ethernet for lashing nodes in parallel supers together, and 29 used 40 Gb/sec (QDR) InfiniBand

By operating system, Linux in its various incarnations dominates the list, with 450 out of 500 machines running it. Unix accounted for 20 machines, Windows five machines, and the remainder were running mixed operating systems. If Microsoft wanted to catch a new wave, it would work to get the best possible GPU runtime and programming tools to market. Just tweaking the MPI stack in Windows HPC Server 2008 R2 to get rough parity with Linux is not going to make a dent at the big supercomputer centers of the world. Then again, Microsoft is trying to move into the HPC arena from the technical workstation up, and it has other advantages that Linux platforms do not in this regard.

IBM has the most systems on the November 2010 Top 500 list, with 199 boxes (39.8 per cent of the total) and 17.1 petaflops (26 per cent of the total flops on the list) of aggregate peak performance on the Linpack test. Big Blue is followed up by Hewlett-Packard, with 158 machines and 11.7 petaflops, which works out to 31.6 per cent of machines and 17.8 per cent of total flops. Cray has only 29 machines on the current super ranking, which is 5.8 per cent of machines but 16.3 per cent of peak floating point power. Silicon Graphics has 22 machines on the list, which is 4.4 per cent of boxes and 4.5 per cent of aggregate flops. Dell has 20 boxes on the list and its hand in a few mixed boxes as well, and Oracle, Fujitsu, NEC, and Hitachi all have a handful of machines, too.

Supercomputing is inherently political (especially so given where the funding for the upper echelon of the Top 500 list comes from), and countries most certainly measure each other up in their HPC centers. The United States leads with machine count, at 275 machines with a combined 31.5 petaflops, and China has jumped well ahead of Japan to become the solid number two, with 42 machines and 12.8 petaflops in total across those machines. Japan has 26 machines that add up to 4.6 petaflops, and Germany’s 26 machines have an aggregate of 3.5 petaflops. The United Kingdom is close behind with 24 machines, for a total of 2.2 petaflops, followed by Russia with 11 machines and 1.1 petaflops. ®

Newscribe : get free news in real time

Jaguar’s supercomputing reign coming to an end?

A timeline of supercomputing speed. 

A timeline of supercomputing speed.(Credit: AMD)
The Jaguar supercomputer, housed at the Oak Ridge National Laboratory at the University of Tennessee, has been the fastest supercomputer on the planet for almost a year. But is it about to lose that title and place atop the podium?

Every six months, the Top500 project releases the rankings of the most powerful supercomputers. The current pace of technology development means the list does tend to reorder every half a year or so. But Jaguar has been poised at the top of the food chain for almost a year. Though the Top500 list doesn’t get released until next week, it’s been widely assumed that Jaguar will be taken down by a supercomputer built by China’s National University of Defense Technology, located at the National Supercomputing Center in Tianjin.

Jaguar narrowly avoided being overtaken in June, the last time the rankings were released. The Nebulae supercomputer, located at the National Supercomputing Center in Shenzhen, came in second, achieving 1.271 petaflops/s (1.271 quadrillion floating point operations per second) running something called the Linpack benchmark.

But it appears that Jaguar’s lead has been overcome this time. There have been reports about it over the last few weeks, and President Barack Obama even mentioned it during a speech last week:

“And we just learned that China now has the fastest supercomputer on Earth–that used to be us. They’re making investments because they know those investments will pay off over the long term,” he said.

The supercomputers are ranked on many factors, but the the Top500 list is ordered based on the results of the Linpack benchmark. Even if it places the Tianjin supercomputer above Jaguar, it doesn’t necessarily mean the U.S. is getting bumped from its perch atop supercomputing, argue two scientists who work at Oak Ridge.

“What you find historically with these supercomputers is they become the normal machines 5 or 10 years later that everybody uses.” 

–Jeremy Smith, Center for Molecular Biophysics

“China might have the largest number of cores in one computer, so theoretically they have the most powerful computer. But they maybe don’t have the most powerful scientific codes yet that use that computer,” said Jeremy Smith, director of the Center for Molecular Biophysics at the University of Tennessee, in an interview. “So from that perspective, they may not be at the same level as Oak Ridge.”

Jaguar is comprised of more than 250,000 AMD Opteron cores, running extremely sophisticated computer programs that try to answer complex questions like why ribosomes (components of cells that create amino acids) are dependent on magnesium, how to simulate making more environmentally-friendly ethanol out of plant material, and how to predict climate change. Jaguar’s specialty is getting all those cores running together extremely efficiently, which is a separate and perhaps harder task than just building a really powerful computer.

Smith says that the projects at Oak Ridge National Laboratory run extremely efficiently on Jaguar, and the scientific value of the computing is therefore very high.

While China’s supercomputer is based on GPUs (graphics processing unit) (in this case, built by Nvidia), and it’s faster technically because the CPU (central processing unit) uses the GPU to accelerate its speed. But if you don’t get the software to run on it properly, it’s actually harder to use, Roland Schultz, graduate student at the University of Tennessee’s Center for Molecular Biophysics, said.

What Schultz says he is much more interested in is the Gordon Bell Prize, which is awarded by the Association for Computing Machinery to the most innovative scientific application of supercomputing. Teams from Oak Ridge have won most recently in 2008 and 2009 for research into high-temperature superconductivity, or sending electricity over long distances in high temperatures with no loss of transmission.

But do we make too much of who’s faster? Smith put it in perspective.

“What you find historically with these supercomputers is they become the normal machines 5 or 10 years later that everybody uses,” said Smith. “The Jaguar machines that we’re so amazed at right now, it could be every university or company has one” eventually.

We’ll know exactly how things have shaken out next week when the Top500 List is released. But even if Jaguar does get hunted down by a Chinese supercomputer, it’s not as if the folks at Oak Ridge are sitting still. The Department of Energy, which owns Oak Ridge’s supercomputer, is already looking at moving from the current peta-scale computing (a quadrillion floating point operations per second) to exa-scale computing (a quintillion floating point operations per second), a speed one thousand times faster than Jaguar is currently capable of processing at.

“To get there in the next 5 to 10 years, to get to 10 million cores in one room, is a major technical challenge,” noted Smith. “It’s going to be fundamentally different than before. It’s a hardware problem, and getting the software working is a major challenge indeed.”


Erica Ogg is a CNET News reporter who covers Apple, HP, Dell, and other PC makers, as well as the consumer electronics industry. She’s also one of the hosts of CNET News’ Daily Podcast. In her non-work life, she’s a history geek, a loyal Dodgers fan, and a mac-and-cheese connoisseur. E-mail Erica.

Recent posts from Circuit Breaker
Jaguar’s supercomputing reign coming to an end?
iOS 4.2, where iPhone meets iPad
IT admins mourn Xserve’s death
Will the IT guy learn to love Apple?
Oakley focuses on 3D future
Report: Hurd told contractor about EDS buy
Apple retires Xserve in favor of Mac Pro Server
Apple adds calendars back to iPhoto
Is China a supercomputer threat? (Q&A)
NCSA director: GPU is future of supercomputing
Nvidia helps China to supercomputer crown
NASA images capture icy nucleus of distant comet
Researchers attack transistors to slay vampire power
SGI’s old-school supercomputer now revved up
Top-10 tech tricks we’re sick of seeing in movies
Lightbulb wars: More than just LED

by Otto Holland November 11, 2010 5:50 PM PST
The article mentioned 250,000 cores of AMD Opteron. I am curious to know if those processors are the 4 cores or the new Barcelona 12 cores on 32 NANO.
If they are the older 4 or 6 cores, they can be swaped out for the new 12 cores, because they use the same ZIFF socket. Just wondering….
Like this Reply to this comment

by rip_saw November 11, 2010 6:19 PM PST

Durr, who has the most cores and flops is meaningless now. Last I checked, Folding@home trumps the crap out of anything in China, and google’s servers totally destroy any supercomputer, although they are not being used for that purpose. I understand the use of a single computer, but for many projects, it’s just not needed.
Like this Reply to this comment

by dralw65 November 11, 2010 6:42 PM PST

This a good article that is very informative, however, the statement about ribsomes appears incorrect: ribosomes synthesize proteins from amino acids. Amino acids are not made by ribosomes.
Like this Reply to this comment

by realityenigma November 11, 2010 7:46 PM PST

When I first read this (on I was concerned myself. However, I was directed to an interesting link about a supercomputer (US built) that will be ready in 2012:

I am sure you guys can find more articles if you are interested;nevertheless, I think we can rest easy if we are worried about speed records.

Like this Reply to this comment

Newscribe : get free news in real time

Asia’s largest supercomputer production base established in Tianjin

The Specialty Association of Mathematical and Scientific Software (SAMSS), the Research and Development Center for Parallel Software, and the State Key Laboratory of Computer Science jointly published the list of the Top 10 High-Performance Computers of China on Oct. 28.

On the list, the “Tianhe-1″ supercomputer in Tianjin and the product series developed and produced by the Dawning Computer Base in Tianjin ranked at the top in calculation speed and in market share, respectively. After the list was released, Tianjin immediately attracted the attention of many computer industry insiders.

On the top-10 list published by authoritative organizations, the technically upgraded “Tianhe-1″ supercomputer jointly developed by the National University of Defense Technology and the Tianjin Binhai New Area, ranked at the top for its peak speed of 4.7 petaflops and sustained speed of 2.5 petaflops. Its peak and sustained speeds are both faster than the published records of the world’s supercomputers.

Currently, the “Tianhe-1″ has been put into operation at the National Super Computing Center in the Tianjin Bainhai New Area.

Thirty-four high-performance computers developed and produced in the Dawning Computer Base also ranked in the top 100 of the list, indicating that the base’s share in China’s high-performance computer market has exceeded 30 percent. Among the 34 computers, the Nebula, China’s first supercomputer capable of sustained computing of more than 1 petaflop, ranked second on the list with a sustained speed of nearly 1.3 petaflops. Furthermore, four Dawning series supercomputers ranked in the top-10 of the list.

The Dawning Computer Base started construction in Tianjin in July 2006. The industrial base covers an area of more than 4 hectares and can produce 100,000 servers a year after the first phase of the construction project was finished. At present, the second phase of the project has been partly finished. In the following two years, the base will be able to produce 500,000 PC servers and 2,000 high-performance computers a year and will become the largest production base of high-performance computers in Asia.

Because the Tianhe-1 has been put into use in Tianjin and the Dawning Computer Base will soon evolve into the largest production base of its kind in Asia, the city has decided to include the promotion of supercomputing applications in its 12th Five-Year Plan.

During the 12th Five-Year Plan period (2011-1015), the Tianjin Binhai New Area will provide high-performance computing support to strategic emerging industries for more technological innovations and attach equal importance to high-tech public services, the development of the information industry, and the training of information technology specialists

By People’s Daily Online

Newscribe : get free news in real time

See related earlier post:  China claims supercomputer crown, a threat?

China claims supercomputer crown, a threat?

China has claimed the top spot on the list of the world’s supercomputers.

Tianhe supercomputer, Nvidia
The Tianhe-1A supercomputer is about 50% faster than its closest rival.

The title has gone to China’s Tianhe-1A supercomputer that is capable of carrying out more than 2.5 thousand trillion calculations a second.

To reach such high speeds the machine draws on more than 7,000 graphics processors and 14,000 Intel chips.

The claim to be the fastest machine on the planet has been ratified by the Top 500 Organisation which maintains a list of the most powerful machines.

High power

China’s Tianhe-1A (Milky Way) has taken over the top spot from America’s XT5 Jaguar at the Oak Ridge National Laboratory (ORNL) in Tennessee that can carry out only 1.75 petaflops per second. One petaflop is the equivalent of 1,000 trillion calculations per second.

The news about the machine broke just before the publication of the biennial Top 500 Supercomputer list which ranks the world’s most powerful machines.

Prof Jack Dongarra from the University of Tennessee, one of the computer scientists who helps to compile the list, said China’s claim was legitimate.

“This is all true,” he told BBC News. “I was in China last week and talked with the designers, saw the system, and verified the results.”

He added: “I would say it’s 47% faster than the Oak Ridge National Laboratory’s machine, 1.7 Pflops (ORNL system) to 2.5 Pflops (Chinese system).”

Tianhe-1A is unusual in that it unites thousands of Intel processors with thousands of graphics cards made by Nvidia.

The chips inside graphics cards are typically made up of small arithmetical units that can carry out simple sums very quickly. By contrast, Intel chips are typically used to carry out more complicated mathematical operations.

The machine houses its processors in more than 100 fridge-sized cabinets and together these weigh more than 155 tonnes.

Based in China’s National Center for Supercomputing in the city of Tianjin, the computer has already started to do work for the local weather service and the National Offshore Oil Corporation.

Newscribe : get free news in real time

Is China a supercomputer threat?

Jack Dongarra, a professor at University of Tennessee's department of electrical engineering. China's supercomputer is a wake-up call.Jack Dongarra, a professor at University of Tennessee’s department of electrical engineering. China’s supercomputer is a ‘wake-up call.’

With China expected to officially take the supercomputer performance crown next month, I asked an expert about the state of supercomputing in the U.S. and whether China poses a long-term threat to the United States’ current preeminence in supercomputing.

Nvidia announced yesterday that its chips are powering the “Tianhe-1A” Chinese supercomputer that achieved 2.507 petaflops, beating a U.S.-based system that is currently ranked No. 1 on the June Top500 list of the fastest supercomputers in the world. The Chinese system is a unique hybrid design that uses approximately 7,000 Nvidia graphics chips along with 14,000 Intel Xeon CPUs. The graphics chips are what give the system the extra oomph to catapult it into the top supercomputer spot.

I spoke with Jack Dongarra, university distinguished professor at University of Tennessee’s Department of Electrical Engineering and Computer Science and part of a group from the University of Tennessee, Oak Ridge National Laboratories, and Georgia Tech that recently purchased a hybrid system. It is important to note that Oak Ridge houses the supercomputer, dubbed “Jaguar,” cited above that is currently ranked No. 1 in the world based on the Top500 June list: it is not a hybrid system.

Q: Does Oak Ridge have anything analogous to the Chinese hybrid system?
Dongarra: Oak Ridge has a small version of a machine that is hybrid in nature. So, this is an acquisition that just took place…out of a grant from the National Science Foundation. It involved Oak Ridge National Labs, University of Tennessee, and Georgia Tech. But it’s much, much smaller than the Chinese system. The machine is in place and testing is being carried out at Oak Ridge. A node has two Intel Westmere chips and three Nvidia Fermi boards. There are 120 nodes in the system.

What makes the Chinese supercomputer so fast?
Dongarra: The Chinese designed their own interconnect. It’s not commodity. It’s based on chips, based on a router, based on a switch that they produce.

Is that in essence the secret sauce?
Dongarra: It’s similar to Cray. Cray’s contribution, besides the integration and software, is the interconnect network. They have a very fast interconnect that makes that machine perform very well. Though [the Chinese] project is based on U.S. processors, it uses a Chinese interconnect. That’s the interesting part. They’ve put something together that is roughly twice the bandwidth of an InfiniBand interconnect [which is used widely in the U.S.]

Will the Chinese system in fact take the No. 1 spot on the Top500 list in November?
Dongarra: Yes. I saw the machine. I saw the output. It’s the real thing.

Why doesn’t Oak Ridge do what the Chinese are doing?
Dongarra: Oak Ridge doesn’t have the ability or technology to develop an interconnect or a router. We don’t make computers. We buy computers and use them. It’s not within our scope or mission to be in the computer design business.

What’s your advice?
Dongarra: You have to remember that you have to not only invest in the hardware. It’s like a race car. In order to run the race car, you need a driver. You need to effectively use the machine. And we need to invest in various levels within the supercomputer ecology. The ecology is made up of the hardware, the operating system, the compiler, the applications, the numerical libraries, and so on. And you have to maintain an investment across that whole software stack in order to effectively use the hardware. And that’s an aspect that sometimes we forget about. It’s underfunded. We fund the hardware but we don’t fund the other components. The ecosystem tends to get out of balance because the hardware tends to run far ahead of what we can develop in terms of software. We have machines that have a tremendous level of parallelism. We currently have a very crude way of doing programming.

Who would do that?
Dongarra: The research is performed under the auspices of the Department of Energy, the National Science Foundation, and the Department of Defense.

Is this a red flag for the U.S.?
Dongarra: Yes, this is a wake-up call. We need to realize that other countries are capable of doing this. We’re losing an advantage.

Brooke Crothers has been an editor at large at CNET News, an analyst at IDC Japan, and an editor at The Asian Wall Street Journal Weekly. He is a member of the CNET Blog Network and is not a current employee of CNET.

Newscribe : get free news in real time

Related stories

Related Internet links

China supercomputer design points to future speed kings

China’s new Nebulae Supercomputer is No. 2, right on the Tail of ORNL’s Jaguar in Newest TOP500 List of Fastest Supercomputers

Jack Dongarra, a professor at University of Tennessee’s department of electrical engineering, says graphics chips will be used increasingly in supercomputers to boost performance.

(Credit: University of Tennessee)

China has muscled into the No. 2 spot on the list of the world’s fastest supercomputers thanks, in part, to specialized Nvidia graphics chips: a technology that Intel is now pursuing to keep pace with this new trend in high-performance computing.

China’s Nebulae supercomputer is located at the recently constructed National Supercomputing Centre in Shenzhen, and achieved 1.271 petaflops/s (1.271 quadrillion floating point operations per second) running the Linpack benchmark, which put it in the No. 2 spot on the widely reported Top500 list. The latest list was formally presented Monday at the International Supercomputing Conference in Hamburg, Germany. (Jaguar, a Cray system at the Oak Ridge National Laboratory in Tennessee, retained the top spot.)

Nebulae achieved this “in part due to its Nvidia GPU (graphics processing unit) accelerators…Nebulae reports an impressive theoretical peak capability of almost 3 petaflop/s–the highest ever on the TOP500,” according to a press release Friday.

Though Nebulae also uses Intel Xeon processors, those are so-called commodity processors that are also employed in standard server computers. So, Intel–despite canceling its Larrabee graphics chip project–is pursuing a technology that leverages Larrabee R&D. On Monday, Intel said the first product of this kind, code-named Knights Corner, will be made on its future 22-nanometer manufacturing process–using transistor structures as small as 22 billionths of a meter–to pack more than 50 processing cores on a single chip.

On Tuesday, I spoke with Jack Dongarra, Distinguished Professor at University of Tennessee’s Department of Electrical Engineering and Computer Science and director of the Innovative Computing Laboratory. Dongarra introduced the LINPACK Benchmark, which is used as the primary yardstick to measure supercomputer performance.

Q: Are GPU accelerators in supercomputers a trend we’ll see more of in coming years?
Jack Dongarra: This looks like this is going to be one of the modes of high-performance computing. Taking commodity processors (such as standard Intel or AMD server-class processors) together with specialized accelerators, in this case graphics processors.

How much do GPUs generally boost performance?
Dongarra: A board by Nvidia can give an order of magnitude greater performance than the commodity processor.

But programs must be written to take advantage of this, it just doesn’t happen, correct?
Dongarra: There’s nothing automatic about it. You have to write a program that explicitly passes information to the GPU and tells the GPU what to do. That can be easy or hard. In most cases it becomes a challenge to write an efficient program to do the operations. Part of the issue there is that the connection between the commodity part of the computer and the graphics processor is a very thin pipe. So, you have to pass information and think of a very thin straw through which you’re passing a lot of information. And once you move it over there, you have to do a lot of operations to gain back any benefit.

And what’s the future hold for GPU supercomputing?
Dongarra: Two things will happen. One, the connection will improve slightly. And then ultimately what’s going to happen is that the graphics processor is going to be integrated into the commodity processor. So, you’ll have a chip that has both the commodity processor’s cores plus the graphics processors or an accelerator for doing floating-point arithmetic embedded into the chip itself. It’s a path a number of companies are pursuing. Intel is one. AMD is another. Companies would like to pursue that path because it does provide the best performance but it does require another ratchet up in chip design.

Dongarra added that chips have been designed in the past with accelerators, though, of course, the chip-manufacturing technology at the time yielded different results. “There were companies that made these things that attached to mainframes,” he said, citing Floating Point Systems, a company founded in 1970.

Brooke Crothers has been an editor at large at CNET News, an analyst at IDC Japan, and an editor at The Asian Wall Street Journal Weekly, among other endeavors, including co-manager of an after-school math-and-reading center. He writes for the CNET Blog Network and is not a current employee of CNET. Disclosure.

Newscribe : get free news in real time

China’s new Nebulae Supercomputer is No. 2, right on the Tail of ORNL’s Jaguar in Newest TOP500 List of Fastest Supercomputers

Fri, 2010-05-28 00:31

HAMBURG, Germany—China’s ambition to enter the supercomputing arena have become obvious with a system called Nebulae, build from a Dawning TC3600 Blade system with Intel X5650 processors and NVidia Tesla C2050 GPUs. Nebulae is currently the fastest system worldwide in theoretical peak performance at 2.98 PFlop/s. With a Linpack performance of 1.271 PFlop/s it holds the No. 2 spot on the 35th edition of the closely watched TOP500 list of supercomputers.

The newest version of the TOP500 list, which is issued twice yearly, will be formally presented on Monday, May 31st, at the ISC’10 Conference to be held at the CCH-Congress Center in Hamburg, Germany.

Jaguar, which is located at the Department of Energy’s Oak Ridge Leadership Computing Facility, held on to the No. 1 spot on the TOP500 with its record 1.75 petaflop/s performance speed running the Linpack benchmark. Jaguar has a theoretical peak capability of 2.3 petaflop/s and nearly a quarter of a million cores. One petaflop/s refers to one quadrillion calculations per second.

Nebulae, which is located at the newly build National Supercomputing Centre in Shenzhen, China, achieved 1.271 PFlop/s running the Linpack benchmark, which puts it in the No. 2 spot on the TOP500 behind Jaguar. In part due to its NVidia GPU accelerators, Nebulae reports an impressive theoretical peak capability of almost 3 petaflop/s – the highest ever on the TOP500.

Roadrunner, which was the first ever petaflop/s system at Los Alamos in June 2008, dropped to No. 3 with a performance of 1.04 petaflop/s.

At No. 5 is the most powerful system in Europe — an IBM BlueGene/P supercomputer located at the Forschungszentrum Juelich (FZJ) in Germany. It achieved 825.5 teraflop/s on the Linpack benchmark.

Tianhe-1 (meaning River in Sky), installed at the National Super Computer Center in Tianjin, China is a second Chinese system in the TOP10 and ranked at No. 7. Tianhe-1 and Nebulae are both hybrid designs with Intel Xeon processors and AMD or NVidia GPUs used as accelerators. Each node of Tianhe-1 consists of two AMD GPUs attached to two Intel Xeon processors.

The performance of Nebulae and Tianhe-1 were enough to catapult China in the No.2 spot of installed performance (9.2 percent) ahead of various European countries, but still clearly behind the U.S. (55.4 percent).

Here are some other highlights from the latest list showing changes from the November 2009 edition:

  • The entry level to the list moved up to the 24.7 teraflop/s mark on the Linpack benchmark from 20 teraflop/s six months ago. The last system on the newest list would have been listed at position 357 in the previous TOP500 just six months ago. This replacement rate was far below average. This might reflect the impact of the recession and purchase delays due to anticipation of new products with six or more core processor technologies replacing current quad-core based systems.
  • Quad-core processor based systems have saturated the TOP500 with now 425 systems using them. However, processor with six or more cores per processor can already be found in 25 systems.
  • A total of 408 systems (81.6 percent) are now using Intel processors. This is slightly up from six months ago (402 systems, 80.4 percent). Intel continues to provide the processors for the largest share of TOP500 systems. The AMD Opteron is the second most common used processor family with 47 systems (9.4 percent), up from 42. They are followed by the IBM Power processors with 42 systems (8.4 percent), down from 52.
  • IBM and Hewlett-Packard continue to sell the bulk of systems at all performance levels of the TOP500. HP lost its narrow lead in systems to IBM and has now 185 systems (37 percent) compared to IBM with 198 systems (39.8 percent). HP had 210 systems (42 percent) six months ago, compared to IBM with 186 systems (37.2 percent). In the system category, Cray, SGI, and Dell follow with 4.2 percent, 3.4 percent and 3.4 percent respectively.
  • IBM remains the clear leader in the TOP500 list in performance with 33.6 percent of installed total performance (down from 35.1 percent), compared to HP with 20.4 percent (down from 23 percent). In the performance category, the manufacturers with more than 5 percent are: Cray (14.8 percent of performance) and SGI (6.6 percent), each of which benefits from large systems in the TOP10.
  • The U.S. is clearly the leading consumer of HPC systems with 282 of the 500 systems (up from 277). The European share (144 systems – down from 152) is still substantially larger then the Asian share (57 systems – up from 51). In Europe, UK remains the No. 1 with 38 systems (45 six months ago). France passed Germany and has now 29 (up from 26). Germany is still now the No. 3 spot with 24 systems (27 six months ago). Dominant countries in Asia are China with 24 systems (up from 21), Japan with 18 systems (up from 16), and India with 5 systems (up from 3).

The TOP500 list is compiled by Hans Meuer of the University of Mannheim, Germany; Erich Strohmaier and Horst Simon of NERSC/Lawrence Berkeley National Laboratory; and Jack Dongarra of the University of Tennessee, Knoxville. For more information, visit

Newscribe : get free news in real time


Get every new post delivered to your Inbox.

Join 1,154 other followers