Clang, LLVM and Gnome @Guadec 2013

Today I gave a talk at the GUADEC 2013 conference, the slides can be downloaded here. . The presentation covered the basics about LLVM: what it is, why use it, how to use the tools and how to benefit from it. It was aimed at providing knowledge about the tool to the projects under the Gnome umbrella. Besides the benefits discussed in the talk, the static analyzer results for each Gnome subproject can be seen here:

Highlights and considerations: Divide by zero reports in gstreamer and gtk+, and Dereference of null pointer reports all around. It is worth to mention that no memory leak bugs were found, very likely because memory allocation functions such as malloc() uses in gnome related projects are done via glib’s g_malloc() versions. There’s a simple hack that enables the tracking of g_mallocs, and that will probably uncover several memory leaks all around, just a guess ;-)

I mentioned in the talk that some gnome applications failed execution while using the memory sanitizer -fsanitize=memory compiler flag, it seems to be something related to glib’s instrospection implementation, the log can be found here and glib experts are welcome to comment about it.

Besides the technical part, the event is being held in Brno, Czech Republic, a great city with lots of cool sites. Thanks to the Gnome Foundation for the opportunity and specially to Fabiano Fidêncio and Rui Matos, very receptive and amazing guys!

UPDATE: recorded video lecture available for download.

ISA Aging: A X86 case study

Last sunday (23 Jun) in Tel Aviv, Israel, we presented the paper ISA Aging: A X86 case study in the WIVOSCA 2013, the Seventh Annual Workshop on the Interaction amongst Virtualization, Operating Systems and Computer Architecture, inside ISCA’13 – 40th International Symposium on Computer Architecture. The workshop was great, thanks to the organization held by Girish Venkatasubramanian and James Poe. Bellow the abstract:

Microprocessor designers such as Intel and AMD implement old instruction sets at their modern processors to ensure backward compatibility with legacy code. In addition to old back-ward compatibility instructions, new extensions are constantly introduced to add functionalities. In this way, the size of the IA-32 ISA is growing at a fast pace, reaching almost 1300 different instructions in 2013 with the introduction of AVX2 and FMA3 by Haswell. Increasing the size of the ISA impacts both hardware and software: it costs a complex microprocessor front- end design, which requires more silicon area, consumes more energy and demands more hardware debugging efforts; it also hinders software performance, since in IA-32 newer instructions are bigger and take up more space in the instruction cache. In this work, after analyzing x86 code from 3 different Windows versions and its respective contemporary applications plus 3 Linux distributions, from 1995 to 2012, we found that up to 30 classes of instructions get unused with time in these software. Should modern x86 processors sacrifice efficiency to provide strict conformance with old software from 30 years ago? Our results show that many old instructions may be truly retired.

Slides for the presentation can be downloaded here. Besides WIVOSCA, the AMAS-BT workshop on Binary Translation also had great talks. Moving from the workshops to the ISCA conference itself, there was an amazing talk entitled DNA-based Molecular Architecture with Spatially Localized Components, which totally twisted my mind regarding the emerging technologies topic. The conference isn’t over yet and I hope to attend more break-through research talks in the next few days.

EfikaMX impressions!

Hey everyone, it’s being a while since my last post. On 5th March 2009, I submitted a proposal to Power Developers to actually improve the ARM Cortex A8 support into the LLVM Compiler. The proposal was accepted in 24th August 2009, and the board arrived in late December.

Fresh new EfikaMX board

Fresh new Fedex shipment with a EfikaMX board

After opening the Fedex box, this is what we found:

Unpacked efika board

Unpacked efika board

The surprising news here, is that when I received the board all the improvements I had in mind and proposed to the Power Developers had already been implemented by other LLVM developers. Things not always go in the way we want to, but looking into the bright side I had a new shinning board to play with, and compiler backends always have room for small improvements.

So, first of all, lets look how this beautiful toy looks like. The “genesi” name actually reminds me of my good old sega genesis (good old times), but getting back to what matters, here is how it looks attached to an external monitor (using the HDMI port) and booting linux.

Attached to an external monitor

Attached to an external monitor

After booting up and gathering information about the system, I installed gnome and did play with some applications to see how efficient it was. It surprised me, it was faster than I expected, even without the proprietary drivers needed to use the video decoding processor. This is an album with all pictures I took from EfikaMX, they were all taken in the day I received the board back in 9 December 2009.

Before talking about what my current plans are I do have some greetings to do: Thanks very much to Genesi U.S.A., Inc and to Raquel and Bill @ Genesi, to being so kind providing a board and making llvm developers happier. Also I would like to apologize to them by the time I took to give them a feedback, sorry for taking so long :)

I’m currently in the USA doing a summer internship and my board is at Brazil right now, so until October I don’t have any way to give it attention, but at the EfikaMX/llvm/ARM side of things, these are future plans:

  • Set-up EfikaMX as a linux ARM backend test machine, so we can run llvm test suite on it and be sure it gets working for linux (we don’t have that for linux in the moment)
  • Consider the possibility to implement random small optimizations aimed to ARM/linux, as for example this gcc bug.

CUDA @wscad2008

A few weeks ago, Rodolfo Azevedo and I presented a workshop entitled: High Performance Computing with CUDA. The workshop happened at WSCAD 2008 (Workshop em Sistemas Computacionais de Alto Desempenho), and was hosted at Campo Grande, MS, Brasil. WSCAD is an event in pt_BR, where people discuss in forums about Computer Architecture, High Performance Computing and Distributed Systems.

CUDA is the architecture and programming model created by NVIDIA and can be used in the 8 series, Tesla, Quadro and newer boards. CUDA is the NVIDIA version of “parallel programming language to rule them all”.

The workshop was divided in 2 days (1h30m each day) and covered the following topics:

  • CUDA requirements and install process
  • NVIDIA Serie 8 Architecture (Stream Processors, Multiprocessors, Memory Hierarchy, Internal Thread Management)
  • Programming Model (C extensions, Built-in variables, Runtime API, Driver API)
  • Code examples (Device probe, Event management, Math operations – matrix multiplication, …)

Several code examples were presented and some of them compiled and executed on-the-fly. We were provided with a dual GFORCE 8800 GTX – G80, so there was a way to do the demonstrations.

The videos, notes and slides are available here. There is also a 42 page tutorial.

mips and llvm 2.4

LLVM 2.4 is now available, and like every very active project, a bunch of new stuff has arrived (see the release notes for a detailed description). The same is true for the Mips back-end. I talked about mips+llvm in old posts, they highlighted some of the ABI issues both from EABI and O32 stuff. A lot has changed since those posts, and here is a summary of what has been accomplished:

  • Little endian support
  • EABI is fully implemented
  • Floating point support
  • Support for allegrex core and its intrinsics
  • Improvements to the O32 ABI.

I’m the creator and current Mips back-end mainteiner, and I implemented those feature as a Google Summer Of Code 2008 project. To a more detailed description, here is the svn commit log for the Mips back-end  –  I filtered out all commits not directly related to mips improvement). Although it has grown a lot since June, it still a experimental back-end and there is much to be done.

Tim Festival

O show do MGMT no tim festival aconteceu sábado passado (dia 25/10) e dos shows que tenho ido nos últimos anos só perdeu para a performance do Sonic Youth. O evento não estava muito cheio, e isso me permitiu ver a banda com uma distância de 10 metros sem ficar sufocado, tomar empurrões e/ou cotoveladas (é claro que uma ou outra a gente sempre leva, mas nada como dar uma de volta para sossegar o infeliz). Das músicas tocadas, teve uma que eu não conhecia, e realmente não sei de onde eles tiraram, mas fica o destaque especial para as músicas: The Youth, Eletric Feel, Weekend Wars e Time To Pretend.

A loucura do pessoal do MGMT estava estampada na cara. Atitudes como: rasgar bichos de pelúcia no palco e mostrar ao Brasil que não existe nada pior que Itaipava quente (o coitado que levou a latada entende bem isso…) foram algumas das palas dadas. Falando de palas, a maior pala do festival foi vender Itaipava a 5 reais, se pelo menos viesse com um remédio para prender o intestino… talvez o preço fosse justo.

Outro fato interessante do festival foram os irmãos Merry e Pippin, que após fugirem da Terra Média (o segundo até deu uma passada por Lost, onde aprendeu a tocar guitarra e ter uma banda…), se tornaram integrantes da banda The National, banda esta cujo vocalista também acredita que todo brasileiro gosta de tomar banho de cerveja (itaipava) quente em shows.

Resumindo, valeu a pena ver o MGMT ao vivo, espero pela próxima oportunidade.

Structure return

As said in the previous post, some ABI specs are a little confusing, and sometimes there are so many (15 different variations for mips), that you just don’t remember where that rule came from. For the more interested ones there’s a whole history about mips ABIs and why there are some many. I’ll devote my time in the next posts, writing about implementations issues in o32 and eabi ones. llvm mips backend is walking toward full support for both (I already implemented the basic calling convention mechanism for o32 and eabi) I’ll approach some issues giving some llvm view of things.

o32

o32 is the mips old 32 bit ABI specified by Sysv4. It’s easy to find o32 on a linux box or in embedded mips applications, worth having support on a compiler. o32 is known for its bad register usage (it’s very inefficient when compared to eabi for example) and messy spec. So, since it is the most comom llvm should have it.

eabi

As all the other used ABIs, the eabi is implemented in GCC. There is no official documentation about it and the only resource available is a post on binutils mailing list and gcc mips backend sources. Although this is the only doc available, it is very direct and a thousand times better than the sysV one. This ABI is used to generate mips assembly for the psp allegrex core (the pspdev gcc patches from uses eabi by default). Since I’m giving support for the allegrex core, this abi must be implemented in llvm.

Structure return

If a pointer to a struct is returned, then it is considered a normal pointer and nothing special has to be done. But if a function returns the whole aggregate, the space must be allocated on the caller and the aggregate address must be passed as the first argument ($4) to the callee. The callee must fulfill this memory space with the returned value before exiting and must also return the pointer in $2. This mechanism is called sret in llvm, where we use a argument with a pointer to the memory space reserved for the aggregate. To be more specific, if we consider this C function:

The following asm output for function test0 is:

As the code highlights the pointer argument comes in $4, is copied to the return value register $2, and the struct is fulfilled by storing the 2nd argument ($5) into the memory pointed by $4. The eabi has almost the same behavior when dealing with struct passing, the only difference is that it directly returns single element structs and aggregates <= 64 bits, and only uses the shadow mechanism otherwise (eabi always uses registers when it’s possible, that’s why it’s so fast, someday I’ll deserve it a post).

Mips calling convention review

Hi! Long days have passed since the last time I wrote here. A Chinese bot stole my domain, and since I havent paid in time, I also lost my old blog stuff. ok, beginning all over again! ;)

I got back to mips llvm work and I’m improving it now to support the psp allegrex core. I improved a lot of minor codegen stuff this month and my next milestone is getting llvm-gcc cross-compiled for Mips, it is breaking now for float stuff, so here we go…

LLVM mips backend currently does only support 32 bit integer arguments (64-bit integer args are not legal and are expanded since 64bit ISAs aren’t supported yet).

I’m currently hacking out the ABI requirements for:

  • Float-point arguments
  • Struct arguments (reference and By Value)
  • Struct returning

Since GCC implementation differs a little from SysV ABI specs (the spec is very bad written) I often get confused if I’m implementing the right thing. To avoid reading this stuff every time I may post some low level info here, so it should be easy to get a reference when needed.

Stay tuned! ;)