How to tape out ASICs like they’re software releases

ASICs are massive software projects. Millions of lines of design code, and millions of lines of QA code to make sure you are not taping out a coaster. Due to the high fabrication cost, ASICs tend to be released infrequently, as it makes economic sense to cram as many features as possible into a single release. The fabrication cost of providing N features is almost the same a providing N+M features, given the same geometry. So the more features you cram in there, the lower your cost per feature.

Except one thing: people. Past a certain size, the project is too big. It’s too big for people to fit all the features in their mind. Too many features to track at the same time, too many interactions between features, too many implications… the schedule slips. And slips again. And again. So the business decision is: tape out now, right now. And your answer is: we can’t, the HEAD of the database has all these features, under development, and at the same time we’re debugging them, and some files won’t even compile, we can’t tape out, we need to freeze the feature set, stabilize the code, run regressions for two months and bug fix as much as we can before we freeze RTL, then maybe fix a few dozen bugs in ECOs.

Everyone knows it sucks to do bug fixes in ECOs because it’s way harder than in RTL. It’s like bug fixing machine code, then porting the fix back to the source, without being allowed to run the compiler on the source code anymore!

But I digress, the question is, how should source code be managed to maintain it in a state where tape out opportunities are the norm rather than the exception? You obviously can’t allow chaotic commits to the source code database, much less chaotic commits to the release branch! Let’s examine those the source code management strategies used in software and try to see how they fit the ASIC development world.

The single branch broken trunk model: all the commits, design and verification, go to the same unique branch, stability is declared at certain points when enough simulations have managed to pass and designers happen to have been quiet on the feature front. Most times, the HEAD of the database is just broken.

This is possibly the easiest model to follow for people who cannot afford to learn how to use branches. However, the head being broken most of the time means that tape out dates are somewhat unpredictable as a period of stability, on an unknown committed feature set, is required for integration and bug fixes. This period of stability also impacts all future code development: since there is only one place for code to go, new development cannot be committed after RTL freeze.

In this model, the only way to allow new development after RTL freeze is to open a second database, and copy and paste all the code and the bug fixes over to the new database.

Pros: easy; no source code management learning curve
Cons: tape out dates are unpredictable; new code breaks working code all the time; continuing development after the RTL freeze is not possible

Broken trunk, tape out branch model
In this model, a branch is created in the database for each tape out tentative. Since it takes a few days to a few weeks before the code for tape out is known to be good for tape out, isolating tape out from the main development trunk is wise: development can continue and will not break the tape out code. Usually, only bug fixes are allowed on the tape out branch. A tape out branch can be abandoned when a better feature set is available on the trunk where a new tentative tape out branch can be created.

The ease with which these branches can be created and maintained heavily depends on the source code management software being used. CVS being one of the oldest SCM around is also the most primitive in terms of branch support. Support for branches is much more developed in git as repeated merges are supported, among other great features.

Pros: tape out and re-spins are isolated from development; back end team can control what goes into the tape out code without locking down the whole database
Cons: the development trunk is broken; a stability period is still required but it can be moved to the tape out branch while development continues on the trunk; back end engineers must learn how to use branches

Stable trunk, tape out branch model
In this model, developers are not allowed to commit directly to the trunk unless their code has passed some tests proving that it won’t have adverse effects on the rest of the code. The typical steps for this model are branch, edit, commit, test and merge. When the tests are successful, designers are allowed to merge the changes to the trunk. Changes to the tape out branches is done the same way.

Pros: the trunk is stable, tape out dates are more predictable as there is no stabilization period on the trunk; cowboy coders do not bring everyone down when they commit unstable code
Cons: stabilization happens on the branches, along with feature development; everyone must learn how to manage branches

It does not take long once you have branches in your methodology to realize that you want to merge to and from the stable trunk, since a feature branch may want to integrate a feature from the trunk which was not available at the time the branch was created. Fortunately, this kind of development is made easy by new decentralized source code management software such as git.

As a bonus, decentralized SCMs allow you to share code with others without releasing it to a central database, reducing the number of branches or tags visible to others.

Advertisements

10 Responses to How to tape out ASICs like they’re software releases

  1. Colin Marquardt says:

    But then, you happen to have analog parts in your design, with a design software that saves its files in a proprietary binary format (and writes out up to 3 different files for each cell which must be kept in sync).

    Plus analog designers are not the target audience for a tool such as git. We got some of them to use SVN, but more screaming and kicking rather than with a smile on their face.

  2. Martin d'Anjou says:

    I guess you use SVN because of the atomic commit. This must help keep the files in sync for the end users of those files.

    There is a huge cultural difference between ASIC design and ASIC verification when it comes to file management: designers own their files and never share them. On the other hand, verifiers will edit each other files all the time.

  3. Colin Marquardt says:

    We chose SVN at a time where it was already pretty mature, but you couldn’t say that of any of the other distributed SCMs, and the learning curve for those is also much steeper.

    I see the cultural difference more in the fact that analog designers are more used to point-and-click tools than digital guys (be they designers or verification people). They are not afraid of typing the occasional command, but the design tool makes it especially hard for them by requiring a consistent set of those 3 files. And since these files are binary, you now enter the realm of requiring file locks etc…

  4. Frank C says:

    I work in a team , where designers doubles as verification engineers, and our time doing regressions
    are very frustrating as there is no protocol setup.
    very frustrating as I am am the designated regression
    person.

    so I am wondering if there is a thoroughly tested regression protocol which covers
    daily regression
    weekly regression.
    partial regression
    debug regression.
    specificaly the last item.
    Frank C

  5. Martin d'Anjou says:

    Where I work we have developed a continuous integration system, doubled by a formal regression system. I find it best to develop regression scripts that put the regression responsibility in the hands of the users rather than in the hands of a single regression person. This way DV engineers are responsible for running their own regressions, using scripts that automate most of it for them. It has pretty much been a full time job just to develop and maintain these systems.

  6. Frank C says:

    Martin I could not agree with you more, this is my first job, and already I can see the fault in my company current system. As I am constantly interrupted whilst performing higher priority task. I am completely at awe where an activity which is performed by every freaking group in the company is not formalized.
    Regression is really important in my opinion, and yet after the fact, they have not thought this through.
    I am trying to develop an infrastructure at least
    so each DV perform this. you wouldn’t happen to have a Protocol Book handy do you? :)

  7. Martin d'Anjou says:

    Ah, the eternal question of the right thing vs. the thing that the company needs right now. They are both important. Sometimes the right thing has to wait long periods of time, until the urgent things are not so urgent anymore.

    Verif school teaches you how to write a BFM, and the more adventurous DVs may write a perl script to run simulations multiple times, but there is nothing telling us what to do with 10G of log files per regression and thousands of simulations to run.

    I discovered that I can replace perl and python scripts with bash, and surprisingly I find that I need 4x to 10x fewer lines when I decompose the big problems into their atomic components. The way Linus structured Git inspired me to approach problems one little step/script at a time.

    But I don’t use bash for parsing. I write state machines in python, and for grammar parsing (where regexps and shallow case statements don’t suffice), I use ANTLR.

    I frown upon all forms automation based on GUI tools, unless they are built on scalable underlying command line tools.

  8. Frank C says:

    Hi Martin, thanks for sharing your experience, I guess I am one of the more adventrous DV since I am doing perl , like there is no tomorrow. One task that was pretty interesting was parsing the log for different formats of errors, which you know kind of a pain since there is no description of the formats since our chip is build with ip’s. but there lots of more work to be done.
    maybe you have come acrross this scenario.
    regression run-> have failing test-> rerun test with fixex to show its passes, all the while the rest of the regression is still running, because mind you my regressions lasts for 1 week. or more.
    what did you do in this scenario if you have experience it.

  9. Martin d'Anjou says:

    Rerunning regression tests when they have failed is an absolute necessity. Doing it automatically is the hard part. Doing it by hand is such a burden.

    I run my regressions on a tagged release of the code. Fixing bugs means a code base change, which means that I can only regress the fix at the next regression. However, when a regression is tool long, it needs to be split into in smaller “sub-regressions”. DVs and designers want to be in control of what sub-regression they run and when then run it, and what code they run it against.

    Knowing when to stop the current regression depends on how well it goes. If it finds all kinds of different bugs, I would let it run to completion. If it finds no bugs at all and you have new code to test, I would end the current regression, and simply relaunch it on the new code base. If the current regression is toasted because its test cases keep hitting the same bug over and over, I’d kill the regression.

    PS: I do not parse logs anymore, or rather, I parse them live. See Processing the output of any program on the fly

  10. Frank C says:

    Hi Martin,
    Since you have more experience than me in the industry, let me share a little of my experience so far,
    and you may or may not comment on it.

    FACTS
    1.) HUGE Company
    2.) no set regression protocol at all, we are still
    “experimenting”
    3.)Manager keeps changing requirements.
    or to put it nicely “evolving” proccess.
    4.)Designers are DV engineers.
    5.)DURING WEEK of final netlist , we still checkin
    environment changes, I had to restart regression midway.
    6.)Relies heavily on verbal communication rather
    than written / agreed upon specs. but then again
    they keep changing their mind.
    7.) job priority changes on a daily or bi-daily.
    8.) likes to do things in parallel.
    9.) likes to assume things are “simple” and turned out
    to be a week of doing a “simple” thing, happened
    more than once

    with the above facts I can only say management is non-existant.or very poor. and this comment is from me a fresh graduate no work experience.

    sorry for the rant. you don’t need to comment

    parsing log on the fly wow thats cool , how do you 20 different kind of error formatting, and ignore false positives at the same time is that possible with incomplete output.

    thank you

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: