Sunday, May 31, 2015

No High-Z, None High-Z

While trying to hook up different destination registers for an op-code, the line to signal the register to update its value was only dependent on the op-code, not the destination.  However, when trying to key off the logic to trigger the memory update there was a problem.  Using normal logic gates (at least with this simulator) a High-Z AND 1 gives a 1.  That means if I want to signal a 1 given several other conditions, if one set of logic is inactive (High-Z), it will signal true.  I don't want this!  Here is an example:

A case where I want to set a register  "Reg0" to load a value when:
     A.  Opcode is "load register with value"
     B.  Operand 1 specifies "Reg0"
     C.  Decode stage is 1.

If the condition is:

     (opcode == 0 && operand1 == 0 && stage== 1) == True

If you try that case, everything is good and thing load correctly.  However, if in the case of our selector if the operand1 is a different register, the selector for operand1 == 0 will be High-Z.  And the other conditions will still be the case.  So what we need is a special AND.  It is an AND that is true if both values are not High-Z.  Otherwise it outputs High-Z.

e.g.
     (opcode == 0 && operand1 == High-Z && stage== 1) == False
This is not what we want!


Here is how the normal AND behaves with High-Z (basically ignoring the high-z input)


Normal AND (partial table)

ABResult
0High-Z0
1High-Z1
010


Not only do we want the High-Z to be passed through the gate, but we also want any non-High-Z (Low-Z????) value to cause a true to be output.  There is data and we want it to cause a selector to be enabled.



The Not High-Z Circuit

To achieve the goal, we want to create a circuit that detects High-Z vs. normal signals.  There are probably off the shelf components for this, but not in the tool I have.  The Tri-State gate goes from normal logic and transforms the output to the 3 state domain.  But I've got nothing to go backwards to trigger a select signal.

The monstrosity below is what I've cobbled together.  It has to handle two separate issues.  The first is that we want to use normal gates to detect High-Z.  The top part of the circuit takes a single input and uses differences in how AND's and OR's behave with a High-Z signal to detect it.  We AND and OR the same signal against itself and if there is a not difference in the output, the signal must be 0 or 1.  If the output is not the same, it is High-Z.
The lower group of gates handles the initial condition issue where starting in High-Z does not give the same result as transitioning to High-Z.
These two are combined to give a reliable High-Z (or !) detector.




No High-Z (partial table)

AResult
01
11
High-Z0

To re-iterate, this takes a Tri-State input and lets us know if it is not High-Z.  Then we can just send direct the result lines of some enable-able component to an input, send the output to this and hook it up to enable (or load) and be good.

None High-Z

Armed with this idea, we can chain outputs from enabled or disabled circuits to either enable a true or output High-Z.  We'll take inputs of 0,1 or High-Z.  From these we'll change domains from 3 state to 2 state and use normal gates.  When when we are done we'll convert back to 3 State and emit either a 1 (to enable some sub-circuit) or High-Z to not affect the attached logic.




None High-Z (Partial Table)
ABResult
0High-ZHigh-Z
1High-ZHigh-Z
011
001

This can easily be extended to arbitrary inputs to have an N-input None High-Z device.

Sunday, May 17, 2015

Registers, Program and Instruction Set


Since the TEA Encryption ASIC is only currently being designed for encryption (maybe decryption will come later), I designed the instruction set and registers around the operations needed.

After some iterations, the current plan is for 10 general purpose registers plus an accumulator.  I've only planned 9 instructions worth of program.  See below for a list.  As far as instructions, I've spec'd out 11.  Those are below as well.

I'm only planning on wiring up the data flows as needed by the program.  So while every instruction should be able to apply and address to every valid register, for simplicity this will not be implemented.  As stated before, this is not intended to be a general purpose processor.

Also, the will not accessible from outside.  These are only stored as a procedures in the microcode ROM.  The user will load the key and data and signal a ready line.  When all the rounds have been completed, the IC will signal a complete line.

The registers are certainly not complete, but only a guestimate for now.  Some may not be needed.  I'll try to reduce the number as much as possible without making it unreasonably obtuse.  The goal here is not optimization.

In the diagram below you can see
B.  The input key and data
C.  The key + sum registers
D.  Intermediate value, loopcount and accumulator registers.





Decoding and Processing Instructions

Even though this is an ASIC for a very specific algorithmic operation, I'm going to implement it similarly to a general purpose computer with a very limited instruction set.  Well, actually, I don't really know how to implement a general purpose computer, so, like everything else here, this is my best guess.  Coming from a software background, these implementations are just my naive imaginings of how this might be done in hardware.  Certainly someone with a Computer Engineering or Electrical Engineering background will have learnt this in school or from work and know the optimal ways to get these things running.  Nonetheless, it's fun to experiment, right!?!

First, I threw some registers and ROM together to think about what I want the instruction set to look like.  The idea is to have the program live in ROM, some registers that map more or less to the algorithm with some more or less general purpose places for different bits of data.

I'll show you the general high level layout, and then dig into some specific regions.  Here is the high level view:


A. is the main state driver which consists of a counter for the current instruction and what I call a "multi-stager" that moves through substates of executing the instruction.  To the bottom of A are 3 ROMs which contain the instructions divided up into lists of Opcode, Operand A and Operand B.  I could have made it a single ROM with both opcodes and operands, but thought this would be easier for now.

B. is the input data with the key and data.  They are ROM now, but I'll change them to RAM since when this part would be called, there would be some lines to load the values in and then eventually a line for the 32biy result.  I haven't done the result yet.

C. has the Key registers that get loaded with the key for use during the encryption.

D. has some general purpose processors with R5 and R6 normally containing the V0 and V1 variables.

E.  is the machinery to handle the decoding of the instruction 0 - Load Constant into Register.  I used an "all-low" comparator to activate it.  That was before I made the "32bit router" part.  I'll switch it soon.

F. is the machinery for handling the instruction 1 - Load memory at index into register.

I'll post more about details of each as well as the instruction set and some of the initial program.

Saturday, May 9, 2015

Comparators

I'm getting to the point where delays in setting memory values triggered by clock signals need to be synchronized.  When I decode an opcode, I enable the necessary parts and routers to load the register, but the value doesn't update until the next tick.  I'd bet there are ways around this and optimizations that can be made to reduce this to a point.  But I imagine at some point these break down and you just need to bite the bullet and have some sort of pipeline.  While I could try to make my decode happen without pipelining, I'd rather get everything working first and then worry about optimizations later.

So, I need a part that enables subsequent lines after each tick.  A bare part could be made for this, but I decided to just use a counter with a decoder.  But at some point when the decoding is all done, and everything is in place, we'll want to move to the next instruction and begin again, i.e. reset to 0.  To be honest, I'm not sure how many stages I'll need and would rather not make the reset value be hard-wired.

So the idea is to use a constant value connected to a comparator.  When they are equal, I'll reset the pipeline counter and enable a completion signal.  But of course I need a comparator.

I earlier made a part that checked all bit were low, which is a special case of a general comparator.  It just Nor'd the inputs.  To compare bits, XNor has the right truth table.  For 0,0 and 1,1, 1 is the output, otherwise 0.  So I'll just string together 8 going to an And for the 8bit version.  Then for 32bits I'll just chain together 4 of those.

Here's what the 8bit comparator looks like:

And here is the 32bit version.


Thursday, May 7, 2015

32bit Router

When decoding instructions, each opcode and operand need to route data around to various components and registers.  I was using decoders and individual selectors to turn on the data (as opposed to having them be in their default tri-state).
But this was a big mess, so after some thought I made a part that takes a 32 but value, has 16 x 32bit outputs and takes a 16bit address value to pick the output to route the data to.  I just call it a 32bit router.  This will clean up the tangle of routes I was using previously.

Also, for this, I want address 0 to route to the first output, so I use a fat NOR for the task.

Here it is: