Sunday, May 31, 2015

No High-Z, None High-Z

While trying to hook up different destination registers for an op-code, the line to signal the register to update its value was only dependent on the op-code, not the destination.  However, when trying to key off the logic to trigger the memory update there was a problem.  Using normal logic gates (at least with this simulator) a High-Z AND 1 gives a 1.  That means if I want to signal a 1 given several other conditions, if one set of logic is inactive (High-Z), it will signal true.  I don't want this!  Here is an example:

A case where I want to set a register  "Reg0" to load a value when:
     A.  Opcode is "load register with value"
     B.  Operand 1 specifies "Reg0"
     C.  Decode stage is 1.

If the condition is:

     (opcode == 0 && operand1 == 0 && stage== 1) == True

If you try that case, everything is good and thing load correctly.  However, if in the case of our selector if the operand1 is a different register, the selector for operand1 == 0 will be High-Z.  And the other conditions will still be the case.  So what we need is a special AND.  It is an AND that is true if both values are not High-Z.  Otherwise it outputs High-Z.

e.g.
     (opcode == 0 && operand1 == High-Z && stage== 1) == False
This is not what we want!


Here is how the normal AND behaves with High-Z (basically ignoring the high-z input)


Normal AND (partial table)

ABResult
0High-Z0
1High-Z1
010


Not only do we want the High-Z to be passed through the gate, but we also want any non-High-Z (Low-Z????) value to cause a true to be output.  There is data and we want it to cause a selector to be enabled.



The Not High-Z Circuit

To achieve the goal, we want to create a circuit that detects High-Z vs. normal signals.  There are probably off the shelf components for this, but not in the tool I have.  The Tri-State gate goes from normal logic and transforms the output to the 3 state domain.  But I've got nothing to go backwards to trigger a select signal.

The monstrosity below is what I've cobbled together.  It has to handle two separate issues.  The first is that we want to use normal gates to detect High-Z.  The top part of the circuit takes a single input and uses differences in how AND's and OR's behave with a High-Z signal to detect it.  We AND and OR the same signal against itself and if there is a not difference in the output, the signal must be 0 or 1.  If the output is not the same, it is High-Z.
The lower group of gates handles the initial condition issue where starting in High-Z does not give the same result as transitioning to High-Z.
These two are combined to give a reliable High-Z (or !) detector.




No High-Z (partial table)

AResult
01
11
High-Z0

To re-iterate, this takes a Tri-State input and lets us know if it is not High-Z.  Then we can just send direct the result lines of some enable-able component to an input, send the output to this and hook it up to enable (or load) and be good.

None High-Z

Armed with this idea, we can chain outputs from enabled or disabled circuits to either enable a true or output High-Z.  We'll take inputs of 0,1 or High-Z.  From these we'll change domains from 3 state to 2 state and use normal gates.  When when we are done we'll convert back to 3 State and emit either a 1 (to enable some sub-circuit) or High-Z to not affect the attached logic.




None High-Z (Partial Table)
ABResult
0High-ZHigh-Z
1High-ZHigh-Z
011
001

This can easily be extended to arbitrary inputs to have an N-input None High-Z device.

Sunday, May 17, 2015

Registers, Program and Instruction Set


Since the TEA Encryption ASIC is only currently being designed for encryption (maybe decryption will come later), I designed the instruction set and registers around the operations needed.

After some iterations, the current plan is for 10 general purpose registers plus an accumulator.  I've only planned 9 instructions worth of program.  See below for a list.  As far as instructions, I've spec'd out 11.  Those are below as well.

I'm only planning on wiring up the data flows as needed by the program.  So while every instruction should be able to apply and address to every valid register, for simplicity this will not be implemented.  As stated before, this is not intended to be a general purpose processor.

Also, the will not accessible from outside.  These are only stored as a procedures in the microcode ROM.  The user will load the key and data and signal a ready line.  When all the rounds have been completed, the IC will signal a complete line.

The registers are certainly not complete, but only a guestimate for now.  Some may not be needed.  I'll try to reduce the number as much as possible without making it unreasonably obtuse.  The goal here is not optimization.

In the diagram below you can see
B.  The input key and data
C.  The key + sum registers
D.  Intermediate value, loopcount and accumulator registers.





Decoding and Processing Instructions

Even though this is an ASIC for a very specific algorithmic operation, I'm going to implement it similarly to a general purpose computer with a very limited instruction set.  Well, actually, I don't really know how to implement a general purpose computer, so, like everything else here, this is my best guess.  Coming from a software background, these implementations are just my naive imaginings of how this might be done in hardware.  Certainly someone with a Computer Engineering or Electrical Engineering background will have learnt this in school or from work and know the optimal ways to get these things running.  Nonetheless, it's fun to experiment, right!?!

First, I threw some registers and ROM together to think about what I want the instruction set to look like.  The idea is to have the program live in ROM, some registers that map more or less to the algorithm with some more or less general purpose places for different bits of data.

I'll show you the general high level layout, and then dig into some specific regions.  Here is the high level view:


A. is the main state driver which consists of a counter for the current instruction and what I call a "multi-stager" that moves through substates of executing the instruction.  To the bottom of A are 3 ROMs which contain the instructions divided up into lists of Opcode, Operand A and Operand B.  I could have made it a single ROM with both opcodes and operands, but thought this would be easier for now.

B. is the input data with the key and data.  They are ROM now, but I'll change them to RAM since when this part would be called, there would be some lines to load the values in and then eventually a line for the 32biy result.  I haven't done the result yet.

C. has the Key registers that get loaded with the key for use during the encryption.

D. has some general purpose processors with R5 and R6 normally containing the V0 and V1 variables.

E.  is the machinery to handle the decoding of the instruction 0 - Load Constant into Register.  I used an "all-low" comparator to activate it.  That was before I made the "32bit router" part.  I'll switch it soon.

F. is the machinery for handling the instruction 1 - Load memory at index into register.

I'll post more about details of each as well as the instruction set and some of the initial program.

Saturday, May 9, 2015

Comparators

I'm getting to the point where delays in setting memory values triggered by clock signals need to be synchronized.  When I decode an opcode, I enable the necessary parts and routers to load the register, but the value doesn't update until the next tick.  I'd bet there are ways around this and optimizations that can be made to reduce this to a point.  But I imagine at some point these break down and you just need to bite the bullet and have some sort of pipeline.  While I could try to make my decode happen without pipelining, I'd rather get everything working first and then worry about optimizations later.

So, I need a part that enables subsequent lines after each tick.  A bare part could be made for this, but I decided to just use a counter with a decoder.  But at some point when the decoding is all done, and everything is in place, we'll want to move to the next instruction and begin again, i.e. reset to 0.  To be honest, I'm not sure how many stages I'll need and would rather not make the reset value be hard-wired.

So the idea is to use a constant value connected to a comparator.  When they are equal, I'll reset the pipeline counter and enable a completion signal.  But of course I need a comparator.

I earlier made a part that checked all bit were low, which is a special case of a general comparator.  It just Nor'd the inputs.  To compare bits, XNor has the right truth table.  For 0,0 and 1,1, 1 is the output, otherwise 0.  So I'll just string together 8 going to an And for the 8bit version.  Then for 32bits I'll just chain together 4 of those.

Here's what the 8bit comparator looks like:

And here is the 32bit version.


Thursday, May 7, 2015

32bit Router

When decoding instructions, each opcode and operand need to route data around to various components and registers.  I was using decoders and individual selectors to turn on the data (as opposed to having them be in their default tri-state).
But this was a big mess, so after some thought I made a part that takes a 32 but value, has 16 x 32bit outputs and takes a 16bit address value to pick the output to route the data to.  I just call it a 32bit router.  This will clean up the tangle of routes I was using previously.

Also, for this, I want address 0 to route to the first output, so I use a fat NOR for the task.

Here it is:


Wednesday, April 29, 2015

Counter

TEA has some iterations to do multiple rounds of mixing and shuffling the data values.  To achieve this, we'll use some looping.  There are only 64 rounds, so were this algorithm designed for hardware, we might be able to unroll them into 64 parallel operations.  But, this algorithm was designed for software and it seems very little parallelism can be enlisted to speed things up.  The results of each round feedback into the previous.  Feedback loops like this are important for good mixing and permuting.  Also, since this is a Feistel cipher, half of the data block is mixed/shuffled with the other half.  Each round jumbles and mixes from one side to the other.  This dependency makes parallelization very difficult at very best to impossible.  I imagine determining any required structure or method to parallelize this is equivalent to cryptanalizing it.   Finding a usable result would probably crack it.

So, since we need to iterate, we need a counter.  Below is a counter implemented with a 32bit adder, a 32bit  register/memory and an extra chooser to feed the value 0 in when the reset signal is sent.  The 7 segment decoder + display is just for debugging.


The inc signal increments the counter.  Reset sets it to 0.  q is the output.  I forgot to change the output name, but it is the only one, so it doesn't really matter.  I'm not sure why 'q' is the default.  Maybe it is some standard.


8,32 bit Selectors

When passing data around to different destinations, we need to select where it goes.  There is an opcode that selects the operational unit.  The input registers are decoded from the operands, and directed using selectors to push the input to the operator component.  After decode happens, the pathway is enabled by data path selectors.  Here is an 8 bit selector.



You can see we use tri-state gates only send the output when the select line is enabled.  When not selected, the output's are tri-stated and won't interfere with the input they are connected to.  This way we can have many selectors connected to a common input line without interference.

With the 8 bit selector we build up a 32bit to help shuttle our data around.



Sunday, April 26, 2015

7 segment decoder

Peeking at memory and using traces is fine, but for output and checking numeric values, 7 segment displays are nice.

Here's a Numitron, a 7 segment incandescent filament display made by RCA in 1970.

"Incandescent light seven-segment display prPNr°17" by D-Kuru - Own work. Licensed under CC BY-SA 3.0 at via Wikimedia Commons 


To display digits on these, you need to convert a binary encoding to the physical layout for the digit positions.

NumberBinarySegments
00000a,b,c,d,e,f
10001a,b
20010a,b,d,e,g
30011a,b,c,d,g
40100b,c,f,g
50101a,c,d,f,g
60110a,c,d,e,f,g
70111a,b,c
81000a,b,c,d,e,f,g
91001a,b,c,d,f,g

There are probably more clever ways to wire this up to reduce the gates and connections required.  For example, the only time segment a is not lit up is for the number 4.  But I'm going to just brute force it and hook up logic for it that says "If all the 1's in the binary number are on AND all the 0's in the binary number are off, then set the signal high for the segments for that number."  As an example, for the number 1, we'll check that bit 0 is on and then use a NOR for bits 1-7 to make sure none of them are on.  If 0 is on and 1-7 are off, then we know that is 00000001 which is the number 1 (base 10).  When that is the case, we'll want to turn the signal on for segments 'a' and 'b' which connect on the right via those OR gates.  Each number that needs to activate a segment attaches to the OR gate for that segment.  So the top OR on the right goes to segment 1 and the bottom one to segment 2.

 With a seven segment display, you can show hexadecimal numbers beyond 0-9 which include a,b,c,d,e,f, but I've only hooked my 7 segment decoder to do 0-9.  Maybe I'll add a-f later.  Below is the full decoder.





32bit XOR

Tiny Encryption Algorithm needs a 32 bit XOR.  XOR is basic gate type, so we just chain 8 of those together to make an 8bit version (byte).



From there we just chain 4 of those to get a 32 bit version.

Saturday, April 25, 2015

Adtractor

To implement subtraction, one takes the one's compliment of number you want negative.  Then you add the two numbers together as normal.  This of course means that your representation is ones compliment and anything with the high bit set is negative.

To construct this, one needs to first make some IC's to make a 32bit compliment.  We do this by making an 8bit compliment and then group 4 of these to make a 32 bit compliment.

Finally, we can make an adtractor by adding a signal to choose the normal input or, a compliment for subtraction of the chosen input.  Note, adtractor is not an standard word for this.  It is normally called a adder-subtractor, but I like adtractor better.

One quick note.  When there is a choice of two outputs to an given single input, we need a special gate to disable the logical output of inputs we are not interested in.  If one were high and another connected that was low, the result is indeterminate.  The solution is a tri-state gate that, when desired, will pass the signal through.  Otherwise the output is High-Z (floating, tri-state) and does not affect the result of other connected gates to the input.  More on this later.

Here is the 8bit compliment:

And here is a 32 bit compliment composed of 4 - 8bit compliments.
Now that we have the compliment, we can make the adtractor.  Here it is:

You'll notice the addend coming in (middle input line) and going straight to the 32bit adder.  Below is the addend or possibly subtracthend.  We convert to the 32bit compliment and then pass both values to the IC named "32bit choose".  This selects one of the inputs to pass through depending on the mode flag.  This is the top value coming in and chooses the mode, addition or subtraction.  Let's see how this 32bit chooser is implemented.

   
 Here we have two 32 bits values coming in on the left.  Each is broken into 4 8it bytes and divided up to these IC's named "2 byte selector".  The 2 byte selectors get the "select in" signal and then output to be recombined to the final 32 bit value.  I don't really like the name "2 byte selector", maybe "byte selector" would be better.  The select signal chooses which of two input bytes is passed out.

Now on to the "2 byte selector"...

Here we see the use of tri-state gates.  The value of Select In determines which of the tri-state's are allowed to pass their value through to the final output.  You can see the NOT gate above the "b In" group that inverts the selection value.  Only the values of the group with a logical 1 values will allow the values to pass through.  Otherwise they are High-Z and have no influence on what they are connected to.  Without this, were one to output both high and low to the same input, the result is indeterminate.  Might they cancel?  Might it cause a short circuit?  Are there specific types of hardware that behave in specific ways?  If anyone knows, leave a comment.  I certainly don't, this is just a theoretical exercise to me.

So, returning to the adtractor you can see the compliment and chooser in action again and see how they work together to make a normal adder perform subtraction.

adders


Below is a 1 bit full adder I made.  If you are unsure what a full adder does, you can read up on it on wikipedia.  http://en.wikipedia.org/wiki/Adder_%28electronics%29

This is certainly over-complicated and not optimized.  The route to optimize it would be to either write out the boolean equations for this (or just come up with them from a truth table) and then simply the expression into fewer gates.  Looking on wikipedia from the link above, I just noticed  how inefficient this is.  But my implementation will only have a few of these, so as long as it functions properly, I'm not interested in this point in optimizing.

After that is a stack of 8 of them wired up to make an 8 bit IC.  And then after that, 4 of those to make a 32bit adder.  32bit addition is needed for TEA, so this is one of the foundational ICs.

The mathematical operations TEA needs are (all 32 bit): 
  • addition
  • subtraction (for decryption)
  • left bit-shift (by 4 bits)
  • right bit-shift (by 5 bits)
  • XOR



1 bit full adder (highly unoptimized)
8 bit full adder

32 bit full adder


Intro

Hi folks,

Well, it started out as a small exercise when I was doodling with my daughter.  I wanted to see if I could make a full adder with just basic logic gates without any help.  I worked on it a bit and then a couple days later found an online program where I could model it up and try it.

After that I made a basic calculator.  As I moved to more complicated stuff, I decided rather than make yet another simple computer, I wanted to implement an ASIC for Tiny Encryption Algorithm.  See http://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm.

The goal was to not look anything up and just see what I could get done on my own from scratch.  I tried a few programs, but am now using LogicCircuit, http://www.logiccircuit.org/.

This series of posts will be to show the IC's making and employing to construct it and also to document my progress.

Maybe someone will find this interesting.  Either way, I think this is just a lot of fun.

-Casten