Python (relative) Imports and Unittests – Derp!

Been far too long since I wrote something. Spent way longer than I feel good about talking about derping around getting my ass kicked by some obvious shit so figured I should write it up as a reminder for me and to hopefully save some anonymous Internet person some heartache! Anyway, I’ve been meaning to put on my big boy pants and start actually writing test cases for all the Python code I write… I’ve been terrible at this and figured it’s time to get my life together with it. Rather than have a bunch of code that won’t really matter, I’ll just use ridiculously dumb examples here to get the issue and the “fix” (not being dumb!) across.

A pretty standard structure for a python project would look something like this:

mycoolproject/
├── mycoolproject
│   ├── __init__.py
│   ├── my_module_1.py
│   └── my_module_2.py
└── tests
 ├── __init__.py
 └── test_basics.py

You can see that we have our “project” — in this case named “mycoolproject” and within that directory we have a folder for our tests and a folder for the actual project code. In the root directory for our project we would probably have other stuff like a setup.py or a requirements.txt or whatever, but those things don’t matter for us at the moment. One last bit — we need those “__init__.py” files in there to make this a “project” — from the real Python docs:

“The __init__.py files are required to make Python treat the directories as containing packages; this is done to prevent directories with a common name, such as string, from unintentionally hiding valid modules that occur later on the module search path. In the simplest case, __init__.py can just be an empty file, but it can also execute initialization code for the package or set the __all__ variable, described later.”

You can find that doc here if you want to read some more about it.

Ok so now that we have the overall gist of things down, let’s take a look at our ridiculously over simplified “module”. Here is the contents of our file “my_module_1.py”:

my_string = 'whoa, this is so kewl'
print(my_string)

Pretty serious code 🙂

Our other script “my_module_2.py” is also pretty simple, but this one refers back to the first module to use the variable “my_string” — we’ll get to why this tripped me up in a bit:

import my_module_1
my_new_string = f'carl said: {my_module_1.my_string}'
print(f'carl said: {my_module_1.my_string}')

Alrighty, so if we run “my_module_1.py” it will simply print “whoa, this is so kewl” — no surprise there. If we run “my_module_2.py” it will *also* print “whoa, this is so kewl” because it is importing/loading the first script, and then of course it will also print “carl said: whoa, this is so kewl” as expected. Great, so our super fancy project works as desired. Now, because we are trying to be better about testing, let’s write a super simple test to validate this code works as expected.

In our test folder, we’ll create a new script called “test_basics.py” that looks like this:

import unittest
import my_module_1


class TestMe(unittest.TestCase):
   def test_stuff(self):
       assert my_module_1.my_string == 'whoa, this is so kewl'


if __name__ == '__main__':
    unittest.main()

At the top we’ll import the unittest library to use for our testing, and we’ll also import our script “my_module_1” so that we can validate (assert) that our variable “my_string” is actual equal to what we think it should be (“whoa, this is so kewl”).

So let’s go ahead and run our test suite and see what happens:

Carls-MacBook-Pro-2:mycoolproject Carl$ python3 -m unittest tests/test_basics.py
E
======================================================================
ERROR: test_basics (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_basics
Traceback (most recent call last):
 File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/unittest/loader.py", line 153, in loadTestsFromName
 module = __import__(module_name)
 File "/Users/Carl/Desktop/mycoolproject/tests/test_basics.py", line 2, in <module>
 import my_module_1
ModuleNotFoundError: No module named 'my_module_1'


----------------------------------------------------------------------
Ran 1 test in 0.000s

 

Well… not ideal eh? Obviously we have some kind of import error since Python is complaining it can’t find our module “my_module_1”. What gives? Well, we know that Python is mad at the line where we import “my_module_1” so obviously we need to start there. We also know that our modules actually run fine within their directory — they do exactly what we think they should. So with this information we can understand that Python — when ran from the tests directory has no idea how and where to find the module we are trying to run. This makes sense because when you think about it Python will search for modules in the local folder and the system path(s) — we can see where Python is looking by importing sys and printing out the path, let’s see what that looks like in our tests folder:

Carls-MacBook-Pro-2:tests Carl$ python3
 Python 3.6.4 (v3.6.4:d48ecebad5, Dec 18 2017, 21:07:28)
 [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
 Type "help", "copyright", "credits" or "license" for more information.
 >>> import sys
 >>> sys.path
 ['', '/Library/Frameworks/Python.framework/Versions/3.6/lib/python36.zip', '/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6', '/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/lib-dynload', '/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages']
 >>>

Ok, so we know it’s looking in the normal paths, and that very first entry (”) shows us its going to look for stuff locally too, but obviously nowhere to be seen is the module we’ve been building. So somehow we need to tell Python to look there, what happens if we ask Python to import “my_module_1” *from* “mycoolproject” like so:

from mycoolproject import my_module_1

Let’s run it and see what happens:

Carls-MacBook-Pro-2:mycoolproject Carl$ python3 -m unittest tests/test_basics.py
whoa, this is so kewl
.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK

Hey that seems a lot better huh? Up to this point everything has been super straight forward and if you’ve done any amount of Python stuff you’ll be more than familiar with import errors as you’ve undoubtedly forgotten to import something and had this happen to you. The next bit is where I got tripped up… let’s add a quick test case to test our other Python file:

import unittest
from mycoolproject import my_module_1
from mycoolproject import my_module_2


class TestMe(unittest.TestCase):
   def test_stuff(self):
       assert my_module_1.my_string == 'whoa, this is so kewl'

  def test_other_stuff(self):
      assert my_module_2.my_new_string == 'carl said: whoa, this is so kewl'


if __name__ == '__main__':
    unittest.main()

Pretty straight forward stuff here too — we simply imported the other module and added a test case to assert that the string is what we think it should be. So what happens if we run our unit tests again?

Carls-MacBook-Pro-2:mycoolproject Carl$ python3 -m unittest tests/test_basics.py
whoa, this is so kewl
E
======================================================================
ERROR: test_basics (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_basics
Traceback (most recent call last):
 File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/unittest/loader.py", line 153, in loadTestsFromName
 module = __import__(module_name)
 File "/Users/Carl/Desktop/mycoolproject/tests/test_basics.py", line 3, in <module>
 from mycoolproject import my_module_2
 File "/Users/Carl/Desktop/mycoolproject/mycoolproject/my_module_2.py", line 1, in <module>
 import my_module_1
ModuleNotFoundError: No module named 'my_module_1'


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)

Annnnnnd we’re back to not working. So what gives? We get kinda the same error as before — complaining about import errors and whatnot. Not cool, especially since we basically did the same thing we did for “my_module_2” as we already did (and got working) for “my_module_1”. This time we can see that Python is upset about not finding a module called “my_module_1” — pretty ridiculous given the fact that importing that module is the second line of our test file right? BUT this is not failing *in* our test file — and here is what tripped me up. The issue is (and I’ll probably butcher the exact technical reasoning for this but you can check out this super handy SO post) that Python is confused about where that module is because the path for the execution (via unittest) is not *in* the same folder as the modules themselves. So we can address this by ensuring that the imports in our modules are not relative, but instead fully qualified if you will. Changing our “my_module_2” file to look like this:

from mycoolproject import my_module_1

my_new_string = f'carl said: {my_module_1.my_string}'
print(my_new_string)

Instead of importing from our local file we are now specifying the project that we are importing from, running our test again we get the following (good) results:

Carls-MacBook-Pro-2:mycoolproject Carl$ python3 -m unittest tests/test_basics.py
whoa, this is so kewl
carl said: whoa, this is so kewl
..
----------------------------------------------------------------------
Ran 2 tests in 0.000s

OK

TL;DR — pay attention to your imports, easy thing to fix but easy to miss it, run it locally and have everything run great and then be dumb like me and get angry at tests for not behaving the way you think they should 🙂

Advertisements

NFD16 – Gigamon and Splunk (with a Dash of Phantom)

I had the opportunity to take a trip to the Gigamon mothership (no, not like the Apple Mothership, it’s a normal HQ) last week at Networking Field Day 16. I was pretty excited as I’ve not had a ton of hands on with Gigamon, but I’ve run into their products at a goodly portion of all the customers I’ve worked with over the years.

If you’d like to take a peak at some Gigamon presentations before continuing you can check out a ton of them here on Tech Field Day’s Youtube channel.

This events presentations were focused on leveraging Gigamon, and some of their partners (Splunk and Phantom), ability to react to security incidents. The core idea in their presentation is that security is a very hard problem (they ain’t wrong!), and that many organizations spend a large quantity of time and money, generally across a broad set of tools. In order to make security simpler the idea is that you can use Gigamon and Splunk to in concert to detect things that you would prefer not happen on your network, and that would require tons of other tools to do without these platforms combined power. The integration between the platforms allows for Splunk to fire off these triggers which Gigamon can then react to by dropping traffic or alerting on this traffic. Taking things a step further via the integration with Phantom these events as seen in Splunk could fire off a whole host of mitigation tactics/processes in order to automate away a lot of the manual tedious work that SOC personnel may have to deal with. All told, this is a pretty cool story. The integration with both of these other platforms seemed from the demo to be pretty smooth; flexible, but not super open-source-y (i.e. your average Joe/Jane could probably figure it out without pulling out too many hairs).

In a perfect world, I can certainly see Gigamon (and team) supplanting possibly many other products by consolidating functionality. Gigamon itself is a pretty powerful platform, couple that with Splunk which is a beast and can provide very interesting data correlation/insights, and finally wrap it all up with Phantom to put the sexy bow of automation on this and things look interesting (unfortunately time was limited for the Phantom portion of the presentation so I don’t have much insight there but it really did look awesome!). That being said, there are some challenges…

Gigamon at its core relies on being inline with traffic, or at least receiving traffic (of course if you’re not inline you can’t drop things so keep that in mind). This has historically been more or less a non-issue — data centers always have had choke points, so go ahead and plop your Gigamon appliance right there and you’re in business. That whole Christmas tree type topology where we had easily defined choke points is not really a thing anymore (at least in data centers being deployed now — of course they still exist). Most data centers, and certainly the ones I’m involved in, are opting to build out in a Clos topology. In a Clos topology we can have crazy things — like 128 way ECMP!! Not that 128 way ECMP is common, but even in small 2-4 spine node topologies there aren’t any especially good places to place a device like a Gigamon. You can of course put Gigamon/IPS/whatever inline between leaf and spine nodes, however this is an atrociously expensive proposition — for several reasons, firstly just for the sheer amount of links that may entail, and secondly from a capacity perspective — if you’re going 40G, 100G or looking toward the crazy 400G you’re going to have to pay to play to run that kind of throughput through a device (Gigamon or otherwise). Depending on the topology, it may be easy to snag “north/south” traffic (border leaf nodes -> whatever is “northbound” for example), but with an ever-increasing focus on micro segmentation within the data center this is *probably* not sufficient for most orgs.

One option to address some of this that was not mentioned, or was mentioned only briefly, at the Gigamon presentation is the GigaVUE-VM. The idea here is that this is a user-space virtual machine that can be either in-line with VM traffic, or sitting “listening” in promiscuous mode. Because this is living in the user-space there are no hypervisor requirements/caveats, it just kind of hangs out. If used in “inline mode” (which I’ve actually not seen so maybe thats not a thing?) there is the potential for this to replace the big iron hardware appliances, and fit more neatly into a Clos topology. I would have liked to see/hear more about this… a bit more about why at the end…

I had two major takeaways from the Gigamon presentation, firstly — Splunk is like a magical glue to tie things together! The data being fed into Splunk could have come from any number of sources (syslog of course, clients on agents, http events, etc.); in this case it was from Gigamon, and Gigamon performed drop actions based on the rules created in Splunk. I suspect that Splunk could be (relatively?) easily configured to make an API call to a firewall or other device/platform to react to data being fed into it. Splunk without data though, is not really all that useful — and here is where Gigamon showed their value. Being able to capture LARGE amounts of data, and then do something to it (really just drop, but thats an important thing) is very valuable.

That being said, my second takeaway was that this felt largely out of sync with what most customers I see are doing, at least in the data center. When pressed for how to practically adapt this into a Clos topology the answers were thin at best: (paraphrasing) “just tap all the links to leaf/spine”, “tap at chokepoints” etc.. This is all well and good and depending on requirements and budget may be just fine, however I didn’t exactly get warm fuzzies that Gigamon knows how to play nice in these Clos data centers. Obviously tapping everything is a non-starter financially, and chokepoints are well and good but that means that the substantial investment in Gigamon/Splunk (because it really does seem like they  need to be deployed in unison to justify the expenditures) doesn’t actually do you much/any good for securing east/west traffic.

Having ran into Gigamon in several Cisco ACI deployments I’ve been a part of I can say that customers really love — or at least have invested so much that they feel the need to continue to get value out of — Gigamon, but each time I’ve seen this there has been a big struggle to find a good home for the appliances. This is why I really would have liked to have seen and heard more about the GigaVUE-VM — my knowledge is quite limited on it but it certainly seems to be a possible work around for the challenges of finding choke points in a  Clos fabric. The big caveat to this is that the Gigamon folks did mention that the VM does NOT have feature parity with the HC hardware appliances. It sounds like they are investing in adding these features though which would obviously be helpful.

One final note, as I have very data center focused goggles on I’ve more or less ignored the campus/WAN, but I definitely think this could be useful in those areas, perhaps much more so than in the data center.